Document Organisation Part 4 – Document Management System

Overview

An update to my overall document management is the use of a Document Management System (DMS). It provides an easily searchable repository with additional meta data such as tags without having to fudge it using a file names yet at it’s core, it can be browsed using files and folders.
Paperless NGX dashboard

Paperless-NGX

An open source project based off of a fork of a fork of another forked project, it is a self hosted solution that may not suit everyone. The benefit of using something like this over files and folders are the management and search capabilities.

Killer productivity improvement includes:

In part 2, I used an app called DropIt to rename and move files to a folder. I ended up storing this in a cloud storage so documents were sync’d and accessible remotely albeit without the Windows indexing to help search for files. With a web interface and mobile, I can upload the files and it will process the documents ready for review.

User management means I can share documents with other people safely and webhooks gives me control on events such as when documents are uploaded and ready for review.

The real winning automation is the email integration. It can pull emails and / or attachments automatically saving the swivel chair move from getting the file in email then to files and folders.

The trade off is the files and folder structure that does not rely on any software. I can vaguely retain the files and folder structure I had before using the configuration in Paperless NGX but the adding tags in the file name was not ideal.

Setup

The installation is beyond the scope of this however the customer configuration I made to made are detailed here.

In docker, I set:
PAPERLESS_FILENAME_FORMAT={created_year}/{created_month}/{created} +{tag} - {title}
This will setup and store files in the same year/month/date tags title structure. However, the tags are not as well formatted compared to the manual method.

Migration

I manually uploaded the files and as Paperless can upload multiple files at once, it was not a big hassle. You could setup a shared consumption folder where it will watch for files and automatically upload and process the files in the same way.

As I had a lot of files to upload, I made the following tweaks in docker:
PAPERLESS_TASK_WORKERS=8
PAPERLESS_THREADS_PER_WORKER=2

This would allow a total of 16 threads i.e process 16 files at the same time, utilising half the threads available on my AMD 3950x. I removed these once everything was uploaded because it does not need that much parallel processing.

It still took several hours to review, create the tags, correspondence, etc but it’s worth it.

Also I would advise batching the uploads because as you add meta data such as tags and correspondences to documents, Paperless NGX will learn and attempt to automatically add these information in for you. At that point, you can review, change if necessary and move on.

Paperless Tagging screen

Document Review

Documents are automatically processed and added to the inbox ready for review. The processing can OCR the document and apply meta data such as tags, correspondence, date of the letter, etc.

Email Integration

The email integration is one of the best part of this. Paperless checks every 10 minutes for a new email with a document attachment and automatically uploads it to Paperless NGX. Once ready, it is available in the inbox for review.

Paperless email integration

There are various rules available which can be applied to the email integration such as from, to, subject and body filters.

Home Assistant

Using the scripting feature, I used a webhook trigger in an automation for any new documents that have been processed.

Create a volume where all Paperless NGX scripts will be held and executed:
/myscriptfolder:/usr/src/paperless/scripts

Set a variable to execute this script inside the above folder after a document has been processed:
PAPERLESS_POST_CONSUME_SCRIPT=/usr/src/paperless/scripts/homeassistant.sh

note the path uses the container path. The script file name in this case is called homeassistant.sh

My example of the homeassistant.sh script is here.

The file contains all the available variables at the time.
curl -X POST http://homeassistant.local:8123/api/webhook/5dc5fc04-365e-4834-97e9-c6967bda3909 \
Change the address to match your Home Assistant Webhook after POST keeping the trailing back slash (\).

-H 'Authorization: Bearer [replaceme:]' \
Generate a long lived token in Home Assistant and replace the [replaceme:] including the brackets. Note to keep the apostrophe and backslash at the end.

The last line is the data in JSON format. If you need to add or remove any information, this is the line to change.

An example of the Home Assistant automation can be found here.

Summary

The system has definitely made searching documents a lot easier. The mobile app is a bonus to be able to find a document quickly on the go. The next step is data archiving / data retention

About Danny

I.T software professional always studying and applying the knowledge gained and one way of doing this is to blog. Danny also has participates in a part time project called Energy@Home [http://code.google.com/p/energyathome/] for monitoring energy usage on a premise. Dedicated to I.T since studying pure Information Technology since the age of 16, Danny Tsang working in the field that he has aimed for since leaving school. View all posts by Danny → This entry was posted in Productivity and tagged , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *.

All comments must go through an approval and anti-spam process before appearing on the website. Please be patience and do not re-submit your comment if it does not appear.

This site uses Akismet to reduce spam. Learn how your comment data is processed.