Document Organisation Part 1 – Physical To Virtual Documents

Overview

This is the first post of a series where I describe the electronic documentation management system I use. It’s a system that works fairly well for me and I hope it can pass on this information and even better improve it.

This part covers getting physical (usually paper) documents into an electronic format. In my opinion the hardest part to do it well with least amount of money as possible and also efficiently.

Devices

Something needs to do the conversion from paper to electronic. A scanner is ideal because it’s sole purpose is to capture paper documents to electronic. Just like cameras, scanners have come down in price to the point where it’s available to almost all consumers even on a budget level built into things like printers.

Flat bed scanners are great at capturing flat documents and isolating it from any background noise however they only work 1 page at a time.

A scanner with an Automatic Document Feeder (ADF) is a solution allow a bunch of paper loaded onto a tray it feeds it into the scanner. Similar to a printer it feeds a sheet at a time.

Moving down the order of best suited devices is a camera. Cameras have become so good and even beating the resolution of which a scanner can produce that people use them to capture copies of a document. They can’t remove background distractions but they are portable and readily available.

Problem with cameras are the variety of elements that make capture documents not ideal. For example if it’s taken at an angle the document can looked skewed to bad lighting (such as using the flash). All of them make it look unprofessional.

Optical Character Recognition (OCR)

OCR has been around for decades and has improved so much that even computers can defeat “captcha” boxes (images with text in them that they ask you type into a box). On the mobile it seems more or less accurate to 99% on the computer it still lags behind. This was more obvious when I went to test different software.

The benefits of OCR’ing a document is to allow them to be indexed and searchable. The other is so that they can be copied and pasted. In PDF format the actually text is overlayed on top of the image of the document to make it look like it’s an editable document.

Whilst OCR would be a great addition, it is currently a nice to have.

Camera Capture

With the above 2 as the main method I went for the camera option. A typical dedicated modern scanner can range from £100 and upwards (Doxie). The ideal solutions are ones with a ADF and double sided (duplex) scanner.

It would be ideal for the device to be cross platform compatible but the main aim was to get it working on Microsoft Windows. Not wanting to spend that sort of money up front the camera seemed like the best option.

On Android I got CamScanner which can OCR and apply automatic enhancements to the image. These can be later exported to other places like Google Docs. It can capture multiple images to form one document as well. It keeps the original image so if later on the enhancement applied to the document doesn’t work, you can go back to the original image and redo and tweak the enhancements.

Summary

Without spending the necessary money to get the dedicated hardware the mobile phone option works fairly well. You have to be careful it doesn’t eat all the space on your device and also it’s exported to somewhere else just in case you loose your phone.

About Danny

I.T software professional always studying and applying the knowledge gained and one way of doing this is to blog. Danny also has participates in a part time project called Energy@Home [http://code.google.com/p/energyathome/] for monitoring energy usage on a premise. Dedicated to I.T since studying pure Information Technology since the age of 16, Danny Tsang working in the field that he has aimed for since leaving school. View all posts by Danny → This entry was posted in Windows, Workflow and tagged , , . Bookmark the permalink.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.