Document Organisation Part 3 – Optical Character Recognition

Filed Under Windows, Workflow at 6th June 2016 0:01 by Danny

Tags: ABBYY, document, Evernote, FreeOCR, ocr, Omnipage, PDF, Scan, Scanner

Overview

I wanted the text in scans to be embedded in the PDF document which PDF does support. Other solutions I’ve seen have been to include a copy of the text in a separate text file with the same name as the scan. That seems like a clunky solution. I had a quest to find the best OCR software that I could use for PDFs and hopefully into my workflow.

Test Capture

The first thing to tackle was the means to get something from the real (physical world) to an electronic format (virtual). As per usual this was done with the collaboration with Dave Bradford, my Apple counter part. He used an app called PDFPen which also has OCR built in. Unfortunately it’s Mac / iOS software only. Our benchmark image to test the OCR capabilities was from a 300 DPI scanned leaflet from a pub. The image was scanned from an Epson multifunction printer using the flat bed scanner.

PDFPen managed the OCR extremely well picking up the title at the top including the overlayed green text. This led me on a quest to find something as good if not better than PDF pen. I tried the following OCR software on Windows in no particular order:

FreeOCR
ABBYY FineReader
Omnipage Ultimate
Evernote

All of them failed to live up to the mark of PDFPen even though I believe PDFPen uses Omnipage’s OCR engine. The main problem the Windows software I tested with were the text on the green background which PDFPen managed to pick up. Here’s the sample from PDFPen.

Summary

I’m unable to match the functionality found in PDFPen on the Windows side. I used trial software on Windows 7 computer to test all of them. The test were brief so I did not tweak any of them to see if there were settings that would increase the accuracy of the OCR technology but PDFPen didn’t need this either.

I’d be interested in hearing from people who may solve this problem on the Windows side and it must be able to do it from command line so that it can work in my workflow.

About Danny

I.T software professional always studying and applying the knowledge gained and one way of doing this is to blog. Danny also has participates in a part time project called Energy@Home [http://code.google.com/p/energyathome/] for monitoring energy usage on a premise. Dedicated to I.T since studying pure Information Technology since the age of 16, Danny Tsang working in the field that he has aimed for since leaving school. View all posts by Danny → This entry was posted in Windows, Workflow and tagged ABBYY, document, Evernote, FreeOCR, ocr, Omnipage, PDF, Scan, Scanner. Bookmark the permalink.

Document Organisation Part 3 – Optical Character Recognition

Overview

Test Capture

Summary

Like this:

About Danny

Leave a ReplyCancel reply

Document Organisation Part 3 – Optical Character Recognition

Overview

Test Capture

Summary

Share this:

Like this:

About Danny

Leave a ReplyCancel reply