How to configue OCR (learning by doing)

Discussion for the End User use uf OCR in PDF-XChange Editor and Viewer

Moderators: TrackerSupp-Daniel, Tracker Support, Sean - Tracker, Paul - Tracker Supp, Chris - Tracker Supp, Vasyl-Tracker Dev Team, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
Peter2
User
Posts: 770
Joined: Mon Sep 13, 2010 10:09 am
Location: Switzerland

How to configue OCR (learning by doing)

Post by Peter2 » Wed Oct 17, 2012 8:29 am

I tried OCR on a PDF from AutoCAD. (The SHX fonts are not integrated as fonts but as vector-elements). The text is created by CAD, not written by hand.

The first result seems good, but I does not recognize small small characters like "t". Example:

Code: Select all

Real Text: Dieser Schaltungsfall-Steckerpunkt ist im angegebenen Relaissatz
OCR Result: Dieser Schal†ungsfall-S†eckerpunk† is† im angegebenen Relaissa†z

Real Text:  ist zu kontrollieren,
OCR Result: is† zu konlrollieren,
Is there a way to configure this?

Regards

Peter
Win 7 Prof German; PDF-X-Change Pro German

User avatar
Tracker Supp-Stefan
Site Admin
Posts: 13188
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: How to configue OCR (learning by doing)

Post by Tracker Supp-Stefan » Wed Oct 17, 2012 9:37 am

Hi Peter,

Pretty much all OCR products out there rely on dictionaries to match their initial findings with actual words - and some technical terms might not be present in those dictionaries. When no match is found - each character is recognized on it's own and your technical font's lowercase "t" seems to be problematic for our tool.

The current OCR tool we provide is very basic and can't be fine tuned. A much more advanced version of it will be available in V3. The current tool is intended to allow you to search through the text of your document so you will need to use some of the other words that are correctly recognized as search phrases.
In V3 of the Viewer you would be able to easily "touch up" the OCR result and fix any such errors, but not for now :(



Best,
Stefan

Peter2
User
Posts: 770
Joined: Mon Sep 13, 2010 10:09 am
Location: Switzerland

Re: How to configue OCR (learning by doing)

Post by Peter2 » Wed Oct 17, 2012 9:41 am

Thanks for your reply.
A much more advanced version of it will be available in V3.
Fine. Will it be integrated in the Pro-Version or will it be an additional package?
And will the dictionaries be configurable (adding own words)?

Peter
Win 7 Prof German; PDF-X-Change Pro German

Walter-Tracker Supp
User
Posts: 383
Joined: Mon Jun 13, 2011 5:10 pm

Re: How to configue OCR (learning by doing)

Post by Walter-Tracker Supp » Wed Oct 17, 2012 8:06 pm

We are going to include character whitelist / blacklist features (which would certainly help for this case), but the exact licensing is not something I am sure of. This is up to the marketing and sales guys, and I am not at liberty to speculate about it.

Post Reply