How to configue OCR (learning by doing)

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
Peter2
User
Posts: 946
Joined: Mon Sep 13, 2010 10:09 am
Location: Switzerland

How to configue OCR (learning by doing)

Post by Peter2 »

I tried OCR on a PDF from AutoCAD. (The SHX fonts are not integrated as fonts but as vector-elements). The text is created by CAD, not written by hand.

The first result seems good, but I does not recognize small small characters like "t". Example:

Code: Select all

Real Text: Dieser Schaltungsfall-Steckerpunkt ist im angegebenen Relaissatz
OCR Result: Dieser Schal†ungsfall-S†eckerpunk† is† im angegebenen Relaissa†z

Real Text:  ist zu kontrollieren,
OCR Result: is† zu konlrollieren,
Is there a way to configure this?

Regards

Peter
PDF-X-Change Pro German
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17824
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: How to configue OCR (learning by doing)

Post by Tracker Supp-Stefan »

Hi Peter,

Pretty much all OCR products out there rely on dictionaries to match their initial findings with actual words - and some technical terms might not be present in those dictionaries. When no match is found - each character is recognized on it's own and your technical font's lowercase "t" seems to be problematic for our tool.

The current OCR tool we provide is very basic and can't be fine tuned. A much more advanced version of it will be available in V3. The current tool is intended to allow you to search through the text of your document so you will need to use some of the other words that are correctly recognized as search phrases.
In V3 of the Viewer you would be able to easily "touch up" the OCR result and fix any such errors, but not for now :(



Best,
Stefan
Peter2
User
Posts: 946
Joined: Mon Sep 13, 2010 10:09 am
Location: Switzerland

Re: How to configue OCR (learning by doing)

Post by Peter2 »

Thanks for your reply.
A much more advanced version of it will be available in V3.
Fine. Will it be integrated in the Pro-Version or will it be an additional package?
And will the dictionaries be configurable (adding own words)?

Peter
PDF-X-Change Pro German
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: How to configue OCR (learning by doing)

Post by Walter-Tracker Supp »

We are going to include character whitelist / blacklist features (which would certainly help for this case), but the exact licensing is not something I am sure of. This is up to the marketing and sales guys, and I am not at liberty to speculate about it.
Post Reply