I tried OCR on a PDF from AutoCAD. (The SHX fonts are not integrated as fonts but as vector-elements). The text is created by CAD, not written by hand.
The first result seems good, but I does not recognize small small characters like "t". Example:
Real Text: Dieser Schaltungsfall-Steckerpunkt ist im angegebenen Relaissatz
OCR Result: Dieser Schal†ungsfall-S†eckerpunk† is† im angegebenen Relaissa†z
Real Text: ist zu kontrollieren,
OCR Result: is† zu konlrollieren,
Pretty much all OCR products out there rely on dictionaries to match their initial findings with actual words - and some technical terms might not be present in those dictionaries. When no match is found - each character is recognized on it's own and your technical font's lowercase "t" seems to be problematic for our tool.
The current OCR tool we provide is very basic and can't be fine tuned. A much more advanced version of it will be available in V3. The current tool is intended to allow you to search through the text of your document so you will need to use some of the other words that are correctly recognized as search phrases.
In V3 of the Viewer you would be able to easily "touch up" the OCR result and fix any such errors, but not for now
We are going to include character whitelist / blacklist features (which would certainly help for this case), but the exact licensing is not something I am sure of. This is up to the marketing and sales guys, and I am not at liberty to speculate about it.