Enhanced OCR

Discussion for the End User use uf OCR in PDF-XChange Editor and Viewer

Moderators: Tracker Support, TrackerSupp-Daniel, Paul - Tracker Supp, Chris - Tracker Supp, Vasyl-Tracker Dev Team, Sean - Tracker, Tracker Supp-Stefan, Ivan - Tracker Software

Post Reply
MarkusRei
User
Posts: 1
Joined: Mon Apr 08, 2019 5:20 pm

Enhanced OCR

Post by MarkusRei » Tue Apr 09, 2019 7:27 pm

Hi,

enhanced OCR:

1. did not recognizes the Currency symbol € "Euro" (you find it in *every* invoice in europe)
2. The uppercase letter of the german Umlaut "Ö" is recognized not very often
3. European write the digit "1" always with an upstroke, but the engine often show "1" when slash /., uppercase i, lowercase L, and so on
4. it seems the german dictionary prevents good results. Original word was "öffentlich" ocr makes it to "ordentlich" a complet different word, different characters.

Please improve

User avatar
Paul - Tracker Supp
Site Admin
Posts: 4884
Joined: Wed Mar 25, 2009 10:37 pm
Location: Chemainus, Canada
Contact:

Re: Enhanced OCR

Post by Paul - Tracker Supp » Tue Apr 09, 2019 8:59 pm

Hi Markus, and welcome to the Tracker Forums.

Thank you for that post. We would indeed like to investigate this. Can you please send us your document for testing please? If it is too large for the forum, you can email it to support@tracker-software.com.

If it is even too large for that you can upload it to https://useruploads.tracker-software.support/

regards
_________________
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards

Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com

substorm
User
Posts: 2
Joined: Tue May 28, 2019 11:57 pm

Re: Enhanced OCR

Post by substorm » Wed May 29, 2019 12:23 am

Just to add to Markus' comments, as shown below, I am also getting pretty poor OCR results even with the latest "Enhanced" version.
I have also attached the same file as PDF for your team to test on your end.
TestOCR.zip
(59.29 KiB) Downloaded 12 times
BEFORE OCR:
Image

AFTER OCR:
Image

User avatar
TrackerSupp-Daniel
Site Admin
Posts: 2239
Joined: Wed Jan 03, 2018 6:52 pm

Re: Enhanced OCR

Post by TrackerSupp-Daniel » Wed May 29, 2019 5:22 pm

Hello Substorm,

Thank you for the examples, Looking this over it seems that roughly 95% of the text came out correctly, and after some investigation and testing I found a few reasons for that.

First and foremost, the quality of the image. Note that Optical Character Recognition (OCR) relies heavily on the quality of the image and operates best in the 300-600 dpi range (which most modern scanners are capable of performing). This image is only 143 dpi, with that considered, these results are phenominal.

The second reason is the scan accuracy, the higher the accuracy of a scan, the more likely that artifacts and errors will appear in "imprefect" documents. for one that is below the optimal dpi range, you will almost always want to use the "low accuracy" mode when performing OCR, to achieve the best possible results. Doing this certainly did further improve the character recognition, leaving only one very minor mistake in that the $ in the first column was seen as a capital S.
image.png
As you can see, the remainder of the text does indeed match the original.

Finally, regarding the title bar (Apple, book, big, etc.), Note that the Editor currently only fully supports Black on White text during scanning (others can work in many cases, but are not yet fully supported). This is why much of that row was missed, and the portions that were performed changed slightly in appearance. Once again, having a higher quality image would improve this situation significantly.

Kind regards,
Daniel McIntyre
Support Technician
Tracker Software Products (Canada) LTD

Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623

substorm
User
Posts: 2
Joined: Tue May 28, 2019 11:57 pm

Re: Enhanced OCR

Post by substorm » Thu May 30, 2019 6:42 pm

Hi Daniel.

Sorry, I don't think I've used the right word by saying "poor". Instead, I should have said that there is some room for improvement, especially with symbols like $ and colored backgrounds. Comparing to the old version of your OCR, this enhanced release is definitely a big jump forward. Hoping to see it one day take the podium by beating Abbyy.

Thanks!

User avatar
TrackerSupp-Daniel
Site Admin
Posts: 2239
Joined: Wed Jan 03, 2018 6:52 pm

Re: Enhanced OCR

Post by TrackerSupp-Daniel » Thu May 30, 2019 6:53 pm

Hello substorm,

We too hope that we can see that kind of improvement in the future. There are always teething trouble with new features like this, and we are certainly doing our best to resolve them as we go. We highly appreciate feedback like this, and will certainly make use of this file for future testing as we work on those features.

Kind regards,
Daniel McIntyre
Support Technician
Tracker Software Products (Canada) LTD

Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623

Post Reply