Some document's OCR results are excellent. This one's are not. Why? What would help?
I suspect that the original document is too poor quality.
Poor OCR results
Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan
-
- User
- Posts: 5
- Joined: Wed Jan 04, 2012 5:47 pm
Poor OCR results
- Attachments
-
- test document.pdf
- The top is the text that had the OCR process applied, the bottom is the result.
- (161.43 KiB) Downloaded 327 times
-
- User
- Posts: 2393
- Joined: Wed Jan 18, 2006 12:10 pm
Re: Poor OCR results
Hi SteepleChase.
The OCR functionality is a nice piece of software that has been add to the PDF-XChange Viewer Pro.
I did several tests with it and did find out that the "Accuracy" parameter does not really do what it should be supposed to do. When you set it on "Low" or "Medium" the result is fairly good. When you set it on "High" the result is rather bad.
A test with your "test document.pdf" confirms this.
So, to the people of Tracker-Software, if you could make something out of it that results in the best combination of "Low" and "Medium" accuracy, it would almost be perfect. Already thanks to all you for the effort that has been done !
The OCR functionality is a nice piece of software that has been add to the PDF-XChange Viewer Pro.
I did several tests with it and did find out that the "Accuracy" parameter does not really do what it should be supposed to do. When you set it on "Low" or "Medium" the result is fairly good. When you set it on "High" the result is rather bad.
A test with your "test document.pdf" confirms this.
So, to the people of Tracker-Software, if you could make something out of it that results in the best combination of "Low" and "Medium" accuracy, it would almost be perfect. Already thanks to all you for the effort that has been done !
- Paul - Tracker Supp
- Site Admin
- Posts: 6897
- Joined: Wed Mar 25, 2009 10:37 pm
- Location: Chemainus, Canada
- Contact:
Re: Poor OCR results
Hi we have the document and I'l pass that on to the OCR lead developer when he returns from his Christmas Holidays next week. We'll provide feedback here.
hth
hth
Best regards
Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
Re: Poor OCR results
Just for interest I ran the "test document.pdf" trough PDF-XChange Viewer's OCR (each of the three levels) and Abby FineReader 9. You can see that also FineReader's results are not perfect. The original is too blurred.
Wilfried
Wilfried
- Attachments
-
- test document ocr.zip
- (1.03 KiB) Downloaded 273 times
- Paul - Tracker Supp
- Site Admin
- Posts: 6897
- Joined: Wed Mar 25, 2009 10:37 pm
- Location: Chemainus, Canada
- Contact:
Re: Poor OCR results
HI wilfriedh,
thanks for the input. That's quite interestinbg to see. Finereader did do a better job but it is also a $ 169.99 product that has had years of market place trial. This is a free OCR that is in it's first release.
We will still be working on this. hth
thanks for the input. That's quite interestinbg to see. Finereader did do a better job but it is also a $ 169.99 product that has had years of market place trial. This is a free OCR that is in it's first release.
We will still be working on this. hth
Best regards
Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
-
- User
- Posts: 381
- Joined: Mon Jun 13, 2011 5:10 pm
Re: Poor OCR results
Hi,
The recommended accuracy setting for most document is "medium"; in some cases the trade-off between speed and accuracy makes it worthwhile to use "low", which is faster but slightly more error-prone. High accuracy should generally be used for high resolution documents with small text; for general use with typical scanned documents (letters, forms, etc) it may end up performing worse than medium. We have left it up to the end users to determine which method is best for their specific document.
Also, the input document you provided is fairly low resolution. If you "zoom in" you can see that there are a lot of things that typically cause problems for OCR - poor delineation of letters and letters often contact their neighbours quite significantly. A higher scanning resolution should resolve this.
We are continually developing our OCR functionality and our products in general, so your feedback is greatly appreciated.
-Walter
The recommended accuracy setting for most document is "medium"; in some cases the trade-off between speed and accuracy makes it worthwhile to use "low", which is faster but slightly more error-prone. High accuracy should generally be used for high resolution documents with small text; for general use with typical scanned documents (letters, forms, etc) it may end up performing worse than medium. We have left it up to the end users to determine which method is best for their specific document.
Also, the input document you provided is fairly low resolution. If you "zoom in" you can see that there are a lot of things that typically cause problems for OCR - poor delineation of letters and letters often contact their neighbours quite significantly. A higher scanning resolution should resolve this.
We are continually developing our OCR functionality and our products in general, so your feedback is greatly appreciated.
-Walter