PDF-XChange - Tracker PDF Viewer - TIFF-XChange - Image-XChange - XMF-XChange - Raster-XChange - Support

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Chris - Tracker Supp, Vasyl-Tracker Dev Team, Sean - Tracker, Tracker Supp-Stefan, Ivan - Tracker Software

Topic Author
Posts: 6
Joined: Fri Nov 30, 2018 10:26 am

OCR struggles to recognise "4" in this typewritten text

Thu Dec 06, 2018 1:43 pm


I've been comparing OCR for scanned documents. In PDF tracker, The Enhanced Scanned Pages OCR does a generally accurate job of OCR, but an error that stands out is the very frequent failure to recognise the number 4 in some typewriter pages from the 1980s. Page and chapter numbers, and text like "45 degrees" all suffer. (Also, the degree symbol seems to be recognized better when the resolution is lower!)

I've attached a sample page in case it's helpful to you to evaluate any future OCR changes that you're making in PDF Xchange. It makes no difference if I align the text first, and very little difference to the OCR if I use the original greyscale, or convert it to black and white.
(52.04 KiB) Downloaded 4 times
User avatar
Site Admin
Posts: 1492
Joined: Wed Jan 03, 2018 6:52 pm

Re: OCR struggles to recognise "4" in this typewritten text

Thu Dec 06, 2018 6:15 pm

Hello sjm8,

This appears to be due to the nonstandard format of the number. While the human eye can easily discern that is the number 4, a computer is trained to look for all possibilities, and then takes what it thinks is the most likely of those, {in my tests I saw )4, h, L, and l+ }. As OCR developers are focusing on our new ORC engine now, I must inform you that it is unlikely this will see much attention. When the new engine (which is much more robust) comes out, it should hopefully be able to interpret this correctly, and if not we will certainly take another look at what is causing the issue.

For now, I apologize for the inconvenience.

Kind Regards,
Daniel McIntyre
Support Technician
Tracker Software Products (Canada) LTD

Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623

Who is online

Users browsing this forum: No registered users and 12 guests