OCR worse than before

Forum for the PDF-XChange Editor - Free and Licensed Versions

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
DWC121
User
Posts: 66
Joined: Thu Jul 30, 2015 5:18 am

OCR worse than before

Post by DWC121 »

Greetings.

I rely on OCR working fairly well. Within the past year something changed and OCR accuracy seems to have gone down hill. I always use the "Medium" accuracy.

Last year I created a pdf from a bmp file and applied OCR. The results were very accurate.

Today I added text to the same bmp, re-created a new pdf, and applied OCR. The results were terrible.

Both pdf files are attached; one labeled "NEW", the other labeled "OLD".

My version of PDF-XChange Editor is V 7.0 (Build 323.1). Could there be a plug-in I'm missing?

Thanks - David
Attachments
OLD.pdf
(276.97 KiB) Downloaded 71 times
NEW.pdf
(305.05 KiB) Downloaded 62 times
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17910
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: OCR worse than before

Post by Tracker Supp-Stefan »

Hello DWC121,

How did you create the new BMP file after you added the word in question?
I see that in the "Old" file the image is 1275 x 1643 pixels, and in the new one it is 1275 x 1650px, so it is slightly taller - and while this is not noticeably when looking at the file - the image does get distorted and this affects the OCR result.

I'd recommend you to instead take the old file and add the new word inside the PDF document - you can use e.g. the typewriter tool, or if you want it to be a base content the "Add"-> "Add Text" tool, and both will not interfere with your original image, and the OCR text layer that is already in the file.

Regards,
Stefan
DWC121
User
Posts: 66
Joined: Thu Jul 30, 2015 5:18 am

Re: OCR worse than before

Post by DWC121 »

Stefan,

I have no idea how the dimensions got changed, but I see they were.

When I applied OCR to the NEW document, the old typewritten information in the middle of the document (in the box under the phrase Catalog No and Description) did not get OCR'd .

I tried your suggestion of adding my text using the typewriter tool. That worked. Plus, since I presume it was "real text" it got OCR'd. Occasionally I have to use a photo editor and move the old typewritten text around, add new text, make it into a new pdf and ORC the new document. For those I might have to take a fairly blank pdf and use the typewriter tool to add all the text.

David

PS - Since PDF-XChange could not recognize some of the old typewritten text on the NEW pdf (1275 x 1650 pixels - 8-1/2" x 11"), I still think something got changed in the newer version of PDF-XChange. Maybe the new version has problems recognizing text that may be a bit blurry or lighter than other text on the page.
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17910
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: OCR worse than before

Post by Tracker Supp-Stefan »

Hi David,

Yes - if you use the typewriter tool - it remains as a text / vector object - and you can even move and change that as text if you do not flatten/rasterize it at some point, so no need to OCR that anew if the original image stays the same and you only need to add a few new characters on top.

Regards,
Stefan
Post Reply