Greetings.
I rely on OCR working fairly well. Within the past year something changed and OCR accuracy seems to have gone down hill. I always use the "Medium" accuracy.
Last year I created a pdf from a bmp file and applied OCR. The results were very accurate.
Today I added text to the same bmp, re-created a new pdf, and applied OCR. The results were terrible.
Both pdf files are attached; one labeled "NEW", the other labeled "OLD".
My version of PDF-XChange Editor is V 7.0 (Build 323.1). Could there be a plug-in I'm missing?
Thanks - David
OCR worse than before
Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan
- Tracker Supp-Stefan
- Site Admin
- Posts: 17910
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
- Contact:
Re: OCR worse than before
Hello DWC121,
How did you create the new BMP file after you added the word in question?
I see that in the "Old" file the image is 1275 x 1643 pixels, and in the new one it is 1275 x 1650px, so it is slightly taller - and while this is not noticeably when looking at the file - the image does get distorted and this affects the OCR result.
I'd recommend you to instead take the old file and add the new word inside the PDF document - you can use e.g. the typewriter tool, or if you want it to be a base content the "Add"-> "Add Text" tool, and both will not interfere with your original image, and the OCR text layer that is already in the file.
Regards,
Stefan
How did you create the new BMP file after you added the word in question?
I see that in the "Old" file the image is 1275 x 1643 pixels, and in the new one it is 1275 x 1650px, so it is slightly taller - and while this is not noticeably when looking at the file - the image does get distorted and this affects the OCR result.
I'd recommend you to instead take the old file and add the new word inside the PDF document - you can use e.g. the typewriter tool, or if you want it to be a base content the "Add"-> "Add Text" tool, and both will not interfere with your original image, and the OCR text layer that is already in the file.
Regards,
Stefan
Re: OCR worse than before
Stefan,
I have no idea how the dimensions got changed, but I see they were.
When I applied OCR to the NEW document, the old typewritten information in the middle of the document (in the box under the phrase Catalog No and Description) did not get OCR'd .
I tried your suggestion of adding my text using the typewriter tool. That worked. Plus, since I presume it was "real text" it got OCR'd. Occasionally I have to use a photo editor and move the old typewritten text around, add new text, make it into a new pdf and ORC the new document. For those I might have to take a fairly blank pdf and use the typewriter tool to add all the text.
David
PS - Since PDF-XChange could not recognize some of the old typewritten text on the NEW pdf (1275 x 1650 pixels - 8-1/2" x 11"), I still think something got changed in the newer version of PDF-XChange. Maybe the new version has problems recognizing text that may be a bit blurry or lighter than other text on the page.
I have no idea how the dimensions got changed, but I see they were.
When I applied OCR to the NEW document, the old typewritten information in the middle of the document (in the box under the phrase Catalog No and Description) did not get OCR'd .
I tried your suggestion of adding my text using the typewriter tool. That worked. Plus, since I presume it was "real text" it got OCR'd. Occasionally I have to use a photo editor and move the old typewritten text around, add new text, make it into a new pdf and ORC the new document. For those I might have to take a fairly blank pdf and use the typewriter tool to add all the text.
David
PS - Since PDF-XChange could not recognize some of the old typewritten text on the NEW pdf (1275 x 1650 pixels - 8-1/2" x 11"), I still think something got changed in the newer version of PDF-XChange. Maybe the new version has problems recognizing text that may be a bit blurry or lighter than other text on the page.
- Tracker Supp-Stefan
- Site Admin
- Posts: 17910
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
- Contact:
Re: OCR worse than before
Hi David,
Yes - if you use the typewriter tool - it remains as a text / vector object - and you can even move and change that as text if you do not flatten/rasterize it at some point, so no need to OCR that anew if the original image stays the same and you only need to add a few new characters on top.
Regards,
Stefan
Yes - if you use the typewriter tool - it remains as a text / vector object - and you can even move and change that as text if you do not flatten/rasterize it at some point, so no need to OCR that anew if the original image stays the same and you only need to add a few new characters on top.
Regards,
Stefan