OCR accuracy V6 vs V7

Forum for the PDF-XChange Editor - Free and Licensed Versions

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
DWC121
User
Posts: 66
Joined: Thu Jul 30, 2015 5:18 am

OCR accuracy V6 vs V7

Post by DWC121 »

Greetings - I'm using two versions of PDF-XChange Editor. The one on my desktop is V7.0 build 326.0. My portable copy is V6.0 build 317.1 . My portable copy is older but the OCR functionality is way more accurate. An example of both are attached. Both pdfs were created from the same bmp file.

I'm almost sure I have all the settings the same for both (Lang: English, Accuracy: Medium, Output type: Preserve Original Content and Add Text Layer, Quality: 300). Is there another setting in my V7 I need to adjust to increase the accuracy? Is there a "hidden" file somewhere (maybe in my V6) that I can import into V7 to increase the accuracy?

A few weeks ago I tried darkening and sharpening another bmp, but the result in V7 was still sub-par compared to V6.

Thanks - David

I'm using Windows 7 Home Premium edition
Attachments
V6-0 Build 317-1.pdf
(241.71 KiB) Downloaded 61 times
V7-0 Build 326-0.pdf
(214.05 KiB) Downloaded 63 times
Timur Born
User
Posts: 874
Joined: Tue Jun 26, 2012 1:50 pm

Re: OCR accuracy V6 vs V7

Post by Timur Born »

Is "medium" OCR quality fixed already? Quite some time ago it was written that the whole OCR engine would get an overhaul, but I never heard about it again.
Willy Van Nuffel
User
Posts: 2393
Joined: Wed Jan 18, 2006 12:10 pm

Re: OCR accuracy V6 vs V7

Post by Willy Van Nuffel »

Hello,

Tracker Software introduced a renewed OCR feature a few releases ago (starting from 7.0.325):
= "Enhance Scanned Pages" - in the Document-menu - Classic Toolbars / in the Convert-tab - Ribbon UI.

I suppose this one will replace the current "OCR Page(s)..." feature over time.

See:
viewtopic.php?f=65&t=30794
viewtopic.php?f=62&t=30797

Regards.
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8588
Joined: Wed Jan 03, 2018 6:52 pm

Re: OCR accuracy V6 vs V7

Post by TrackerSupp-Daniel »

Hello DWC121,

Thank you for those comparison files. Could I ask that you try using the new "Enhanced Scanned Pages" function in the latest build and tell us if the results are more desirable?

Thanks for the forum references Willy :)
Timur, Willy is correct in that the "Enhanced Scanned Pages" option is the OCR overhaul that was written about, Its development is still ongoing, as such we do of course still offer both options for users.
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
DWC121
User
Posts: 66
Joined: Thu Jul 30, 2015 5:18 am

Re: OCR accuracy V6 vs V7

Post by DWC121 »

Daniel,

I tried the new "Enhanced Scanned Pages" function and the results are much better. The original image is on the light side (attached as PDF TEST.jpg). Note that when I create a pdf I import a bmp file; not a jpg file.

The "Enhanced Scanned Pages" function has several options. In my case I found the following settings made no difference:
"Color/Grayscale" = JPG, default quality
"Color/Grayscale" = JPG, "Small size"
"Color/Grayscale" = JPG, "High Quality"
"Color/Grayscale" = JPG2000, default quality
"Color/Grayscale" = JPG2000, "Small size"
"Color/Grayscale" = JPG2000, "High Quality"
Even though my image is monochrome I left that setting as JBIG2.
For the above, I left the "Filter" options at their default settings (On, off, low, low)

I then tried the above JPG combinations with "Filters - Text Sharpening" at Medium and again at High. Both gave satisfactory results. The OCR were minimally different. These files are attached. The High setting made the text much darker (as expected). The Medium setting kept the tonal color of the black text very close to the original. I prefer that since the pdf might be printed later. I need the pdf to appear close to the original bmp file.

It is great there are many options for applying OCR. I just hope people don't get set in their ways and use one particular set of options... and then find out later after applying OCR to hundreds of documents "I should have set another option differently".

David
Attachments
TEST 08 Default Quality JPEG Text sharpen high.pdf
(79.91 KiB) Downloaded 63 times
TEST 07 Default Quality JPEG Text sharpen med.pdf
(71.14 KiB) Downloaded 57 times
PDF TEST.jpg
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17906
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: OCR accuracy V6 vs V7

Post by Tracker Supp-Stefan »

Hello David,

Glad to hear you are now happy with the results!

And many thanks for sharing your findings and samples!
I am sure they will be useful to others as well!

Cheers,
Stefan
Post Reply