Enhanced OCR quality tuning

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
Puffolino
User
Posts: 317
Joined: Wed Feb 09, 2011 1:06 pm

Enhanced OCR quality tuning

Post by Puffolino »

Hi, tried to OCR some pages now and was suprised that the engine tends to replace certain common characters by special characters very often. Another point is that the result might show many different font faces and font sizes within a paragraph, maybe an additional filter could allow to reduce the number of used fonts to a minimum.

The example below shows also, that it is not easy to decide if the medium or high quality setting should be used. The first page (high quality) shows a nearly perfect text but eliminates an image, page two (medium quality) shows multiple errors in text content and placement, also some font variations are seen.
Attachments
OCR example result.pdf
(221.19 KiB) Downloaded 143 times
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8436
Joined: Wed Jan 03, 2018 6:52 pm

Re: Enhanced OCR quality tuning

Post by TrackerSupp-Daniel »

Hi, Puffolino

Could I ask you to send us a copy of the original, before OCR was performed on this page? So we can run a few tests here and see what is happening. Also, you mentioned specifically running this on "high" and "medium". How does the "auto" quality level work for you? What happens on your end if you enable the "ignore text in graphics" option, with regards to the disappearing image?

Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Puffolino
User
Posts: 317
Joined: Wed Feb 09, 2011 1:06 pm

Re: Enhanced OCR quality tuning

Post by Puffolino »

Hi Daniel,

did not find the perfect setting so far, also was wondering that "Fine Page Content" output did create a file which is more than 6 times larger than the original image file.

I've uploaded the original file 'r.pdf' and the new output file 'r (auto).pdf' as well.

Cheers.
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8436
Joined: Wed Jan 03, 2018 6:52 pm

Re: Enhanced OCR quality tuning

Post by TrackerSupp-Daniel »

Hi, Puffolino

My apologies but I do not seem to see these new sample files you say you've uploaded, where are they?

Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Puffolino
User
Posts: 317
Joined: Wed Feb 09, 2011 1:06 pm

Re: Enhanced OCR quality tuning

Post by Puffolino »

Oh, didn't mention that - sorry.
The files are in the Temp (OCR) directory of the user server for uploads.
:roll:
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8436
Joined: Wed Jan 03, 2018 6:52 pm

Re: Enhanced OCR quality tuning

Post by TrackerSupp-Daniel »

Hi, Puffolino

Understood, I will go look there. Out of curiosity, is there a reason you didn't simply attach them to your forum post like you did earlier, I see that they are certainly small enough files that we could have placed them here (and if you are not against it, I would like to attach them to one of these posts to keep the topic complete and self contained).

Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8436
Joined: Wed Jan 03, 2018 6:52 pm

Re: Enhanced OCR quality tuning

Post by TrackerSupp-Daniel »

Hi again,

I have finished looking into the files here, I do see that the Auto mode resulted in a fairly good quality result, I could not identify any errors in the text aside from some sections of text being slightly taller than the original image. I did however see what you mean about the vastly increased size, and have created a ticket on that matter for you:
RT#5515: EOCR Auto-quality increases image size substantially

Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Puffolino
User
Posts: 317
Joined: Wed Feb 09, 2011 1:06 pm

Re: Enhanced OCR quality tuning

Post by Puffolino »

Seems that the auto result is far better compared to the other settings - there's still one point I couldn't fix: all delimeters of the hyphenations are lost ('Zahlen- system' -> 'Zahlen system' and so on)...
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8436
Joined: Wed Jan 03, 2018 6:52 pm

Re: Enhanced OCR quality tuning

Post by TrackerSupp-Daniel »

Hi, Puffolino

Thank you for that pointer, after looking again I do see this happening in the file you sent me, Ill make a note of that for the Dev team to look into as well.

Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Puffolino
User
Posts: 317
Joined: Wed Feb 09, 2011 1:06 pm

Re: Enhanced OCR quality tuning

Post by Puffolino »

Thanks for that as well :)
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8436
Joined: Wed Jan 03, 2018 6:52 pm

Enhanced OCR quality tuning

Post by TrackerSupp-Daniel »

:)
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Post Reply