Enhanced OCR quality tuning
Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan
Enhanced OCR quality tuning
Hi, tried to OCR some pages now and was suprised that the engine tends to replace certain common characters by special characters very often. Another point is that the result might show many different font faces and font sizes within a paragraph, maybe an additional filter could allow to reduce the number of used fonts to a minimum.
The example below shows also, that it is not easy to decide if the medium or high quality setting should be used. The first page (high quality) shows a nearly perfect text but eliminates an image, page two (medium quality) shows multiple errors in text content and placement, also some font variations are seen.
The example below shows also, that it is not easy to decide if the medium or high quality setting should be used. The first page (high quality) shows a nearly perfect text but eliminates an image, page two (medium quality) shows multiple errors in text content and placement, also some font variations are seen.
- Attachments
-
- OCR example result.pdf
- (221.19 KiB) Downloaded 144 times
- TrackerSupp-Daniel
- Site Admin
- Posts: 8592
- Joined: Wed Jan 03, 2018 6:52 pm
Re: Enhanced OCR quality tuning
Hi, Puffolino
Could I ask you to send us a copy of the original, before OCR was performed on this page? So we can run a few tests here and see what is happening. Also, you mentioned specifically running this on "high" and "medium". How does the "auto" quality level work for you? What happens on your end if you enable the "ignore text in graphics" option, with regards to the disappearing image?
Kind regards,
Could I ask you to send us a copy of the original, before OCR was performed on this page? So we can run a few tests here and see what is happening. Also, you mentioned specifically running this on "high" and "medium". How does the "auto" quality level work for you? What happens on your end if you enable the "ignore text in graphics" option, with regards to the disappearing image?
Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Re: Enhanced OCR quality tuning
Hi Daniel,
did not find the perfect setting so far, also was wondering that "Fine Page Content" output did create a file which is more than 6 times larger than the original image file.
I've uploaded the original file 'r.pdf' and the new output file 'r (auto).pdf' as well.
Cheers.
did not find the perfect setting so far, also was wondering that "Fine Page Content" output did create a file which is more than 6 times larger than the original image file.
I've uploaded the original file 'r.pdf' and the new output file 'r (auto).pdf' as well.
Cheers.
- TrackerSupp-Daniel
- Site Admin
- Posts: 8592
- Joined: Wed Jan 03, 2018 6:52 pm
Re: Enhanced OCR quality tuning
Hi, Puffolino
My apologies but I do not seem to see these new sample files you say you've uploaded, where are they?
Kind regards,
My apologies but I do not seem to see these new sample files you say you've uploaded, where are they?
Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Re: Enhanced OCR quality tuning
Oh, didn't mention that - sorry.
The files are in the Temp (OCR) directory of the user server for uploads.
The files are in the Temp (OCR) directory of the user server for uploads.
- TrackerSupp-Daniel
- Site Admin
- Posts: 8592
- Joined: Wed Jan 03, 2018 6:52 pm
Re: Enhanced OCR quality tuning
Hi, Puffolino
Understood, I will go look there. Out of curiosity, is there a reason you didn't simply attach them to your forum post like you did earlier, I see that they are certainly small enough files that we could have placed them here (and if you are not against it, I would like to attach them to one of these posts to keep the topic complete and self contained).
Kind regards,
Understood, I will go look there. Out of curiosity, is there a reason you didn't simply attach them to your forum post like you did earlier, I see that they are certainly small enough files that we could have placed them here (and if you are not against it, I would like to attach them to one of these posts to keep the topic complete and self contained).
Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
- TrackerSupp-Daniel
- Site Admin
- Posts: 8592
- Joined: Wed Jan 03, 2018 6:52 pm
Re: Enhanced OCR quality tuning
Hi again,
I have finished looking into the files here, I do see that the Auto mode resulted in a fairly good quality result, I could not identify any errors in the text aside from some sections of text being slightly taller than the original image. I did however see what you mean about the vastly increased size, and have created a ticket on that matter for you:
RT#5515: EOCR Auto-quality increases image size substantially
Kind regards,
I have finished looking into the files here, I do see that the Auto mode resulted in a fairly good quality result, I could not identify any errors in the text aside from some sections of text being slightly taller than the original image. I did however see what you mean about the vastly increased size, and have created a ticket on that matter for you:
RT#5515: EOCR Auto-quality increases image size substantially
Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Re: Enhanced OCR quality tuning
Seems that the auto result is far better compared to the other settings - there's still one point I couldn't fix: all delimeters of the hyphenations are lost ('Zahlen- system' -> 'Zahlen system' and so on)...
- TrackerSupp-Daniel
- Site Admin
- Posts: 8592
- Joined: Wed Jan 03, 2018 6:52 pm
Re: Enhanced OCR quality tuning
Hi, Puffolino
Thank you for that pointer, after looking again I do see this happening in the file you sent me, Ill make a note of that for the Dev team to look into as well.
Kind regards,
Thank you for that pointer, after looking again I do see this happening in the file you sent me, Ill make a note of that for the Dev team to look into as well.
Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Re: Enhanced OCR quality tuning
Thanks for that as well
- TrackerSupp-Daniel
- Site Admin
- Posts: 8592
- Joined: Wed Jan 03, 2018 6:52 pm
Enhanced OCR quality tuning
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com