Latin language - OCR-result not satisfying

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
jürgen somorjai
User
Posts: 51
Joined: Sun Feb 14, 2016 3:09 pm

Latin language - OCR-result not satisfying

Post by jürgen somorjai »

Hallo,
Now I'm using the newest version of PDF ExChange Editor plus. I was very happy to see that there are a lot of new languages by OCR. I often use and need the Latin laguage, but I was disappointed when I scanned a Latin page. After using OCR I copied one line after another into WORD, but the results were not satisfying. Is it possible to train that language to get better results? Or other ideas I can try??
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Latin language - OCR-result not satisfying

Post by Will - Tracker Supp »

Hi Jürgen,

As per my response to your other post, there is no way to train the OCR. Can you please send a sample file and send a screen-shot of your OCR settings?

Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
jürgen somorjai
User
Posts: 51
Joined: Sun Feb 14, 2016 3:09 pm

Re: Latin language - OCR-result not satisfying

Post by jürgen somorjai »

Hallo,
Thank you for you interest.
Here are the two examples: settings and sample.
:-)
I would like to hear from you.
Thanks
PDF-OCR-1.jpg
PDF-OCR_Beispielseite_Latein.7z
(172.67 KiB) Downloaded 249 times
Attachments
PDF-OCR_Beispielseite_Latein.pdf
(176.29 KiB) Downloaded 239 times
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Latin language - OCR-result not satisfying

Post by Will - Tracker Supp »

Hi Jürgen,

Beautiful, thanks for that! Please try using Medium accuracy instead, as there is an issue with High accuracy that makes it worse than medium. I believe that the issue is with Google's Tesseract libraries (which we use), so isn't something that we can fix.

Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
jürgen somorjai
User
Posts: 51
Joined: Sun Feb 14, 2016 3:09 pm

Re: Latin language - OCR-result not satisfying

Post by jürgen somorjai »

Ok., thank you. I changed and I'm going to notice it.
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Latin language - OCR-result not satisfying

Post by Will - Tracker Supp »

:D
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Timur Born
User
Posts: 874
Joined: Tue Jun 26, 2012 1:50 pm

Re: Latin language - OCR-result not satisfying

Post by Timur Born »

I noticed the sometimes worse performance of "High" before. Maybe it should either not be offered until fixed or a warning should be issued upon selecting it?
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Latin language - OCR-result not satisfying

Post by Will - Tracker Supp »

Hi Timur,

We're actually in the process of re-writing the OCR, so it should be much better and, I believe, this should be one of the concerns addressed.

Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Timur Born
User
Posts: 874
Joined: Tue Jun 26, 2012 1:50 pm

Re: Latin language - OCR-result not satisfying

Post by Timur Born »

Yes, you mentioned that in January. :P No need to hurry, once I noticed the differences between Medium and High I knew how to work with these. Adobe Acrobat really is on the forefront of OCR nowadays, even beating dedicated OCR applications. But you sure pay a steep price for that (money that is).
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Latin language - OCR-result not satisfying

Post by Will - Tracker Supp »

Ah, sorry! It's sometimes hard to keep track of what has/hasn't been said due to the number of people I deal with daily :oops:
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
DIV
User
Posts: 252
Joined: Fri Jun 23, 2017 1:47 am

Re: Latin language - OCR-result not satisfying

Post by DIV »

Hi, Timur & co.

I have done a little test on some German text.
Based on this: Medium Accuracy had some mistakes; High Accuracy fixed some of those mistakes, but added new mistakes.

It seems the algorithm must be making a trade-off in each case between what it 'sees', and what it considers is likely to appear — in particular, what is contained in a dictionary.
It is notable that whereas occasionally just one or two letters will be misinterpreted, there are also several cases where one word has been replaced by a totally different word.
Examples:
  • Desinfektion, — Destination, (Med.) – Desinfektiom (High)
  • Atlas, — Atlas-. (Med.) – Mais, (High)
—DIV
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Latin language - OCR-result not satisfying

Post by Will - Tracker Supp »

Hi DIV,

OCR results should be drastically improved for the new OCR.

Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Post Reply