Latin language - OCR-result not satisfying

Discussion for the End User use uf OCR in PDF-XChange Editor and Viewer

Moderators: Tracker Support, TrackerSupp-Daniel, Paul - Tracker Supp, Chris - Tracker Supp, Vasyl-Tracker Dev Team, Sean - Tracker, Tracker Supp-Stefan, Ivan - Tracker Software

Post Reply
jürgen somorjai
User
Posts: 18
Joined: Sun Feb 14, 2016 3:09 pm

Latin language - OCR-result not satisfying

Post by jürgen somorjai » Tue May 16, 2017 11:26 am

Hallo,
Now I'm using the newest version of PDF ExChange Editor plus. I was very happy to see that there are a lot of new languages by OCR. I often use and need the Latin laguage, but I was disappointed when I scanned a Latin page. After using OCR I copied one line after another into WORD, but the results were not satisfying. Is it possible to train that language to get better results? Or other ideas I can try??

User avatar
Will - Tracker Supp
Site Admin
Posts: 6724
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Latin language - OCR-result not satisfying

Post by Will - Tracker Supp » Thu May 18, 2017 8:33 am

Hi Jürgen,

As per my response to your other post, there is no way to train the OCR. Can you please send a sample file and send a screen-shot of your OCR settings?

Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com

jürgen somorjai
User
Posts: 18
Joined: Sun Feb 14, 2016 3:09 pm

Re: Latin language - OCR-result not satisfying

Post by jürgen somorjai » Mon May 29, 2017 9:59 am

Hallo,
Thank you for you interest.
Here are the two examples: settings and sample.
:-)
I would like to hear from you.
Thanks
PDF-OCR-1.jpg
PDF-OCR_Beispielseite_Latein.7z
(172.67 KiB) Downloaded 91 times
Attachments
PDF-OCR_Beispielseite_Latein.pdf
(176.29 KiB) Downloaded 95 times

User avatar
Will - Tracker Supp
Site Admin
Posts: 6724
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Latin language - OCR-result not satisfying

Post by Will - Tracker Supp » Mon May 29, 2017 10:08 am

Hi Jürgen,

Beautiful, thanks for that! Please try using Medium accuracy instead, as there is an issue with High accuracy that makes it worse than medium. I believe that the issue is with Google's Tesseract libraries (which we use), so isn't something that we can fix.

Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com

jürgen somorjai
User
Posts: 18
Joined: Sun Feb 14, 2016 3:09 pm

Re: Latin language - OCR-result not satisfying

Post by jürgen somorjai » Fri Jun 02, 2017 12:15 am

Ok., thank you. I changed and I'm going to notice it.

User avatar
Will - Tracker Supp
Site Admin
Posts: 6724
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Latin language - OCR-result not satisfying

Post by Will - Tracker Supp » Fri Jun 02, 2017 7:11 am

:D
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com

Timur Born
User
Posts: 581
Joined: Tue Jun 26, 2012 1:50 pm

Re: Latin language - OCR-result not satisfying

Post by Timur Born » Thu Aug 17, 2017 9:17 pm

I noticed the sometimes worse performance of "High" before. Maybe it should either not be offered until fixed or a warning should be issued upon selecting it?

User avatar
Will - Tracker Supp
Site Admin
Posts: 6724
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Latin language - OCR-result not satisfying

Post by Will - Tracker Supp » Mon Aug 21, 2017 9:51 am

Hi Timur,

We're actually in the process of re-writing the OCR, so it should be much better and, I believe, this should be one of the concerns addressed.

Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com

Timur Born
User
Posts: 581
Joined: Tue Jun 26, 2012 1:50 pm

Re: Latin language - OCR-result not satisfying

Post by Timur Born » Mon Aug 21, 2017 2:52 pm

Yes, you mentioned that in January. :P No need to hurry, once I noticed the differences between Medium and High I knew how to work with these. Adobe Acrobat really is on the forefront of OCR nowadays, even beating dedicated OCR applications. But you sure pay a steep price for that (money that is).

User avatar
Will - Tracker Supp
Site Admin
Posts: 6724
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Latin language - OCR-result not satisfying

Post by Will - Tracker Supp » Mon Aug 21, 2017 3:17 pm

Ah, sorry! It's sometimes hard to keep track of what has/hasn't been said due to the number of people I deal with daily :oops:
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com

DIV
User
Posts: 62
Joined: Fri Jun 23, 2017 1:47 am

Re: Latin language - OCR-result not satisfying

Post by DIV » Sun Aug 27, 2017 1:02 pm

Hi, Timur & co.

I have done a little test on some German text.
Based on this: Medium Accuracy had some mistakes; High Accuracy fixed some of those mistakes, but added new mistakes.

It seems the algorithm must be making a trade-off in each case between what it 'sees', and what it considers is likely to appear — in particular, what is contained in a dictionary.
It is notable that whereas occasionally just one or two letters will be misinterpreted, there are also several cases where one word has been replaced by a totally different word.
Examples:
  • Desinfektion, — Destination, (Med.) – Desinfektiom (High)
  • Atlas, — Atlas-. (Med.) – Mais, (High)
—DIV

User avatar
Will - Tracker Supp
Site Admin
Posts: 6724
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Latin language - OCR-result not satisfying

Post by Will - Tracker Supp » Mon Aug 28, 2017 7:14 am

Hi DIV,

OCR results should be drastically improved for the new OCR.

Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com

Post Reply