Page 1 of 1

Latin language - OCR-result not satisfying

Posted: Tue May 16, 2017 11:26 am
by jürgen somorjai
Hallo,
Now I'm using the newest version of PDF ExChange Editor plus. I was very happy to see that there are a lot of new languages by OCR. I often use and need the Latin laguage, but I was disappointed when I scanned a Latin page. After using OCR I copied one line after another into WORD, but the results were not satisfying. Is it possible to train that language to get better results? Or other ideas I can try??

Re: Latin language - OCR-result not satisfying

Posted: Thu May 18, 2017 8:33 am
by Will - Tracker Supp
Hi Jürgen,

As per my response to your other post, there is no way to train the OCR. Can you please send a sample file and send a screen-shot of your OCR settings?

Thanks,

Re: Latin language - OCR-result not satisfying

Posted: Mon May 29, 2017 9:59 am
by jürgen somorjai
Hallo,
Thank you for you interest.
Here are the two examples: settings and sample.
:-)
I would like to hear from you.
Thanks
PDF-OCR-1.jpg
PDF-OCR_Beispielseite_Latein.7z
(172.67 KiB) Downloaded 250 times

Re: Latin language - OCR-result not satisfying

Posted: Mon May 29, 2017 10:08 am
by Will - Tracker Supp
Hi Jürgen,

Beautiful, thanks for that! Please try using Medium accuracy instead, as there is an issue with High accuracy that makes it worse than medium. I believe that the issue is with Google's Tesseract libraries (which we use), so isn't something that we can fix.

Thanks,

Re: Latin language - OCR-result not satisfying

Posted: Fri Jun 02, 2017 12:15 am
by jürgen somorjai
Ok., thank you. I changed and I'm going to notice it.

Re: Latin language - OCR-result not satisfying

Posted: Fri Jun 02, 2017 7:11 am
by Will - Tracker Supp
:D

Re: Latin language - OCR-result not satisfying

Posted: Thu Aug 17, 2017 9:17 pm
by Timur Born
I noticed the sometimes worse performance of "High" before. Maybe it should either not be offered until fixed or a warning should be issued upon selecting it?

Re: Latin language - OCR-result not satisfying

Posted: Mon Aug 21, 2017 9:51 am
by Will - Tracker Supp
Hi Timur,

We're actually in the process of re-writing the OCR, so it should be much better and, I believe, this should be one of the concerns addressed.

Thanks,

Re: Latin language - OCR-result not satisfying

Posted: Mon Aug 21, 2017 2:52 pm
by Timur Born
Yes, you mentioned that in January. :P No need to hurry, once I noticed the differences between Medium and High I knew how to work with these. Adobe Acrobat really is on the forefront of OCR nowadays, even beating dedicated OCR applications. But you sure pay a steep price for that (money that is).

Re: Latin language - OCR-result not satisfying

Posted: Mon Aug 21, 2017 3:17 pm
by Will - Tracker Supp
Ah, sorry! It's sometimes hard to keep track of what has/hasn't been said due to the number of people I deal with daily :oops:

Re: Latin language - OCR-result not satisfying

Posted: Sun Aug 27, 2017 1:02 pm
by DIV
Hi, Timur & co.

I have done a little test on some German text.
Based on this: Medium Accuracy had some mistakes; High Accuracy fixed some of those mistakes, but added new mistakes.

It seems the algorithm must be making a trade-off in each case between what it 'sees', and what it considers is likely to appear — in particular, what is contained in a dictionary.
It is notable that whereas occasionally just one or two letters will be misinterpreted, there are also several cases where one word has been replaced by a totally different word.
Examples:
  • Desinfektion, — Destination, (Med.) – Desinfektiom (High)
  • Atlas, — Atlas-. (Med.) – Mais, (High)
—DIV

Re: Latin language - OCR-result not satisfying

Posted: Mon Aug 28, 2017 7:14 am
by Will - Tracker Supp
Hi DIV,

OCR results should be drastically improved for the new OCR.

Thanks,