Page 1 of 1
Latin language - OCR-result not satisfying
Posted: Tue May 16, 2017 11:26 am
by jürgen somorjai
Hallo,
Now I'm using the newest version of PDF ExChange Editor plus. I was very happy to see that there are a lot of new languages by OCR. I often use and need the Latin laguage, but I was disappointed when I scanned a Latin page. After using OCR I copied one line after another into WORD, but the results were not satisfying. Is it possible to train that language to get better results? Or other ideas I can try??
Re: Latin language - OCR-result not satisfying
Posted: Thu May 18, 2017 8:33 am
by Will - Tracker Supp
Hi Jürgen,
As per my response to your other post, there is no way to train the OCR. Can you please send a sample file and send a screen-shot of your OCR settings?
Thanks,
Re: Latin language - OCR-result not satisfying
Posted: Mon May 29, 2017 9:59 am
by jürgen somorjai
Hallo,
Thank you for you interest.
Here are the two examples: settings and sample.
I would like to hear from you.
Thanks
Re: Latin language - OCR-result not satisfying
Posted: Mon May 29, 2017 10:08 am
by Will - Tracker Supp
Hi Jürgen,
Beautiful, thanks for that! Please try using Medium accuracy instead, as there is an issue with High accuracy that makes it worse than medium. I believe that the issue is with Google's Tesseract libraries (which we use), so isn't something that we can fix.
Thanks,
Re: Latin language - OCR-result not satisfying
Posted: Fri Jun 02, 2017 12:15 am
by jürgen somorjai
Ok., thank you. I changed and I'm going to notice it.
Re: Latin language - OCR-result not satisfying
Posted: Fri Jun 02, 2017 7:11 am
by Will - Tracker Supp
Re: Latin language - OCR-result not satisfying
Posted: Thu Aug 17, 2017 9:17 pm
by Timur Born
I noticed the sometimes worse performance of "High" before. Maybe it should either not be offered until fixed or a warning should be issued upon selecting it?
Re: Latin language - OCR-result not satisfying
Posted: Mon Aug 21, 2017 9:51 am
by Will - Tracker Supp
Hi Timur,
We're actually in the process of re-writing the OCR, so it should be much better and, I believe, this should be one of the concerns addressed.
Thanks,
Re: Latin language - OCR-result not satisfying
Posted: Mon Aug 21, 2017 2:52 pm
by Timur Born
Yes, you mentioned that in January.
No need to hurry, once I noticed the differences between Medium and High I knew how to work with these. Adobe Acrobat really is on the forefront of OCR nowadays, even beating dedicated OCR applications. But you sure pay a steep price for that (money that is).
Re: Latin language - OCR-result not satisfying
Posted: Mon Aug 21, 2017 3:17 pm
by Will - Tracker Supp
Ah, sorry! It's sometimes hard to keep track of what has/hasn't been said due to the number of people I deal with daily
Re: Latin language - OCR-result not satisfying
Posted: Sun Aug 27, 2017 1:02 pm
by DIV
Hi, Timur & co.
I have done
a little test on some German text.
Based on this: Medium Accuracy had some mistakes; High Accuracy fixed some of those mistakes, but added new mistakes.
It seems the algorithm must be making a trade-off in each case between what it 'sees', and what it considers is likely to appear — in particular, what is contained in a dictionary.
It is notable that whereas occasionally just one or two letters will be misinterpreted, there are also several cases where one word has been replaced by a totally different word.
Examples:
- Desinfektion, — Destination, (Med.) – Desinfektiom (High)
- Atlas, — Atlas-. (Med.) – Mais, (High)
—DIV
Re: Latin language - OCR-result not satisfying
Posted: Mon Aug 28, 2017 7:14 am
by Will - Tracker Supp
Hi DIV,
OCR results should be drastically improved for the new OCR.
Thanks,