OCR got worse in PDF-XChange Editor 6.0.317

ChrisZ16 · Post by **ChrisZ16** » Sun Apr 03, 2016 12:48 pm

I use OCR on scanned articles from journals (pdf) in german language.

OCR was working almost error-free in previous versions of PDF-XChange Editor. In 6.0.317 (licensed version) it is really bad, errors factor is 5 to 20 in comparison.

My settings:
Language: German
Accuracy: Medium (High gives even slightly worse results)
Output Type: Create New Searchable PDF
Quality: 600 or 300 (tried both)

First problem: German Umlauts (äöüß) are never recognized now, in previous versions it worked. This and the fact, that changing the language in OCR between "German" or "English" or "German,English" gives exactly the same result, seems to indicate that German OCR is not working?

In c:\programme\Tracker Software\PDF Editor\PluginsData\OCRLanguages the eng_pxvocr.dat is 21.364 KB and the deu_pxvocr.dat is only 2.381 KB, but this was the same in 5.5.316.0, which was working ok.

Second problem: Also for normal (not language-specific) characters the recognition is worse, confusing c/e, t/i, i/I, O/D, o/0 etc. Here the error rate was almost zero before, now there are a lot.

I can upload examples if needed.

Best regards

josch · Post by **josch** » Mon Apr 04, 2016 10:21 am

I can conform that OCR for German documents is much worse now!

Mon Apr 04, 2016 10:19 pm

Hi guys,

Thanks for the posts - I've passed this along to our OCR dev. and am waiting to here back. I'll update you as soon as possible.

Thanks,

Post by **Will - Tracker Supp** » Tue Apr 05, 2016 2:00 pm

Hi all,

Should have asked previously: can we get a sample document?

Thanks,

ChrisZ16 · Post by **ChrisZ16** » Tue Apr 05, 2016 5:43 pm

Will,

Here are two examples.

5.5.316.0 only had problems with superscripted footnote-numbers (which is acceptable) and §-signs.

6.0.317 understands §-signs, but does not recognize any Umlauts.

Example-01.zip: (645.92 KiB) Downloaded 165 times

Example-02.zip: (253.68 KiB) Downloaded 157 times

Post by **Will - Tracker Supp** » Tue Apr 05, 2016 7:19 pm

Hi Chris,

Thanks for those - I've passed them along to the Dev. Team.

Cheers,

caroll consulting · Post by **caroll consulting** » Sun Apr 10, 2016 8:24 pm

I trying to ocr swedish characters but I get 0% success rate. Absolutly no of ÅÄÖ characters are identified.

Tue Apr 12, 2016 10:59 am

Hi Caroll,

Thanks - this is with our development team and we will respond in due course, thanks for your patience.

Tue Apr 12, 2016 11:01 am

I can confirm that an issue has been located and this will be corrected in a service release in the coming days.

ChrisZ16 · Post by **ChrisZ16** » Wed Apr 20, 2016 11:46 pm

In 6.0.317.1 the OCR is back to how it was up to 5.5.316 (at least in German language). As far as I can see, the errors from 6.0.317.0 are gone and results are exactly the same as in 5.5.316. So it is working good again.

But it also lost the ability to recognize §-signs, which only 6.0.317.0 had.

Is it not possible to get that also in?

Cheers
Chris

Post by **John - Tracker Supp** » Thu Apr 21, 2016 6:09 am

Hi Chris,

I have passed to the team member responsible and asked him to investigate - will advise once we have his response.

Thanks for your patience.

anybodym · Post by **anybodym** » Thu May 05, 2016 11:15 am

Thank you for fixing this.
Just a few days ago noticed the weird German Umlaut OCR problem at a customer of mine who uses PDF Xchange Editor.
I've now tried it with the new 6.0.317.1 on my PC and Umlauts are now recognized again.

Also: could you please improve OCR speed by a factor of four? Since I was already at it, I've done some tests:
I scanned 24 pages (text, but also some tables), it took more than 7 minutes with "English, German" and "Accuracy High" for the OCR to complete.
Meanwhile, my CPU usage from PDFXEdit.exe was at only 12-13% (it's a 4-Core + Hyperthreading CPU, so 12,5% would be one core).
Why doesn't it run the OCR for the pages in parallel? It could easily be 4 times faster. I don't think the pages depend on each other for OCR'ing!
And OCR'ing could start even earlier! Why doesn't OCR already run in the background for the pages already scanned while the scanner is still busy scanning the remaining pages? No need to make the user wait more than necessary!

Thanks for listening (and fixing the bug)!

Thu May 05, 2016 11:18 am

Hello anybodym,

Glad to hear the umlauts issue is sorted for you!
As for the OCR speed, and using multiple cores (if available) - we are considering this but as this is a rather complex task - we will need some more time to achieve it.

Regards,
Stefan

OCR got worse in PDF-XChange Editor 6.0.317

OCR got worse in PDF-XChange Editor 6.0.317

Re: OCR got worse in PDF-XChange Editor 6.0.317

Re: OCR got worse in PDF-XChange Editor 6.0.317

Re: OCR got worse in PDF-XChange Editor 6.0.317

Re: OCR got worse in PDF-XChange Editor 6.0.317

Re: OCR got worse in PDF-XChange Editor 6.0.317

Re: OCR got worse in PDF-XChange Editor 6.0.317

Re: OCR got worse in PDF-XChange Editor 6.0.317

Re: OCR got worse in PDF-XChange Editor 6.0.317

Re: OCR got worse in PDF-XChange Editor 6.0.317

Re: OCR got worse in PDF-XChange Editor 6.0.317

Re: OCR got worse in PDF-XChange Editor 6.0.317

Re: OCR got worse in PDF-XChange Editor 6.0.317