OCR got worse in PDF-XChange Editor 6.0.317

Forum for the PDF-XChange Editor - Free and Licensed Versions

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
ChrisZ16
User
Posts: 28
Joined: Sun Apr 06, 2014 11:29 pm

OCR got worse in PDF-XChange Editor 6.0.317

Post by ChrisZ16 »

I use OCR on scanned articles from journals (pdf) in german language.

OCR was working almost error-free in previous versions of PDF-XChange Editor. In 6.0.317 (licensed version) it is really bad, errors factor is 5 to 20 in comparison.

My settings:
Language: German
Accuracy: Medium (High gives even slightly worse results)
Output Type: Create New Searchable PDF
Quality: 600 or 300 (tried both)

First problem: German Umlauts (äöüß) are never recognized now, in previous versions it worked. This and the fact, that changing the language in OCR between "German" or "English" or "German,English" gives exactly the same result, seems to indicate that German OCR is not working?

In c:\programme\Tracker Software\PDF Editor\PluginsData\OCRLanguages the eng_pxvocr.dat is 21.364 KB and the deu_pxvocr.dat is only 2.381 KB, but this was the same in 5.5.316.0, which was working ok.

Second problem: Also for normal (not language-specific) characters the recognition is worse, confusing c/e, t/i, i/I, O/D, o/0 etc. Here the error rate was almost zero before, now there are a lot.

I can upload examples if needed.

Best regards
josch
User
Posts: 7
Joined: Mon Sep 21, 2009 11:04 am

Re: OCR got worse in PDF-XChange Editor 6.0.317

Post by josch »

I can conform that OCR for German documents is much worse now!
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: OCR got worse in PDF-XChange Editor 6.0.317

Post by Will - Tracker Supp »

Hi guys,

Thanks for the posts - I've passed this along to our OCR dev. and am waiting to here back. I'll update you as soon as possible.

Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: OCR got worse in PDF-XChange Editor 6.0.317

Post by Will - Tracker Supp »

Hi all,

Should have asked previously: can we get a sample document?

Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
ChrisZ16
User
Posts: 28
Joined: Sun Apr 06, 2014 11:29 pm

Re: OCR got worse in PDF-XChange Editor 6.0.317

Post by ChrisZ16 »

Will,

Here are two examples.

5.5.316.0 only had problems with superscripted footnote-numbers (which is acceptable) and §-signs.

6.0.317 understands §-signs, but does not recognize any Umlauts.
Example-01.zip
(645.92 KiB) Downloaded 164 times
Example-02.zip
(253.68 KiB) Downloaded 156 times
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: OCR got worse in PDF-XChange Editor 6.0.317

Post by Will - Tracker Supp »

Hi Chris,

Thanks for those - I've passed them along to the Dev. Team.

Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
caroll consulting
User
Posts: 1
Joined: Sun May 13, 2012 5:21 pm

Re: OCR got worse in PDF-XChange Editor 6.0.317

Post by caroll consulting »

I trying to ocr swedish characters but I get 0% success rate. Absolutly no of ÅÄÖ characters are identified.
User avatar
John - Tracker Supp
Site Admin
Posts: 5219
Joined: Tue Jun 29, 2004 10:34 am
Location: United Kingdom
Contact:

Re: OCR got worse in PDF-XChange Editor 6.0.317

Post by John - Tracker Supp »

Hi Caroll,

Thanks - this is with our development team and we will respond in due course, thanks for your patience.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
User avatar
John - Tracker Supp
Site Admin
Posts: 5219
Joined: Tue Jun 29, 2004 10:34 am
Location: United Kingdom
Contact:

Re: OCR got worse in PDF-XChange Editor 6.0.317

Post by John - Tracker Supp »

I can confirm that an issue has been located and this will be corrected in a service release in the coming days.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
ChrisZ16
User
Posts: 28
Joined: Sun Apr 06, 2014 11:29 pm

Re: OCR got worse in PDF-XChange Editor 6.0.317

Post by ChrisZ16 »

In 6.0.317.1 the OCR is back to how it was up to 5.5.316 (at least in German language). As far as I can see, the errors from 6.0.317.0 are gone and results are exactly the same as in 5.5.316. So it is working good again.

But it also lost the ability to recognize §-signs, which only 6.0.317.0 had.

Is it not possible to get that also in?

Cheers
Chris
User avatar
John - Tracker Supp
Site Admin
Posts: 5219
Joined: Tue Jun 29, 2004 10:34 am
Location: United Kingdom
Contact:

Re: OCR got worse in PDF-XChange Editor 6.0.317

Post by John - Tracker Supp »

Hi Chris,

I have passed to the team member responsible and asked him to investigate - will advise once we have his response.

Thanks for your patience.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
anybodym
User
Posts: 5
Joined: Thu Mar 20, 2014 2:17 pm

Re: OCR got worse in PDF-XChange Editor 6.0.317

Post by anybodym »

Thank you for fixing this.
Just a few days ago noticed the weird German Umlaut OCR problem at a customer of mine who uses PDF Xchange Editor.
I've now tried it with the new 6.0.317.1 on my PC and Umlauts are now recognized again.

Also: could you please improve OCR speed by a factor of four? Since I was already at it, I've done some tests:
I scanned 24 pages (text, but also some tables), it took more than 7 minutes with "English, German" and "Accuracy High" for the OCR to complete.
Meanwhile, my CPU usage from PDFXEdit.exe was at only 12-13% (it's a 4-Core + Hyperthreading CPU, so 12,5% would be one core).
Why doesn't it run the OCR for the pages in parallel? It could easily be 4 times faster. I don't think the pages depend on each other for OCR'ing!
And OCR'ing could start even earlier! Why doesn't OCR already run in the background for the pages already scanned while the scanner is still busy scanning the remaining pages? No need to make the user wait more than necessary!

Thanks for listening (and fixing the bug)!
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17910
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: OCR got worse in PDF-XChange Editor 6.0.317

Post by Tracker Supp-Stefan »

Hello anybodym,

Glad to hear the umlauts issue is sorted for you!
As for the OCR speed, and using multiple cores (if available) - we are considering this but as this is a rather complex task - we will need some more time to achieve it.

Regards,
Stefan
Post Reply