OCR got worse in PDF-XChange Editor 6.0.317
Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan
OCR got worse in PDF-XChange Editor 6.0.317
I use OCR on scanned articles from journals (pdf) in german language.
OCR was working almost error-free in previous versions of PDF-XChange Editor. In 6.0.317 (licensed version) it is really bad, errors factor is 5 to 20 in comparison.
My settings:
Language: German
Accuracy: Medium (High gives even slightly worse results)
Output Type: Create New Searchable PDF
Quality: 600 or 300 (tried both)
First problem: German Umlauts (äöüß) are never recognized now, in previous versions it worked. This and the fact, that changing the language in OCR between "German" or "English" or "German,English" gives exactly the same result, seems to indicate that German OCR is not working?
In c:\programme\Tracker Software\PDF Editor\PluginsData\OCRLanguages the eng_pxvocr.dat is 21.364 KB and the deu_pxvocr.dat is only 2.381 KB, but this was the same in 5.5.316.0, which was working ok.
Second problem: Also for normal (not language-specific) characters the recognition is worse, confusing c/e, t/i, i/I, O/D, o/0 etc. Here the error rate was almost zero before, now there are a lot.
I can upload examples if needed.
Best regards
OCR was working almost error-free in previous versions of PDF-XChange Editor. In 6.0.317 (licensed version) it is really bad, errors factor is 5 to 20 in comparison.
My settings:
Language: German
Accuracy: Medium (High gives even slightly worse results)
Output Type: Create New Searchable PDF
Quality: 600 or 300 (tried both)
First problem: German Umlauts (äöüß) are never recognized now, in previous versions it worked. This and the fact, that changing the language in OCR between "German" or "English" or "German,English" gives exactly the same result, seems to indicate that German OCR is not working?
In c:\programme\Tracker Software\PDF Editor\PluginsData\OCRLanguages the eng_pxvocr.dat is 21.364 KB and the deu_pxvocr.dat is only 2.381 KB, but this was the same in 5.5.316.0, which was working ok.
Second problem: Also for normal (not language-specific) characters the recognition is worse, confusing c/e, t/i, i/I, O/D, o/0 etc. Here the error rate was almost zero before, now there are a lot.
I can upload examples if needed.
Best regards
Re: OCR got worse in PDF-XChange Editor 6.0.317
I can conform that OCR for German documents is much worse now!
- Will - Tracker Supp
- Site Admin
- Posts: 6815
- Joined: Mon Oct 15, 2012 9:21 pm
- Location: London, UK
- Contact:
Re: OCR got worse in PDF-XChange Editor 6.0.317
Hi guys,
Thanks for the posts - I've passed this along to our OCR dev. and am waiting to here back. I'll update you as soon as possible.
Thanks,
Thanks for the posts - I've passed this along to our OCR dev. and am waiting to here back. I'll update you as soon as possible.
Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
- Will - Tracker Supp
- Site Admin
- Posts: 6815
- Joined: Mon Oct 15, 2012 9:21 pm
- Location: London, UK
- Contact:
Re: OCR got worse in PDF-XChange Editor 6.0.317
Hi all,
Should have asked previously: can we get a sample document?
Thanks,
Should have asked previously: can we get a sample document?
Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Re: OCR got worse in PDF-XChange Editor 6.0.317
Will,
Here are two examples.
5.5.316.0 only had problems with superscripted footnote-numbers (which is acceptable) and §-signs.
6.0.317 understands §-signs, but does not recognize any Umlauts.
Here are two examples.
5.5.316.0 only had problems with superscripted footnote-numbers (which is acceptable) and §-signs.
6.0.317 understands §-signs, but does not recognize any Umlauts.
- Will - Tracker Supp
- Site Admin
- Posts: 6815
- Joined: Mon Oct 15, 2012 9:21 pm
- Location: London, UK
- Contact:
Re: OCR got worse in PDF-XChange Editor 6.0.317
Hi Chris,
Thanks for those - I've passed them along to the Dev. Team.
Cheers,
Thanks for those - I've passed them along to the Dev. Team.
Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
-
- User
- Posts: 1
- Joined: Sun May 13, 2012 5:21 pm
Re: OCR got worse in PDF-XChange Editor 6.0.317
I trying to ocr swedish characters but I get 0% success rate. Absolutly no of ÅÄÖ characters are identified.
- John - Tracker Supp
- Site Admin
- Posts: 5219
- Joined: Tue Jun 29, 2004 10:34 am
- Location: United Kingdom
- Contact:
Re: OCR got worse in PDF-XChange Editor 6.0.317
Hi Caroll,
Thanks - this is with our development team and we will respond in due course, thanks for your patience.
Thanks - this is with our development team and we will respond in due course, thanks for your patience.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.
Best regards
Tracker Support
http://www.tracker-software.com
Best regards
Tracker Support
http://www.tracker-software.com
- John - Tracker Supp
- Site Admin
- Posts: 5219
- Joined: Tue Jun 29, 2004 10:34 am
- Location: United Kingdom
- Contact:
Re: OCR got worse in PDF-XChange Editor 6.0.317
I can confirm that an issue has been located and this will be corrected in a service release in the coming days.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.
Best regards
Tracker Support
http://www.tracker-software.com
Best regards
Tracker Support
http://www.tracker-software.com
Re: OCR got worse in PDF-XChange Editor 6.0.317
In 6.0.317.1 the OCR is back to how it was up to 5.5.316 (at least in German language). As far as I can see, the errors from 6.0.317.0 are gone and results are exactly the same as in 5.5.316. So it is working good again.
But it also lost the ability to recognize §-signs, which only 6.0.317.0 had.
Is it not possible to get that also in?
Cheers
Chris
But it also lost the ability to recognize §-signs, which only 6.0.317.0 had.
Is it not possible to get that also in?
Cheers
Chris
- John - Tracker Supp
- Site Admin
- Posts: 5219
- Joined: Tue Jun 29, 2004 10:34 am
- Location: United Kingdom
- Contact:
Re: OCR got worse in PDF-XChange Editor 6.0.317
Hi Chris,
I have passed to the team member responsible and asked him to investigate - will advise once we have his response.
Thanks for your patience.
I have passed to the team member responsible and asked him to investigate - will advise once we have his response.
Thanks for your patience.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.
Best regards
Tracker Support
http://www.tracker-software.com
Best regards
Tracker Support
http://www.tracker-software.com
Re: OCR got worse in PDF-XChange Editor 6.0.317
Thank you for fixing this.
Just a few days ago noticed the weird German Umlaut OCR problem at a customer of mine who uses PDF Xchange Editor.
I've now tried it with the new 6.0.317.1 on my PC and Umlauts are now recognized again.
Also: could you please improve OCR speed by a factor of four? Since I was already at it, I've done some tests:
I scanned 24 pages (text, but also some tables), it took more than 7 minutes with "English, German" and "Accuracy High" for the OCR to complete.
Meanwhile, my CPU usage from PDFXEdit.exe was at only 12-13% (it's a 4-Core + Hyperthreading CPU, so 12,5% would be one core).
Why doesn't it run the OCR for the pages in parallel? It could easily be 4 times faster. I don't think the pages depend on each other for OCR'ing!
And OCR'ing could start even earlier! Why doesn't OCR already run in the background for the pages already scanned while the scanner is still busy scanning the remaining pages? No need to make the user wait more than necessary!
Thanks for listening (and fixing the bug)!
Just a few days ago noticed the weird German Umlaut OCR problem at a customer of mine who uses PDF Xchange Editor.
I've now tried it with the new 6.0.317.1 on my PC and Umlauts are now recognized again.
Also: could you please improve OCR speed by a factor of four? Since I was already at it, I've done some tests:
I scanned 24 pages (text, but also some tables), it took more than 7 minutes with "English, German" and "Accuracy High" for the OCR to complete.
Meanwhile, my CPU usage from PDFXEdit.exe was at only 12-13% (it's a 4-Core + Hyperthreading CPU, so 12,5% would be one core).
Why doesn't it run the OCR for the pages in parallel? It could easily be 4 times faster. I don't think the pages depend on each other for OCR'ing!
And OCR'ing could start even earlier! Why doesn't OCR already run in the background for the pages already scanned while the scanner is still busy scanning the remaining pages? No need to make the user wait more than necessary!
Thanks for listening (and fixing the bug)!
- Tracker Supp-Stefan
- Site Admin
- Posts: 17929
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
- Contact:
Re: OCR got worse in PDF-XChange Editor 6.0.317
Hello anybodym,
Glad to hear the umlauts issue is sorted for you!
As for the OCR speed, and using multiple cores (if available) - we are considering this but as this is a rather complex task - we will need some more time to achieve it.
Regards,
Stefan
Glad to hear the umlauts issue is sorted for you!
As for the OCR speed, and using multiple cores (if available) - we are considering this but as this is a rather complex task - we will need some more time to achieve it.
Regards,
Stefan