Page 1 of 1

OCR for german language doesn't work PDF-XChange Editor V8

Posted: Tue May 14, 2019 7:52 am
by Biber7
Many doesn´'t work; see original and OCR Attachments, for example
€/m²
,
.
Tested in many other documents.
In other documents OCR makes X to ;(
If OCR makes X to negativ smily ;( our customers misunderstands something. So we must read every letter for example 100 Pages before sending to other documents (invoicing).

Re: OCR for german language doesn't work PDF-XChange Editor V8

Posted: Tue May 14, 2019 7:23 pm
by TrackerSupp-Daniel
Hello Biber7,

Thank you for the report, I see the issues with €/m², and other similar issues, I was not however able to reproduce the issue with X on my end. Might I ask for an example where that is seen as well so that I can report all these issues together?

Beyond that, I notice that the original scanned image is around 200dpi. Note that OCR operates best in the 300-600 dpi range, so if possible you may wish to try rescanning this document at a higher DPI setting, and see if that helps with the OCR output.

Kind regards,

Re: OCR for german language doesn't work PDF-XChange Editor V8

Posted: Wed May 15, 2019 5:55 am
by Biber7
Hello,

see original, please, Nr. 03.1 X
OCR )(

original 03.4 X
OCR ;(
So we thought, custumer won´'t this 2 Numbers. No: customer want! So we don't have to use OCR.

Thank You for professionial support since many more years. We bought Your Software PDF-XChange Editor V8 for 2000 users. We love it, if it don't falsified documents. If 2 oder 3 letters are wrong to OCR, it doesn't matter. But don't distort documents so we don't see what customers mark with crosses.

Yes, I know dpi. We get about 4000 requests daily. We can not show all our customer's secretaries how to change the scanner resolution. And even if we did, they would not do it.

Bigger 400 dpi are disabled in all our 2000 workstations. We can not scan for example 100 pages/document at 600 dpi.

Earlier versions of your software never converted crosses or letters, just superimposed the OCR-recognized text OVER the documents. We can't find this feature: don't touch the original document; only put the OCR over the original. In earlier versions we could mouse, copy and paste.

Kind regards

Re: OCR for german language doesn't work PDF-XChange Editor V8

Posted: Wed May 15, 2019 6:24 pm
by TrackerSupp-Daniel
Hello Biber7,

Thank you for the files, I see this now and Will be putting all the information into a bug report shortly. I cannot guarantee that we can resolve this one, as it pertains to drawn letters, which is not something that OCR is intended to work with in the first place, but hopefully we can find a way to ignore these cases at the least.

Regarding the old OCR. It is still present and you can enable it from the preferences (Ctrl+K) under the OCR tab. Simply change the setting from Enhanced to Default, and you will be able to go back to the old OCR engine.
image.png
Alternatively, if you wish to take advantage of the increased speed and accuracy of the new engine, you simply need to change the setting in the OCR dialog to "searchable image"
image.png
I hope this helps!

Re: OCR for german language doesn't work PDF-XChange Editor V8

Posted: Mon May 27, 2019 12:53 pm
by Biber7
(2 attachments with 7 mistakes in 1 contact maximum dpi = 400 in our network.)

We will try. --- We tried: old OCR is working reliably. Thanks so much.

Re: OCR for german language doesn't work PDF-XChange Editor V8

Posted: Mon May 27, 2019 4:46 pm
by TrackerSupp-Daniel
Hello Biber7,

Thank you for the sample files, OCR does innately have issues with minimalist fonts like this one, it seems that choosing "High accuracy" mode helps with this somewhat, but some items, such as the bold lowercase letter "a", still have issues with this. The dev team is aware of these items and working on improving them, but in that specific case it will likely be a long time coming.

It is always advised to review the document after an OCR operation to ensure that no overly erroneous mistakes were made.

I am glad to hear that the old OCR engine is working better for you. In my test, it had some of the same issues that the new one does, such as the letter "a" as above, So you may still wish to double check the results there. This article will help you convert documents that were OCR'd using the old method into an editable format: https://www.pdf-xchange.com/knowle ... -performed

Kind regards,