PDF XChange Forum

Posted: **Fri Feb 24, 2012 6:54 pm**

In a few documents the OCR feature seems to convert the "-" character to ?.

For example, RFP-S-0129-0-2011/EM - Disaster Debris becomes RFP?S?0129?0?2011/EM ? Disaster Debris.

Has anyone else noticed this?

Posted: **Fri Feb 24, 2012 7:00 pm**

Could you send us a sample file (input)? OCR is always going to make mistakes from time to time, but diagnosing why (and whether or not we can come up with a solution) will require some insight into the file causing it.

If you don't want to attach it to this post, you can email it to support@pdf-xchange.com.

-Walter

Posted: **Tue Feb 28, 2012 7:01 pm**

Hi Arnold, I followed up with you by email already, but am posting this to the forum to ensure completeness.

The documents you originally sent contained bad searchable text already. I’m not sure if this came from an older version of our viewer, or something else (e.g. a scanner’s OCR software), but if I remove this bad text, I find OCR works fine with our current viewer build (version 201). OCR also works fine with the fresh scan you just sent. In both cases I get the correct text, with hyphens as expected.

The settings I used were:
- English
- Accuracy: Medium
- Mode: Preserve original content and add text as layer

Therefore I would recommend upgrading to build 201 and trying again from a clean document. If you use “Preserve original content and add text as layer” with the documents you sent originally (that already have OCR text in them), that bad OCR text layer is retained, giving the impression that OCR failed. If you use “Convert page content to image only – add text as layer” then the original bad text is removed and you can see that OCR will work as expected.

But please do make sure you are using build 201 of the viewer to rule out bugs in previous versions that have already been fixed.

-Walter

PDF XChange Forum

Problem with - character

Problem with - character

Re: Problem with - character

Re: Problem with - character