question(ocr): black text on noise background

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
SashaChernykh
User
Posts: 11
Joined: Tue Aug 20, 2019 8:01 am

question(ocr): black text on noise background

Post by SashaChernykh »

1. Summary

PDF-XChange Editor doesn't add OCR for black text on noise background.

2. Data
  • KiraSuperhero.pdf — B&W page of scanned Russian book for which I want add OCR layer
Image

The human eye recognizes the symbols in the bottom block of text.

3. Steps to reproduce

I opened KiraSuperhero.pdf in PDF-XChange Editor → I add OCR layer, with Russian language → I save a file with OCR.

4. Actual behavior

PDF-XChange Editor doesn't recognize letters in the lower block:

Image
5. Expected behavior
  1. Make recognition possible black letters in noise background as in my case.
  2. Or tell me, how I can edit this PDF lossless book design, so that PDF-XChange Editor recognize this document correctly.
6. Note

I haven't paper copy of this book → I can't re-scan this page on grayscale or another better quality.

7.Environment
  • Windows 10 Enterprise LTSB 64-bit EN
  • PDF-XChange Editor 8.0 Build 331.0, Portable
  • Russian Language Pack from Default OCR Engine
Thanks.
User avatar
Dimitar - Tracker Supp
Site Admin
Posts: 1778
Joined: Mon Jan 15, 2018 9:01 am

Re: question(ocr): black text on noise background

Post by Dimitar - Tracker Supp »

Hello SashaChernykh,

Unfortunately, our Enhanced OCR currently has some problems recognizing text with a color background.

The developers are working on resolving this issue.

Regards.
User avatar
Ovg
User
Posts: 461
Joined: Tue Sep 05, 2017 4:56 pm

Re: question(ocr): black text on noise background

Post by Ovg »

Default OCR engine doesn't work either
It's impossible to lead us astray for we don't care even to choose the way.
PDF-XChange PRO, 10.1.1 (Build 381) / W7 SP1 x64
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8436
Joined: Wed Jan 03, 2018 6:52 pm

Re: question(ocr): black text on noise background

Post by TrackerSupp-Daniel »

Hello OVG,
Have you tried this with the accuracy setting on "Low"?

In my tests that allowed the lower text to be recognized. However it should be noted that this is somewhat of an extreme case of "peppering" on the page, and will almost always lead to seriously reduced OCR capacity.

Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
User avatar
Ovg
User
Posts: 461
Joined: Tue Sep 05, 2017 4:56 pm

Re: question(ocr): black text on noise background

Post by Ovg »

Hello Daniel! Yes, I have tried low accuracy - it isn't working for me.
It's impossible to lead us astray for we don't care even to choose the way.
PDF-XChange PRO, 10.1.1 (Build 381) / W7 SP1 x64
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8436
Joined: Wed Jan 03, 2018 6:52 pm

Re: question(ocr): black text on noise background

Post by TrackerSupp-Daniel »

Hello OVG, and Sasha!

Sorry for the delay, Ive just come back to retry with this one, and realized that I was testing with the EOCR instead of the default OCR. I do indeed see the issues with it not seeing anything even on low accuracy. I have informed the Dev team of this, but cannot offer a timeline for a resolution.

Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
SashaChernykh
User
Posts: 11
Joined: Tue Aug 20, 2019 8:01 am

Re: question(ocr): black text on noise background

Post by SashaChernykh »

@Tracker Software support:

Type: Question :?:

What priority (e. g. low, medium, high) does this task have in PDF-XChange Editor task prioritization?

Thanks.
User avatar
Dimitar - Tracker Supp
Site Admin
Posts: 1778
Joined: Mon Jan 15, 2018 9:01 am

Re: question(ocr): black text on noise background

Post by Dimitar - Tracker Supp »

Hello SashaChernykh,

Since this affects the work process of our customers this problem is with high priority.

But we can not give an exact date when this issue will be resolved since this OCR engine is developed and supported by external developers.

Regards.
Post Reply