OCR filter between text and numbers

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
alexB
User
Posts: 1
Joined: Thu Oct 14, 2021 9:14 am

OCR filter between text and numbers

Post by alexB »

Hi.

Is there a filter option to make the OCR better at recognize mixed text and numbers?

Usually it tries to recognize the whole string as either text or numbers, but i am working with technical parts.

Example 1:
The string "TS14052" turns into "TSHoSZ"

Example 2:
The string "14H21" turns into "14421"

Kind regards,
Alex.
Willy Van Nuffel
User
Posts: 2342
Joined: Wed Jan 18, 2006 12:10 pm

Re: OCR filter between text and numbers

Post by Willy Van Nuffel »

Hi Alex,


The most important option here, in OCR, is the "Accuracy"-setting with choices "Auto / Low / Medium / High".

Myself, I get the best results (not yet perfect) with "Low" (using the Default OCR Engine).
In case you have a licensed version, you can try the Enhanced OCR Engine, with the different Accuracy settings.

After applying OCR in PDF-XChange Editor, you can check the result via View > Panes > Content pane.

Kind regards.


Willy
Attachments
PDF-XChange Editor - OCR - Low.pdf
(28.64 KiB) Downloaded 139 times
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8371
Joined: Wed Jan 03, 2018 6:52 pm

Re: OCR filter between text and numbers

Post by TrackerSupp-Daniel »

Hello, Alex

If you could send us a copy of a document you can reproduce this issue in reliably, it would greatly help us with troubleshooting, and eventually resolving the issue for you.

I should also note that, as Willy mentioned, the "accuracy" setting will be of great import here. Accuracy is used to define the quality of the Document however, and not the quality of the scanning that occurs (frankly, if it was the latter, why would we even offer options, of course everyone would want the highest quality OCR scan). In brief, you should generally only NEED to choose Auto, or Low accuracy, choosing normal or high is often unnecessary as the Auto setting is usually able to determine where to use each as it goes.

~ If your document is old, damaged, stained, speckled, has a low contrast background or faded text, etc, you should be using low or normal accuracy.
~ If your document is a normal, decent quality scanned image, without many blemishes, and fairly clear text, Normal or Auto accuracy should be used.
~ If your document is pristine quality, either a completely unblemished very high quality scan, or a file which has never left the digital format, you should use Auto, or in some cases, High accuracy.

Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
DIV
User
Posts: 252
Joined: Fri Jun 23, 2017 1:47 am

Re: OCR filter between text and numbers

Post by DIV »

TrackerSupp-Daniel wrote: Fri Oct 15, 2021 12:14 am Accuracy is used to define the quality of the Document however, and not the quality of the scanning that occurs (frankly, if it was the latter, why would we even offer options, of course everyone would want the highest quality OCR scan).
OMG, I never guessed this from the GUI!!!

PLEASE change the GUI text from "Accuracy" to "Image quality" or "Input quality" similar!
Alternatively invert and change to "Image imperfections" or similar.
Alternatively, instead of naming that panel "Recognition options", rename it to "Input document settings". That will then contrast well with the existing lowermost panel, "Output Options".

In answer to your (rhetorical?) question of why "Accuracy" might be interpreted as quality of OCR analysis [paraphrasing what I think you mean], there are some clear answers for me:
  • high-quality analyses can require long computational times, large memory capacity, and so on; lower-quality analyses are often considered acceptable if the higher-quality analyses are prohibitively slow or otherwise computationally demanding;
  • the option appears in a dialogue box containing a whole lot of settings about how the OCR should be performed, produced after the user as clicked the "OCR Page(s)" button.
—DIV
User avatar
Paul - Tracker Supp
Site Admin
Posts: 6813
Joined: Wed Mar 25, 2009 10:37 pm
Location: Chemainus, Canada
Contact:

Re: OCR filter between text and numbers

Post by Paul - Tracker Supp »

Give us some time on this one and we will have further discussion here about this.

Things are not normal at the moment, it may be this week, or maybe next.

regards
Best regards

Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
DIV
User
Posts: 252
Joined: Fri Jun 23, 2017 1:47 am

Re: OCR filter between text and numbers

Post by DIV »

Thanks, Paul.
Even next month would be great, as far as I'm concerned :-)
Actually, the suggestion's intended more for the benefit of new/other users (and hence also the benefit of Tracker).
I didn't mean to stress you out with my all-caps 'yelling' ;-)

—DIV
User avatar
Paul - Tracker Supp
Site Admin
Posts: 6813
Joined: Wed Mar 25, 2009 10:37 pm
Location: Chemainus, Canada
Contact:

Re: OCR filter between text and numbers

Post by Paul - Tracker Supp »

No worries, as you are aware, the discussion is happening: viewtopic.php?f=63&t=37543

Lets see what comes from that percolating...

:-)
Best regards

Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
DIV
User
Posts: 252
Joined: Fri Jun 23, 2017 1:47 am

Sign of the times

Post by DIV »

Paul - Tracker Supp wrote: Mon Mar 07, 2022 4:04 pm Things are not normal at the moment, it may be this week, or maybe next.
Paul,
People around the world are horrified by the Russian military's attack on Ukraine,
search.php?keywords=Ukraine
https://www.pdf-xchange.com/index. ... s/view/274


My thoughts are with you all, and I encourage others to dig deep to oppose the war.
https://www.defendukraine.org/donate

—DIV

P.S. I realise that this is a technical support forum, so I apologise for posting on another matter, and I understand that this forum is not set up for political discussion.
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17771
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: OCR filter between text and numbers

Post by Tracker Supp-Stefan »

Hello DIV,

Many thanks for the kind words and consideration!

Kind regards,
Stefan
Post Reply