How to improve OCR performance

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
CFCF
User
Posts: 1
Joined: Mon Dec 23, 2019 5:53 am

How to improve OCR performance

Post by CFCF »

All,

I own a powerful 8 core (16 with hyper threading) Win10 64 bit PC with 32 GB of RAM whose power I'd like to employ for OCR.

I've just upgraded my installation to PDF Exchange Editor Plus V8 Build 335.0 with enhanced OCR plugin.

No matter what settings I chose in the OCR dialog or in Settings/Performance (16 threads), CPU consumption in Win10 task manager doesn't rise beyond 35% during OCR.

OCR of larger PDF's should be perfect for parallelization so I'd hope to find a way how the OCR plugin makes better use of my compute resources.

Thanks for your insights

Christoph
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17824
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: How to improve OCR performance

Post by Tracker Supp-Stefan »

Hello CHristoph,

I am checking with colleagues from the dev team to see if the EOCR engine is affected by these settings, and if not - what can be done.

Season's greetings,
Stefan
User avatar
Vasyl-Tracker Dev Team
Site Admin
Posts: 2352
Joined: Thu Jun 30, 2005 4:11 pm
Location: Canada

Re: How to improve OCR performance

Post by Vasyl-Tracker Dev Team »

Hi Christoph.

We found an issue that limits the number of threads that can be used for OCR, on x64 systems. We will fix it in the upcoming build.
Sorry for the inconvenience and thanks for the report.

Cheers.
Vasyl Yaremyn
Tracker Software Products
Project Developer

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
Timur Born
User
Posts: 874
Joined: Tue Jun 26, 2012 1:50 pm

Re: How to improve OCR performance

Post by Timur Born »

I only just noticed that I was still using 334, which was limited in its number of OCR threads (3 full load threads maximum). Just tested 336 and happy to say that it makes full use of all my CPU cores now. It creates more threads than CPU cores, which may or may not be intentional? But in the end it speeds up OCR considerably.
Timur Born
User
Posts: 874
Joined: Tue Jun 26, 2012 1:50 pm

Re: How to improve OCR performance

Post by Timur Born »

Unfortunately with "Fine Page Content" the "Rasterizing" and especially "Applying results of recognition" parts seem to be mostly single-threaded and correspondingly can take a long time to complete.
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17824
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: How to improve OCR performance

Post by Tracker Supp-Stefan »

Hello Timur,

I will check with Vasyl if there can be any improvements in both of those steps and we will post any further news as soon as we get them!

Cheers,
Stefan
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17824
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: How to improve OCR performance

Post by Tracker Supp-Stefan »

Hello Timur,

Our devs said that they will investigate what can be done for those two steps of the OCR process, and I've made a ticket for it:
#5101: OCR Performance optimisations for "Fine Page Content" and "Rasterizing" steps of the process
So we will post again here as soon as there are any further news.

Regards,
Stefan
DrStoertebecker
User
Posts: 1
Joined: Mon Feb 15, 2021 8:50 am

Re: How to improve OCR performance

Post by DrStoertebecker »

Same problem here:

my CPU is a 16-core Ryzen 9 59050x with 32GB of RAM. I am running PDF-XChange Editor Plus (Version: 9.0 (Build 352.0) (Feb 4 2021; 17:55:44) 64bit) on Windows 10 Home (19041.1.amd64fre.vb_release.191206-1406).

When using OCR multi-threading is pretty much non-existent. Doing OCR on large files with several hundred pages sometimes takes over half an hour. CPU-utilization idles at around 5% all the time with only one core (constantly changing) being used at around 30-80%.
My first instinct was that the software is not very good at distributing the pages within a single document over different threads. So I tried OCR on a large number of files simultaniously using batch-processing in "PDF-tools". Same problem: CPU-utilization is around 5% and OCR takes forever.
I also tried changing multi-threading in the options from "automatic" to "16 cores" - no effect.

The weird thing is: Every once in a while with some files OCR does suddenly use 16 cores/32 threads at around 95% core-usage and everything works extremely fast and smooth. However, I could not establish any rules behind this behaviour so far (depending on file size or similar). It all seems quite random to me.

For the record: The problem is most annoying when I am using OCR because it does take forever to finish a job. But I have the impression that multi-threading does not work very well in general. For instance, when I am printing a large document to PDF using the "PDF X-Change Standard PDF printer" it also takes a very long time and CPU-utilization is mostly below 5% with only one core doing all the work.

I would be very grateful for a solution to this problem. Looking at my CPU and its extremely low utilization I assume I could cut the time for many jobs by over 90% if multi-threading would work properly.
Thanks in advance!

Sincerely,
Chris
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17824
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: How to improve OCR performance

Post by Tracker Supp-Stefan »

Hello DrStoertebecker,

On our last meeting with the devs, this subject was discussed, and our devs did tell me that we are currently looking at ways to indeed allow multi threading to work fully when performing compute heavy tasks like OCR. There are some things that need to be tested, and to ensure that this will not have negative impacts elsewhere, but we are definitely working on this multithreading and will have it out as soon as possible (but no specific ETA yet)!

Kind regards,
Stefan
Post Reply