How to improve OCR performance

Discussion for the End User use uf OCR in PDF-XChange Editor and Viewer

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Sean - Tracker, Paul - Tracker Supp, Chris - Tracker Supp, Tracker Supp-Stefan, Ivan - Tracker Software

Post Reply
CFCF
User
Posts: 1
Joined: Mon Dec 23, 2019 5:53 am

How to improve OCR performance

Post by CFCF » Mon Dec 23, 2019 6:05 am

All,

I own a powerful 8 core (16 with hyper threading) Win10 64 bit PC with 32 GB of RAM whose power I'd like to employ for OCR.

I've just upgraded my installation to PDF Exchange Editor Plus V8 Build 335.0 with enhanced OCR plugin.

No matter what settings I chose in the OCR dialog or in Settings/Performance (16 threads), CPU consumption in Win10 task manager doesn't rise beyond 35% during OCR.

OCR of larger PDF's should be perfect for parallelization so I'd hope to find a way how the OCR plugin makes better use of my compute resources.

Thanks for your insights

Christoph

User avatar
Tracker Supp-Stefan
Site Admin
Posts: 14037
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: How to improve OCR performance

Post by Tracker Supp-Stefan » Mon Dec 23, 2019 11:24 am

Hello CHristoph,

I am checking with colleagues from the dev team to see if the EOCR engine is affected by these settings, and if not - what can be done.

Season's greetings,
Stefan

User avatar
Vasyl-Tracker Dev Team
Site Admin
Posts: 1983
Joined: Thu Jun 30, 2005 4:11 pm
Location: Canada

Re: How to improve OCR performance

Post by Vasyl-Tracker Dev Team » Mon Dec 23, 2019 5:33 pm

Hi Christoph.

We found an issue that limits the number of threads that can be used for OCR, on x64 systems. We will fix it in the upcoming build.
Sorry for the inconvenience and thanks for the report.

Cheers.
Vasyl Yaremyn
Tracker Software Products
Project Developer

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.

Timur Born
User
Posts: 698
Joined: Tue Jun 26, 2012 1:50 pm

Re: How to improve OCR performance

Post by Timur Born » Tue Feb 11, 2020 10:50 am

I only just noticed that I was still using 334, which was limited in its number of OCR threads (3 full load threads maximum). Just tested 336 and happy to say that it makes full use of all my CPU cores now. It creates more threads than CPU cores, which may or may not be intentional? But in the end it speeds up OCR considerably.

Timur Born
User
Posts: 698
Joined: Tue Jun 26, 2012 1:50 pm

Re: How to improve OCR performance

Post by Timur Born » Tue Feb 11, 2020 10:58 am

Unfortunately with "Fine Page Content" the "Rasterizing" and especially "Applying results of recognition" parts seem to be mostly single-threaded and correspondingly can take a long time to complete.

User avatar
Tracker Supp-Stefan
Site Admin
Posts: 14037
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: How to improve OCR performance

Post by Tracker Supp-Stefan » Tue Feb 11, 2020 2:33 pm

Hello Timur,

I will check with Vasyl if there can be any improvements in both of those steps and we will post any further news as soon as we get them!

Cheers,
Stefan

User avatar
Tracker Supp-Stefan
Site Admin
Posts: 14037
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: How to improve OCR performance

Post by Tracker Supp-Stefan » Thu Feb 13, 2020 10:25 am

Hello Timur,

Our devs said that they will investigate what can be done for those two steps of the OCR process, and I've made a ticket for it:
#5101: OCR Performance optimisations for "Fine Page Content" and "Rasterizing" steps of the process
So we will post again here as soon as there are any further news.

Regards,
Stefan

Post Reply