How to speed up OCR?  SOLVED

PDF-XChange Editor SDK for Developers

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Forum rules
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.

When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
Post Reply
hjerteblod
User
Posts: 26
Joined: Thu Sep 05, 2019 9:03 am

How to speed up OCR?

Post by hjerteblod »

Hello,
I have noticed that the OCR engine of the PDFX Change Editor (version 7) requires significantly less time for text recognition than my own implementation.
The editor needed about 30 seconds for text recognition for a test document, whereas my implementation, with the same settings, took 3 minutes.

Is there a way to make my implementation as fast as the Editor?

I also used this code, as described in this forum viewtopic.php?f=66&t=33528&p=141558&hil ... ty#p141558 :

Code: Select all

pdfCtl.Inst.Settings["Performance.MaxThreads"].v = 16;
pdfCtl.Inst.Settings["Performance.MaxBackgroundThreads"].v = 8;
pdfCtl.Inst.FireAppPrefsChanged(PDFXEdit.PXV_AppPrefsChanges.PXV_AppPrefsChange_Performance);
However, I did not notice any difference in performance.


Snippet of the Code I use:

Code: Select all

Try
            docAFSName = FsInst.DefaultFileSys.StringToName(inputpdf)
            resDoc = MPxcInst.OpenDocumentFrom(docAFSName, Nothing)
            Inputcab = Op_OCR.Params.Root("Input")
            Outputcab = Op_OCR.Params.Root("Output")
            Inputcab.v = resDoc
            options = Op_OCR.Params.Root("Options")
            options("PagesRange.Type").v = "All"
            options("ExtParams.Accuracy").v = iOCRDPI
            options("ExtParams.Language").v = "deu+eng" & sAddationalLanguages

            If Not AutoDeskew Then

                options("OutputType").v = 0

            Else

                options("OutputType").v = 1
                options("OutputDPI").v = 0

            End If

            options("ExtParams.AutoDeskew").v = AutoDeskew

            MInst.AsyncDoAndWaitForFinish(Op_OCR, CUInt(OpExecFlags.OpExecFlag_NoUI Or OpExecFlags.OpExecFlag_NoProgress))

            If AutoDeskew Then

                outputDoc = CType(Outputcab.v, IPXC_Document)

            End If

        Catch

            Environment.ExitCode = PDFXEDIT_OPERATION_FAILURE
            Environment.Exit(Environment.ExitCode)

        End Try

Many thanks in advance.
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: How to speed up OCR?

Post by Sasha - Tracker Dev Team »

Hello hjerteblod,

Are you comparing the normal OCR from the End-User Editor or the Enhanced OCR?

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
hjerteblod
User
Posts: 26
Joined: Thu Sep 05, 2019 9:03 am

Re: How to speed up OCR?

Post by hjerteblod »

Hello Sasha,

my comparison refers to the normal, not enhanced OCR.
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: How to speed up OCR?

Post by Sasha - Tracker Dev Team »

Hello hjerteblod,

Another thing - are you using both x64 versions of the applications/dlls?

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
hjerteblod
User
Posts: 26
Joined: Thu Sep 05, 2019 9:03 am

Re: How to speed up OCR?

Post by hjerteblod »

I only have a reference to the PDFXEditCore.x86.dll.
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: How to speed up OCR?

Post by Sasha - Tracker Dev Team »

Hello hjerteblod,

Well that's a problem - you are using the x86 dll (32 bit program) that has memory limits - thus the speed is low. Try using the x64 version.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
hjerteblod
User
Posts: 26
Joined: Thu Sep 05, 2019 9:03 am

Re: How to speed up OCR?

Post by hjerteblod »

Okay Sasha, thank you very much.

I'm now trying to include the 64 bit version of the DLL. However, I have a completely different problem that you might be able to help me with. Anyway, that would be very nice.

I've had this problem for quite a while: It's about the registration of the COM-DLL.

In our case, access to the PDFXEditCore.x86.dll never worked directly. Instead, the PdfXEditCore.x86 was registered via regsvr32 under Programs.x86 in the installation directory of the SDK, where the corresponding DLLs are located.
Only now is it possible to work with it in Visual Studio.

To work with the 64-bit version of the DLL, I have now done the same with the x64 version, but the x86 version is still referenced.

Do you possibly have something like a developer's guide that shows how these DLLs can be included without registration? E.g., so that when the program is called, the corresponding DLL, which is located in the working directory, can be accessed.

Once again: Many thanks in advance.
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: How to speed up OCR?

Post by Sasha - Tracker Dev Team »

Hello hjerteblod,

What programming language are you using?

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
hjerteblod
User
Posts: 26
Joined: Thu Sep 05, 2019 9:03 am

Re: How to speed up OCR?

Post by hjerteblod »

We're using VB.NET with .NET Framework 4.7.2.
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: How to speed up OCR?

Post by Sasha - Tracker Dev Team »

Hello hjerteblod,

Well if you use the manifest with the needed dll - there won't be any problems. There is a manifest sample in the FullDemo application.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
hjerteblod
User
Posts: 26
Joined: Thu Sep 05, 2019 9:03 am

Re: How to speed up OCR?

Post by hjerteblod »

Hello Sasha,
this is how it should be in theory, yes.

In practice, however, the FullDemo project folder refers to the corresponding manifest files, but unfortunately they are not there.
Screenshot 2020-09-18 085758.png
Could you somehow make these files available to me? That would be great.
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: How to speed up OCR?  SOLVED

Post by Sasha - Tracker Dev Team »

Hello hjerteblod,

Strange, they should be available. Nevertheless here you go:
manifest.zip
(2.19 KiB) Downloaded 76 times
Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
hjerteblod
User
Posts: 26
Joined: Thu Sep 05, 2019 9:03 am

Re: How to speed up OCR?

Post by hjerteblod »

Thank you! :)
It took a while until it worked. But in the end it helped to improve the performance.

There is one more thing I would like to ask: I did not find a single entry in the documentation regarding:

Code: Select all

pdfCtl.Inst.Settings["Performance.MaxThreads"].v = 16;
pdfCtl.Inst.Settings["Performance.MaxBackgroundThreads"].v = 8;
pdfCtl.Inst.FireAppPrefsChanged(PDFXEdit.PXV_AppPrefsChanges.PXV_AppPrefsChange_Performance);
Could these entries be explained in more detail? For example, I have not found an entry called "MaxBackgroundThreads". But instead I have "RenderThreads" and "ThumbThreads". How do these properties affect the program?
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: How to speed up OCR?

Post by Sasha - Tracker Dev Team »

Hello hjerteblod,

Well these are from the Application Preferences when you go to the Performance tab (should be pretty understandable from the labels):
image.png
Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
Post Reply