OCR Hanging

PDF-XChange Editor SDK for Developers

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Forum rules
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.

When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
Post Reply
DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

OCR Hanging

Post by DolphinMann »

This one is a bit confusing, so apologies if I am not completely clear, just let me know what else you need. I also don't expect an immediate resolution, but merely some guidance if I am doing something that seems wrong.

I have a web service that listens for incoming requests, using .NET HttpListener. When it gets that request it will do any number of actions. One of those actions is to OCR a document using PDF XChange(code below).

However when I get a web request to do such activity it hangs until the timeout I've setup(as indicated by the TASK in the below code) and fails. I wasn't sure exactly what was going on so I wrote a simple 2 line program to OCR the same file, and it worked. I can confirm it hangs on the Op.Do() line.

OCR Code:
Init Function:

Code: Select all

                if (viewerInstance == null)
                {
                    if (existingViewer == null)
                    {
                        try
                        {
                            viewerInstance = new PXV_Inst();
                            viewerInstance.Init(null, DolphinCorePDF.devKey);

                            Logger.Log("PDF Converter Object Initialized", 5);
                        }
                        catch (Exception ex)
                        {
                            Logger.Log("Error During Init for PDFTools", ex, 1);
                            int hr = Marshal.GetHRForException(ex);
                            PDFTools.LogErrMsg(hr);
                        }
                    }
                    else
                    {
                        viewerInstance = existingViewer;
                    }

                    string pluginLoadPath = Environment.GetEnvironmentVariable("DolphinPath");
                    viewerInstance.StartLoadingPlugins();
                    viewerInstance.AddPluginFromFile(pluginLoadPath + @"OCRPlugin.pvp");
                    viewerInstance.AddPluginFromFile(pluginLoadPath + @"ConvertPDF.pvp");
                    viewerInstance.FinishLoadingPlugins();
                    Logger.Log("Conversion and OCR Plugin Loaded", 5);

                    try
                    {
                        PDFTools.pdfToolsTimeoutInMin = Convert.ToInt32(ApplicationSettings.ReadSettingFromProfile(ConfigurationSettings.PDFToolsTimeoutInMin.ToString(), "Default"));
                    }
                    catch (Exception ex)
                    {
                        Logger.Log("Failed to read PDFTool Timeout. Defaulting to 10 min", ex, 1);
                        PDFTools.pdfToolsTimeoutInMin = 10;
                        ApplicationSettings.WriteSetting(ConfigurationSettings.PDFToolsTimeoutInMin.ToString(), Convert.ToString(PDFTools.pdfToolsTimeoutInMin), "Default");
                    }
                }

                if (auxInst == null)
                {
                    auxInst = (IAUX_Inst)viewerInstance.GetExtension("AUX");
                }
OCR Code:

Code: Select all

            Logger.Log("Attempting to execute OCR on document: " + inputPDF, 5);
            var myTask = Task.Run(() =>
            {
                try
                {

                    if (File.Exists(inputPDF))
                    {
                        Init();
                        IPXC_Inst pxcInst = (IPXC_Inst)viewerInstance.GetExtension("PXC");
                        IPXC_Document doc = pxcInst.OpenDocumentFromFile(inputPDF, clbk);

                        int nID = viewerInstance.Str2ID("op.document.OCRPages", false);
                        PDFXEdit.IOperation Op = viewerInstance.CreateOp(nID);
                        PDFXEdit.ICabNode input = Op.Params.Root["Input"];
                        input.v = doc;
                        PDFXEdit.ICabNode options = Op.Params.Root["Options"];

                        if (pages.Length == 0 || (pages.Length == 1 && pages[0] == -1))
                        {
                            options["PagesRange.Type"].v = "All";
                        }
                        else
                        {
                            options["PagesRange.Type"].v = "Exactly";
                            string pageValues = "";
                            for (int count = 0; count < pages.Length; count++)
                            {
                                if (pageValues != "")
                                {
                                    pageValues += ",";
                                }

                                pageValues += Convert.ToString(pages[count]);
                            }

                            options["PagesRange.Text"].v = pageValues;
                        }

                        options["OutputType"].v = 0;
                        options["OutputDPI"].v = 300;

                        Op.Do();

                        doc.WriteToFile(inputPDF);

                        doc.Close();
                        options.Clear();
                        input.Clear();

                        Logger.Log("PDF File: " + inputPDF + " had OCR completed", 5);
                    }
                    else
                    {
                        Logger.Log("PDF File: " + inputPDF + ", does not exist. Cannot execute OCR", 1);
                    }
                }
                catch (Exception ex)
                {
                    Logger.Log("Error runnong OCR on PDF: " + inputPDF, ex, 1);
                    int hr = Marshal.GetHRForException(ex);
                    PDFTools.LogErrMsg(hr);
                }
            });
            bool completed = myTask.Wait(1000 * 60 * PDFTools.pdfToolsTimeoutInMin);

            if (!completed)
            {
                Logger.Log("Timeout for PDF Tools reached. OCR Process cancelled: " + PDFTools.pdfToolsTimeoutInMin, 1);
            }
Sample Program that works:

Code: Select all

        public void QuickOCRTest(string filePath)
        {
            PDFTools.Init();
            PDFTools.RunOCRAndAddText(filePath, new int[] { });
        }
Obviously I can't paste the entire web listener code here as it's more complicated and can do far more than just OCR. But I'd appreciate any advice on what I may be doing wrong. The OCR Function works perfectly, as indicated by the sample program calling it with the same file passed from the service. I can confirm the file is the same and the access permissions to it are the exact same in both circumstances.

The INIT call is made when the Windows Service is started, but obviously each request comes in separately and fires off the chain of events for processing.
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR Hanging

Post by Sasha - Tracker Dev Team »

Hello DolphinMann,

Can this be recreated by making a small sample with a task that runs the OCR?

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: OCR Hanging

Post by DolphinMann »

What do you mean exactly? The OCR Code, I've pasted above already creates a Task to execute the OCR process as the very first step. That code is the same whether it is called from the sample program or the web service.

Are you asking for me to also tell the sample program to run the function in a separate thread?
DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: OCR Hanging

Post by DolphinMann »

I have tried everything I can think of and still cannot figure this out.

If I run the OCR from the following it works:

-Direct Function call from "public static void main" of a simple 2 line program, as shown in my above code snippet

However in my Windows service I cannot get it to work at all. I tried putting a hardcoded line at the service start to OCR a static file and it hangs. So it appears to be something related to the process running in a windows Service instead of a program
DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: OCR Hanging

Post by DolphinMann »

More Info:

I think I have this fixed but am still doing some testing.

If I move the call to the INIT function, exactly as it is above to the "static void main" of my windows service, so now the PDFTools.Init is the very first thing the service does. This seems to work even when actions are execution in a separate thread.

However if I made the INIT call from a separate thread, it seems like the OCR would fail, but not all PDF calls would fail, just OCR.

I am still not sure exactly what is going on or if what I've done is the real fix or perhaps it is something else that I am just not aware of, but that seemed to help.

Does that make sense? Does it seem logical that such a thing could be the point of failure for the OCR operation?
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR Hanging

Post by Sasha - Tracker Dev Team »

Hello DolphinMann,

Well the Init and Shutdown methods should be done in the same thread and preferably once. Then everything else should behave normally. Again normally when we are talking about languages like C++ that gives us a direct memory management and management of the COM and we don't have such special effects as the .Net often gives us.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: OCR Hanging

Post by DolphinMann »

I did have some protection in INIT(and it was a static object) to ensure that it was only called once or if it was already called it would skip the function.

I think my problem was that the INIT was called on a different thread and thus not accessible to worker threads.
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR Hanging

Post by Sasha - Tracker Dev Team »

Hello DolphinMann,

Yup, probably that was the cause.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: OCR Hanging

Post by DolphinMann »

Sorry to necro an old thread but this is related....

I am starting to have this issue again with a more complicated Windows Service. I don't want to derail the entire forum with "best threading practice", but what should I do for something like a Windows Service that spawns many worker threads.

When the Windows Service starts, let's say the MAIN thread is ID1. When I start a worker thread, ID2, I am not sure how to make a call back to Thread 1 as this is now a Windows APP with a GUI, in which I can always call back to the GUI thread. To my knowledge, and in C#, I would need to create a thread which polls and pulls from some type of blocking collection in order to accomplish this, which is no small change.

The other option, but curious if it will cause problems, would be to simply create a new instance of the PDF X-Change library for each worker thread, but this also doesn't seem like the best approach.

Can you provide any guidance or mention what others may have done. All I am looking to do is an Op.Do with OCR but in different threads each time I receive a request.
DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: OCR Hanging

Post by DolphinMann »

I switched to the INST.AsyncDoAndWaitForFinish and now I am getting:

[PXVLib]: Wrong thread

Which makes sense based on our previous conversations, but I am not sure how to proceed here when NOT using the UI
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR Hanging

Post by Sasha - Tracker Dev Team »

Hello DolphinMann,

The AsyncDo/AsyncDoAndWaitForFinish does only work in the thread where the instance was initialized. These methods are launching the operation in another thread and in your case, you should just use the Op.Do() as you are already using another thread.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: OCR Hanging

Post by DolphinMann »

Thank you Alex, however when I do that it seems to just hang. It is something related to permissions or the fact I am operating the execution as a Windows Service.

If I use the demo app from Tracker or create my own simply demo app with the GUI, the OCR Op.Do command works fine. This includes using my same code but calling my GUI instance of the PXV_INST object.

However the exact same code in a Windows service does not work and it just hangs. However it does not seem to do this everywhere, which is why I am confused. It is something about permissions, threading, or some other hidden scenario. It's just got me completely stuck. Without some advanced logging of what it is doing the hang it's hard to isolate the source. Is there a way to get internal logging?
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR Hanging

Post by Sasha - Tracker Dev Team »

Hello DolphinMann,

I've consulted with my teammates. And they advised to make a full dump (through the task manager for example) when you experience a hang and send it to us - we'll check whether there is any deadlock there from our side. You can upload a dump here for example:
https://www.pdf-xchange.com/knowle ... ile-server

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: OCR Hanging

Post by DolphinMann »

One other question about this functionality....

Is there a way to set a timeout on Op.Do()? So I can kill/cancel that operation after X seconds?

Currently I have the entire function wrapped in a task, so while I can set a timer for my task, it technically doesn't kill the thread that Op.Do is executing.
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR Hanging

Post by Sasha - Tracker Dev Team »

Hello DolphinMann,

There is a way of doing this. You will have to implement your own IProgressMon derived class and then use the https://sdkhelp.pdf-xchange.com/vi ... rogressMon property of the Instance to replace the progress monitor for the time of the operation work. At first you will have to get the standard IProgressMon from the Instance to get the default implementations and also to replace it back after the operation. In your class, use that old IProgressMon implementations to implement all of the methods. In the https://sdkhelp.pdf-xchange.com/vi ... n_Canceled property you will have to use your timing logic along with the old implementation. If your timer exceeds the value that you need, then you will have to return bCanceled as true. As all of the operations check the canceled flag at some time of their work if the IProgressMon exists, that should break the operation when the time for that check comes.
There can be a situation that the operation hangs in the place where there is no Cancel check - then I'm afraid you will have to kill the service for correct restart.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
hjerteblod
User
Posts: 26
Joined: Thu Sep 05, 2019 9:03 am

Re: OCR Hanging

Post by hjerteblod »

Hello,
since I am currently working on a module for our application and am now experiencing the same problem as DolphinMann, I would like to discuss the result you came to when examining the DUMP files.

In my case the facts are as follows:
I have separated the OCR of PDFXEdit as a separate x64 application from the rest of the application, so that the OCR is available as a separate .exe file. The documents to be processed are passed to this .exe file based on arguments when called. The exit code of the application then tells you about success and failure.

This .exe file is now called by a Windows Forms application on the one hand and by a Windows service on the other hand.

I have made the experience that this OCR application always works reliably when called by the Windows Forms application, but hangs sporadically when called by the Windows service. This happens actually with:

- IOperation.Do(),
- PXV_Inst.AsyncDoAndWaitForFinish(),

which of the variants is used, does not matter. The used thread will be blocked at the call and the CPU load will remain at 0%. It's as if the thread has been put into an unlimited "thread.sleep". However, if the process is stopped and restarted, this process usually works after the second or third start.

Is there possibly a logical explanation for this behavior?


- For the rest:
I have also tried to solve the problem by monitoring the progress of IProgressMon with timer control and terminating the process if no progress is visible within a certain period of time.
But here too I have encountered a problem. (Granted, the problem is probably due to my unknowledge).
As described in the article over me, I have tried to implement the interface "IProgressMon" in an own class and thus replace the corresponding property of the PXV_Instance.
I encountered the problem that when setting certain properties (e.g. "Duration"), the setter throws an exception because the requested value cannot be accessed. That's because PXV_Inst.AsynDo /PXV_AsyncDoAndWaitForFinish sets /gettet this value from a thread that is not the same as the one it was created from (Should be ThreadID-1, but in fact this is done with thread-ID 5).
Admittedly, I could also use some help concerning its realization.


Although these are probably quite beginner questions, I hope for some help regarding my questions. Thank you in advance for your support.
Post Reply