OCR Hanging

PDF-XChange Editor SDK for Developers

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Sean - Tracker, Paul - Tracker Supp, Chris - Tracker Supp, Tracker Supp-Stefan, Ivan - Tracker Software

Forum rules
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.

When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
Post Reply
DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

OCR Hanging

Post by DolphinMann » Tue Feb 06, 2018 9:42 pm

This one is a bit confusing, so apologies if I am not completely clear, just let me know what else you need. I also don't expect an immediate resolution, but merely some guidance if I am doing something that seems wrong.

I have a web service that listens for incoming requests, using .NET HttpListener. When it gets that request it will do any number of actions. One of those actions is to OCR a document using PDF XChange(code below).

However when I get a web request to do such activity it hangs until the timeout I've setup(as indicated by the TASK in the below code) and fails. I wasn't sure exactly what was going on so I wrote a simple 2 line program to OCR the same file, and it worked. I can confirm it hangs on the Op.Do() line.

OCR Code:
Init Function:

Code: Select all

                if (viewerInstance == null)
                {
                    if (existingViewer == null)
                    {
                        try
                        {
                            viewerInstance = new PXV_Inst();
                            viewerInstance.Init(null, DolphinCorePDF.devKey);

                            Logger.Log("PDF Converter Object Initialized", 5);
                        }
                        catch (Exception ex)
                        {
                            Logger.Log("Error During Init for PDFTools", ex, 1);
                            int hr = Marshal.GetHRForException(ex);
                            PDFTools.LogErrMsg(hr);
                        }
                    }
                    else
                    {
                        viewerInstance = existingViewer;
                    }

                    string pluginLoadPath = Environment.GetEnvironmentVariable("DolphinPath");
                    viewerInstance.StartLoadingPlugins();
                    viewerInstance.AddPluginFromFile(pluginLoadPath + @"OCRPlugin.pvp");
                    viewerInstance.AddPluginFromFile(pluginLoadPath + @"ConvertPDF.pvp");
                    viewerInstance.FinishLoadingPlugins();
                    Logger.Log("Conversion and OCR Plugin Loaded", 5);

                    try
                    {
                        PDFTools.pdfToolsTimeoutInMin = Convert.ToInt32(ApplicationSettings.ReadSettingFromProfile(ConfigurationSettings.PDFToolsTimeoutInMin.ToString(), "Default"));
                    }
                    catch (Exception ex)
                    {
                        Logger.Log("Failed to read PDFTool Timeout. Defaulting to 10 min", ex, 1);
                        PDFTools.pdfToolsTimeoutInMin = 10;
                        ApplicationSettings.WriteSetting(ConfigurationSettings.PDFToolsTimeoutInMin.ToString(), Convert.ToString(PDFTools.pdfToolsTimeoutInMin), "Default");
                    }
                }

                if (auxInst == null)
                {
                    auxInst = (IAUX_Inst)viewerInstance.GetExtension("AUX");
                }
OCR Code:

Code: Select all

            Logger.Log("Attempting to execute OCR on document: " + inputPDF, 5);
            var myTask = Task.Run(() =>
            {
                try
                {

                    if (File.Exists(inputPDF))
                    {
                        Init();
                        IPXC_Inst pxcInst = (IPXC_Inst)viewerInstance.GetExtension("PXC");
                        IPXC_Document doc = pxcInst.OpenDocumentFromFile(inputPDF, clbk);

                        int nID = viewerInstance.Str2ID("op.document.OCRPages", false);
                        PDFXEdit.IOperation Op = viewerInstance.CreateOp(nID);
                        PDFXEdit.ICabNode input = Op.Params.Root["Input"];
                        input.v = doc;
                        PDFXEdit.ICabNode options = Op.Params.Root["Options"];

                        if (pages.Length == 0 || (pages.Length == 1 && pages[0] == -1))
                        {
                            options["PagesRange.Type"].v = "All";
                        }
                        else
                        {
                            options["PagesRange.Type"].v = "Exactly";
                            string pageValues = "";
                            for (int count = 0; count < pages.Length; count++)
                            {
                                if (pageValues != "")
                                {
                                    pageValues += ",";
                                }

                                pageValues += Convert.ToString(pages[count]);
                            }

                            options["PagesRange.Text"].v = pageValues;
                        }

                        options["OutputType"].v = 0;
                        options["OutputDPI"].v = 300;

                        Op.Do();

                        doc.WriteToFile(inputPDF);

                        doc.Close();
                        options.Clear();
                        input.Clear();

                        Logger.Log("PDF File: " + inputPDF + " had OCR completed", 5);
                    }
                    else
                    {
                        Logger.Log("PDF File: " + inputPDF + ", does not exist. Cannot execute OCR", 1);
                    }
                }
                catch (Exception ex)
                {
                    Logger.Log("Error runnong OCR on PDF: " + inputPDF, ex, 1);
                    int hr = Marshal.GetHRForException(ex);
                    PDFTools.LogErrMsg(hr);
                }
            });
            bool completed = myTask.Wait(1000 * 60 * PDFTools.pdfToolsTimeoutInMin);

            if (!completed)
            {
                Logger.Log("Timeout for PDF Tools reached. OCR Process cancelled: " + PDFTools.pdfToolsTimeoutInMin, 1);
            }
Sample Program that works:

Code: Select all

        public void QuickOCRTest(string filePath)
        {
            PDFTools.Init();
            PDFTools.RunOCRAndAddText(filePath, new int[] { });
        }
Obviously I can't paste the entire web listener code here as it's more complicated and can do far more than just OCR. But I'd appreciate any advice on what I may be doing wrong. The OCR Function works perfectly, as indicated by the sample program calling it with the same file passed from the service. I can confirm the file is the same and the access permissions to it are the exact same in both circumstances.

The INIT call is made when the Windows Service is started, but obviously each request comes in separately and fires off the chain of events for processing.

User avatar
Sasha - Tracker Dev Team
User
Posts: 4408
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR Hanging

Post by Sasha - Tracker Dev Team » Wed Feb 07, 2018 10:35 am

Hello DolphinMann,

Can this be recreated by making a small sample with a task that runs the OCR?

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: OCR Hanging

Post by DolphinMann » Wed Feb 07, 2018 3:51 pm

What do you mean exactly? The OCR Code, I've pasted above already creates a Task to execute the OCR process as the very first step. That code is the same whether it is called from the sample program or the web service.

Are you asking for me to also tell the sample program to run the function in a separate thread?

DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: OCR Hanging

Post by DolphinMann » Wed Feb 07, 2018 7:51 pm

I have tried everything I can think of and still cannot figure this out.

If I run the OCR from the following it works:

-Direct Function call from "public static void main" of a simple 2 line program, as shown in my above code snippet

However in my Windows service I cannot get it to work at all. I tried putting a hardcoded line at the service start to OCR a static file and it hangs. So it appears to be something related to the process running in a windows Service instead of a program

DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: OCR Hanging

Post by DolphinMann » Wed Feb 07, 2018 8:43 pm

More Info:

I think I have this fixed but am still doing some testing.

If I move the call to the INIT function, exactly as it is above to the "static void main" of my windows service, so now the PDFTools.Init is the very first thing the service does. This seems to work even when actions are execution in a separate thread.

However if I made the INIT call from a separate thread, it seems like the OCR would fail, but not all PDF calls would fail, just OCR.

I am still not sure exactly what is going on or if what I've done is the real fix or perhaps it is something else that I am just not aware of, but that seemed to help.

Does that make sense? Does it seem logical that such a thing could be the point of failure for the OCR operation?

User avatar
Sasha - Tracker Dev Team
User
Posts: 4408
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR Hanging

Post by Sasha - Tracker Dev Team » Thu Feb 08, 2018 9:06 am

Hello DolphinMann,

Well the Init and Shutdown methods should be done in the same thread and preferably once. Then everything else should behave normally. Again normally when we are talking about languages like C++ that gives us a direct memory management and management of the COM and we don't have such special effects as the .Net often gives us.

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: OCR Hanging

Post by DolphinMann » Thu Feb 08, 2018 3:14 pm

I did have some protection in INIT(and it was a static object) to ensure that it was only called once or if it was already called it would skip the function.

I think my problem was that the INIT was called on a different thread and thus not accessible to worker threads.

User avatar
Sasha - Tracker Dev Team
User
Posts: 4408
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR Hanging

Post by Sasha - Tracker Dev Team » Thu Feb 08, 2018 3:16 pm

Hello DolphinMann,

Yup, probably that was the cause.

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: OCR Hanging

Post by DolphinMann » Wed Aug 01, 2018 6:13 pm

Sorry to necro an old thread but this is related....

I am starting to have this issue again with a more complicated Windows Service. I don't want to derail the entire forum with "best threading practice", but what should I do for something like a Windows Service that spawns many worker threads.

When the Windows Service starts, let's say the MAIN thread is ID1. When I start a worker thread, ID2, I am not sure how to make a call back to Thread 1 as this is now a Windows APP with a GUI, in which I can always call back to the GUI thread. To my knowledge, and in C#, I would need to create a thread which polls and pulls from some type of blocking collection in order to accomplish this, which is no small change.

The other option, but curious if it will cause problems, would be to simply create a new instance of the PDF X-Change library for each worker thread, but this also doesn't seem like the best approach.

Can you provide any guidance or mention what others may have done. All I am looking to do is an Op.Do with OCR but in different threads each time I receive a request.

DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: OCR Hanging

Post by DolphinMann » Wed Aug 01, 2018 7:40 pm

I switched to the INST.AsyncDoAndWaitForFinish and now I am getting:

[PXVLib]: Wrong thread

Which makes sense based on our previous conversations, but I am not sure how to proceed here when NOT using the UI

User avatar
Sasha - Tracker Dev Team
User
Posts: 4408
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR Hanging

Post by Sasha - Tracker Dev Team » Mon Aug 06, 2018 8:50 am

Hello DolphinMann,

The AsyncDo/AsyncDoAndWaitForFinish does only work in the thread where the instance was initialized. These methods are launching the operation in another thread and in your case, you should just use the Op.Do() as you are already using another thread.

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: OCR Hanging

Post by DolphinMann » Mon Aug 06, 2018 11:49 am

Thank you Alex, however when I do that it seems to just hang. It is something related to permissions or the fact I am operating the execution as a Windows Service.

If I use the demo app from Tracker or create my own simply demo app with the GUI, the OCR Op.Do command works fine. This includes using my same code but calling my GUI instance of the PXV_INST object.

However the exact same code in a Windows service does not work and it just hangs. However it does not seem to do this everywhere, which is why I am confused. It is something about permissions, threading, or some other hidden scenario. It's just got me completely stuck. Without some advanced logging of what it is doing the hang it's hard to isolate the source. Is there a way to get internal logging?

User avatar
Sasha - Tracker Dev Team
User
Posts: 4408
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR Hanging

Post by Sasha - Tracker Dev Team » Tue Aug 07, 2018 5:51 am

Hello DolphinMann,

I've consulted with my teammates. And they advised to make a full dump (through the task manager for example) when you experience a hang and send it to us - we'll check whether there is any deadlock there from our side. You can upload a dump here for example:
https://www.tracker-software.com/knowle ... ile-server

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: OCR Hanging

Post by DolphinMann » Mon Aug 20, 2018 10:30 pm

One other question about this functionality....

Is there a way to set a timeout on Op.Do()? So I can kill/cancel that operation after X seconds?

Currently I have the entire function wrapped in a task, so while I can set a timer for my task, it technically doesn't kill the thread that Op.Do is executing.

User avatar
Sasha - Tracker Dev Team
User
Posts: 4408
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR Hanging

Post by Sasha - Tracker Dev Team » Tue Aug 21, 2018 6:24 am

Hello DolphinMann,

There is a way of doing this. You will have to implement your own IProgressMon derived class and then use the https://sdkhelp.tracker-software.com/vi ... rogressMon property of the Instance to replace the progress monitor for the time of the operation work. At first you will have to get the standard IProgressMon from the Instance to get the default implementations and also to replace it back after the operation. In your class, use that old IProgressMon implementations to implement all of the methods. In the https://sdkhelp.tracker-software.com/vi ... n_Canceled property you will have to use your timing logic along with the old implementation. If your timer exceeds the value that you need, then you will have to return bCanceled as true. As all of the operations check the canceled flag at some time of their work if the IProgressMon exists, that should break the operation when the time for that check comes.
There can be a situation that the operation hangs in the place where there is no Cancel check - then I'm afraid you will have to kill the service for correct restart.

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

Post Reply