PDF-XChange - Tracker PDF Viewer - TIFF-XChange - Image-XChange - XMF-XChange - Raster-XChange - Support

Moderators: Tracker Support, TrackerSupp-Daniel, Paul - Tracker Supp, Chris - Tracker Supp, Vasyl-Tracker Dev Team, Ivan - Tracker Software, Sean - Tracker, Tracker Supp-Stefan

Forum rules
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.

When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
 
User avatar
MartinCS
User
Topic Author
Posts: 137
Joined: Thu Apr 07, 2011 10:01 am
Contact:

Retrieve text of 'OCRPages' function and correct use of languages (folder)

Tue Jan 16, 2018 5:12 am

Hi Tracker-Team,

I'm using the following code to perform OCR recognizition of pdf documents (https://sdkhelp.tracker-software.com/vi ... t_OCRPages):
var nId = pdfCtl.Inst.Str2ID("op.document.OCRPages", false);
var pOp = pdfCtl.Inst.CreateOp(nId);
var input = pOp.Params.Root["Input"];

input.v = pdfDocumentModel.PxvDocument.CoreDoc;

ICabNode options = pOp.Params.Root["Options"];
options["PagesRange.Type"].v = rangeType;
options["OutputType"].v = outputType;
options["OutputDPI"].v = outputDpi;

pdfCtl.Inst.AsyncDoAndWaitForFinish(pOp);

The code works withoutout problems. My question is, how can I get the searchable text of the new created layer? I have the need to save this searchable text in our backend database system.

Additionally, I'd like to ask how I can use the 'Languages' folder in order to get the best OCR recognizition results? I have put the folder in the same directory where the PDFXEdit dlls reside:
16-01-_2018_07-29-37.jpg

But I don't get the same OCR result when testing it with the standalone PDF Editor. In my contructor of my main form (which contains the pdf control) I do load the OCR plugin and the saved pdf document contains OCR text. So, I don't think there are no problems with the plugin.

// Martin
 
User avatar
Sasha - Tracker Dev Team
User
Posts: 2878
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Tue Jan 16, 2018 1:36 pm

Hello Martin,

The OCR operation calls https://sdkhelp.tracker-software.com/vi ... addContent operation inside of it for each page and it has a Content option. Basically you can take the text from there.
As for your second question, make sure, that the settings of both the End-User Editor and your sample are the same.

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
 
User avatar
MartinCS
User
Topic Author
Posts: 137
Joined: Thu Apr 07, 2011 10:01 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Wed Jan 17, 2018 10:14 am

Hello Alex,

Thank you for the information. Unfortunately, it's not clear to me how I can access the "addContent" function, as it is internal to the "OcrPages" function. Is there maybe an event that I've not seen before that would give me access to this stage of the code execution, or could you please elaborate on how I would go about getting access to the "Content" property?

Thank you!

// Martin
 
User avatar
Sasha - Tracker Dev Team
User
Posts: 2878
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Wed Jan 17, 2018 10:42 am

Hello Martin,

In this case the OCR Pages operation call the AddContent operations from inside of it. Thus you can listen to these inner operations and take their data.
You will have to listen to the https://sdkhelp.tracker-software.com/vi ... oreExecute event and see whether it's an OCR pages operation (mark this with some bool value). The e.operExecuted event will terminate that bool value if it's the OCR operation.
Then when this bool value is true listen to the operBeforeExecute events for addContent operations - this will indicate that these operations are being executed from the OCR operation. Then you can get the needed information from each of them.

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
 
User avatar
MartinCS
User
Topic Author
Posts: 137
Joined: Thu Apr 07, 2011 10:01 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Thu Jan 18, 2018 1:10 pm

Hi Alex,

thank you your help!

Regarding my second question. I'm not able to configure the language which should be used by default. In my implementation I'm using the SDK with the activeX control and I have put the OCR languages in the same folder where your dlls are residing:
18-01-_2018_13-51-36.jpg

The 'OCRLanguages' folder has all available language files included:
18-01-_2018_13-52-37.jpg

When my application loads I do load the OCR plugin (right after 'InitializeComponents'):
public FrmMain()
{
   this.InitializeComponent();
   
   Instance.PxvInst.StartLoadingPlugins();
   PxvInst.AddPluginFromFile(
                Path.Combine(EplassConfiguration.FilePath, "Plugins.x86", "OCRPlugin.pvp"));
   PxvInst.FinishLoadingPlugins();
}

Right after that I import my *.xcs setting file in order/hoping to set the default languages which I have configured and exported via standalone Pdf Editor:
18-01-_2018_14-00-08.jpg

with this code:
var op =pdfCtl.Inst.CreateOp(pdfCtl.Inst.Str2ID("op.settings.import"));

if (op == null)
{
    return;
}

op.Params.Root["Options.History"].v = false;

op.Params.Root["Input"].v = FsInst.DefaultFileSys.StringToName(filePathSettings);
op.Do();

After that I run the OCR function (see code in my very first post). I don't get any exceptions and everything seems to run without any problems. The new OCR recognized file is also created.

But I don't get the same text results within the searchable layer containing the text when I run it with the standalone Pdf Editor.

I hope you guys can help me!

// Martin
 
User avatar
Sasha - Tracker Dev Team
User
Posts: 2878
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Thu Jan 18, 2018 1:22 pm

Hello Martin,

Try this in the OCR operation:
ICabNode options = pOp.Params.Root["Options"];
options["ExtParams.Language"].v = "deu+eng+fra+spa"; //separate the needed languages with +
options["ExtParams.Accuracy"].v = 300;


Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
 
User avatar
MartinCS
User
Topic Author
Posts: 137
Joined: Thu Apr 07, 2011 10:01 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Thu Jan 18, 2018 1:46 pm

Hi Alex,

I added your code lines but it made no difference. Can I send you my pdf I'm using via pm or email? I just would need your email address.

The recognized text from the OCR function is:
Schon heule isl der an- und abschwellende Gülerzuglörm für uns sehr
slörend und hal bereils slark zugenommen. Die vorgelegle Planung eriülll níchl
einmal die geseizlíchen Grenzwene. Schienenlörm isl níchl harmloser als anderer Lörm,
der an- und abschwellende Lörm isl sogar schlimmer. Die „Millelung“ des Lörms muss
abgeschaffl werden, weil die slörende Unlerbrechungswírkung der Lörmspilzen so níchl
erfassl wird.


This is the result of the standalone Pdf Editor:
Schon heute ist der an- und abschwellende Güterzuglörm für uns sehr
störend und hat bereits stark zugenommen. Die vorgelegte Planung erfüllt nicht
einmal die gesetzlichen Grenzwerte. Schienenlörm ist nicht harmloser als anderer Lärm,
der an— und abschwellende Larm ist sogar schlimmer. Die „Mittelung“ des Lörms muss
abgeschafft werden, weil die störende Unterbrechungswirkung der LdrmSpitzen so nicht
erfasst wird.

I noticed that the text recogniztion in the Pdf Editor takes a little longer on the contrary the OCR function is much faster.

// Martin
 
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 12191
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Thu Jan 18, 2018 1:51 pm

Hi Martin,

You can send the sample file to support@tracker-software.com and we will pass it along to Alex!

Cheers,
Stefan
 
User avatar
MartinCS
User
Topic Author
Posts: 137
Joined: Thu Apr 07, 2011 10:01 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Thu Jan 18, 2018 2:05 pm

Hi Stefan,

I send you the email containing the pdf file. Please forward my second email containing the screenshot with my settings for Pdf Editor.

Thank you!

// Martin
 
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 12191
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Thu Jan 18, 2018 2:09 pm

Hi Martin,

Thanks, we got the files and are already passing them along!

Cheers,
Stefan
 
User avatar
MartinCS
User
Topic Author
Posts: 137
Joined: Thu Apr 07, 2011 10:01 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Thu Jan 18, 2018 3:24 pm

:D Thank you!
 
User avatar
Sasha - Tracker Dev Team
User
Posts: 2878
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Fri Jan 19, 2018 11:03 am

Hello Martin,

I've reproduced some strange behavior - will investigate.

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
 
User avatar
MartinCS
User
Topic Author
Posts: 137
Joined: Thu Apr 07, 2011 10:01 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Fri Jan 19, 2018 3:57 pm

Hi Alex,

thank you for the information! I'm hoping you will find the issue. Fingers crossed!

Will wait for your reply.

// Martin
 
User avatar
Sasha - Tracker Dev Team
User
Posts: 2878
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Sat Jan 20, 2018 7:46 am

Hello Martin,

I've found what caused this behavior - it is the license key. When OCR renders the page for the recognition it also renders the Watermark. If you specify a valid dev key then the OCR would work the same as in the End-User Editor.

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
 
User avatar
MartinCS
User
Topic Author
Posts: 137
Joined: Thu Apr 07, 2011 10:01 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Mon Jan 22, 2018 5:13 am

Hi Alex,

I'm happy you've been able to find the issue. Although, I don't understand that our license key should be the issue. Last year we sent your sales department a signed license agreement with the attached license key which we have received for the Editor SDK. I will sent you the license agreement in a separate email. Could you check the license please an let me know if there something wrong with it?

Thank you!

// Martin
 
User avatar
Sasha - Tracker Dev Team
User
Posts: 2878
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Mon Jan 22, 2018 8:07 am

Hello Martin,

That's the issue that I've experienced (if you enter an invalid key and get a watermark on page). In your case, it looks like some settings are off. Please try doing the OCR with these parameters (these should be identical to the screenshot that you have provided earlier):
private void OCRPages(PDFXEdit.IPXV_Inst Inst, PDFXEdit.IPXV_Document Doc)
{
   int nID = Inst.Str2ID("op.document.OCRPages", false);
   PDFXEdit.IOperation Op = Inst.CreateOp(nID);
   PDFXEdit.ICabNode input = Op.Params.Root["Input"];
   input.v = Doc;
   PDFXEdit.ICabNode options = Op.Params.Root["Options"];
   options["PagesRange.Type"].v = "All"; //OCR all pages
   options["OutputType"].v = 0;
   options["OutputDPI"].v = 300;
   options["ExtParams.Language"].v = "deu+eng"; //separate the needed languages with +
   options["ExtParams.Accuracy"].v = 300;
   options["ExtParams.AutoDeskew"].v = false;
   Inst.AsyncDoAndWaitForFinish(Op);
}


Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
 
User avatar
MartinCS
User
Topic Author
Posts: 137
Joined: Thu Apr 07, 2011 10:01 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Tue Jan 23, 2018 12:31 pm

Alex,

I sent you an email today containing a link for a solution file. If you have any further questions, please let me know.

// Martin
 
User avatar
Sasha - Tracker Dev Team
User
Posts: 2878
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Tue Jan 23, 2018 1:02 pm

Hello Martin,

What email are you talking about exactly? The code that I provided should work the same as the OCR in the End-User Editor. Have you tried it?

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
 
User avatar
MartinCS
User
Topic Author
Posts: 137
Joined: Thu Apr 07, 2011 10:01 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Wed Jan 24, 2018 4:24 am

Hi Alex,

I have to appologize! I just realized that I used the wrong email when I was sending my email yesterday. I will re-send it using the correct email this time. Yes, I did try the code but it doesn't make any changes to the positive. I also noticed that this code of line doesn't take that long like the ocr recognizition takes in the End-User Editor:
pdfCtl.Inst.AsyncDoAndWaitForFinish(pOp);

It also seems like that the ocr recogniztion stops right after the first page and if i open the "OCR-Processed.pdf" (see code in solution) it did not add the additional text layer.

// Martin
 
User avatar
Sasha - Tracker Dev Team
User
Posts: 2878
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Wed Jan 24, 2018 8:18 am

Hello Martin,

Please mail me to the polaringu@tracker-software.com directly.

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
 
User avatar
Sasha - Tracker Dev Team
User
Posts: 2878
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Wed Jan 24, 2018 3:04 pm

Hello Martin,

It seems there is a problem with languages in your project - please read this post:
viewtopic.php?p=97913#p97951
Also, there is a problem with plugin loading and Instance usage:
1) The InitializeComponent method should be after the Inst initialization and plugin loading
2) You should also include the Shutdown method for Inst in the FormClosed event.
For all of this see FullDemo.
Also, the strange OCR results are reoccurring here - will investigate. Are you sure that you have taken all of the latest files from the End-User Editor?

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
 
User avatar
MartinCS
User
Topic Author
Posts: 137
Joined: Thu Apr 07, 2011 10:01 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Tue Jan 30, 2018 5:47 am

Hi Alex,

I addressed the two points you mentioned and I'm absolutely positive that I'm using the latest files from End-User Editor. I also tested the updated files from the last version you puglished last week for the End-User Editor. But still there are no changes to the positive.

// Martin
 
User avatar
Sasha - Tracker Dev Team
User
Posts: 2878
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Tue Jan 30, 2018 7:55 am

Hello Martin,

We are holding a release in a couple of days - I will be able to tend to your problem afterwards.

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
 
User avatar
MartinCS
User
Topic Author
Posts: 137
Joined: Thu Apr 07, 2011 10:01 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Thu Feb 01, 2018 4:48 am

Hi Alex,

Thank you very much for the information!

// Martin
 
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 12191
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Thu Feb 01, 2018 11:45 am

:D
 
User avatar
Sasha - Tracker Dev Team
User
Posts: 2878
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Sat Feb 03, 2018 1:28 pm

Hello Martin,

Have you tried the new build? Please do so and see whether the problem still recreates.

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
 
User avatar
MartinCS
User
Topic Author
Posts: 137
Joined: Thu Apr 07, 2011 10:01 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Thu Feb 08, 2018 3:44 pm

Hi Alex,

I sent you an email with detailed information.

// Martin
 
User avatar
MartinCS
User
Topic Author
Posts: 137
Joined: Thu Apr 07, 2011 10:01 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Tue Feb 13, 2018 7:56 am

Hi Alex,

have you received my email and did you have a Chance to have a look at it?

// Martin
 
User avatar
Sasha - Tracker Dev Team
User
Posts: 2878
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Tue Feb 13, 2018 3:57 pm

Hello Martin,

Got your E-Mail and will further investigate the problem tomorrow.

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
 
User avatar
Sasha - Tracker Dev Team
User
Posts: 2878
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Retrieve text of 'OCRPages' function and correct use of languages (folder)

Thu Feb 15, 2018 8:41 am

Hello Martin,

Yesterday I've tested your sample and the End-User Editor - it gives the same results as an output.

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

Who is online

Users browsing this forum: No registered users and 1 guest