OCR of pdf and pictures

PDF-X OCR SDK is a New product from us and intended to compliment our existing PDF and Imaging Tools to provide the Developer with an expanding set of professional tools for Optical Character Recognition tasks

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Sean - Tracker, Chris - Tracker Supp, Tracker Supp-Stefan

Post Reply
crimsonlogic
User
Posts: 38
Joined: Tue Jan 12, 2016 2:25 am

OCR of pdf and pictures

Post by crimsonlogic » Sat Jan 16, 2016 1:51 am

We bought Pro SDK license under CrimsonLogic Pte Ltd.

I have 3 problems now while doing OCR in my WPF application.

1) I am not able to OCR pdf with 17 pages and above.

2) I notice that some successfully OCRed files have text overlaid as in attached screenshot. How can I fix it?

3) When I convert image to pdf, the image size is quite small compared to original image. Where can I change the image size?
I’ve played around with the last 2 values in below line but I couldn’t manage to make the image bigger in pdf file.
PDFXC_Funcs.PXC_PlaceImage(cpage, p, Common.I2L(1), Common.PH - Common.I2L(1), Common.I2L(3), Common.I2L(2));

Please help to advise. Thank you very much.
Attachments
pdf-xchange_screenshot.pdf
(63.23 KiB) Downloaded 151 times

John - Tracker Supp
Site Admin
Posts: 8202
Joined: Tue Jun 29, 2004 10:34 am
Location: Vancouver Island - Canada
Contact:

Re: OCR of pdf and pictures

Post by John - Tracker Supp » Mon Jan 18, 2016 2:11 pm

Hi,

Can we please keep all OCR related questions in one forum - or email please - you are posting in multiple forums and also then sending emails - which is not helpful and just divides the effort to assist you as we are having to check if some items have been answered in emails or other forums first ...

I will move this one to the OCR forums and any others - so we can address them all logically - thank you.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com

John - Tracker Supp
Site Admin
Posts: 8202
Joined: Tue Jun 29, 2004 10:34 am
Location: Vancouver Island - Canada
Contact:

Re: OCR of pdf and pictures

Post by John - Tracker Supp » Mon Jan 18, 2016 2:17 pm

RE: Questions;

1) I am not able to OCR pdf with 17 pages and above.

Please advise what version of our products are being used, the spec of the hardware (processor, drive space and also Ram, OS) Also please provide an example of the PDF being OCR'd - could it be you are running out of resources ??? Perhaps try breaking the job into 'chunks'

2) I notice that some successfully OCRed files have text overlaid as in attached screenshot. How can I fix it?

Please supply before/after PDF files for us to analyse along with a snippet of the code you are using for this specific task.

3) When I convert image to pdf, the image size is quite small compared to original image. Where can I change the image size?
I’ve played around with the last 2 values in below line but I couldn’t manage to make the image bigger in pdf file.
PDFXC_Funcs.PXC_PlaceImage(cpage, p, Common.I2L(1), Common.PH - Common.I2L(1), Common.I2L(3), Common.I2L(2));

I have asked a colleague to help and advise on this specifically...
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com

crimsonlogic
User
Posts: 38
Joined: Tue Jan 12, 2016 2:25 am

Re: OCR of pdf and pictures

Post by crimsonlogic » Tue Jan 19, 2016 2:14 am

Hi John,

1) I am not able to OCR pdf with 17 pages and above.
>> We bought the license of PDF Xchange PRO SDK
>> On your website it shows
**NEW OCR Module Included** - Now includes PDF-X OCR SDK Module for converting image based PDF files to fully text searchable PDF files at no charge. For more information on this exciting new module and usage requirements for the free new add-on please visit our PDF-X OCR SDK Module page
>> We are using this PDF-X OCR SDK.
>> machine : 8 GB ram, I7, 64Bit OS.
>> Attached the pdf of 17 pages where you can try to OCR and update us on the outcome.
>> (please note that this 17 pages PDF was converted from word doc as your forum does not allow upload)
>> (let us know if you need the word copy to email to you.)
>> please see the code below.

2) I notice that some successfully OCRed files have text overlaid as in attached screenshot. How can I fix it?
>> Attached the pdf for your investigation. Please go through the pdf to see the issue.
>> ( Provide the program file on the OCR code)

3) When I convert image to pdf, the image size is quite small compared to original image. Where can I change the image size?
I’ve played around with the last 2 values in below line but I couldn’t manage to make the image bigger in pdf file.
PDFXC_Funcs.PXC_PlaceImage(cpage, p, Common.I2L(1), Common.PH - Common.I2L(1), Common.I2L(3), Common.I2L(2));
>> This we will wait for your feedback.

>> The code for OCR pdf.
private string ConvertPDFToOCR(string m_SourceFilename, string m_DestFilename, string language)
{
string result = "OK";
IntPtr pdf;
int hResult;
string OCRretcode;
int m_DPI;
string m_Datapath = Path.GetDirectoryName(Assembly.GetExecutingAssembly().GetName().CodeBase).Replace("file:\\", "") + @"\OCRLanguages\";

PDFXOCR_Funcs.PXO_Language m_Language = (PDFXOCR_Funcs.PXO_Language)Array.IndexOf(PDFXOCR_Funcs.OCR_LangFullArrayW, language); //GetOCRLanguage(language);

string langinit = PDFXOCR_Funcs.OCR_LangArrayW[Array.IndexOf(PDFXOCR_Funcs.OCR_LangFullArrayW, language)];

// Check if language file exists
string langfile = m_Datapath + @"ocrdats\" + langinit + "_pxvocr.dat";// m_Datapath + @"ocrdats\eng_pxvocr.dat"; //OCR Language file

// string err = string.Empty;

try
{
if (!System.IO.File.Exists(langfile))
{
result += "Language File Missing";
}
m_DPI = 200; //quality of OCR

string regkey = "XXXXXXXXXXXXXXXXXXXXXXX";
string devcode = "XXXXXXXXXXXXXXXXXXXXXXX";

//string key = "YOUR PRODUCT KEY";
//string code = "YOUR DEVELOPER CODE";
hResult = PDFXOCR_Funcs.OCR_Init(out pdf, regkey, devcode);

if (PDFXOCR_Funcs.IS_DS_FAILED(hResult))
{
result += "OCR Initialization failure.";
}

hResult = PDFXOCR_Funcs.OCR_SetCallback(pdf, thecallback, 0);

hResult = PDFXOCR_Funcs.OCR_LoadW(pdf, m_SourceFilename);
if (PDFXOCR_Funcs.IS_DS_FAILED(hResult))
{
result += "Error loading file: \n" + m_SourceFilename + "OCR Library Error";
}

PDFXOCR_Funcs.PXO_Options Options = new PDFXOCR_Funcs.PXO_Options();
Options.blacklist = string.Empty;
Options.whitelist = string.Empty;
Options.raster_dpi = m_DPI;
Options.ImageFlags = (uint)PDFXOCR_Funcs.OCR_ImageProcessingFlags.OCR_Image_FastAutorotate;
Options.DataPath = m_Datapath;
Options.lang = m_Language;
Options.RegionMode = PDFXOCR_Funcs.OCR_RegionMode.OCR_Auto;
Options.reserved = 0;

IntPtr pxoPagelist = IntPtr.Zero; // null pointer passed to OCR_MakeSearchable() will result in all pages being OCRd.

hResult = PDFXOCR_Funcs.OCR_MakeSearchable(pdf, ref Options, pxoPagelist);

if (PDFXOCR_Funcs.IS_DS_FAILED(hResult))
{
result += "Error running searchable.\nError code: " + hResult.ToString();
}
else
{
OCRretcode = hResult.ToString();
}

hResult = PDFXOCR_Funcs.OCR_SaveW(pdf, m_DestFilename);
if (PDFXOCR_Funcs.IS_DS_FAILED(hResult))
{
result += "Error saving output PDF file.\nError code: " + hResult.ToString();
}
PDFXOCR_Funcs.OCR_Delete(out pdf);
}
catch (Exception ex)
{
//throw ex;
result += "[EXCEPTION]" + ex.GetType();
result += "[EXCEPTION]" + ex.Message;
result += "[EXCEPTION]" + ex.StackTrace;
//Dispose();
//result += "Disposed OCRHelper class";
}
return result;
}

>> The code of Convert Word to PDF
private bool ConvertToPDF(string pdfpath, string inputfile)
{

bool isDone = false;
PXCComLib5.CPXCPrinter PDFPrinter;
PXCComLib5.CPXCControlEx prnFactory = new PXCComLib5.CPXCControlEx();
string regkey = "XXXXXXXXXXXX";
string devcode = "XXXXXXXXXXXX";
PDFPrinter = (PXCComLib5.CPXCPrinter)prnFactory.get_Printer("", "PDF-XChange Printer 2012", regkey, devcode);
PDFPrinter.Option["Save.ShowSaveDialog"] = false;
PDFPrinter.Option["Save.RunApp"] = false;
PDFPrinter.Option["Save.Path"] = pdfpath;
PDFPrinter.Option["Save.WhenExists"] = 1; //overwrite

PDFPrinter.SetAsDefaultPrinter();


System.Diagnostics.Process printJob = new System.Diagnostics.Process();
printJob.StartInfo.FileName = inputfile;
printJob.StartInfo.UseShellExecute = true;
printJob.StartInfo.Verb = "print";
printJob.StartInfo.WindowStyle = System.Diagnostics.ProcessWindowStyle.Minimized;
printJob.Start();
printJob.WaitForExit();
isDone = true;
return isDone;
}
Attachments
ABST.PDF
The overlay PDF
(20.76 KiB) Downloaded 147 times
test 17 pages and image-comment.pdf
17 pages PDF copy
(109.91 KiB) Downloaded 138 times

Lzcat - Tracker Supp
Site Admin
Posts: 711
Joined: Thu Jun 28, 2007 8:42 am

Re: OCR of pdf and pictures

Post by Lzcat - Tracker Supp » Tue Jan 19, 2016 7:41 am

Hi.
3) When I convert image to pdf, the image size is quite small compared to original image. Where can I change the image size?
I’ve played around with the last 2 values in below line but I couldn’t manage to make the image bigger in pdf file.
PDFXC_Funcs.PXC_PlaceImage(cpage, p, Common.I2L(1), Common.PH - Common.I2L(1), Common.I2L(3), Common.I2L(2));
If you read help for PXC_PlaceImage function you can see that the last two parameters specify width and height of an image in points (1/72 inch). I cannot see code of your I2L function, so cannot say why you are getting such small images - because of the error in I2L or because 3 and 2 values are simply too small.
HTH.
Victor
Tracker Software
Project manager

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.

User avatar
Sasha - Tracker Dev Team
User
Posts: 4202
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR of pdf and pictures

Post by Sasha - Tracker Dev Team » Tue Jan 19, 2016 7:51 am

Hello crimsonlogic,

As for the error code - it means OCR_ERR_INVALID_DICT_PATH meaning that you gave wrong path to the dictionary folder.

Do use these for problem investigating in future:

Code: Select all

OCRCORE_API LONG OCR_API OCRE_Err_FormatSeverity(HRESULT errorcode, LPSTR buf, LONG maxlen);
OCRCORE_API LONG OCR_API OCRE_Err_FormatFacility(HRESULT errorcode, LPSTR buf, LONG maxlen);
OCRCORE_API LONG OCR_API OCRE_Err_FormatErrorCode(HRESULT errorcode, LPSTR buf, LONG maxlen);
HTH,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

crimsonlogic
User
Posts: 38
Joined: Tue Jan 12, 2016 2:25 am

Re: OCR of pdf and pictures

Post by crimsonlogic » Tue Jan 19, 2016 9:14 am

Hi Sasha,

Sorry, don't quite understand. which error code you are referring to??

Thanks

User avatar
Sasha - Tracker Dev Team
User
Posts: 4202
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR of pdf and pictures

Post by Sasha - Tracker Dev Team » Tue Jan 19, 2016 9:32 am

Hello crimsonlogic,

It's about the error code that you've asked about ERROR CODE – 2113263855 == 0x820A2711

HTH
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

User avatar
Sasha - Tracker Dev Team
User
Posts: 4202
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR of pdf and pictures

Post by Sasha - Tracker Dev Team » Tue Jan 19, 2016 9:04 pm

By the way - it would be better if you could provide a small sample project (with your dlls included) where the problems occur and the guide on how to reproduce them. Then we could help you more efficiently. Because right now there are many questions from our side which could be answered if we had a working project.
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

crimsonlogic
User
Posts: 38
Joined: Tue Jan 12, 2016 2:25 am

Re: OCR of pdf and pictures

Post by crimsonlogic » Wed Jan 20, 2016 9:23 am

Hi Sasha,

We will email you a sample program and documents to try out via email (support@tracker-software.com) due to file size limitation in file attachment in this forum. We will send them in 2 separate emails. Thanks for your help.

crimsonlogic
User
Posts: 38
Joined: Tue Jan 12, 2016 2:25 am

Re: OCR of pdf and pictures

Post by crimsonlogic » Wed Jan 20, 2016 9:30 am

Hi Sasha,

We've tried to send you the programs and sample files via email but failed to send due to the file size. Do you have any other alternative way to deposit our files? Thanks.

John - Tracker Supp
Site Admin
Posts: 8202
Joined: Tue Jun 29, 2004 10:34 am
Location: Vancouver Island - Canada
Contact:

Re: OCR of pdf and pictures

Post by John - Tracker Supp » Wed Jan 20, 2016 9:44 am

How big are the attachments ?
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com

crimsonlogic
User
Posts: 38
Joined: Tue Jan 12, 2016 2:25 am

Re: OCR of pdf and pictures

Post by crimsonlogic » Wed Jan 20, 2016 9:51 am

Program file is about 25MB and sample files are about 4MB after zipping

User avatar
Sasha - Tracker Dev Team
User
Posts: 4202
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR of pdf and pictures

Post by Sasha - Tracker Dev Team » Wed Jan 20, 2016 9:58 am

Please post them to google drive or dropbox and give us a link.

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

crimsonlogic
User
Posts: 38
Joined: Tue Jan 12, 2016 2:25 am

Re: OCR of pdf and pictures

Post by crimsonlogic » Wed Jan 20, 2016 10:32 am

Hi Sasha,

Our client is a government agency and they prohibit us to upload their code to cloud due to security concern.

Please help us to provide a secured repository to upload the files. Thank you very much.

User avatar
Tracker Supp-Stefan
Site Admin
Posts: 13424
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: OCR of pdf and pictures

Post by Tracker Supp-Stefan » Wed Jan 20, 2016 10:40 am

Hello crimsonlogic,

Maybe you can upload the files to our ftp server?
You can find the details for it here:
http://www.tracker-software.com/knowledgebase/321
However as the FTP is open to anyone - we would recommend you to password protect the files uploaded, and then send us the password e.g. via e-mail to support@tracker-software.com

Regards,
Stefan

crimsonlogic
User
Posts: 38
Joined: Tue Jan 12, 2016 2:25 am

Re: OCR of pdf and pictures

Post by crimsonlogic » Thu Jan 21, 2016 3:30 am

Hi Stefan,

Thank you for your reply. We have uploaded the files and sent password in email.

User avatar
Sasha - Tracker Dev Team
User
Posts: 4202
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR of pdf and pictures

Post by Sasha - Tracker Dev Team » Thu Jan 21, 2016 7:30 am

Hello crimsonlogic,

Thanks for the sample - we'll look at it.
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

crimsonlogic
User
Posts: 38
Joined: Tue Jan 12, 2016 2:25 am

Re: OCR of pdf and pictures

Post by crimsonlogic » Fri Jan 22, 2016 1:36 am

Hi Sasha,

Any updates??

Thanks

User avatar
Sasha - Tracker Dev Team
User
Posts: 4202
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR of pdf and pictures

Post by Sasha - Tracker Dev Team » Fri Jan 22, 2016 11:12 am

Hello crimsonlogic,

Looking at your files in media.zip we've investigated this so far:
The DWC.pdf created had been already OCR'd by some external converter (libtiff / tiff2pdf - 2.3.606.0) with the text overlay that has invisible text.

When this file is OCR'd the text becomes visible and the background image + this text is going through our OCR engine. Thus you'll have the visible text (aligned by top in you example) and the OCR'd image background with the invisible text on top of it. Of course this text will be corrupted where it was overlayed with previously invisible text.

HTH,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

crimsonlogic
User
Posts: 38
Joined: Tue Jan 12, 2016 2:25 am

Re: OCR of pdf and pictures

Post by crimsonlogic » Mon Jan 25, 2016 8:21 am

HI Sasha,

is it possible to know if the file has already been OCR when pass through PDF Xchange SDK?

Any updates on the other issue?


Thanks
fya

User avatar
Sasha - Tracker Dev Team
User
Posts: 4202
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR of pdf and pictures

Post by Sasha - Tracker Dev Team » Mon Jan 25, 2016 8:28 am

Hello crimsonlogic,

Maybe it's better to look at the pdf generator and it's options so that it won't generate any text?

Do you mean the 17 page problem as the other problem?

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

crimsonlogic
User
Posts: 38
Joined: Tue Jan 12, 2016 2:25 am

Re: OCR of pdf and pictures

Post by crimsonlogic » Tue Jan 26, 2016 2:45 am

HI Sasha,

yes, we need the solution of the 17 pages error.


Thanks
fya

crimsonlogic
User
Posts: 38
Joined: Tue Jan 12, 2016 2:25 am

Re: OCR of pdf and pictures

Post by crimsonlogic » Tue Jan 26, 2016 2:52 am

HI Sasha,

Don't understand your statement

Maybe it's better to look at the pdf generator and it's options so that it won't generate any text?

The PDF program given performs OCR which causes the overlay. What do you mean by the PDF generator??

The other issue is a word file, convert to PDF format and the OCR.
The convert to PDF format has no issue.
Where as the OCR process throws error.
Please try the program as we take effort to build to show the issue.
Please get the developer to look at the codes if you are not able to do so.

We need the solution ASAP as we are already reported the issues for over a week with no progress.

thanks
fya

Ivan - Tracker Software
Site Admin
Posts: 3607
Joined: Thu Jul 08, 2004 10:36 pm
Location: Vancouver Island - Canada
Contact:

Re: OCR of pdf and pictures

Post by Ivan - Tracker Software » Tue Jan 26, 2016 7:54 am

yes, we need the solution of the 17 pages error.
As we already mentioned, the problem is because your process is 32-bit.
32-bit processes have limited address space available, and, what is most important, in modern OSes Address Space Layout Randomization (https://en.wikipedia.org/wiki/Address_s ... domization) technology makes this address space highly fragmented and application often cannot allocate big continues buffer of memory (for example, one Letter page on 300 dpi requires about 32 Mb of memory on rasterization).
The only possible solutions I can recommend here:
1. create separate .exe that will OCR document and turn off ASLR for this .exe (not sure in .NET allows to do that)
2. convert your app to 64-bits.

HTH
Tracker Software (Project Director)

When attaching files to any message - please ensure they are archived and posted as a .ZIP, .RAR or .7z format - or they will not be posted - thanks.

crimsonlogic
User
Posts: 38
Joined: Tue Jan 12, 2016 2:25 am

Re: OCR of pdf and pictures

Post by crimsonlogic » Tue Feb 02, 2016 4:49 am

Hi,

As Alex said above, overlaid text is due to the pdf we use has been already OCRed. How can we know whether the pdf is already OCRed?

We have another problem in converting word file to pdf. Our code is as follow:

Firstly, we opened one word document (doc1.docx). Then, launch our application and upload another word document (doc2.docx) which will run below code to convert to PDF. Default printer is set to physical printer.

The below code still uses physical printer instead of using PDF-Xchange Printer. doc2.docx is printed out from physical printer instead of getting converted to PDF. Please advise us ASAP as this issue is stopping business flows for our live system.


PDFPrinter = (PXCComLib5.CPXCPrinter)prnFactory.get_Printer("", "PDF-XChange Printer 2012", regkey, devcode);
PDFPrinter.Option["Save.ShowSaveDialog"] = false;
PDFPrinter.Option["Save.RunApp"] = false;
PDFPrinter.Option["Save.Path"] = pdfpath;
PDFPrinter.Option["Save.WhenExists"] = 1; //overwrite

PDFPrinter.SetAsDefaultPrinter();

System.Diagnostics.Process printJob = new System.Diagnostics.Process();
printJob.StartInfo.FileName = inputfile;
printJob.StartInfo.UseShellExecute = true;
printJob.StartInfo.Verb = "print";
printJob.StartInfo.WindowStyle = System.Diagnostics.ProcessWindowStyle.Minimized;
printJob.Start();
printJob.WaitForExit(60000);

PDFPrinter.RestoreDefaultPrinter();

User avatar
Sasha - Tracker Dev Team
User
Posts: 4202
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR of pdf and pictures

Post by Sasha - Tracker Dev Team » Tue Feb 02, 2016 8:39 am

Hello crimsonlogic,

We suspect that this is a Windows 10 issue.
Do try this - we've just tested this code and it worked for us:

Code: Select all

            PXCComLib5.CPXCPrinter PDFPrinter;
            PXCComLib5.CPXCControlEx prnFactory = new PXCComLib5.CPXCControlEx();

            PDFPrinter = (PXCComLib5.CPXCPrinter)prnFactory.get_Printer("", "PDF-XChange Printer 2012", regkey, devcode);
            PDFPrinter.Option["Save.ShowSaveDialog"] = false;
            PDFPrinter.Option["Save.RunApp"] = false;
            PDFPrinter.Option["Save.Path"] = ocrfile;
            PDFPrinter.Option["Save.WhenExists"] = 1; //overwrite

            System.Diagnostics.Process printJob = new System.Diagnostics.Process();
            printJob.StartInfo.FileName = inputfile;
            printJob.StartInfo.UseShellExecute = true;
            printJob.StartInfo.Verb = "printto";
            printJob.StartInfo.Arguments = "\"" + PDFPrinter.Name + "\"";
            printJob.StartInfo.WindowStyle = System.Diagnostics.ProcessWindowStyle.Minimized;
            printJob.Start();
            printJob.WaitForExit(60000);

            return "ok";
HTH
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

crimsonlogic
User
Posts: 38
Joined: Tue Jan 12, 2016 2:25 am

Re: OCR of pdf and pictures

Post by crimsonlogic » Wed Feb 17, 2016 10:25 am

Hi Support,

I converted my application to 64bit according to Tracker's advice.
I am not able to convert image files to pdf. I've replaced all dlls from Bin.64 folders from Tracker Software\PDF-XChange PRO 5 SDK\Examples
Our code is as follows:
if (Common.IS_DS_FAILED(PDFXC_Funcs.PXC_NewDocument(out pdf, regkey, devcode)))
resultstr += "ConvertOthersToOCR: IS_DS_FAILED";
PDFXC_Funcs.PXC_SetDocumentInfoA(pdf, PDFXC_Funcs.PXC_StdInfoField.InfoField_Author, "Tracker Software");
PDFXC_Funcs.PXC_SetDocumentInfoA(pdf, PDFXC_Funcs.PXC_StdInfoField.InfoField_Title, "PDF-XChange 4.0 Examples");
PDFXC_Funcs.PXC_SetDocumentInfoA(pdf, PDFXC_Funcs.PXC_StdInfoField.InfoField_Creator, "PDF-XChange 4.0");
PDFXC_Funcs.PXC_SetDocumentInfoA(pdf, PDFXC_Funcs.PXC_StdInfoField.InfoField_Keywords, "PDF-XChange; Examples; 4.0; C#");
PDFXC_Funcs.PXC_EnableLinkAnalyzer(pdf, true);
PDFXC_Funcs.PXC_SetCompression(pdf, false, false, PDFXC_Funcs.PXC_CompressionType.ComprType_C_Auto,
75, PDFXC_Funcs.PXC_CompressionType.ComprType_I_Auto, PDFXC_Funcs.PXC_CompressionType.ComprType_M_Auto);


int res = PDFXC_Funcs.PXC_AddPage(pdf, Common.PW, Common.PH, out page);
if (Common.IS_DS_FAILED(res))
resultstr += "ConvertOthersToOCR: " + res;
cpage = page;

double iw, ih;
res = PDFXC_Funcs.PXC_AddImageA(pdf, inputfile, out p);
if (Common.IS_DS_FAILED(res))
resultstr += "ConvertOthersToOCR: " + res;
PDFXC_Funcs.PXC_GetImageDimension(pdf, p, out iw, out ih);
PDFXC_Funcs.PXC_PlaceImage(cpage, p, Common.I2L(1), Common.PH - Common.I2L(1), Common.I2L(7), Common.I2L(8));

PDFXC_Funcs.PXC_WriteDocumentExA(pdf, extractfile, extractfile.Length, fl, "");
PDFXC_Funcs.PXC_ReleaseDocument(pdf);

I am getting this error code -2113667071 from below line and no pdf is generated.

res = PDFXC_Funcs.PXC_AddImageA(pdf, inputfile, out p);

Please advise.

Thank you very much.

User avatar
Sasha - Tracker Dev Team
User
Posts: 4202
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR of pdf and pictures

Post by Sasha - Tracker Dev Team » Wed Feb 17, 2016 10:58 am

Hello crimsonlogic,

Please do not post error codes only - use PXC_Err_FormatErrorCode method.
The error code that you've provided means Invalid Argument.
The code sample does not contain enough information for that method.
Please provide samples with FULL problem data.
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

crimsonlogic
User
Posts: 38
Joined: Tue Jan 12, 2016 2:25 am

Re: OCR of pdf and pictures

Post by crimsonlogic » Wed Feb 17, 2016 11:30 am

Hi Sasha,

We are uploading sample project (TestPDFXChangeORG.zip) to Tracker's FTP . Please unzip with the password sent in a separate email to 'support@tracker-software.com'

The sample data file (CL.TIF) is in Temp.zip.

Please advise how we can use PXC_Err_FormatErrorCode in our program too.

Thank you very much.

User avatar
Sasha - Tracker Dev Team
User
Posts: 4202
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR of pdf and pictures

Post by Sasha - Tracker Dev Team » Wed Feb 17, 2016 3:18 pm

How to use FormatErrorCode method:

Code: Select all

					byte[] bytes = new byte[128 * sizeof(char)];
					PDFXC_Funcs.PXC_Err_FormatErrorCode(-2113667071, bytes, bytes.Length);
					string str = System.Text.Encoding.ASCII.GetString(bytes);
Please post the error message with the error code itself when you need to include it in your message.

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

User avatar
Sasha - Tracker Dev Team
User
Posts: 4202
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR of pdf and pictures

Post by Sasha - Tracker Dev Team » Thu Feb 18, 2016 12:42 pm

Hello crimsonlogic,

I've updated the zip archive ClassLibrary1.zip with the same password that you've specified.
The problem was in the int type - C# understands int as the 32 bit value thus when you switched to the x64 the pointers that were used became corrupted. I've modified them to IntPtr and it all worked properly.
In the archive there are files that I modified.

HTH,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

Post Reply