Page 1 of 1

PDFXChange PRO OCR SDK Issue OCR Text output is junk characters

Posted: Fri Oct 19, 2018 3:37 am
by admin-emmeluth
Hi,

We tried using the latest PDFXchange PRO OCR SDK [main dll: OcrTools.x64.dll ] using the C# example demo provided along with the SDK . But the text output after OCR in the PDF document is just junk characters. Attached is the C# demo code used.Also attached is the document that need to be OCRed. Please see the attache image file which contains the sample junk text copied from the PDF doc after OCR. Please advise what needs to be done in code to fix this issue.
textAfterOCR.PNG
Thanks,
CL team.

Re: PDFXChange PRO OCR SDK Issue OCR Text output is junk characters

Posted: Thu Nov 01, 2018 11:26 am
by Tracker Supp-Stefan
Hello admin-emmeluth,

Apologies for the delay in following up on this one!
I've passed it along to a colleague in the dev team who works with our OCR SDK, and as soon as we have any further advise we will post here!

Regards,
Stefan

Re: PDFXChange PRO OCR SDK Issue OCR Text output is junk characters

Posted: Thu Nov 01, 2018 1:11 pm
by Tracker Supp-Stefan
Hello admin-emmeluth,

My colleague who reviewed your code says that you've specified 100 DPI for rasterization, please try with 200 or 300 DPI - and you will get much better results!

Regards,
Stefan

Re: PDFXChange PRO OCR SDK Issue OCR Text output is junk characters

Posted: Fri Nov 02, 2018 2:58 am
by admin-emmeluth
Hi,

Thanks for your timely help.

As suggested we tried with 300 DPI. But, the file size is coming out to be 13MB for 3MB file.
When tried with 200 DPI, the output is not correct as it is putting junk characters and the file size is 8MB.
Please help on this.

Re: PDFXChange PRO OCR SDK Issue OCR Text output is junk characters

Posted: Fri Nov 02, 2018 1:12 pm
by Tracker Supp-Stefan
Hello admin-emmeluth,

Currently you have this flag:
Options.ImageFlags = (uint)PDFXOCR.PDFXOCR_Funcs.OCR_ImageProcessingFlags.OCR_Image_Autorotate;

Please try to also add this one:
OCR_Content_Original

And let us know the result!

Regards,
Stefan

Re: PDFXChange PRO OCR SDK Issue OCR Text output is junk characters

Posted: Mon Nov 05, 2018 4:04 am
by admin-emmeluth
Hi,

Thanks for the prompt reply.

I could not see any option for OCR_Content_Original in the current sample code provided.
As I browse through your forum I could see that there is a image flag set for that and I applied the same like the one below bold one in the PDFXOCR_Funcs class & then applied in OCR Options as

Code: Select all

[b]Options.ImageFlags = (uint)PDFXOCR.PDFXOCR_Funcs.OCR_ImageProcessingFlags.OCR_Content_Original;[/b]

Code: Select all

        public enum OCR_ImageProcessingFlags
        {
            	OCR_Image_NoRotate = 0x0000,
	            OCR_Image_Autorotate = 0x0001,
	            OCR_Image_EdgeRefine = 0x0002,
	            OCR_Image_GaussianBlur = 0x0004,
	            OCR_Image_SuppressOutput = 0x0008, // only place text layer
	            OCR_Image_FastAutorotate = 0x0011, // OCR_Image_Autorotate bit included
	            OCR_Text_PlaceByLines = 0x0020, // smaller but less accurate output.
                [b]OCR_Content_Original = 0x0040[/b]
        }
Is this the correct way to do it? As, I am getting same results as mentioned in my previous comments with larger file size and junk characters when reduce DPI. Please help. We don't have much options left now.

Re: PDFXChange PRO OCR SDK Issue OCR Text output is junk characters

Posted: Mon Nov 05, 2018 12:00 pm
by Tracker Supp-Stefan
Hello admin-emmeluth,

Please take a look at this topic from September:
viewtopic.php?f=42&t=31422

That discussion there made it necessary for our developers to include this OCR_Content_Original parameter.

The fixes provided in the custom DLLs included there should already be included in the latest live SDK builds on our website, so please make sure to update to build 327.1 if you have not already!

Regards,
Stefan

p.s. I have also blocked your serial numbers which you forgot to remove from your sample project.
I also removed the project from your original post.
Please contact us on sales@pdf-xchange.com so that we can issue you replacement ones.