Hi,
We tried using the latest PDFXchange PRO OCR SDK [main dll: OcrTools.x64.dll ] using the C# example demo provided along with the SDK . But the text output after OCR in the PDF document is just junk characters. Attached is the C# demo code used.Also attached is the document that need to be OCRed. Please see the attache image file which contains the sample junk text copied from the PDF doc after OCR. Please advise what needs to be done in code to fix this issue.
Thanks,
CL team.
PDFXChange PRO OCR SDK Issue OCR Text output is junk characters
Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan
-
- User
- Posts: 3
- Joined: Tue Mar 17, 2015 6:16 am
PDFXChange PRO OCR SDK Issue OCR Text output is junk characters
- Attachments
-
- WO2014044840A1.pdf
- (2.74 MiB) Downloaded 272 times
- Tracker Supp-Stefan
- Site Admin
- Posts: 17853
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
- Contact:
Re: PDFXChange PRO OCR SDK Issue OCR Text output is junk characters
Hello admin-emmeluth,
Apologies for the delay in following up on this one!
I've passed it along to a colleague in the dev team who works with our OCR SDK, and as soon as we have any further advise we will post here!
Regards,
Stefan
Apologies for the delay in following up on this one!
I've passed it along to a colleague in the dev team who works with our OCR SDK, and as soon as we have any further advise we will post here!
Regards,
Stefan
- Tracker Supp-Stefan
- Site Admin
- Posts: 17853
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
- Contact:
Re: PDFXChange PRO OCR SDK Issue OCR Text output is junk characters
Hello admin-emmeluth,
My colleague who reviewed your code says that you've specified 100 DPI for rasterization, please try with 200 or 300 DPI - and you will get much better results!
Regards,
Stefan
My colleague who reviewed your code says that you've specified 100 DPI for rasterization, please try with 200 or 300 DPI - and you will get much better results!
Regards,
Stefan
-
- User
- Posts: 3
- Joined: Tue Mar 17, 2015 6:16 am
Re: PDFXChange PRO OCR SDK Issue OCR Text output is junk characters
Hi,
Thanks for your timely help.
As suggested we tried with 300 DPI. But, the file size is coming out to be 13MB for 3MB file.
When tried with 200 DPI, the output is not correct as it is putting junk characters and the file size is 8MB.
Please help on this.
Thanks for your timely help.
As suggested we tried with 300 DPI. But, the file size is coming out to be 13MB for 3MB file.
When tried with 200 DPI, the output is not correct as it is putting junk characters and the file size is 8MB.
Please help on this.
- Tracker Supp-Stefan
- Site Admin
- Posts: 17853
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
- Contact:
Re: PDFXChange PRO OCR SDK Issue OCR Text output is junk characters
Hello admin-emmeluth,
Currently you have this flag:
Options.ImageFlags = (uint)PDFXOCR.PDFXOCR_Funcs.OCR_ImageProcessingFlags.OCR_Image_Autorotate;
Please try to also add this one:
OCR_Content_Original
And let us know the result!
Regards,
Stefan
Currently you have this flag:
Options.ImageFlags = (uint)PDFXOCR.PDFXOCR_Funcs.OCR_ImageProcessingFlags.OCR_Image_Autorotate;
Please try to also add this one:
OCR_Content_Original
And let us know the result!
Regards,
Stefan
-
- User
- Posts: 3
- Joined: Tue Mar 17, 2015 6:16 am
Re: PDFXChange PRO OCR SDK Issue OCR Text output is junk characters
Hi,
Thanks for the prompt reply.
I could not see any option for OCR_Content_Original in the current sample code provided.
As I browse through your forum I could see that there is a image flag set for that and I applied the same like the one below bold one in the PDFXOCR_Funcs class & then applied in OCR Options as
Is this the correct way to do it? As, I am getting same results as mentioned in my previous comments with larger file size and junk characters when reduce DPI. Please help. We don't have much options left now.
Thanks for the prompt reply.
I could not see any option for OCR_Content_Original in the current sample code provided.
As I browse through your forum I could see that there is a image flag set for that and I applied the same like the one below bold one in the PDFXOCR_Funcs class & then applied in OCR Options as
Code: Select all
[b]Options.ImageFlags = (uint)PDFXOCR.PDFXOCR_Funcs.OCR_ImageProcessingFlags.OCR_Content_Original;[/b]
Code: Select all
public enum OCR_ImageProcessingFlags
{
OCR_Image_NoRotate = 0x0000,
OCR_Image_Autorotate = 0x0001,
OCR_Image_EdgeRefine = 0x0002,
OCR_Image_GaussianBlur = 0x0004,
OCR_Image_SuppressOutput = 0x0008, // only place text layer
OCR_Image_FastAutorotate = 0x0011, // OCR_Image_Autorotate bit included
OCR_Text_PlaceByLines = 0x0020, // smaller but less accurate output.
[b]OCR_Content_Original = 0x0040[/b]
}
- Tracker Supp-Stefan
- Site Admin
- Posts: 17853
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
- Contact:
Re: PDFXChange PRO OCR SDK Issue OCR Text output is junk characters
Hello admin-emmeluth,
Please take a look at this topic from September:
viewtopic.php?f=42&t=31422
That discussion there made it necessary for our developers to include this OCR_Content_Original parameter.
The fixes provided in the custom DLLs included there should already be included in the latest live SDK builds on our website, so please make sure to update to build 327.1 if you have not already!
Regards,
Stefan
p.s. I have also blocked your serial numbers which you forgot to remove from your sample project.
I also removed the project from your original post.
Please contact us on sales@pdf-xchange.com so that we can issue you replacement ones.
Please take a look at this topic from September:
viewtopic.php?f=42&t=31422
That discussion there made it necessary for our developers to include this OCR_Content_Original parameter.
The fixes provided in the custom DLLs included there should already be included in the latest live SDK builds on our website, so please make sure to update to build 327.1 if you have not already!
Regards,
Stefan
p.s. I have also blocked your serial numbers which you forgot to remove from your sample project.
I also removed the project from your original post.
Please contact us on sales@pdf-xchange.com so that we can issue you replacement ones.