Wrong PDF when using OCR_MakeSearchable

PDF-X OCR SDK is a New product from us and intended to compliment our existing PDF and Imaging Tools to provide the Developer with an expanding set of professional tools for Optical Character Recognition tasks

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Sean - Tracker, Chris - Tracker Supp, Tracker Supp-Stefan

Post Reply
baumunk
User
Posts: 1
Joined: Fri Nov 13, 2020 8:47 am

Wrong PDF when using OCR_MakeSearchable

Post by baumunk » Fri Nov 13, 2020 9:41 am

We test your product.
With your example application OCRDemoCsharp

Pdf contains images and text. After OCR, text in the image is searchable but text is removed.

Orignal PDF:

Image
image.png
After OCR:

Image
image1.png
What am I doing wrong?
We are using DEMO.

If I use the same function in PDF-XChange Editor it is correct.

Code:

Code: Select all

hResult = PDFXOCR.PDFXOCR_Funcs.OCR_SetCallback(pdf, thecallback, 0);

				hResult = PDFXOCR.PDFXOCR_Funcs.OCR_LoadW(pdf, m_SourceFilename);
				if (PDFXOCR.PDFXOCR_Funcs.IS_DS_FAILED(hResult))
				{
					MessageBox.Show("Error loading file: \n" + m_SourceFilename, "OCR Library Error");
					break;
				};


				PDFXOCR.PDFXOCR_Funcs.PXO_Options Options = new PDFXOCR.PDFXOCR_Funcs.PXO_Options();
				Options.blacklist = "";
				Options.whitelist = "";
				Options.raster_dpi = 300;
                Options.ImageFlags = (uint) PDFXOCR.PDFXOCR_Funcs.OCR_ImageProcessingFlags.OCR_Image_NoRotate;
				Options.DataPath = m_Datapath;
				Options.lang = m_Language;
				Options.RegionMode = PDFXOCR.PDFXOCR_Funcs.OCR_RegionMode.OCR_Auto;
				Options.SecondLanguage = 0;

				IntPtr pxoPagelist = IntPtr.Zero; // null pointer passed to OCR_MakeSearchable() will result in all pages being OCRd.

           
                hResult = PDFXOCR.PDFXOCR_Funcs.OCR_MakeSearchable(pdf, ref Options, pxoPagelist);
Regards.
Attachments
ocr_77.pdf
(633.6 KiB) Downloaded 15 times
77.pdf
(101.1 KiB) Downloaded 13 times

User avatar
Paul - Tracker Supp
Site Admin
Posts: 5140
Joined: Wed Mar 25, 2009 10:37 pm
Location: Chemainus, Canada
Contact:

Re: Wrong PDF when using OCR_MakeSearchable

Post by Paul - Tracker Supp » Mon Dec 07, 2020 8:57 pm

Hi baumunk,

I did speak to one of the dev team about this. He told me the issue is reproduced and to make a formal support request ticket around this. While intended for internal use only, if you refer to RT#5394: Wrong PDF when using OCR_MakeSearchable in correspondence with us then any support staff member can get yo a status update on the ticket.

Back to the issue itself, you should see a post here with some suggestions withing 24 hours.
_________________
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards

Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com

User avatar
Sasha - Tracker Dev Team
User
Posts: 5198
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Wrong PDF when using OCR_MakeSearchable

Post by Sasha - Tracker Dev Team » Tue Dec 08, 2020 10:42 am

Hello baumunk,

We've inspected this behavior and it seems that it's a bug. Part of the content is not being rendered and the SDK does not know about it.
But, as a temporary fix, you can try using the additional flags: OCR_Content_Original and OCR_Image_SuppressOutput. Then the text that is being lost in the rendering process will be left as is and the text for search will be put on top of it.

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

Post Reply