Wrong PDF when using OCR_MakeSearchable

PDF-X OCR SDK is a New product from us and intended to compliment our existing PDF and Imaging Tools to provide the Developer with an expanding set of professional tools for Optical Character Recognition tasks

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan

Post Reply
baumunk
User
Posts: 38
Joined: Fri Nov 13, 2020 8:47 am

Wrong PDF when using OCR_MakeSearchable

Post by baumunk »

We test your product.
With your example application OCRDemoCsharp

Pdf contains images and text. After OCR, text in the image is searchable but text is removed.

Orignal PDF:

Image
image.png
After OCR:

Image
image1.png
What am I doing wrong?
We are using DEMO.

If I use the same function in PDF-XChange Editor it is correct.

Code:

Code: Select all

hResult = PDFXOCR.PDFXOCR_Funcs.OCR_SetCallback(pdf, thecallback, 0);

				hResult = PDFXOCR.PDFXOCR_Funcs.OCR_LoadW(pdf, m_SourceFilename);
				if (PDFXOCR.PDFXOCR_Funcs.IS_DS_FAILED(hResult))
				{
					MessageBox.Show("Error loading file: \n" + m_SourceFilename, "OCR Library Error");
					break;
				};


				PDFXOCR.PDFXOCR_Funcs.PXO_Options Options = new PDFXOCR.PDFXOCR_Funcs.PXO_Options();
				Options.blacklist = "";
				Options.whitelist = "";
				Options.raster_dpi = 300;
                Options.ImageFlags = (uint) PDFXOCR.PDFXOCR_Funcs.OCR_ImageProcessingFlags.OCR_Image_NoRotate;
				Options.DataPath = m_Datapath;
				Options.lang = m_Language;
				Options.RegionMode = PDFXOCR.PDFXOCR_Funcs.OCR_RegionMode.OCR_Auto;
				Options.SecondLanguage = 0;

				IntPtr pxoPagelist = IntPtr.Zero; // null pointer passed to OCR_MakeSearchable() will result in all pages being OCRd.

           
                hResult = PDFXOCR.PDFXOCR_Funcs.OCR_MakeSearchable(pdf, ref Options, pxoPagelist);
Regards.
Attachments
ocr_77.pdf
(633.6 KiB) Downloaded 193 times
77.pdf
(101.1 KiB) Downloaded 162 times
User avatar
Paul - Tracker Supp
Site Admin
Posts: 6837
Joined: Wed Mar 25, 2009 10:37 pm
Location: Chemainus, Canada
Contact:

Re: Wrong PDF when using OCR_MakeSearchable

Post by Paul - Tracker Supp »

Hi baumunk,

I did speak to one of the dev team about this. He told me the issue is reproduced and to make a formal support request ticket around this. While intended for internal use only, if you refer to RT#5394: Wrong PDF when using OCR_MakeSearchable in correspondence with us then any support staff member can get yo a status update on the ticket.

Back to the issue itself, you should see a post here with some suggestions withing 24 hours.
Best regards

Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Wrong PDF when using OCR_MakeSearchable

Post by Sasha - Tracker Dev Team »

Hello baumunk,

We've inspected this behavior and it seems that it's a bug. Part of the content is not being rendered and the SDK does not know about it.
But, as a temporary fix, you can try using the additional flags: OCR_Content_Original and OCR_Image_SuppressOutput. Then the text that is being lost in the rendering process will be left as is and the text for search will be put on top of it.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
baumunk
User
Posts: 38
Joined: Fri Nov 13, 2020 8:47 am

Re: Wrong PDF when using OCR_MakeSearchable

Post by baumunk »

Hello Alex,

Thanks for the tip. This helped.
OCR_Content_Original = 0x0040
but was only in description not in PDFOCR_Funcs.cs

With kind regards
Ernest Baumunk
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Wrong PDF when using OCR_MakeSearchable

Post by Sasha - Tracker Dev Team »

:)
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
baumunk
User
Posts: 38
Joined: Fri Nov 13, 2020 8:47 am

Re: Wrong PDF when using OCR_MakeSearchable

Post by baumunk »

I have another problem.

After OCR I need to convert file to PDF/A-1B.
Since we have PDF-XChange PRO SDK license. After OCR new file is printed to XChange PDF driver.

Although after that PDF is a valid PDF/A-1B.
But the OCR recognition is then no longer present.

Do you have a solution for this available?

On my other tiket: https://forum.pdf-xchange.com/viewtopic.php?f=43&t=35816 :oops:
unfortunately nobody answers.

With kind regards
Ernest Baumunk
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17824
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Wrong PDF when using OCR_MakeSearchable

Post by Tracker Supp-Stefan »

Hello baumunk,

It seems like the PDF/A-1B part of your problem is being handled in that other topic - so shall we consider this one resolved and continue the discussion in the other one?

Kind regards,
Stefan
Post Reply