How to save OCRed PDFs keeping unchanged background images?

PDF-X OCR SDK is a New product from us and intended to compliment our existing PDF and Imaging Tools to provide the Developer with an expanding set of professional tools for Optical Character Recognition tasks

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan

Post Reply
technovia
User
Posts: 11
Joined: Wed Feb 22, 2012 3:59 pm

How to save OCRed PDFs keeping unchanged background images?

Post by technovia »

Hi

Using OCR_Save(W or A) functions of PDF-XOCR SDK, we can save a OCRed document as a PDF file.

OK. But it seems that all the B&W background images are saved as 8-bit per pixel "JPG like" images, causing a huge growth of size: eg.:
-input PDF file (a four A4 pages of scanned text embedded as B&W JBIG2 images) size: 87 kB
-output PDF file (this four OCRed A4 pages of scanned text embedded as 8bpp JPG images) size: 2763 kB

Most "well scanned" input files do not need skew detection and rotation before OCR, and the background images of saved files can be the same as the input file.

So in this case, is a way to keep unchanged the background layers of such a file, in order to keep a "moderate" size of output files?

Or could you give an example of code to merge an input PDF file with the "text layer" using the OCR_Image_SuppressOutput Flag that seems to do this kind of job?

Many thanks

Gérard
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: How to save OCRed PDFs keeping unchanged background imag

Post by Walter-Tracker Supp »

Yes, for this version of the SDK we do not have any output image options that would let you output the same image content as input.

To merge pages, load the output PDF (with just invisible text, no images) and the source PDF (with just images) using the pro SDK DLL XCPRO40, then merge pages with the function PXCp_PlaceContents(). There is a good example of how to use this on page 340 of the PDF-ToolsV4SDK.pdf manual included with the pro SDK.

You may have to ensure that input pages are already deskewed and/or ensure you do not deskew during OCR (ie, don't pass the OCR_Image_AutoRotate flag).
technovia
User
Posts: 11
Joined: Wed Feb 22, 2012 3:59 pm

Re: How to save OCRed PDFs keeping unchanged background imag

Post by technovia »

Merci beaucoup !
Gérard
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17941
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: How to save OCRed PDFs keeping unchanged background imag

Post by Tracker Supp-Stefan »

Glad we could help Gerard!

Cheers,
Stefan
Post Reply