Hi
Using OCR_Save(W or A) functions of PDF-XOCR SDK, we can save a OCRed document as a PDF file.
OK. But it seems that all the B&W background images are saved as 8-bit per pixel "JPG like" images, causing a huge growth of size: eg.:
-input PDF file (a four A4 pages of scanned text embedded as B&W JBIG2 images) size: 87 kB
-output PDF file (this four OCRed A4 pages of scanned text embedded as 8bpp JPG images) size: 2763 kB
Most "well scanned" input files do not need skew detection and rotation before OCR, and the background images of saved files can be the same as the input file.
So in this case, is a way to keep unchanged the background layers of such a file, in order to keep a "moderate" size of output files?
Or could you give an example of code to merge an input PDF file with the "text layer" using the OCR_Image_SuppressOutput Flag that seems to do this kind of job?
Many thanks
Gérard
How to save OCRed PDFs keeping unchanged background images?
Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan
-
- User
- Posts: 381
- Joined: Mon Jun 13, 2011 5:10 pm
Re: How to save OCRed PDFs keeping unchanged background imag
Yes, for this version of the SDK we do not have any output image options that would let you output the same image content as input.
To merge pages, load the output PDF (with just invisible text, no images) and the source PDF (with just images) using the pro SDK DLL XCPRO40, then merge pages with the function PXCp_PlaceContents(). There is a good example of how to use this on page 340 of the PDF-ToolsV4SDK.pdf manual included with the pro SDK.
You may have to ensure that input pages are already deskewed and/or ensure you do not deskew during OCR (ie, don't pass the OCR_Image_AutoRotate flag).
To merge pages, load the output PDF (with just invisible text, no images) and the source PDF (with just images) using the pro SDK DLL XCPRO40, then merge pages with the function PXCp_PlaceContents(). There is a good example of how to use this on page 340 of the PDF-ToolsV4SDK.pdf manual included with the pro SDK.
You may have to ensure that input pages are already deskewed and/or ensure you do not deskew during OCR (ie, don't pass the OCR_Image_AutoRotate flag).
Re: How to save OCRed PDFs keeping unchanged background imag
Merci beaucoup !
Gérard
Gérard
- Tracker Supp-Stefan
- Site Admin
- Posts: 17941
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
- Contact:
Re: How to save OCRed PDFs keeping unchanged background imag
Glad we could help Gerard!
Cheers,
Stefan
Cheers,
Stefan