Mapping to create searchable pdf using an OCR library

This Forum is for the use of Software Developers requiring help and assistance for Tracker Software's PDF-Tools SDK of Library DLL functions(only) - Please use the PDF-XChange Drivers API SDK Forum for assistance with all PDF Print Driver related topics.

Moderators: Tracker Support, TrackerSupp-Daniel, Chris - Tracker Supp, Vasyl-Tracker Dev Team, Sean - Tracker, Andrew - Tracker Support, Tracker - Clarion Support, John - Tracker Supp, Tracker Supp-Stefan, Ivan - Tracker Software, Support Staff, moderators

Post Reply
diana
User
Posts: 1
Joined: Tue Mar 29, 2005 7:29 am

Mapping to create searchable pdf using an OCR library

Post by diana » Wed May 24, 2006 7:53 am

Hi

We have been using docutrack pdf tools for years in our product. Currently we are looking for a solution to use pdf tools sdk with other third party OCR components to produce searchable pdf files. We have already found some third party OCR components to do the work, but we get stuck on the place where the image to be mapped with the hidden text (image on text). There are lot of third party component produces xml contains all mapping information relevant to the source image. how do I make pdf tools to understand these mapping to produce searchable pdf documents? Is there any way i can achieve this with any one of your product? If not can you please guide us what can be done?

John - Tracker Supp
Site Admin
Posts: 8202
Joined: Tue Jun 29, 2004 10:34 am
Location: Vancouver Island - Canada
Contact:

Post by John - Tracker Supp » Wed May 24, 2006 9:28 am

Hi Dianna,

Yu can indeed use PDF-Tools SDK (Xcpro35.dll) to create searchable PDF's with your Image and OCR'ed text.

Here is a broad outline of the steps required:

1. Create a new pdf document using the pxclib30 library. (Do not forget to set the required compression options exactly after creation - because they will be used when placing the images and to achieve maximum compression - not possible later)

2. Acquire text with the relevant positions from an image using any third-party OCR package

3. Create a page and place the image into it.

4. Recalculate text size and positions depending on the information from your OCR output and the rectangle where the image was placed.

5. Place text in the correct positions using one of the text output functions (do not forget to set text rendering mode to TextRenderingMode_None in the final release, for testing it will be better use TextRenderingMode_Fill with some contrast color - Reg, Green, so on).

6. Repeat steps 2-5 for all images that should be added into the pdf (if two or more images should be placed on same page - it is not a problem, just reuse the already created page and apply as required to the correct position)

7. Save the resulting pdf and close the document.

All should be well.

Hope that helps.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com

Post Reply