Page 1 of 1

Mapping to create searchable pdf using an OCR library

Posted: Wed May 24, 2006 7:53 am
by diana
Hi

We have been using docutrack pdf tools for years in our product. Currently we are looking for a solution to use pdf tools sdk with other third party OCR components to produce searchable pdf files. We have already found some third party OCR components to do the work, but we get stuck on the place where the image to be mapped with the hidden text (image on text). There are lot of third party component produces xml contains all mapping information relevant to the source image. how do I make pdf tools to understand these mapping to produce searchable pdf documents? Is there any way i can achieve this with any one of your product? If not can you please guide us what can be done?

Posted: Wed May 24, 2006 9:28 am
by John - Tracker Supp
Hi Dianna,

Yu can indeed use PDF-Tools SDK (Xcpro35.dll) to create searchable PDF's with your Image and OCR'ed text.

Here is a broad outline of the steps required:

1. Create a new pdf document using the pxclib30 library. (Do not forget to set the required compression options exactly after creation - because they will be used when placing the images and to achieve maximum compression - not possible later)

2. Acquire text with the relevant positions from an image using any third-party OCR package

3. Create a page and place the image into it.

4. Recalculate text size and positions depending on the information from your OCR output and the rectangle where the image was placed.

5. Place text in the correct positions using one of the text output functions (do not forget to set text rendering mode to TextRenderingMode_None in the final release, for testing it will be better use TextRenderingMode_Fill with some contrast color - Reg, Green, so on).

6. Repeat steps 2-5 for all images that should be added into the pdf (if two or more images should be placed on same page - it is not a problem, just reuse the already created page and apply as required to the correct position)

7. Save the resulting pdf and close the document.

All should be well.

Hope that helps.