Text extraction and position dentification

This Forum is for the use of Software Developers requiring help and assistance for Tracker Software's PDF-Tools SDK of Library DLL functions(only) - Please use the PDF-XChange Drivers API SDK Forum for assistance with all PDF Print Driver related topics.

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Sean - Tracker, Chris - Tracker Supp, Tracker Supp-Stefan, Ivan - Tracker Software, Andrew - Tracker Support, Tracker - Clarion Support, John - Tracker Supp, Support Staff, moderators

Post Reply
Posts: 31
Joined: Fri Jun 10, 2005 7:24 pm

Text extraction and position dentification

Post by paj » Thu Nov 10, 2005 7:38 pm


With 3.5 when using PDF2TEXT, how can I identify the x,y position of the text pieces? I need to be able to extract/identify the contents of a text string from a given x,y position but can't seem to find a function that returns this info.



John - Tracker Supp
Site Admin
Posts: 8205
Joined: Tue Jun 29, 2004 10:34 am
Location: Vancouver Island - Canada

Post by John - Tracker Supp » Fri Nov 11, 2005 5:55 pm


Please see the example provided in the evaluation SDK Folder :


The example shows how to extract formatted text from one PDF and insert to another pdf.

However the principle is the same as required for your use.

You must extract text from the page element by element using a matrix to acquire each element.

In the matrix the first four parameters define scaling, rotation and so on, and the last two - the offset from the lower-left corner of the page (as described in pdf specification).

That should provide you with the required methodolgy to progress.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support

Post Reply