How to enumerate/identify objects? PXCp_llGetObjectByIndex?

This Forum is for the use of Software Developers requiring help and assistance for Tracker Software's PDF-Tools SDK of Library DLL functions(only) - Please use the PDF-XChange Drivers API SDK Forum for assistance with all PDF Print Driver related topics or PDF-XChange Viewer SDK if appropriate.

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan

Post Reply
afalsow
User
Posts: 17
Joined: Thu Jun 14, 2012 1:28 pm

How to enumerate/identify objects? PXCp_llGetObjectByIndex?

Post by afalsow »

I recently evaluated (and rejected - for other reasons) an SDK that permitted for very easy enumeration and identification (determination of type) of objects on a PDF page. That SDK allows the developer to determine, for each object on the page, the type of object (text, path, image, etc), its coordinates, and other attributes. It also permits the developer to - again, very easily - set the object to visible=false, move the object, or delete the object entirely.

In searching the Tracker help files, I do see PXCp_llGetObjectsCount apparently permits me to retrieve the total object count within a document (is that the entire document? or just a selected page?). And then, I presume, one may iterate through the objects via PXCp_llGetObjectByIndex? Are there existing methods to determine each object's type, position, etc? Or for Tracker's SDK, is that something that must be implemented by the developer using low-level api? Perhaps PXCp_ObjectGetDictionary?

For this particular sub-task, I need to do the following:

1. Load a multi-page pdf (each page will contain header text and a footer logo/image) This is done quite easily enough.

2. Identify and delete the footer logo on each/every page. The footer logo is an image object - and is the ONLY image object on the page. So once I find an object of type image, I can safely delete. Would this involve use of PXCp_ImageGetCountOnPage, and PXCp_ImageGetFromPage? I do not see a high-level Image Manipulation function/method to delete the image once I have identified it... how would that be done? LL API?

3. Identify and extract (not delete) the header text. Header text object is located well above any other objects.... therefore any text object within range of X1,Y1 to X2, Y2 may safely be considered the header. Must extract this header text - but only from the first page, as all subsequent pages have identical header text... redundant to extract from subsequent pages. I do not see a method (as in #2) such as "TextGetCountOnPage".

4. On each/every page, there will be several path objects. The upper-most path object (on each page) must be set visible=false. I also have not found any HL function to control an object's visibility. Is this a LL API issue, perhaps?

5. Save the modified document. Easily done.

I have read, and understand, Tracker's policy on offering no assistance on low-level api issues. If my project goals require use of low-level api, I would ask only that you indicate that is where the answer lies (as opposed to the desired functionality being available in the high-level api... and perhaps I have simply failed to find the relevant documentation).

I do see another post - wishing to manipulate annotations - where tech support refers the developer to 3.2.5 PDF Dictionary Functions... would this section be also applicable to my situation?

Thank you.
User avatar
Lzcat - Tracker Supp
Site Admin
Posts: 677
Joined: Thu Jun 28, 2007 8:42 am

Re: How to enumerate/identify objects? PXCp_llGetObjectByInd

Post by Lzcat - Tracker Supp »

Hi afalsow.
1. Simple, as you mentioned.
2. You can find image(s) in the page's Resources dictionary and may delete it from there using the Low-Level API, but this is not a good idea since there will be reference(s) to that image inside the page content and some readers may report an error in page. To completely remove an image you should update page content, but this is not a trivial task and we do not provide any API to do this (I mean modification, but you can extract/replace page content stream(s) using Low-Level API).
3. There are enough functions to extract text with letters positions and formatting - they begin with PXCp_ET_ . Please see the Text Extraction section in the documentation.
4. You cannot easily show/hide path objects (or any other) on the PDF page. To do this you must recreate page content, as in step 2.
5. Same as 1.
HTH.
Victor
Tracker Software
Project manager

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
afalsow
User
Posts: 17
Joined: Thu Jun 14, 2012 1:28 pm

Re: How to enumerate/identify objects? PXCp_llGetObjectByInd

Post by afalsow »

Thank you!
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17948
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: How to enumerate/identify objects? PXCp_llGetObjectByInd

Post by Tracker Supp-Stefan »

:)
Post Reply