How to enumerate/identify objects? PXCp_llGetObjectByIndex?

afalsow · Post by **afalsow** » Sun Aug 26, 2012 1:51 pm

I recently evaluated (and rejected - for other reasons) an SDK that permitted for very easy enumeration and identification (determination of type) of objects on a PDF page. That SDK allows the developer to determine, for each object on the page, the type of object (text, path, image, etc), its coordinates, and other attributes. It also permits the developer to - again, very easily - set the object to visible=false, move the object, or delete the object entirely.

In searching the Tracker help files, I do see PXCp_llGetObjectsCount apparently permits me to retrieve the total object count within a document (is that the entire document? or just a selected page?). And then, I presume, one may iterate through the objects via PXCp_llGetObjectByIndex? Are there existing methods to determine each object's type, position, etc? Or for Tracker's SDK, is that something that must be implemented by the developer using low-level api? Perhaps PXCp_ObjectGetDictionary?

For this particular sub-task, I need to do the following:

1. Load a multi-page pdf (each page will contain header text and a footer logo/image) This is done quite easily enough.

2. Identify and delete the footer logo on each/every page. The footer logo is an image object - and is the ONLY image object on the page. So once I find an object of type image, I can safely delete. Would this involve use of PXCp_ImageGetCountOnPage, and PXCp_ImageGetFromPage? I do not see a high-level Image Manipulation function/method to delete the image once I have identified it... how would that be done? LL API?

3. Identify and extract (not delete) the header text. Header text object is located well above any other objects.... therefore any text object within range of X1,Y1 to X2, Y2 may safely be considered the header. Must extract this header text - but only from the first page, as all subsequent pages have identical header text... redundant to extract from subsequent pages. I do not see a method (as in #2) such as "TextGetCountOnPage".

4. On each/every page, there will be several path objects. The upper-most path object (on each page) must be set visible=false. I also have not found any HL function to control an object's visibility. Is this a LL API issue, perhaps?

5. Save the modified document. Easily done.

I have read, and understand, Tracker's policy on offering no assistance on low-level api issues. If my project goals require use of low-level api, I would ask only that you indicate that is where the answer lies (as opposed to the desired functionality being available in the high-level api... and perhaps I have simply failed to find the relevant documentation).

I do see another post - wishing to manipulate annotations - where tech support refers the developer to 3.2.5 PDF Dictionary Functions... would this section be also applicable to my situation?

Thank you.

Tue Aug 28, 2012 9:01 am

Hi afalsow.
1. Simple, as you mentioned.
2. You can find image(s) in the page's Resources dictionary and may delete it from there using the Low-Level API, but this is not a good idea since there will be reference(s) to that image inside the page content and some readers may report an error in page. To completely remove an image you should update page content, but this is not a trivial task and we do not provide any API to do this (I mean modification, but you can extract/replace page content stream(s) using Low-Level API).
3. There are enough functions to extract text with letters positions and formatting - they begin with PXCp_ET_ . Please see the Text Extraction section in the documentation.
4. You cannot easily show/hide path objects (or any other) on the PDF page. To do this you must recreate page content, as in step 2.
5. Same as 1.
HTH.

afalsow · Post by **afalsow** » Tue Aug 28, 2012 11:17 am

Thank you!

Tue Aug 28, 2012 11:20 am

How to enumerate/identify objects? PXCp_llGetObjectByIndex?

How to enumerate/identify objects? PXCp_llGetObjectByIndex?

Re: How to enumerate/identify objects? PXCp_llGetObjectByInd

Re: How to enumerate/identify objects? PXCp_llGetObjectByInd

Re: How to enumerate/identify objects? PXCp_llGetObjectByInd