Headaches with PDF-XChange files

This Forum is for the use of Software Developers requiring help and assistance for Tracker Software's PDF-Tools SDK of Library DLL functions(only) - Please use the PDF-XChange Drivers API SDK Forum for assistance with all PDF Print Driver related topics.

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Sean - Tracker, Chris - Tracker Supp, Tracker Supp-Stefan, Ivan - Tracker Software, Andrew - Tracker Support, Tracker - Clarion Support, John - Tracker Supp, Support Staff, moderators

Post Reply
Thanny
User
Posts: 4
Joined: Thu Nov 05, 2009 2:48 am

Headaches with PDF-XChange files

Post by Thanny » Wed Aug 25, 2010 1:24 am

I'm trying to polish up some code to strip text out of PDF files for automated parsing, and documents from two different versions of PDF-XChange are making things difficult.

First, version 3.40.0077. With a modified file, a secondary cross reference table is written, but startxref points to the first table in the file, which means none of the modified objects are correctly referenced. Beyond that, the offset is skewed by two bytes, as are several of the offsets in the new cross reference table. I found the simplest way to get the "correct" results was to scan the file front to back, replacing objects as duplicates were discovered. This prevents me from detecting whether an object has been marked deleted, however.

Second, version 3.60.0106. Here, I have two problems. First, the file header says version 1.4, and there is no overriding "Version" parameter in the catalog dictionary. So it should be a version 1.4 file, right? Yet there's no cross reference table or trailer. There's a cross reference stream, which isn't valid until version 1.5. Next, the "stream" keyword is stuck right at the end of a dictionary. I don't know if this is quite in spec or not, because the PDF reference is very light on particulars like this. But it's sure annoying, as every other PDF I've seen has the "stream" keyword alone on a line, which makes parsing a whole lot easier.

Maybe I'm missing something, and there's a way to make sense of these examples. Anyone?

John - Tracker Supp
Site Admin
Posts: 8202
Joined: Tue Jun 29, 2004 10:34 am
Location: Vancouver Island - Canada
Contact:

Re: Headaches with PDF-XChange files

Post by John - Tracker Supp » Wed Aug 25, 2010 7:05 pm

Hi Thanny,

the very best place for us to start here is going to be examples of the 2 files please and then we can start to reply in an educated way - as you will appreciate its been several years now since Version 3 was current and as we are also looking at possible differences between one build and another - examples are going to be crucial in commenting.

Also you say the files have been modified - the same files pre/post modification are also going to be almost essential to see how and with what modifications were made.

Please zip any files sent - if you have any problems in regards confidentiality posting here - feel free to email to support@tracker-software.com with a link back to this forum post.

thanks
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com

Thanny
User
Posts: 4
Joined: Thu Nov 05, 2009 2:48 am

Re: Headaches with PDF-XChange files

Post by Thanny » Wed Aug 25, 2010 7:18 pm

Examples attached.
Attachments
pdfexamples.zip
(26.28 KiB) Downloaded 236 times

John - Tracker Supp
Site Admin
Posts: 8202
Joined: Tue Jun 29, 2004 10:34 am
Location: Vancouver Island - Canada
Contact:

Re: Headaches with PDF-XChange files

Post by John - Tracker Supp » Wed Aug 25, 2010 9:49 pm

Thank you for the files - however ...

We are 99.9999% sure there has been another tool involved in the editing process other than our products after some detailed analysis, to explain in a little more detail :

File 1:
1: noxref_14.pdf (3.6.106), file originally was created by PDF-XChange as PDF 1.4, but was recently be modified (possibly Adobe, but for sure not by any of our PDF-XChange products) and saved as linearized (web optimized) PDF. So, I cannot say why it was saved using objects stream (1.5+ feature) - but as we have never supported Linearization as a feature - you can see for yourself in the file content this has been added.

File 2: (botched_update.PDF) isn't the original PDF-XChange file or modified in isolation by our products too. This file contains two PDF's concatenated together - looks like first is the PDF-XChange original one, and second is modified, but looks like on saving the file - it was appended instead of overwriting and therefore contains the of both the original and modified file - thus causing the issue detailed - and this would not be possible had you used our tools alone.

If you are able (or prepared) to share more detailed info on the complete modification process and the tools used we can comment further - but without knowing the full history of the files from inception to the current state - we cannot comment further I am afraid as to how the files have come to their current problematic condition.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com

Thanny
User
Posts: 4
Joined: Thu Nov 05, 2009 2:48 am

Re: Headaches with PDF-XChange files

Post by Thanny » Wed Aug 25, 2010 9:58 pm

The files come from a client, so I have no idea exactly what process they are using. I don't know if I'll be seeing more of the earlier version, or if they updated completely to 3.60.

Thanks for the confirmation that the files are messed up, though. I'd bring it up with the client, but I'm sure they'd go with the "it works in Acrobat" line.

John - Tracker Supp
Site Admin
Posts: 8202
Joined: Tue Jun 29, 2004 10:34 am
Location: Vancouver Island - Canada
Contact:

Re: Headaches with PDF-XChange files

Post by John - Tracker Supp » Wed Aug 25, 2010 10:03 pm

Hi Thanny,

when they say it 'works with Acrobat' - in what sense do they refer and what can they not do with our products ?

We are often told this and then try to do as detailed with Acrobat, only to find it does not do quite as claimed ...
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com

Post Reply