Non-seachable PDF from web browser

This Forum is for the use of Software Developers requiring help and assistance for Tracker Software's PDF-XChange Printer Drivers SDK (only) - Please use the PDF-Tools SDK Forum for Library DLL assistance.

Moderators: Tracker Support, TrackerSupp-Daniel, Chris - Tracker Supp, Vasyl-Tracker Dev Team, Sean - Tracker, Andrew - Tracker Support, Tracker - Clarion Support, John - Tracker Supp, Tracker Supp-Stefan, Ivan - Tracker Software, Support Staff, moderators

Post Reply
PerfectLaw
User
Posts: 8
Joined: Wed Mar 28, 2007 7:27 pm

Non-seachable PDF from web browser

Post by PerfectLaw » Tue May 01, 2007 6:55 pm

Dear Sirs,

When I print (save) a web page with an embedded PDF object, the resulting PDF file is not searchable. If the page is not a PDF object, the resulting PDF file is searchable.

Is there any PDF-XChange driver setting o code line required to produce a searchable PDF when the source is the browser embedded PDF?

Thanks

John - Tracker Supp
Site Admin
Posts: 8202
Joined: Tue Jun 29, 2004 10:34 am
Location: Vancouver Island - Canada
Contact:

Post by John - Tracker Supp » Tue May 01, 2007 7:50 pm

Hi,

Could you please zip and upload a sample please.

Also please advise the browser/Version and an example page link for comparison.

Thanks
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com

PerfectLaw
User
Posts: 8
Joined: Wed Mar 28, 2007 7:27 pm

Non-seachable PDF from web browser

Post by PerfectLaw » Tue May 01, 2007 9:38 pm

Dear Sirs,

It appears that it's the printer driver and you can reproduce the problem just using Acrobat Reader without any sample code.

I'm attaching PDFXCprint.zip that contains 2 files: beforePDFXCprint.pdf and afterPDFXCprint.pdf.

afterPDFXCprint.pdf was created from Acrobat Reader by printing with the PDF-XChange driver.

Open both documents and perform a search for the word "browser".

beforePDFXCprint.pdf will find 6 hits and after PDFXCprint.pdf will find just 1 hit.

The same behavior occurs with any PDF file, no matter if it has a single page or many.

Please, let me know ASAP how to overcome this problem.

Thanks

John - Tracker Supp
Site Admin
Posts: 8202
Joined: Tue Jun 29, 2004 10:34 am
Location: Vancouver Island - Canada
Contact:

Post by John - Tracker Supp » Wed May 02, 2007 6:52 am

Hi,

Your file was not attached - either it was not archived as a Zip, .RAR or 7z format - or exceeded the max 2 MB limit, please either zip and attach or email to usrfiles@tracker-software.com (please still zip or it will be stripped by our mail server as a precaution)m and please copy/paste a URL link back to this forum thread for reference.

Thanks
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com

PerfectLaw
User
Posts: 8
Joined: Wed Mar 28, 2007 7:27 pm

Non-seachable PDF from web browser

Post by PerfectLaw » Wed May 02, 2007 9:50 pm

Dear Sirs,

It appears that it's the printer driver and you can reproduce the problem just using Acrobat Reader without any sample code, jus with a PDF file.

To reproduce the problem just perform a search in Acrobat on a PDF file - not created by the PDF-XChange driver - for a word having more than one ocurrence.

Now, print the same file using the PDF-XChange printer driver and save the new file.

Still in Acrobat, perform the same search but on the PDF-Xchange created file and it will find less ocurrences of the same word even when the content is the same.

Please, let me know ASAP how to overcome this problem.

Thanks

John - Tracker Supp
Site Admin
Posts: 8202
Joined: Tue Jun 29, 2004 10:34 am
Location: Vancouver Island - Canada
Contact:

Post by John - Tracker Supp » Thu May 03, 2007 7:16 am

Thanks for coming back - BUT - I cannot reproduce this here,

Which is specifically why I asked for example files - please zip and upload 2 sample files - one that shows the problem and one that does not - and also provide the source file from which both files are created.

Also please provide information relating the Versions of the Browser/Acrobat/Windows etc being used.

As soon as we have this - we will investigate ASAP.

Please note the previously outlined upload restrictions to ensure your files are received.

Thanks.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com

PerfectLaw
User
Posts: 8
Joined: Wed Mar 28, 2007 7:27 pm

Non-seachable PDF from web browser

Post by PerfectLaw » Thu May 03, 2007 1:48 pm

Attached is a zip file containg 2 files named before_PDFXC.pdf and after_PDFXC.pdf.

before_PDFXC.pdf was not created by the PDF-XChange printer driver.
after_PDFXC.pdf was created by the PDF-XChange printer driver having before_PDFXC.pdf as the source document.

Open both and you'll see that the content appears identical but if you search for the word `mark` you'll get 25 hits in before_PDFXC.pdf and none in after_PDFXC.pdf.

As I stated before, has nothing to do with any program code because I just used Acrobat Reader and PDF-XChange printer driver to reproduce the problem.

Thanks
Attachments
PDFXC_Test.zip
(269.25 KiB) Downloaded 126 times

John - Tracker Supp
Site Admin
Posts: 8202
Joined: Tue Jun 29, 2004 10:34 am
Location: Vancouver Island - Canada
Contact:

Post by John - Tracker Supp » Fri May 04, 2007 10:17 am

Thanks for the files and we now understand the problem - :

when Acrobat prints text to the PDF-XChange printer, it makes temporary fonts, and passes the string's not as a set of characters but as a set of glyphs indexes, because of the manner in which PDF-XChange functions we have no necessity to make the transformation from glyph to char under normal conditions - further in most examples - this is not easily achievable - if at all.

So, for now at least, I am afraid we have no direct solution to modify this behaviour.

I am sure you have a very good reason for printing the embedded PDF pages - but can I ask why these are not just 'saved' from the URL link - rather than printed - if you could explain a little more about the task at hand - possibly - we could suggest an alternate means of achieving your desired goal ?
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com

PerfectLaw
User
Posts: 8
Joined: Wed Mar 28, 2007 7:27 pm

Non-seachable PDF from web browser

Post by PerfectLaw » Mon May 07, 2007 6:56 pm

Dear Sirs,

Thanks for your help.

There are several reasons to use the PDF-Xchange driver but if it can not be done with PDF files then I will process the file without PDF-XChange involved by writing a component.

Again, thanks

John - Tracker Supp
Site Admin
Posts: 8202
Joined: Tue Jun 29, 2004 10:34 am
Location: Vancouver Island - Canada
Contact:

Post by John - Tracker Supp » Mon May 07, 2007 7:55 pm

Only wish we could have offered a more postive answer in this instance - but thanks for your pragmatism - appreciated ! :)
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com

Post Reply