PDF-XChange - Tracker PDF Viewer - TIFF-XChange - Image-XChange - XMF-XChange - Raster-XChange - Support

Moderators: Tracker Support, TrackerSupp-Daniel, Chris - Tracker Supp, Vasyl-Tracker Dev Team, Sean - Tracker, Tracker Supp-Stefan

 
Timur Born
User
Topic Author
Posts: 341
Joined: Tue Jun 26, 2012 1:50 pm

Extracting images out of an "interactive" PDF

Mon Jan 30, 2017 3:28 pm

Hello.

I am trying to extract (single) images out of an interactive PDF. Unfortunately both PDF-Tools and Acrobat only extract the non-interactive images out of the PDF. "Interactive" means that there are two buttons, one button switches on/off overlay text, the other button switches on/off an overlay grid.

These images cannot be extracted/copied out of the PDF by themselves (not even in Acrobat DC), because they are some kind of special element that change the mouse-pointer to a hand with pointing index-finger. Exporting the PDF to Word, Powerpoint or RTF (using Acrobat instead of XChange) works, but the image resolution is lowered to fit the original page size then.

What I'd like to do is use PDF-Tools to extract *all* images out of the PDF, including these "interactive" ones. "Changing content" and "Page Extraction" are not allowed by security settings on the document, but "Content Copying" and "Content Copying for Accessibility" are allowed, as is "Printing: High Resolution".

I cannot post the original PDF here, because it is copyright protected and licensed to me. Maybe I can send it via e-mail for you to check on the content.

Thanks and regards!
 
User avatar
Will - Tracker Supp
Site Admin
Posts: 5881
Joined: Mon Oct 15, 2012 9:21 pm
Location: Chemainus, BC
Contact:

Re: Extracting images out of an "interactive" PDF

Mon Jan 30, 2017 3:37 pm

Hi Timur,

Thanks for the post - We would really need a sample document to say more, as it's not immediately clear what type of content they would be and how they work. Please send it to support@tracker-software.com with a link back to this topic, if possible.

Note: Documents are not shared with 3rd parties, only within the company and only where/when necessary.

Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support
http://www.tracker-software.com
 
Timur Born
User
Topic Author
Posts: 341
Joined: Tue Jun 26, 2012 1:50 pm

Re: Extracting images out of an "interactive" PDF

Mon Jan 30, 2017 3:57 pm

E-Mail sent, thanks for the quick reply.

PS: Editor really needs a means to extract/copy single images, preferably with the ability to copy/extract directly from the content browser. Extracting all images out of a 700 pages document via PDF Tools just isn't a viable option
 
User avatar
Will - Tracker Supp
Site Admin
Posts: 5881
Joined: Mon Oct 15, 2012 9:21 pm
Location: Chemainus, BC
Contact:

Re: Extracting images out of an "interactive" PDF

Mon Jan 30, 2017 4:02 pm

Hi Timur,

Thanks for that - I've taken a look at the file and it's as I suspected: these are not images containing interactive elements (I'm not even sure that's possible), but are actually form fields and so cannot be exported to an image, it's just impossible as all image formats that I'm aware of do not support this type of data. I'm afraid that there's nothing we can do to help here.

PS: Editor really needs a means to extract/copy single images, preferably with the ability to copy/extract directly from the content browser. Extracting all images out of a 700 pages document via PDF Tools just isn't a viable option
I completely agree with that and I believe that we may have spoken about it in a previous thread(?). I've passed this along as a request but as always, cannot guarantee implementation. In this instance, I don't necessarily forsee any barriers.

Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support
http://www.tracker-software.com
 
Timur Born
User
Topic Author
Posts: 341
Joined: Tue Jun 26, 2012 1:50 pm

Re: Extracting images out of an "interactive" PDF

Mon Jan 30, 2017 4:21 pm

But they can be exported as single image files in a Word or Powerpoint document, so why would they not be extractable? I'd even take the export route, but XChange and Acrobat both insert these images at lower resolution/quality than their originals.

I also read and wrote the file via Abbyy Fine Reader, which seems to ignore the locked/password protected state of the file once you write it. This enabled me to edit all content in Editor, except for those image forms. They don't even show up in the content browser?! I deleted everything listed as content and the images and buttons still remained.
 
User avatar
Will - Tracker Supp
Site Admin
Posts: 5881
Joined: Mon Oct 15, 2012 9:21 pm
Location: Chemainus, BC
Contact:

Re: Extracting images out of an "interactive" PDF

Mon Jan 30, 2017 4:36 pm

Hi Timur,

Can you email a sample Word document where this has worked, but the images are of lower quality?

It's very difficult to say what these fields are actually doing, because I can't view the properties of the fields due to the security on the document. They don't appear to be toggling image layers, as I initially thought, so there aren't multiple images for us to extract and those that are extracted are the only images present in the document, insofar as we're able to see.

If you export the pages to an image, you should be presented with exactly what is displayed on screen, but it would also export the entire page, so you would need to edit the resulting image files to remove that and you still would not have the interactive elements.

Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support
http://www.tracker-software.com
 
Timur Born
User
Topic Author
Posts: 341
Joined: Tue Jun 26, 2012 1:50 pm

Re: Extracting images out of an "interactive" PDF

Mon Jan 30, 2017 5:19 pm

I will prepare something usable. The image quality seems to depend on how the PDF was converted (XChange vs. Acrobat vs. others). While LibreOffice doesn't popup a password box like Word does it seems like the converted file still is protected.

I can click on the images and save them as image files, but the quality/size is lower than the original, at least for those extractable images from page 1. When I try to copy/paste the images out of Word to Paint.net or Photoshop I only get a low resolution (72 dpi) version. Copy/Paste within Word works at the resolution that is used within the Word document, same as saving the image.

In any case all the images are present in the Word/Powerpoint/RTF file (depending on the converter). With the PDF Tool converted Word file you first need to delete some other image(s) on top of it to get to the map images underneath.

Just to clarify: I don't need the Map Tags (text) overlay, albeit the Grid overlay (for putting on top of the map) would be nice. The map itself is the most important part.
 
Timur Born
User
Topic Author
Posts: 341
Joined: Tue Jun 26, 2012 1:50 pm

Re: Extracting images out of an "interactive" PDF

Mon Jan 30, 2017 5:55 pm

PDF Tools interprets the images somewhat differently compared to other tools.

Example:

The map image on the title page is a full size (1:1) image that can be extracted via PDF Tools. Converting the image to Word via PDF Tools puts a highlight gradation on the image, going from left (no effect) to right (blended to foggy white). The same image is without such an effect when the PDF is converted via PDF Grabber. On the other hand the image in the PDF Grabber Word file is the smallest, then comes the PDF Tools converted Word file, but the extracted (PDF Tools) original image is larger than both.
 
Timur Born
User
Topic Author
Posts: 341
Joined: Tue Jun 26, 2012 1:50 pm

Re: Extracting images out of an "interactive" PDF

Mon Jan 30, 2017 6:23 pm

And to make things more complicated: The "person" on the front page is higher (full) resolution in the PDF Grabber Word file (still cut off at the bottom) while the PDF Tools version is lower resolution (DPI). This seems quite strange, because the map part on the title page is higher resolution in the PDF Tools Word file.

The size (width) of said map image in the PDF Tools version seems to be dictated by the letter page size (width) that the original PDF file is using.

Anyway, these are intricacies of various conversion processed. The important part is that all of these conversion algorithms (including PDF Tools) are able to extract the images out of the form fields. So the image data is in there and it should be quite possible be extracted using methods that are already present in PDF Tools' code. ;)
 
User avatar
Will - Tracker Supp
Site Admin
Posts: 5881
Joined: Mon Oct 15, 2012 9:21 pm
Location: Chemainus, BC
Contact:

Re: Extracting images out of an "interactive" PDF

Tue Jan 31, 2017 10:29 am

Hi Timur,

I apologize if I'm being dense, but I'm struggling to understand what you need here. Are you just looking for the maps to be exported to images? Or is the issue the quality of the images that are exported? Can you provide a brief and clear, point by point summary of what you're looking for?

e.g.
- Better image quality.
- Map images exported.

Also, then provide a clear point by point breakdown of what you are actually seeing, similar to the above example?

Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support
http://www.tracker-software.com
 
Timur Born
User
Topic Author
Posts: 341
Joined: Tue Jun 26, 2012 1:50 pm

Re: Extracting images out of an "interactive" PDF

Tue Jan 31, 2017 10:36 am

No problem. ;)

I need the maps on page 2/3 to be extracted at original (pixel) size, no downsampling, cropping or effects applied. Just the "original" files.

Since there is no such function available I tried the workaround of exporting the whole file into another format. I tried Word and Powerpoint via XChange and other software and additionally RTF via Acrobat DC. All these exported files resample the images in order to fit them to the original letter sized page, including the maps.

Furthermore, judging from how XChange transforms the images on the front-page - the ones that can also be extracted via PDF Tools - it becomes clear that XChange also applies other effects directly to the exported images (like the gradation on the map image part of the front-page).

It's a shame that end-users have to go through such ordeals after paying Paizo (the ones who create and sell these files) good money to get these maps. The whole "interactive" parts is a gimmick that solves one little problem at the cost of producing new ones. They should just have created a normal PDF with one map option per page that could easily be extracted. I was hoping to use XChange to workaround these restrictions created by Paizo.
 
Timur Born
User
Topic Author
Posts: 341
Joined: Tue Jun 26, 2012 1:50 pm

Re: Extracting images out of an "interactive" PDF

Tue Jan 31, 2017 1:39 pm

In Soda PDF the frames/maps are identified as "Widgets". I was not able to extract the content yet, but the widget parameters can be edited (like what happens on a mouse-click, border and the like). I can also move them around.

Editor doesn't show any properties for these at all.
 
User avatar
Will - Tracker Supp
Site Admin
Posts: 5881
Joined: Mon Oct 15, 2012 9:21 pm
Location: Chemainus, BC
Contact:

Re: Extracting images out of an "interactive" PDF

Tue Jan 31, 2017 1:51 pm

Hi Timur,

Widgets are individual instances of form fields, so Soda PDF is indentifying these correctly. I believe that we don't show the properties because of the security, which is also why we don't allow for them to be moved. It seems as though Soda PDF isn't respecting the security on the document.

As for the issues with the image quality, I see something similar here, although I'm using a pre-release of the next build and don't seem to get all of the images in the document when using the Extract Images feature of PDF-Tools, but do when converting to Word. I'll pass this along to the Dev. Team to look at.

Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support
http://www.tracker-software.com
 
Timur Born
User
Topic Author
Posts: 341
Joined: Tue Jun 26, 2012 1:50 pm

Re: Extracting images out of an "interactive" PDF

Tue Jan 31, 2017 2:50 pm

All that security stuff seems a bit mixed up. Soda doesn't allow extracting images because of security settings, which XChange does allow. One thing that Soda does better than PDF Tools / Editor is that it put *all* images into a converted Word/RTF file, not just one.

So in the Soda Word file I get:

0. Background image underneath all other images.
1. Image of the markings.
2. Image of the shadow beneath the marking.
3. Image of the map without grid.
4. Image of the map with grid.

All of these are placed on top of each other and can individually be accessed in the exported Word/RTF file.

In the PDF Tools Word file I get:

0. Background image on top of all other images.
1. Single text boxes of the markings (one box per marking).
2. Single "form" box for every single shadow beneath every single marking letter.
3. Image of the map with grid.
4. There is no version of the map image without grid.

I compared the large map on the last page and noticed that both versions are downsampled to the same resolution. Difference in the downsampling filter make the Soda version look a bit more contrasty in parts (at the cost of other stuff). Both versions can be saved out of Word at the same resolution and pixel dimensaions. But strangely when the XChange version is copy + pasted into an image program its detail resolution decreases to half the resolution in Word, even though the image dimensions stay the same.

What Soda does worse than PDF Tools is that it turns most (but not all) of the title page into a single image. Soda also lowers the image resolution, just like PDF Tools does. This is evident in the (R) and TM text at the right side of the title text.

One more thing I noticed: There is a distinct difference in how PDF Tools / Editor interpret colors in these extracted/exported images compared to most other software, XChange has more green tint, other software more red/warmer tint. This is something I noticed with XChange Standard and Editor as well, so better be left for another thread.
 
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 12027
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Extracting images out of an "interactive" PDF

Thu Feb 02, 2017 4:47 pm

Hello Timur,

The images of the map without grid are "backgrounds" for some form objects that are shown or hidden when you click the relevant buttons in that form. Unfortunately we do not currently extract those as images. PDF Tools will extract only what is an actual base content image object as an image and not those form elements.
I can not comment on why the author of the file decided to put the images in such interactive form elements - that is up to them to answer, but the truth is it is not easy to extract the images.

Please note that the original image format might be lost - so it's not possible to extract them "as they are" without any modifications - unless you are e.g. extracting to BMP files - but then the resulting images will be huge.

Regards,
Stefan
 
Timur Born
User
Topic Author
Posts: 341
Joined: Tue Jun 26, 2012 1:50 pm

Re: Extracting images out of an "interactive" PDF

Thu Feb 02, 2017 5:41 pm

I would be fine with "huge" images as long as the original resolution is retained without resampling. The problem with using "Export" for this is that it resamples the images instead of retaining their original resolution (and then just changing magnification in Word to adjust their size). That being said, I would also use export without resampling as an alternative to image extraction, just anything that gives me full resolution images.

In the end I am going to print these maps at 1 grid box being 1 inch on paper, so they will end up being even larger when the PDF original is smaller than that. Usually I use Photoshop's special enlargement algorithm to prevent too much pixelation and blur, which is fine for table-top gaming material.

No idea why Paizo doesn't provide full sized maps (which would print on several dozen pages for those very large maps). They take money for these files, so I would have expected more of an incentive to provide better quality images. But at least I want as good a source file as possible before I blow it up in size via Photoshop.
 
Timur Born
User
Topic Author
Posts: 341
Joined: Tue Jun 26, 2012 1:50 pm

Re: Extracting images out of an "interactive" PDF

Fri Feb 03, 2017 5:26 pm

Just to mention this: The overview charts that lists all features of your various software packages lists "Convert PDF to .DOC/.RTF (no OCR capability)" as "-" in every column.
 
Timur Born
User
Topic Author
Posts: 341
Joined: Tue Jun 26, 2012 1:50 pm

Re: Extracting images out of an "interactive" PDF

Mon Mar 13, 2017 4:44 pm

I found two web-sites that can extract all images at full resolution out of the PDF forms, including both the version with and without grid. Of course I would prefer to use PDF Tools (or even Editor) to do so, especially because the whole process is slower via uploading to web-sites.
 
Timur Born
User
Topic Author
Posts: 341
Joined: Tue Jun 26, 2012 1:50 pm

Re: Extracting images out of an "interactive" PDF

Mon Mar 13, 2017 5:35 pm

Just to mention it: One of these web-sites also offers conversion to Word and other formats. It works quite well overall and images are embedded at full resolution into the Word file. They are then downsized with Word's own percentage option to 64% to fit the page, but can be restored to 100% or saved as an image file without loss.
 
Timur Born
User
Topic Author
Posts: 341
Joined: Tue Jun 26, 2012 1:50 pm

Re: Extracting images out of an "interactive" PDF

Wed Mar 15, 2017 9:01 am

Since you asked for the web-site in the other (true type) thread, I suppose I can post these here, too.

http://www.extractpdf.com/

This is the one that also extracts true-type fonts, it extracts some of the original image files from the forms switches at original resolution. I only gets the images with the grid on top, though.

On the other hand it is more successful at extracting various non-forms images from the front-page of my document, both compared to PDF Tools and compared to the other web-site. These extra images are white halos/filled outlines that are layered below the other images.

http://www.pdfaid.com/ExtractImages.aspx

Here is another one that extracts these image files successfully. The main difference is that it you choose an image format for saving while the other one seems to extract the original image format. It creates various duplicates of the forms based images, but it also manages to extract both the grid and non-grid version of one of the images.

It extracts the same (fewer) images from the non-forms front-page as PDF Tools, just without the transparencies that Tools can do now.
 
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 12027
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Extracting images out of an "interactive" PDF

Wed Mar 15, 2017 4:52 pm

Hi Timur,

I have a strong suspicion that this site simply rasterizes the content of the file - and then generates images from it.
You can do the same with the Editor -File -> Export to Images...
Select a high enough resolution and the original images pixel widths and heights will be preserved. You can then manipulate those further in PS to achieve your desired result.

Regards,
Stefan
 
Timur Born
User
Topic Author
Posts: 341
Joined: Tue Jun 26, 2012 1:50 pm

Re: Extracting images out of an "interactive" PDF

Wed Mar 15, 2017 6:36 pm

I don't agree. The images of the front-page are definitively extracted and the forms embedded images also very much seem to be extracted rather than cut out.

Using the Export to Images function uses a fixed DPI setting that results in specific resolutions that are upsampled from the original bitmap resolution. This is a problem, because I want to use more sophisticated upsampling later on to bring the maps to 1:1 size (1 grid square = 1 inch).

There also is a small issue, the image on the last page that defaults to a version with a grid on it is not exported as grid version but as grid-less version. Only once I switch the image back and forth via the forms buttons does Editor export the correct grid version.

These PDF files are awkward to work with, I blame their producers for that. But having PDF Tools extract all images, including the ones embedded in the forms would very much improve the experience.

That this is possible is proven by your very own Word export function that put all these images (with and without grid) inside the resulting DOCX file. Unfortunately it also downsamples the files to lower resolution during export. The PDFaid web-site's Word converter not only does quite a good job overall (tested some complex PDFs), but also embeds the forms images at high resolution (zoomed to smaller size within Word).

Who is online

Users browsing this forum: No registered users and 1 guest