Page 1 of 1

Optimizing PDF bigger than 150 Pages takes hours

Posted: Fri Aug 09, 2019 3:04 pm
by pesce
Hello!

I need to optimize large PDF Files, i.e. use "Save as optimized PDF".

By large I mean both in size and number of pages. Sample PDF documents can be found here
The PDF files are produced by a Java Swing application. They contain many complex transparent objects which can be optimized.

PDF-Xchange Pro is able to optimize these files and compress them by about a factor 10:1 :D However it takes hours (!) to complete :(

Results, when the document contain
  • 100 pages only: Duration 2 mins
  • 150 pages only: Duration 5 mins
  • 250 pages only: Duration 20 mins
  • full ca. 1000 pages: ca 6 hours
Windows version: Microsoft Windows 10 Pro, Version 10.0.16299 Build 16299.
PDF-XChange Editor Version: Version 8 Build 331.0
Hardware: LENOVO_MT_20HG_BU_Think_FM_ThinkPad T470s, i7-7600U CPU @ 2.80GHz, Physical Memory (RAM): 20.0 GB

Optimizer Settings, see Attachment. Note, especially the last setting "Find and remove content outside of the Crop Box" is very important and only this setting leads to small PDFs
Optimize_Settings.PNG
Many thanks for your help

Re: Optimizing PDF bigger than 150 Pages takes hours

Posted: Tue Aug 13, 2019 4:55 pm
by TrackerSupp-Daniel
Hello Pesce,

Thank you for the report, optimizing is a very in depth process, and as the odds are slim that every page is identical, a simple "X pages takes Y time" comparison doesn't really work in this case.

The content that optimization needs to sift through changes from page to page, and while it might not seem like much, this includes the tiny aspects that even the human cannot pickup. Take this screenshot for example.
image.png
On Page 3, there are arrows creates as images, which is no big issue. There are however also a few dozen images that are "Clipped" to 1x1 and completely invisible to the naked eye. each of these will negatively impact the optimization time, a cursory glance through the rest of the document shows many pages are setup like page 3. Conversely, Page 2 is very clean with no images whatsoever, and minimal content to go through, which will result in a more efficient optimizing.
image.png
image.png (11.08 KiB) Viewed 1277 times
In practice, if the issue is that optimizing locks up the Editor and you are unable to use it during this processing time, I might advise using PDF-Tools for the process instead, as it offers all the same functions, while also able to run in the background. It may not be ideal, but it is expected for a multiple hundred page document to take tens of minutes, upwards to multiple hours, for optimizing, depending on the content on each page.

We on the support team are running some additional tests with your documents to see if we can find something that would provide any improvement.

Kind regards,

Re: Optimizing PDF bigger than 150 Pages takes hours

Posted: Tue Aug 13, 2019 6:17 pm
by TrackerSupp-Daniel
Hello Pesce,

Thank you again for the report, These tests have brought an issue we hadn't caught before to light. It appears that each consecutive page being processed by "save as optimized" with your settings takes slightly longer to be processed. IE pages 1-10 take under a second, pages 40-50 take around 1 second each, pages 100+ take multiple seconds each, and so on. I have reported this to the Development team and created a high priority ticket to rectify this issue. I cannot speak for when it will be resolved, but we are working on it currently.

For reference, you can ask any member of our support team for the following ticket number, and we will provide an update if available:
RT#4872: Optimization Takes longer per page processed

I am looking for a workaround you can use to speed up the process in the meantime, but have so far been unsuccessful. I will be sure to come back as soon as I have found anything that may be of use.

Re: Optimizing PDF bigger than 150 Pages takes hours

Posted: Tue Aug 20, 2019 12:13 pm
by pesce
Hi @TrackerSupp-Daniel,

Many thanks for your analysis and especially for opening a ticket.

I am looking forward to hearing from you.

Kind regards,
Pesce

Re: Optimizing PDF bigger than 150 Pages takes hours

Posted: Tue Aug 20, 2019 3:08 pm
by Will - Tracker Supp
:)

Re: Optimizing PDF bigger than 150 Pages takes hours

Posted: Thu Aug 22, 2019 10:49 pm
by Timur Born
It would also be nice if XChange could use multithreading for its optimization process. Maybe something like one thread per page redacting or so.

Re: Optimizing PDF bigger than 150 Pages takes hours

Posted: Thu Aug 22, 2019 11:56 pm
by TrackerSupp-Daniel
Hello Timur,

Optimizing should now make use of multiple cores, at least in the test build for 332 it now does as you can see the spike occurring as I begin optimizing an 800 page document:
image.png
image.png (5.3 KiB) Viewed 1213 times
With that said, unfortunately the issue reported earlier in this thread is not yet resolved, so the overall process is still a long one.

Kind regards,

Re: Optimizing PDF bigger than 150 Pages takes hours

Posted: Fri Aug 23, 2019 7:15 am
by Timur Born
Not seeing any meaningful multi-threading here. The bottleneck is one single thread, the others are quick to come and go with little real extra load happening.

Re: Optimizing PDF bigger than 150 Pages takes hours

Posted: Wed Nov 20, 2019 6:08 pm
by chaspi
Is there anything new on this ticket?

I run into the same when having documents with a lot of pages. I can confirm that the process gets near exponentially slower the more pages are processed but it also depends on the preprocessing. E.g. when optimizing a 200MB/1000Page file with Acrobat PRO first to 20MB/1000Page it can be processed with XChangeEditor within minutes while optimizing the 200MB/1000Page file directly takes hours (I stopped the process after 3h).

Re: Optimizing PDF bigger than 150 Pages takes hours

Posted: Wed Nov 20, 2019 6:34 pm
by Paul - Tracker Supp
Hi chaspi,

I have asked the team for a status update on this item and will let you know what comes back.

hth

Re: Optimizing PDF bigger than 150 Pages takes hours

Posted: Tue Dec 10, 2019 3:13 pm
by pesce
Paul - Tracker Supp wrote:
Wed Nov 20, 2019 6:34 pm
Hi chaspi,

I have asked the team for a status update on this item and will let you know what comes back.

hth
Would be really great, if you could have an update on that! :D

Many thanks,
Pascal

Re: Optimizing PDF bigger than 150 Pages takes hours

Posted: Tue Dec 10, 2019 6:55 pm
by TrackerSupp-Daniel
Hello Pesce,

Thank you for reaching out to us, we have made some changes in build 334 which aimed to address this, I am just running a test on your 996 page document, and it has completed processing in the time it has taken me to write this (around 1 minute) and is now just finishing up the save process.

Can you please update the software and see if you find the same?
[Edit: I forgot to change my settings while testing this just now, going through again with your settings enabled definitely seems to be faster than it was before (~15% after a minute) but still a long process. I would still suggest updating to take advantage of this speed improvement, but know that we are still working on improvements in this area.]

Kind regards,

Re: Optimizing PDF bigger than 150 Pages takes hours

Posted: Mon Jan 06, 2020 4:06 pm
by pesce
TrackerSupp-Daniel wrote:
Tue Dec 10, 2019 6:55 pm
Hello Pesce,

Thank you for reaching out to us, we have made some changes in build 334 which aimed to address this, I am just running a test on your 996 page document, and it has completed processing in the time it has taken me to write this (around 1 minute) and is now just finishing up the save process.

Can you please update the software and see if you find the same?
[Edit: I forgot to change my settings while testing this just now, going through again with your settings enabled definitely seems to be faster than it was before (~15% after a minute) but still a long process. I would still suggest updating to take advantage of this speed improvement, but know that we are still working on improvements in this area.]

Kind regards,
That's good news indeed :)

I've tested it
  • Version 8.0.335.0
And can give you the following feedback
  • Optimizing with settings above took araound 1h, plus ca 3 mins for saving the optimized pdf file
  • The filesize got reduced to 33 MB
  • All the bookmarks are working, visual quality is as good as in the non-optimized version
So, it goes in the right direction :)

Re: Optimizing PDF bigger than 150 Pages takes hours

Posted: Mon Jan 06, 2020 4:43 pm
by Tracker Supp-Stefan
Hello Pesce,

Just run a test myself with your settings from the original post.

I was at about 800 pages after 8 minutes of optimization, and thile it did slow down a bit after that it was still doing the "redacting" operation at a reasonable speed. The first 500 pages were done in about 4 and a half minutes, and at around 1000th page it was taking a bit over a minute per 100 pages (I was at 955 pages on the 10th minute, and at 1000 on the 11th minute). A more noticeable slowdown was visible after the 1000th page with the last page (1158) done at around the 18 minutes 45 second mark for me.
Then the rest of the optimization process after "Redacting" completed in under a minute!

So it does seem like there is a slow down with "Find and Remove content outside of the Crop Box" turned on even in the latest build, so I will ask my colleagues to investigate further.

If I do not use the "Find and remove content outside the Crop box" - the operation is much quicker though as no Redaction is needed! With the default settings I got the conversion done in under a minute (around 47 seconds), and then selecting the new file's location and actually saving it took me longer than the optimization process :)

An alternative approach to speed things up while still using your settings might be to split the file in e.g. 300 or 600 pages per portion (and select to "only keep relevant bookmarks" when splitting the original). Then optimizing it should be very fast (600 pages were done in about 2.5 minutes for me and the remaining 558 pages in about 2 minutes), and you can then merge the optimized portions.

Regards,
Stefan

Re: Optimizing PDF bigger than 150 Pages takes hours

Posted: Wed Jan 08, 2020 9:11 am
by pesce
Thanks for your further investigations on that :-)

Splitting the PDF and then merging is bit problematic because of the bookmarks. Especially those "manual" bookmarks on pages 1 and 2 get lost after merging.

Re: Optimizing PDF bigger than 150 Pages takes hours

Posted: Wed Jan 08, 2020 8:17 pm
by TrackerSupp-Daniel
Understood Pesce,

We will keep trying to make improvements on this front, hopefully for now this is at least a tolerable amount of time. If you need to work on other items in the Editor while processing these, Might I suggest optimizing the documents with PDF-Tools instead? This will leave the editor free to continue working while the other documents are processed.

Kind regards,