Smaller pdf file size

Forum for the PDF-XChange Editor - Free and Licensed Versions

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
DWC121
User
Posts: 66
Joined: Thu Jul 30, 2015 5:18 am

Smaller pdf file size

Post by DWC121 »

Greetings,

I have a 140 page pdf that was created from jpg images. The images are 150 dpi (1218 pixels wide by 1650 pixels tall). The images are of a school yearbook. The resulting pdf is 539,579 KB.

The pdf will be available for viewing on our Alumni website. Some classmates may want to print the pdf.

We will be having about 40 other pdf's on the website. Each pdf will be between 40 and 140 pages and currently they all have the same dpi, width and height. The files will take up a lot of space on our website. I do not know what our allocated space is on our website.

Years ago 72 dpi was ideal for viewing and 96 dpi for printing. Monitors and printers have changed.

What is the best way to shrink the file size and have it look half way decent on a Computer monitor? Should I shrink the dpi (for viewing) and mention if someone wants a pdf to print to let me know and I can share a larger size pdf file with them using a service like Dropbox? (I've used Dropbox for other things).

I have all the original images at 300 dpi and in bmp format. I'll use those to recreate jpg's in a lower dpi before creating a pdf if I have to.

Thanks - David
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8592
Joined: Wed Jan 03, 2018 6:52 pm

Re: Smaller pdf file size

Post by TrackerSupp-Daniel »

Hello David,

that is quite the predicament indeed, My suggestion would fall in line with your idea there, having a lower resolution version "Live", and the higher resolution versions stored in another location for those who explicitly request it. In term of the live version, I am not sure how much lower than 150dpi you can go without sacrificing too much of the image fidelity, however reducing them to 96 dpi will likely still offer a nice "middle ground" for size to visual acceptability.

Theoretically, a 96dpi image can be as little as ~1/100 of a 300dpi image of the same resolution, to give an idea of the potential savings in that kind of reduction.

The Editor does also offer an optimization function, via File > Save as optimized, which allows you to tweak many settings before optimizing, an allows you to audit the current space usage so that you can see how much of the document is images, fonts, etc. This may help in your endeavours.

Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
DIV
User
Posts: 252
Joined: Fri Jun 23, 2017 1:47 am

Re: 1/100

Post by DIV »

I'm curious as to how ~1/100 was arrived at for the theoretical size ratio going from 300 dpi to 96 dpi.
I would have thought it to be (96/300)² ≈ 1/9.766 ~ 1/10.
DIV
User
Posts: 252
Joined: Fri Jun 23, 2017 1:47 am

Convert to text.

Post by DIV »

In general there is another way to drastically shrink file size, which is to convert from images (of text) to actual text.

A couple of warnings up front. This will only be appropriate if
  • the original scan quality is excellent; and
  • the document comprises mostly prose (not mostly photographs or graphics); and
  • you have plenty of time and patience.
The principle is as follows:
  1. Acquire high-quality scans. In general 300 dpi is suitable, but 400 dpi may be needed for small text.
  2. Perform Optical Character Recognition (a.k.a. "OCR"). This creates a 'layer' of real text. This text is set invisible, so that it won't intrude on viewing the underlying scan image.
  3. Make the text visible. (See also forum.)
  4. Delete the image.
As far as I know, in PDF-XChange Editor this process would generally have to be performed 'manually', per the link(s) above.
I originally supposed that it could only be done page by page, but now I discover the text colour (in the OCR layer) can be adjusted for the entire document in just one step. However, the images seem to only be deletable by manual selection. (Unless there's a clever trick?)
In any case, for a real-life use case I believe you would probably want to go page by page anyway, because (i) you should check that the OCR results are OK, and (ii) you won't want to rashly delete the scan images for pages that contain graphics or photographs. The latter would be troublesome to optimise 'manually', but in theory could be done by using the functionality to right-click images and edit them in third-party software. (In this case the editing would be cropping or blanking of text portions of a scan.)
As I say, a pain to do manually, but theoretically a good way of reducing file size.

Alternatively, before the scan images are deleted, the PDF file can be exported to Microsoft Word document format. The images can then be manipulated within Microsoft Word. (Deleted, cropped, moved, etc..) An advantage of this is that it would be easier to find and correct spelling issues caused by mistakes in the OCR.

This might also help explain the reason why file size should shrink: it is like 'reverse engineering' the scanned images to get a formatted-text document (with perhaps just a few embedded graphics).

—DIV

P.S. In principle it would be possible for a software application to implement this automatically with a dedicated OCR option like "PDF Output Style" = "Formatted Text & Graphics". Hypothetically, of course :-)
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8592
Joined: Wed Jan 03, 2018 6:52 pm

Re: Smaller pdf file size

Post by TrackerSupp-Daniel »

Hello DIV,

Thank you for the comprehensive writeup, I hope that it helps DWC121.

As for my slipup regarding the ratio, It seems that was a typo on my part, apologies for the error!
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Post Reply