Page 1 of 1

Get a Scribd book as a searchable PDF with PDF Xchange Editor

Posted: Mon May 01, 2017 3:54 am
by Mitch
Create a searchable PDF from any Scribd book with PDF Xchange Editor
How? Use software to automate flipping the pages and take screenshots, and use PDF Xchange to create a a high quality searchable PDF.

Windows or Mac

I need my books to be available offline. Scribd Premium its offline storage often fails. The solution is to scan the pages from screen and create a searchable PDF. It's for personal usage. I tried OSX Automator, Abbyy Finereader, Acrobat DC professional, and ePub. All of these have issues with readability or OCR accuracy.

Here we go:

Login to Scribd and open the book to read.

Step 1: Screenshot pages as PNG with Keyboard Maestro (Mac) or AutoHotkey (Windows). Both are Freeware. Make sure screenshots are taken in Full Screen.

Keyboard Maestro for Mac:
Keyboard Maestro settings
Keyboard Maestro settings
AutoHotkey for Windows
This is a bit more complex and involves a script:

^!R:: ; CTRL+ALT+R to run the script
loop 400 ; keep going for n number of times in this case 400 times
{
Send +{Printscreen} ;keystroke [shift]+[PrintScreen]
SetKeyDelay, 5000 ; delay for 5 seconds
Send {right} ; keystroke right
SetKeyDelay, 5000 ; delay for 5 seconds

}
return
For more info check https://autohotkey.com/board/topic/5811 ... re-script/

Step 2: Batch conversion and rename with XnView (Mac) or Irfanview (Windows).
PDF Xchange can also sharpen scan images but not in batch. That's why I use XnView or Irfanview.

Xnview for Mac
Choose Tools - Batch Convert and set the actions below under the second tab:
XnView settings
XnView settings
Irfanview for Windows
Choose Menu - Batch conversion and rename
a. PNG compression level 6
b. Crop to 1170 x 770 (this is optional and removes the grey Scribd borders. The size is based on Macbook screen resolution 1440 x 900. Get SwitchResX for Mac if you want a screenshot of higher quality, which requires a higher screen resolution).
c. Sharpen 10, Contrast 15

Step 3: Image to PDF with PDF Xchange Editor (V6 build 321)
a. File - New Document from Image files
Go to Options
b. Select Paper size from Image size (under New Page Options)
c. Fit Image to Cell (centre-middle) under Images Layout Options
d. Flate compression all (True color, Grayscale, etc.) under Image compression
e. Set OCR Medium under Image Postprocessing. You can also skip this setting first and pre-process to see how the quality of the scanned images will be. And make your document searchable with the desired OCR accuracy via Menu - Document - OCR.
Image options
Image options
After setting the above options click ok and ok again to run and process the images. PDF Xchange is now going to OCR (recognize) the images you selected.
Alternatively, skip the OCR part first
PDF Exchange OCR process
PDF Exchange OCR process
Step 4: Split Pages
a. Split pages with PDF Xchange Editor
b. Menu - Document - Split Pages
c. Click on icon to select a Vertical split 50% after which a dotted vertical red line appears in the preview
d. Select Remove Source pages
e. Select Change physical size
Split Pages
Split Pages
That's it. Good luck.

Mitch

Re: Get a Scribd book as a searchable PDF with PDF Xchange Editor

Posted: Wed May 03, 2017 9:42 am
by Tracker Supp-Stefan
Hello Mitch,

Many thanks for this tutorial!
Hope other people will find it useful as well!

Cheers,
Stefan