How can I get a PDF ready for efficient OCR?

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
dlmajor
User
Posts: 2
Joined: Thu Oct 01, 2020 1:09 am

How can I get a PDF ready for efficient OCR?

Post by dlmajor »

I need to OCR PDFs of old books. Here is a typical example of what I am processing:
book scan sample.jpg
As you can see, the pages are dark yellow, and the text is nowhere near as dark or as clearly defined as it could be,
Is there anything I can do to files like this before performing the OCR, to make things better? Can I turn the contrast up, adjust colour curves, etc?
Thanks,
David M.
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17765
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: How can I get a PDF ready for efficient OCR?

Post by Tracker Supp-Stefan »

Hello dlmajor,

Yes - you can Edit the images in your PDF almost directly:
https://www.youtube.com/watch?v=bhnCAZUw2cA

Though it would probably be easier if you extract all those images (or if you already have them as separate image files), and perform any colour/brightnes/levels adjustments in an external image editing software as some form of batch process, and only then use the processed images to create new PDF files, and OCR those.

If you do not have the images as external files - our PDF Tools should allow you to extract all the images quickly, so that you can then work with those images in some image processing tool (e.g. GIMP is free and is comparable in image processing features to say Photoshop).

Kind regards,
Stefan
Post Reply