I need to OCR PDFs of old books. Here is a typical example of what I am processing:
Is there anything I can do to files like this before performing the OCR, to make things better? Can I turn the contrast up, adjust colour curves, etc?
Thanks,
David M.
As you can see, the pages are dark yellow, and the text is nowhere near as dark or as clearly defined as it could be,How can I get a PDF ready for efficient OCR?
Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan
- Tracker Supp-Stefan
- Site Admin
- Posts: 17907
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
- Contact:
Re: How can I get a PDF ready for efficient OCR?
Hello dlmajor,
Yes - you can Edit the images in your PDF almost directly:
https://www.youtube.com/watch?v=bhnCAZUw2cA
Though it would probably be easier if you extract all those images (or if you already have them as separate image files), and perform any colour/brightnes/levels adjustments in an external image editing software as some form of batch process, and only then use the processed images to create new PDF files, and OCR those.
If you do not have the images as external files - our PDF Tools should allow you to extract all the images quickly, so that you can then work with those images in some image processing tool (e.g. GIMP is free and is comparable in image processing features to say Photoshop).
Kind regards,
Stefan
Yes - you can Edit the images in your PDF almost directly:
https://www.youtube.com/watch?v=bhnCAZUw2cA
Though it would probably be easier if you extract all those images (or if you already have them as separate image files), and perform any colour/brightnes/levels adjustments in an external image editing software as some form of batch process, and only then use the processed images to create new PDF files, and OCR those.
If you do not have the images as external files - our PDF Tools should allow you to extract all the images quickly, so that you can then work with those images in some image processing tool (e.g. GIMP is free and is comparable in image processing features to say Photoshop).
Kind regards,
Stefan