Hi
I have TWO distinct requirements for updating in batch a large library of PDF files, over 10k
The first is to search for all without OCR (they are mixed scans, and digital pdfs, some have been OCR'd during scanning, some never had ocr and were generated elsewhere). Can this be done using tools? The task is then to OCR them as a batch, either insitu overwriting the original file, or appending tot he filename.
The second search is for PDF documents which contain more than 1 page, one of the "libraries" is expense invoices, and we seem to need to check every month that no "doubling" up has occurred and that 2 expenses are not in a single file. So the search results need to be previewed to confirm multi page files have all pages related to the main first page. Splitting is easy, it's finding them!
Is Tools the right product for this? Can it do both?
Or can Editor / Editor Pro offer a better solution?
Thanks in advance for any tips/how to's etc
Richard
search / find files - with no OCR OR multiple pages
Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan
- Tracker Supp-Stefan
- Site Admin
- Posts: 17941
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
- Contact:
Re: search / find files - with no OCR OR multiple pages
Hello Richard,
Using the PDF Tools - you can create a batch of all your files - and ask the PDF Tools to OCR them all, and then in the settings set it up so that it skips any pages that do contain actual text on them already.
That way Tools will go through all the files, and only process the necessary pages and documents.
As for the second requirement - it could potentially be done with e.g. some JS run through inside the Editor - but an easier approach would be to e.g. open the folder with all your files inside Windows Explorer, and with our Shell Extensions installed - you should be able to turn on a "Pages" column - and then just sort your results by that: Effectively letting Windows and our Shell Extensions do your 'filtering' for you.
Regards,
Stefan
Using the PDF Tools - you can create a batch of all your files - and ask the PDF Tools to OCR them all, and then in the settings set it up so that it skips any pages that do contain actual text on them already.
That way Tools will go through all the files, and only process the necessary pages and documents.
As for the second requirement - it could potentially be done with e.g. some JS run through inside the Editor - but an easier approach would be to e.g. open the folder with all your files inside Windows Explorer, and with our Shell Extensions installed - you should be able to turn on a "Pages" column - and then just sort your results by that: Effectively letting Windows and our Shell Extensions do your 'filtering' for you.
Regards,
Stefan