I would like to filter off those files I already did ocr. They all have a sufic -OCR.PDF
I would like to do something like: *.* NOT *-OCR.PDF
Or: -*-OCR.PDF
Is there some wildcard I cant use to achieve this?
How to filter files you dont want / filter type "leave all except this tipe"[SOLVED]
Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan
How to filter files you dont want / filter type "leave all except this tipe"[SOLVED]
Last edited by cunha00 on Wed Nov 06, 2019 12:13 pm, edited 1 time in total.
- Vladimir G - Tracker Dev
- User
- Posts: 40
- Joined: Thu Nov 30, 2017 1:24 pm
Re: How to filter files you dont want / filter type "leave all except this tipe"
Hello, cunha00
Unfortunately, for now it is not possible to filter off(do not process) files that contain specific characters in the file name.
You can only filter files by name that matches pattern.
It is not clear what kind of work you exactly want do with PDF-Tools(filter files only or apply some action too).
In general, I can suggest some possible workrounds for you:
1. If you are using "OCR Pages" action from PDF-Tools and your not OCR-ed documents doesn't not contain text you can use option of the "OCR Pages" action: "If document contains text - Skip processing the document".
2. You can filter files in andvance using Windows Command Line and combine a *.pdtfl file that can be passed as input to any Tool.
For example you can use command:
where ".*-OCR.pdf" is Regular Expression pattern for name of files you want to ignore.
After you got input.pdtfl you can pass it as input of any Tool in PDF-Tools(no matter in UI or in Command Line):
- In UI turn on option "Select file list" in "Choose Input Files" action.
- In Command Line pass it as argument of /RunTool command, e.g. "PDFXTools.exe /RunTool <tool-id> input.pdftl
Best regards,
vmgoshko
Unfortunately, for now it is not possible to filter off(do not process) files that contain specific characters in the file name.
You can only filter files by name that matches pattern.
It is not clear what kind of work you exactly want do with PDF-Tools(filter files only or apply some action too).
In general, I can suggest some possible workrounds for you:
1. If you are using "OCR Pages" action from PDF-Tools and your not OCR-ed documents doesn't not contain text you can use option of the "OCR Pages" action: "If document contains text - Skip processing the document".
2. You can filter files in andvance using Windows Command Line and combine a *.pdtfl file that can be passed as input to any Tool.
For example you can use command:
Code: Select all
dir /s /b /a-d | findstr /v /r ".*-OCR.pdf" | findstr ".*.pdf" >> input.pdtfl
After you got input.pdtfl you can pass it as input of any Tool in PDF-Tools(no matter in UI or in Command Line):
- In UI turn on option "Select file list" in "Choose Input Files" action.
- In Command Line pass it as argument of /RunTool command, e.g. "PDFXTools.exe /RunTool <tool-id> input.pdftl
Best regards,
vmgoshko
Vladimir Goshko
Software Developer
Tracker Software Products
Software Developer
Tracker Software Products
Re: How to filter files you dont want / filter type "leave all except this tipe"
I´ think I´ll take your suggestions.
On my case it is a very large nested folder with legal documetns on it and I dont want to OCR be redone. I can´´t use the first option because some PDFs have page with OCR and other pages without it, but the command line file creation will work for me!
Thanks for the feedback and available options
On my case it is a very large nested folder with legal documetns on it and I dont want to OCR be redone. I can´´t use the first option because some PDFs have page with OCR and other pages without it, but the command line file creation will work for me!
Thanks for the feedback and available options
- TrackerSupp-Daniel
- Site Admin
- Posts: 8611
- Joined: Wed Jan 03, 2018 6:52 pm
Re: How to filter files you dont want / filter type "leave all except this tipe"[SOLVED]
If there are some documents with text content already existing, you can setup the OCR action itself to "skip pages with text content", or "ignore existing text on page" so that none of the text is duplicated:
I hope this helps!
Using that in conjunction with the prior suggested command line options should help you work through all of the files there without altering the existing text.I hope this helps!
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com