How to filter files you dont want / filter type "leave all except this tipe"[SOLVED]

This Forum is for the use of End Users requiring help and assistance for Tracker Software's PDF-Tools.

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan

Post Reply
cunha00
User
Posts: 10
Joined: Mon May 01, 2017 4:29 pm

How to filter files you dont want / filter type "leave all except this tipe"[SOLVED]

Post by cunha00 »

I would like to filter off those files I already did ocr. They all have a sufic -OCR.PDF

I would like to do something like: *.* NOT *-OCR.PDF
Or: -*-OCR.PDF

Is there some wildcard I cant use to achieve this?
Last edited by cunha00 on Wed Nov 06, 2019 12:13 pm, edited 1 time in total.
User avatar
Vladimir G - Tracker Dev
User
Posts: 40
Joined: Thu Nov 30, 2017 1:24 pm

Re: How to filter files you dont want / filter type "leave all except this tipe"

Post by Vladimir G - Tracker Dev »

Hello, cunha00

Unfortunately, for now it is not possible to filter off(do not process) files that contain specific characters in the file name.
You can only filter files by name that matches pattern.

It is not clear what kind of work you exactly want do with PDF-Tools(filter files only or apply some action too).

In general, I can suggest some possible workrounds for you:
1. If you are using "OCR Pages" action from PDF-Tools and your not OCR-ed documents doesn't not contain text you can use option of the "OCR Pages" action: "If document contains text - Skip processing the document".

2. You can filter files in andvance using Windows Command Line and combine a *.pdtfl file that can be passed as input to any Tool.
For example you can use command:

Code: Select all

dir /s /b /a-d | findstr /v /r ".*-OCR.pdf" | findstr ".*.pdf" >> input.pdtfl
where ".*-OCR.pdf" is Regular Expression pattern for name of files you want to ignore.

After you got input.pdtfl you can pass it as input of any Tool in PDF-Tools(no matter in UI or in Command Line):
- In UI turn on option "Select file list" in "Choose Input Files" action.
- In Command Line pass it as argument of /RunTool command, e.g. "PDFXTools.exe /RunTool <tool-id> input.pdftl

Best regards,
vmgoshko
Vladimir Goshko
Software Developer
Tracker Software Products
cunha00
User
Posts: 10
Joined: Mon May 01, 2017 4:29 pm

Re: How to filter files you dont want / filter type "leave all except this tipe"

Post by cunha00 »

I´ think I´ll take your suggestions.

On my case it is a very large nested folder with legal documetns on it and I dont want to OCR be redone. I can´´t use the first option because some PDFs have page with OCR and other pages without it, but the command line file creation will work for me!

Thanks for the feedback and available options
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8611
Joined: Wed Jan 03, 2018 6:52 pm

Re: How to filter files you dont want / filter type "leave all except this tipe"[SOLVED]

Post by TrackerSupp-Daniel »

If there are some documents with text content already existing, you can setup the OCR action itself to "skip pages with text content", or "ignore existing text on page" so that none of the text is duplicated:
image.png
Using that in conjunction with the prior suggested command line options should help you work through all of the files there without altering the existing text.
I hope this helps!
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Post Reply