Bulk OCR Existing Files in Folder  SOLVED

This Forum is for the use of End Users requiring help and assistance for Tracker Software's PDF-Tools.

Moderators: TrackerSupp-Daniel, Tracker Support, Chris - Tracker Supp, Vasyl-Tracker Dev Team, Sean - Tracker, Tracker Supp-Stefan

Post Reply
bqxmprij
User
Posts: 128
Joined: Tue Dec 18, 2012 3:51 am

Bulk OCR Existing Files in Folder

Post by bqxmprij » Sun May 02, 2021 3:01 pm

I have a lot of pdf files I need to review in a folder and subfolders. I want to OCR anything in the folder and subfolders. I don't understand the save option in PDF-Tools. The "Save Document" part of the OCR tool doesn't seem to have the option to OCR each document and save without renaming or saving a new document. How do I OCR an existing file, save it, and move on to the next? How do I do that?

User avatar
Ovg
User
Posts: 361
Joined: Tue Sep 05, 2017 4:56 pm
Location: Moscow

Re: Bulk OCR Existing Files in Folder

Post by Ovg » Sun May 02, 2021 3:37 pm

20210502_183622.png
Last edited by Ovg on Sun May 02, 2021 3:41 pm, edited 1 time in total.
It's impossible to lead us astray for we don't care even to choose the way.
PDF-XChange PRO, 9.0 (Build 354.0) / W7 x64 SP1

bqxmprij
User
Posts: 128
Joined: Tue Dec 18, 2012 3:51 am

Re: Bulk OCR Existing Files in Folder

Post by bqxmprij » Sun May 02, 2021 3:40 pm

Ovg,

Thank you for your post. I agree. That is the window and the option in the bottom right. See how it will save a new file with an OCR name? I don't want that. I want PDF-Tools to open the file, OCR it, save it, and move on without creating new files or changing the file name.

User avatar
Ovg
User
Posts: 361
Joined: Tue Sep 05, 2017 4:56 pm
Location: Moscow

Re: Bulk OCR Existing Files in Folder  SOLVED

Post by Ovg » Sun May 02, 2021 3:55 pm

20210502_185334.png
It's impossible to lead us astray for we don't care even to choose the way.
PDF-XChange PRO, 9.0 (Build 354.0) / W7 x64 SP1

bqxmprij
User
Posts: 128
Joined: Tue Dec 18, 2012 3:51 am

Re: Bulk OCR Existing Files in Folder

Post by bqxmprij » Sun May 02, 2021 7:52 pm

OVG, you are the best! For some reason it didn't think of just saving it with the same file name.

Now, I am wondering why some documents didn't OCR, but that is another issue.

User avatar
Ovg
User
Posts: 361
Joined: Tue Sep 05, 2017 4:56 pm
Location: Moscow

Re: Bulk OCR Existing Files in Folder

Post by Ovg » Mon May 03, 2021 7:32 am

bqxmprij wrote:
Sun May 02, 2021 7:52 pm
Now, I am wondering why some documents didn't OCR, but that is another issue.

Hi, bqxmprij
Check OCR settings:

20210503_102850.png
It's impossible to lead us astray for we don't care even to choose the way.
PDF-XChange PRO, 9.0 (Build 354.0) / W7 x64 SP1

User avatar
Tracker Supp-Stefan
Site Admin
Posts: 14586
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Bulk OCR Existing Files in Folder

Post by Tracker Supp-Stefan » Mon May 03, 2021 4:28 pm

Hello Ovg,

Many thanks for the help! Indeed that might be the reason why some files were skipper for bqxmprij.

@bqxmprij - please let us know if OVG's suggestion helped you sort everything out?

Kind regards,
Stefan

bqxmprij
User
Posts: 128
Joined: Tue Dec 18, 2012 3:51 am

Re: Bulk OCR Existing Files in Folder

Post by bqxmprij » Mon May 03, 2021 6:58 pm

Of the three options, I used "do not OCR but continue processing." I don't know why some were not OCR'd.

I think there are 3 types of documents:
1. Documents with full text (e.g., computer generated pdfs) or any text.
2. Documents with no text (e.g., a scan).
3. Documents with both some text and some areas could be OCR'd but don't have text.

I think the options only contemplate 1 and 2. How do you OCR a document in category 3? In other words, I think we need (or let me know of) an option that reviews a document and OCRs non-text areas that could be OCR'd and ignores areas that already have text.

User avatar
TrackerSupp-Daniel
Site Admin
Posts: 4671
Joined: Wed Jan 03, 2018 6:52 pm

Re: Bulk OCR Existing Files in Folder

Post by TrackerSupp-Daniel » Mon May 03, 2021 7:13 pm

Hi, bqxmprij

To accomplish that, you would need to use the "ocr document" option (yes this does mean that all files, even those already containing text will be processed and cause the tool to take extra time), instead of the "do not OCR" option (which automatically skips any document containing any text based content at all).
With the OCR document function enabled, click "more options", and check off the options as you need:
image.png
-The "skip pages" option will skip processing any page which contains any text based content at all, so enabling this would likely result un you skipping some pages in section 3.
-The "Ignore existing text on page" option will instead process the entire page, and skip areas which text already exists (meaning you will not get overlapping text). This process is the longest of the options presented to you, but will also give the most complete result.

Kind regards,
Daniel McIntyre
Support Technician
Tracker Software Products (Canada) LTD

Support: <Support@tracker-software.com>
Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623

bqxmprij
User
Posts: 128
Joined: Tue Dec 18, 2012 3:51 am

Re: Bulk OCR Existing Files in Folder

Post by bqxmprij » Tue May 04, 2021 4:01 pm

So, operator error.

Thank you!

User avatar
TrackerSupp-Daniel
Site Admin
Posts: 4671
Joined: Wed Jan 03, 2018 6:52 pm

Bulk OCR Existing Files in Folder

Post by TrackerSupp-Daniel » Tue May 04, 2021 6:07 pm

:)
Daniel McIntyre
Support Technician
Tracker Software Products (Canada) LTD

Support: <Support@tracker-software.com>
Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623

Post Reply