OCR'ing existing files / preserving timestamps

This Forum is for the use of End Users requiring help and assistance for Tracker Software's PDF-Tools.

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan

Post Reply
fletch
User
Posts: 79
Joined: Wed Mar 11, 2020 2:53 am

OCR'ing existing files / preserving timestamps

Post by fletch »

I doubt this is possible but I wanted check just in case. I have 20+ years of documents to OCR. As they have migrated from one machine to another over the years the files have retained their original timestamp. This is useful as I can sort by date and easily browse chronologically regardless of the documents name.

It certainly makes sense that OCR is essentially an edit operation and logic would dictate that the file date/time would be updated after being edited. For my scenario, the ideal solution would be an option to preserve the files existing timestamp during the OCR process.
User avatar
DenisO
Site Admin
Posts: 104
Joined: Fri Jun 09, 2017 5:40 pm

Re: OCR'ing existing files / preserving timestamps

Post by DenisO »

Hi
may it will be helpful to use %[DocInfo:ModDate] option.
There is button with MacroHelper in the right corner of FileName field
image.png
Kind regards,
Denis Oleksenko
Software Developer
Tracker Software Products (Canada) LTD
fletch
User
Posts: 79
Joined: Wed Mar 11, 2020 2:53 am

Re: OCR'ing existing files / preserving timestamps

Post by fletch »

That's an interesting thought but I don't really want to alter the file name.

Below is exactly what I'm looking for - from another tool I have that does OCR. The default setting here is to preserve the original date/time stamp on the file.
image.png
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8436
Joined: Wed Jan 03, 2018 6:52 pm

Re: OCR'ing existing files / preserving timestamps

Post by TrackerSupp-Daniel »

Hi, fletch

Thank you for the detailed description, I have created a formal feature request for you on this topic:

#5378: FR: Tools "document properties" option to preserve original timestamp

As usual, I cannot make any guarantees of implementation or timelines, but we will be sure to consider this when we are next looking at new feature requests.

Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
fletch
User
Posts: 79
Joined: Wed Mar 11, 2020 2:53 am

Re: OCR'ing existing files / preserving timestamps

Post by fletch »

Thank you.
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8436
Joined: Wed Jan 03, 2018 6:52 pm

OCR'ing existing files / preserving timestamps

Post by TrackerSupp-Daniel »

:)
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
fletch
User
Posts: 79
Joined: Wed Mar 11, 2020 2:53 am

Re: OCR'ing existing files / preserving timestamps

Post by fletch »

I was certainly surprised and excited to read about this in the release notes, so I didn't hesitate one moment to try it. Unfortunately it's not quite working or I'm missing something.
image.png
Each file OCR'd was timestamped with the current date/time.
User avatar
DenisO
Site Admin
Posts: 104
Joined: Fri Jun 09, 2017 5:40 pm

Re: OCR'ing existing files / preserving timestamps

Post by DenisO »

Hi fletch,
it is very strange that OCRed documents were timestamped by current datetime.
I have just tried this PDFTools functionality and it worked as expected
image.png
In attachement you can find the files from test.
abc1.zip
(5.53 KiB) Downloaded 58 times
Could you please export your OCR Tool and place it here or send to support . So we can test it on our side.
Denis Oleksenko
Software Developer
Tracker Software Products (Canada) LTD
fletch
User
Posts: 79
Joined: Wed Mar 11, 2020 2:53 am

Re: OCR'ing existing files / preserving timestamps

Post by fletch »

It would have been more helpful if you had shared the settings you used, like I did.

My guess is that you used "save new with other unique name" ?

That's not feasible. I don't want to create 10,000 duplicates of existing files. I want to OCR "existing" files, leaving their name intact. The only difference on disk is that they are now OCR'd and their original last modified date is preserved.

While preserving the date/time on newly created files is a step in the right direction, it requires that I delete the old 10,000 files (scattered among 70 folders) and change their names back to what they were.

The other program I reference above does the logical operation, it just OCR's the file, leaving the date/time intact. It doesn't require that a new file be created.
User avatar
DenisO
Site Admin
Posts: 104
Joined: Fri Jun 09, 2017 5:40 pm

Re: OCR'ing existing files / preserving timestamps

Post by DenisO »

It would have been more helpful if you had shared the settings you used, like I did.
My guess is that you used "save new with other unique name"
yes, you are right, we've reproduced the issue. it will be fixed in next release.
sorry for inconvenience. :(
Denis Oleksenko
Software Developer
Tracker Software Products (Canada) LTD
fletch
User
Posts: 79
Joined: Wed Mar 11, 2020 2:53 am

Re: OCR'ing existing files / preserving timestamps

Post by fletch »

Thanks for confirming! I'm VERY grateful this enhancement was accepted and implemented so quickly, I can wait a little longer for the paint to dry :D
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8436
Joined: Wed Jan 03, 2018 6:52 pm

OCR'ing existing files / preserving timestamps

Post by TrackerSupp-Daniel »

:)
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
User avatar
DenisO
Site Admin
Posts: 104
Joined: Fri Jun 09, 2017 5:40 pm

Re: OCR'ing existing files / preserving timestamps

Post by DenisO »

hi fletch,
please try the new 351 version of PDFTools.
The issue should be fixed.
Denis Oleksenko
Software Developer
Tracker Software Products (Canada) LTD
fletch
User
Posts: 79
Joined: Wed Mar 11, 2020 2:53 am

Re: OCR'ing existing files / preserving timestamps

Post by fletch »

THE ISSUE IS FIXED ! THANK YOU !
image.png
image1.png
Once again, thanks to everyone for implementing this so quickly :!: Now I can OCR 25 years of PDF's (20,000+) and retain the original modification dates of those files.

THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU

VERY MUCH.
fletch
User
Posts: 79
Joined: Wed Mar 11, 2020 2:53 am

Re: OCR'ing existing files / preserving timestamps

Post by fletch »

First set of files completed...

[1/24/2021] ===== "OCR PaperPort" tool finished: 53 errors, 11,945 files processed, 0 files created, 1 warnings, 1d 2h 34m 53s was spent =====

it's also Interesting to know that I had 53 files that were locked/protected from changes. A mixture of legal documents and credit card statements from Citi and Chase. Not a big problem since they already contain text, as if they were already OCR'd - during creation I suppose.

PDF-Exchange Editor also forbids OCR'ing these particular documents. Seems like OCR'ing the document and storing metadata about it would not be actually modifying the documents original content and should be possible/allowed. Though from a pure technical perspective, you are modifying the document.
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8436
Joined: Wed Jan 03, 2018 6:52 pm

Re: OCR'ing existing files / preserving timestamps

Post by TrackerSupp-Daniel »

Hi, fletch

Glad to hear it is working for you. We do need to respect the document security, and so if it disables editing, OCR will not work. This is more than just from a technical perspective. OCR does not simply alter metadata (for that matter, it actually would not have any impact on the metadata at all) it explicitly modifies the document content, and does not offer a function to do otherwise.

Whether you are running the new Enhanced OCR, which will cut out (modify existing) sections of the image content and place new text (adding additional content) on the page, or you are running the old OCR engine, which will only create searchable text (this is still text content being added to the page, albeit invisible). Adding/changing any content is a modification of content, and thus is not allowed when the document security says so.

Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Post Reply