OCR'ing existing files / preserving timestamps
Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan
OCR'ing existing files / preserving timestamps
I doubt this is possible but I wanted check just in case. I have 20+ years of documents to OCR. As they have migrated from one machine to another over the years the files have retained their original timestamp. This is useful as I can sort by date and easily browse chronologically regardless of the documents name.
It certainly makes sense that OCR is essentially an edit operation and logic would dictate that the file date/time would be updated after being edited. For my scenario, the ideal solution would be an option to preserve the files existing timestamp during the OCR process.
It certainly makes sense that OCR is essentially an edit operation and logic would dictate that the file date/time would be updated after being edited. For my scenario, the ideal solution would be an option to preserve the files existing timestamp during the OCR process.
Re: OCR'ing existing files / preserving timestamps
Hi
may it will be helpful to use %[DocInfo:ModDate] option.
There is button with MacroHelper in the right corner of FileName field Kind regards,
may it will be helpful to use %[DocInfo:ModDate] option.
There is button with MacroHelper in the right corner of FileName field Kind regards,
Denis Oleksenko
Software Developer
Tracker Software Products (Canada) LTD
Software Developer
Tracker Software Products (Canada) LTD
Re: OCR'ing existing files / preserving timestamps
That's an interesting thought but I don't really want to alter the file name.
Below is exactly what I'm looking for - from another tool I have that does OCR. The default setting here is to preserve the original date/time stamp on the file.
Below is exactly what I'm looking for - from another tool I have that does OCR. The default setting here is to preserve the original date/time stamp on the file.
- TrackerSupp-Daniel
- Site Admin
- Posts: 8588
- Joined: Wed Jan 03, 2018 6:52 pm
Re: OCR'ing existing files / preserving timestamps
Hi, fletch
Thank you for the detailed description, I have created a formal feature request for you on this topic:
#5378: FR: Tools "document properties" option to preserve original timestamp
As usual, I cannot make any guarantees of implementation or timelines, but we will be sure to consider this when we are next looking at new feature requests.
Kind regards,
Thank you for the detailed description, I have created a formal feature request for you on this topic:
#5378: FR: Tools "document properties" option to preserve original timestamp
As usual, I cannot make any guarantees of implementation or timelines, but we will be sure to consider this when we are next looking at new feature requests.
Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
- TrackerSupp-Daniel
- Site Admin
- Posts: 8588
- Joined: Wed Jan 03, 2018 6:52 pm
OCR'ing existing files / preserving timestamps
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Re: OCR'ing existing files / preserving timestamps
I was certainly surprised and excited to read about this in the release notes, so I didn't hesitate one moment to try it. Unfortunately it's not quite working or I'm missing something.
Each file OCR'd was timestamped with the current date/time.
Each file OCR'd was timestamped with the current date/time.
Re: OCR'ing existing files / preserving timestamps
Hi fletch,
it is very strange that OCRed documents were timestamped by current datetime.
I have just tried this PDFTools functionality and it worked as expected In attachement you can find the files from test. Could you please export your OCR Tool and place it here or send to support . So we can test it on our side.
it is very strange that OCRed documents were timestamped by current datetime.
I have just tried this PDFTools functionality and it worked as expected In attachement you can find the files from test. Could you please export your OCR Tool and place it here or send to support . So we can test it on our side.
Denis Oleksenko
Software Developer
Tracker Software Products (Canada) LTD
Software Developer
Tracker Software Products (Canada) LTD
Re: OCR'ing existing files / preserving timestamps
It would have been more helpful if you had shared the settings you used, like I did.
My guess is that you used "save new with other unique name" ?
That's not feasible. I don't want to create 10,000 duplicates of existing files. I want to OCR "existing" files, leaving their name intact. The only difference on disk is that they are now OCR'd and their original last modified date is preserved.
While preserving the date/time on newly created files is a step in the right direction, it requires that I delete the old 10,000 files (scattered among 70 folders) and change their names back to what they were.
The other program I reference above does the logical operation, it just OCR's the file, leaving the date/time intact. It doesn't require that a new file be created.
My guess is that you used "save new with other unique name" ?
That's not feasible. I don't want to create 10,000 duplicates of existing files. I want to OCR "existing" files, leaving their name intact. The only difference on disk is that they are now OCR'd and their original last modified date is preserved.
While preserving the date/time on newly created files is a step in the right direction, it requires that I delete the old 10,000 files (scattered among 70 folders) and change their names back to what they were.
The other program I reference above does the logical operation, it just OCR's the file, leaving the date/time intact. It doesn't require that a new file be created.
Re: OCR'ing existing files / preserving timestamps
yes, you are right, we've reproduced the issue. it will be fixed in next release.It would have been more helpful if you had shared the settings you used, like I did.
My guess is that you used "save new with other unique name"
sorry for inconvenience.
Denis Oleksenko
Software Developer
Tracker Software Products (Canada) LTD
Software Developer
Tracker Software Products (Canada) LTD
Re: OCR'ing existing files / preserving timestamps
Thanks for confirming! I'm VERY grateful this enhancement was accepted and implemented so quickly, I can wait a little longer for the paint to dry
- TrackerSupp-Daniel
- Site Admin
- Posts: 8588
- Joined: Wed Jan 03, 2018 6:52 pm
OCR'ing existing files / preserving timestamps
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Re: OCR'ing existing files / preserving timestamps
hi fletch,
please try the new 351 version of PDFTools.
The issue should be fixed.
please try the new 351 version of PDFTools.
The issue should be fixed.
Denis Oleksenko
Software Developer
Tracker Software Products (Canada) LTD
Software Developer
Tracker Software Products (Canada) LTD
Re: OCR'ing existing files / preserving timestamps
THE ISSUE IS FIXED ! THANK YOU !
Once again, thanks to everyone for implementing this so quickly Now I can OCR 25 years of PDF's (20,000+) and retain the original modification dates of those files.
THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU
VERY MUCH.
Once again, thanks to everyone for implementing this so quickly Now I can OCR 25 years of PDF's (20,000+) and retain the original modification dates of those files.
THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU
VERY MUCH.
Re: OCR'ing existing files / preserving timestamps
First set of files completed...
[1/24/2021] ===== "OCR PaperPort" tool finished: 53 errors, 11,945 files processed, 0 files created, 1 warnings, 1d 2h 34m 53s was spent =====
it's also Interesting to know that I had 53 files that were locked/protected from changes. A mixture of legal documents and credit card statements from Citi and Chase. Not a big problem since they already contain text, as if they were already OCR'd - during creation I suppose.
PDF-Exchange Editor also forbids OCR'ing these particular documents. Seems like OCR'ing the document and storing metadata about it would not be actually modifying the documents original content and should be possible/allowed. Though from a pure technical perspective, you are modifying the document.
[1/24/2021] ===== "OCR PaperPort" tool finished: 53 errors, 11,945 files processed, 0 files created, 1 warnings, 1d 2h 34m 53s was spent =====
it's also Interesting to know that I had 53 files that were locked/protected from changes. A mixture of legal documents and credit card statements from Citi and Chase. Not a big problem since they already contain text, as if they were already OCR'd - during creation I suppose.
PDF-Exchange Editor also forbids OCR'ing these particular documents. Seems like OCR'ing the document and storing metadata about it would not be actually modifying the documents original content and should be possible/allowed. Though from a pure technical perspective, you are modifying the document.
- TrackerSupp-Daniel
- Site Admin
- Posts: 8588
- Joined: Wed Jan 03, 2018 6:52 pm
Re: OCR'ing existing files / preserving timestamps
Hi, fletch
Glad to hear it is working for you. We do need to respect the document security, and so if it disables editing, OCR will not work. This is more than just from a technical perspective. OCR does not simply alter metadata (for that matter, it actually would not have any impact on the metadata at all) it explicitly modifies the document content, and does not offer a function to do otherwise.
Whether you are running the new Enhanced OCR, which will cut out (modify existing) sections of the image content and place new text (adding additional content) on the page, or you are running the old OCR engine, which will only create searchable text (this is still text content being added to the page, albeit invisible). Adding/changing any content is a modification of content, and thus is not allowed when the document security says so.
Kind regards,
Glad to hear it is working for you. We do need to respect the document security, and so if it disables editing, OCR will not work. This is more than just from a technical perspective. OCR does not simply alter metadata (for that matter, it actually would not have any impact on the metadata at all) it explicitly modifies the document content, and does not offer a function to do otherwise.
Whether you are running the new Enhanced OCR, which will cut out (modify existing) sections of the image content and place new text (adding additional content) on the page, or you are running the old OCR engine, which will only create searchable text (this is still text content being added to the page, albeit invisible). Adding/changing any content is a modification of content, and thus is not allowed when the document security says so.
Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com