OCR broken in V8 (used in Windows Explorer)

Forum for the PDF-XChange Editor - Free and Licensed Versions

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
DWC121
User
Posts: 66
Joined: Thu Jul 30, 2015 5:18 am

OCR broken in V8 (used in Windows Explorer)

Post by DWC121 »

Greetings,

I read a posting that there were problems with OCR in Version 8.0 . Perhaps my problem is the same.

Text can be found when the pdf is open. When I search for the text in Windows Explorer, the pdf file is not found. My indexing is turned on. I can find files for pdf's created in previous versions of PDF-XChange, but not for pdf's created in version 8. I have tried both "Enhanced Scanned Pages" and "OCR Page(s)...".

I have attached a document. Hopefully I have removed personal information from the original file. The attachment does NOT have OCR applied to the file.

Luckily I have version V7 on a USB elsewhere. I use it at work for personal things.

Thanks - David
Attachments
2019-03-OCRTEST.pdf
(84.71 KiB) Downloaded 80 times
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8436
Joined: Wed Jan 03, 2018 6:52 pm

Re: OCR broken in V8 (used in Windows Explorer)

Post by TrackerSupp-Daniel »

Hello David,

Thank you for the report, I do not seem to be having these same troubles when searching for OCR'd content. Can you please go through the steps here and see if they help at all?
https://www.pdf-xchange.com/knowle ... extensions

I would suggest checking for updates as well, we have just released build 331 of the Editor.
I also want to ask, why are you running OCR on this document? All of its content is already text content, and is indeed searchable as is.

Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
DWC121
User
Posts: 66
Joined: Thu Jul 30, 2015 5:18 am

Re: OCR broken in V8 (used in Windows Explorer)

Post by DWC121 »

Daniel,

Thank you for the quick response.

Unfortunately I still have the problem. I used the instructions in the link (set to "none", applied, then re-established the options, applied). I downloaded build 331 of the editor.

I made a copy of the original pdf and opened it with PDF X-Change v8. I selected "Document>>OCR Page(s)...". The only option selected is "Ignore comments on page". Output Options set to "Searchable image". After scanning I saved the document. It is attached as 2019-03-OCRTESTv8.pdf . In Windows Explorer I searched for the word "Conklingville" (without the quotes). The pdf was NOT found.

I copied the original pdf and opened it with my portable version of PDF X-Change v7. I selected "Document>>OCR Page(s)...". Accuracy = Medium, Output Type = Preserve Original Content and Add Text Layer. After scanning I saved the document. It is attached as 2019-03-OCRTESTv7.pdf . In Windows Explorer I searched for the word "Conklingville" (without the quotes). The pdf WAS found.

The word "Conklingville" is within a form field. In Windows Explorer I also searched for the word "Please" which appears in the pdf (but not in a form field). Windows explorer found BOTH files.

In the past, I have had to run OCR to get Windows Explorer to index all the words in the pdf including the words in the form fields. Without running OCR, the only words Windows Explorer would find are words NOT in the form fields.

David
Attachments
2019-03-OCRTESTv8.pdf
(88.01 KiB) Downloaded 73 times
2019-03-OCRTESTv7.pdf
(87.85 KiB) Downloaded 74 times
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8436
Joined: Wed Jan 03, 2018 6:52 pm

Re: OCR broken in V8 (used in Windows Explorer)

Post by TrackerSupp-Daniel »

Hello DWC121,

Thank you for the details there. I see what you mean now, and have informed the Dev team about this issue. I cannot speak for when it will be resolved, but you should be able to properly search for attachment names in the near future.

As for searching through form field conten. As running OCR like this is a troublesome way to accomplish that, I have created a formal feature request to allow Windows explorer to search through form fields directly. This should save you a few steps, and keep your files smaller, with less content to sift through. Again I cannot provide a timeline, but it should be coming in the future.

Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
User avatar
Vasyl-Tracker Dev Team
Site Admin
Posts: 2352
Joined: Thu Jun 30, 2005 4:11 pm
Location: Canada

Re: OCR broken in V8 (used in Windows Explorer)

Post by Vasyl-Tracker Dev Team »

Hi David.

The problem occurred because you enabled the option "Ignore comments on page". But text, which you want to find, is inside the widgets (form fields) and you excluded it from the OCR-process.
Also another problem is with our IFilter-implementation that provides data for the system's search-feature - it skips the data from comments and form-fields. As result - the system cannot search inside comments and fields directly. For sure we will fix this issue in the next upcoming build. After that system will be able to search in comments and fields too, even document(s) wasn't OCR-ed before the search..

HTH
Vasyl Yaremyn
Tracker Software Products
Project Developer

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
DWC121
User
Posts: 66
Joined: Thu Jul 30, 2015 5:18 am

Re: OCR broken in V8 (used in Windows Explorer)

Post by DWC121 »

HTH (and Daniel),

Thanks for the reply.

I thought "Ignore comments on page" meant only data in the comment property. Does "Ignore comments on page" mean both comment and content properties?

IFilter... something must have changed. It worked in version 7 for data in form fields.

Before you go searching for a potential problem with the IFilter, let me try v8 again with "Ignore comments on page" UN-checked.

David
User avatar
Vasyl-Tracker Dev Team
Site Admin
Posts: 2352
Joined: Thu Jun 30, 2005 4:11 pm
Location: Canada

Re: OCR broken in V8 (used in Windows Explorer)

Post by Vasyl-Tracker Dev Team »

I thought "Ignore comments on page" meant only data in the comment property. Does "Ignore comments on page" mean both comment and content properties?
Yes, "Ignore comments on page" means ignoring of text-data from comments(and fields) only. Problem is that for example the text "Photo: Two story home in Conklingville before the Conklingville Dam" - is the text-data inside corresponding form field, not from page content...
Vasyl Yaremyn
Tracker Software Products
Project Developer

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
DWC121
User
Posts: 66
Joined: Thu Jul 30, 2015 5:18 am

Re: OCR broken in V8 (used in Windows Explorer)

Post by DWC121 »

After I run OCR, the results (including words in form fields) would appear in the Content pane, so I figured it was "Content" data. I thought "Comments" were something entirely different so that is why I told OCR to ignore "Comments". I learned something today. :D

Today I copied my original file and OCR'd it with "Ignore comments on page" UN-checked. That did the trick. I can now find the words in Windows Explorer. If you want to look at the file, it is attached as 2019-03-OCRTESTv8OCRnoOPT.pdf .

I'm going to have to redo some thinking. The original blank form has no fields filled in. I applied OCR to it when I created the form. When I use the original blank form I copy it and give it a new name, fill in the fields, and apply OCR (again). I can see now I really did not need to OCR the original blank form.

David
Attachments
2019-03-OCRTESTv8OCRnoOPT.pdf
(90.43 KiB) Downloaded 65 times
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: OCR broken in V8 (used in Windows Explorer)

Post by Will - Tracker Supp »

Hi David,

That's correct - OCRing should really only be done once. You should be fine to create the fields then apply the OCR. As Vasyl mentioned, the next release should add the ability to search form fields and commends via Explorer.

Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Post Reply