Page 1 of 1

OCR broken in V8 (used in Windows Explorer)

Posted: Tue Apr 23, 2019 1:00 pm
by DWC121
Greetings,

I read a posting that there were problems with OCR in Version 8.0 . Perhaps my problem is the same.

Text can be found when the pdf is open. When I search for the text in Windows Explorer, the pdf file is not found. My indexing is turned on. I can find files for pdf's created in previous versions of PDF-XChange, but not for pdf's created in version 8. I have tried both "Enhanced Scanned Pages" and "OCR Page(s)...".

I have attached a document. Hopefully I have removed personal information from the original file. The attachment does NOT have OCR applied to the file.

Luckily I have version V7 on a USB elsewhere. I use it at work for personal things.

Thanks - David

Re: OCR broken in V8 (used in Windows Explorer)

Posted: Tue Apr 23, 2019 5:02 pm
by TrackerSupp-Daniel
Hello David,

Thank you for the report, I do not seem to be having these same troubles when searching for OCR'd content. Can you please go through the steps here and see if they help at all?
https://www.pdf-xchange.com/knowle ... extensions

I would suggest checking for updates as well, we have just released build 331 of the Editor.
I also want to ask, why are you running OCR on this document? All of its content is already text content, and is indeed searchable as is.

Kind regards,

Re: OCR broken in V8 (used in Windows Explorer)

Posted: Wed Apr 24, 2019 1:32 am
by DWC121
Daniel,

Thank you for the quick response.

Unfortunately I still have the problem. I used the instructions in the link (set to "none", applied, then re-established the options, applied). I downloaded build 331 of the editor.

I made a copy of the original pdf and opened it with PDF X-Change v8. I selected "Document>>OCR Page(s)...". The only option selected is "Ignore comments on page". Output Options set to "Searchable image". After scanning I saved the document. It is attached as 2019-03-OCRTESTv8.pdf . In Windows Explorer I searched for the word "Conklingville" (without the quotes). The pdf was NOT found.

I copied the original pdf and opened it with my portable version of PDF X-Change v7. I selected "Document>>OCR Page(s)...". Accuracy = Medium, Output Type = Preserve Original Content and Add Text Layer. After scanning I saved the document. It is attached as 2019-03-OCRTESTv7.pdf . In Windows Explorer I searched for the word "Conklingville" (without the quotes). The pdf WAS found.

The word "Conklingville" is within a form field. In Windows Explorer I also searched for the word "Please" which appears in the pdf (but not in a form field). Windows explorer found BOTH files.

In the past, I have had to run OCR to get Windows Explorer to index all the words in the pdf including the words in the form fields. Without running OCR, the only words Windows Explorer would find are words NOT in the form fields.

David

Re: OCR broken in V8 (used in Windows Explorer)

Posted: Wed Apr 24, 2019 5:56 pm
by TrackerSupp-Daniel
Hello DWC121,

Thank you for the details there. I see what you mean now, and have informed the Dev team about this issue. I cannot speak for when it will be resolved, but you should be able to properly search for attachment names in the near future.

As for searching through form field conten. As running OCR like this is a troublesome way to accomplish that, I have created a formal feature request to allow Windows explorer to search through form fields directly. This should save you a few steps, and keep your files smaller, with less content to sift through. Again I cannot provide a timeline, but it should be coming in the future.

Kind regards,

Re: OCR broken in V8 (used in Windows Explorer)

Posted: Wed Apr 24, 2019 6:56 pm
by Vasyl-Tracker Dev Team
Hi David.

The problem occurred because you enabled the option "Ignore comments on page". But text, which you want to find, is inside the widgets (form fields) and you excluded it from the OCR-process.
Also another problem is with our IFilter-implementation that provides data for the system's search-feature - it skips the data from comments and form-fields. As result - the system cannot search inside comments and fields directly. For sure we will fix this issue in the next upcoming build. After that system will be able to search in comments and fields too, even document(s) wasn't OCR-ed before the search..

HTH

Re: OCR broken in V8 (used in Windows Explorer)

Posted: Wed Apr 24, 2019 7:15 pm
by DWC121
HTH (and Daniel),

Thanks for the reply.

I thought "Ignore comments on page" meant only data in the comment property. Does "Ignore comments on page" mean both comment and content properties?

IFilter... something must have changed. It worked in version 7 for data in form fields.

Before you go searching for a potential problem with the IFilter, let me try v8 again with "Ignore comments on page" UN-checked.

David

Re: OCR broken in V8 (used in Windows Explorer)

Posted: Wed Apr 24, 2019 7:56 pm
by Vasyl-Tracker Dev Team
I thought "Ignore comments on page" meant only data in the comment property. Does "Ignore comments on page" mean both comment and content properties?
Yes, "Ignore comments on page" means ignoring of text-data from comments(and fields) only. Problem is that for example the text "Photo: Two story home in Conklingville before the Conklingville Dam" - is the text-data inside corresponding form field, not from page content...

Re: OCR broken in V8 (used in Windows Explorer)

Posted: Wed Apr 24, 2019 11:48 pm
by DWC121
After I run OCR, the results (including words in form fields) would appear in the Content pane, so I figured it was "Content" data. I thought "Comments" were something entirely different so that is why I told OCR to ignore "Comments". I learned something today. :D

Today I copied my original file and OCR'd it with "Ignore comments on page" UN-checked. That did the trick. I can now find the words in Windows Explorer. If you want to look at the file, it is attached as 2019-03-OCRTESTv8OCRnoOPT.pdf .

I'm going to have to redo some thinking. The original blank form has no fields filled in. I applied OCR to it when I created the form. When I use the original blank form I copy it and give it a new name, fill in the fields, and apply OCR (again). I can see now I really did not need to OCR the original blank form.

David

Re: OCR broken in V8 (used in Windows Explorer)

Posted: Thu Apr 25, 2019 9:54 am
by Will - Tracker Supp
Hi David,

That's correct - OCRing should really only be done once. You should be fine to create the fields then apply the OCR. As Vasyl mentioned, the next release should add the ability to search form fields and commends via Explorer.

Cheers,