Find vs Search for renderable text

Forum for the PDF-XChange Editor - Free and Licensed Versions

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
janeg
User
Posts: 14
Joined: Wed Jun 14, 2017 9:03 pm

Find vs Search for renderable text

Post by janeg »

I have noticed that with some files that have renderable text that the find function will find selected text but the search feature will not. Any idea what is causing this? I'm using identical search terms. If I OCR the pages in question, then search will work. But it seems silly to have to run OCR on something that obviously already has text attached to it somewhere (otherwise find wouldn't work either). Is this a layering issue? Do I need to flatten the file somehow?

Thanks!
Jane
User avatar
Patrick-Tracker Supp
Site Admin
Posts: 1645
Joined: Thu Mar 27, 2014 6:14 pm
Location: Vancouver Island
Contact:

Re: Find vs Search for renderable text

Post by Patrick-Tracker Supp »

Hi Jane,

Thank you for your post. I suspect this may be a settings factor, but cannot be certain. Please send us the file in question so that we may investigate this issue.

Thank you!
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Cheers,

Patrick Charest
Tracker Support North America
janeg
User
Posts: 14
Joined: Wed Jun 14, 2017 9:03 pm

Re: Find vs Search for renderable text

Post by janeg »

I can't share the doc due to confidentiality issues, but I will explain it a bit and hopefully you'll be able to follow.

I have been further investigating and think I have found out *why* it's happening, but am hoping there is a better work-around than OCR'ing the text.

I looked more closely at the content pane view. The native document is reading as text, but the program is arbitrarily grouping the words into separate text boxes. I'm searching for the phrase "copy of this report," but it ends up spreading itself between multiple text boxes with "copy of" in one box, this in a box by itself, and report in a box with words from the next sentence.

After OCRing, a second set of text boxes appear in the content pane and they group the words by line, so the phrase is together and can be found.

Before OCR, if I use the find function for the phrase surrounded by quotes, it can't be found. But without quotes, it finds it. After OCR, the find function locates the text phrase with or without quotes. Using search, if I use the any of these words box and surround the phrase with quotes, it can't be found. If I don't use quotes, it finds the words individually, but the words copy, of, this, and report appear hundreds of times so the search is relatively meaningless. I need to be able to find it as a phrase. After OCR, searching for the phrase surrounded by quotes in the any of these words box works as expected.

I need to be able to search for phrases and use the any of these [phrases] box because the text phrase I'm looking for might appear in a variety of ways. For example "copy of the report", "copy of your report", "copy of report", etc. I don't want to set up multiple searches for each of the ways the phrase might appear, but rather one with all the possibilities.

Is there something in the settings that can force the search function to look in adjacent text boxes?

This would also help with another issue I found which is that, even after OCR, search can't find text phrases if they wrap to the next line. I assume now this is because each line is in its own text box and the search can't look between them when you bind the search phrase in quotation marks.

I hope this helps to clarify the situation. Let me know if you still have questions and thanks so much for investigating!

Jane
Willy Van Nuffel
User
Posts: 2347
Joined: Wed Jan 18, 2006 12:10 pm

Re: Find vs Search for renderable text

Post by Willy Van Nuffel »

Did you already try if works better by changing the "Proximity" setting in the Search pane (via the Options... button)?

There, you can make a choice between:
- Only adjacent words
- Words from the same paragraph
- Words from the same page
- Words from the same document

Based on your very clear description, it seems obvious that the problem is due to the internal structure of the PDF.
I suppose you can also detect the different text boxes via the "Edit Content" feature of PDF-XChange Editor.
janeg
User
Posts: 14
Joined: Wed Jun 14, 2017 9:03 pm

Re: Find vs Search for renderable text

Post by janeg »

Thanks for your response, Willy.
Willy Van Nuffel wrote:Did you already try if works better by changing the "Proximity" setting in the Search pane (via the Options... button)?
Thanks for the suggestion. I've tried a bit and using the proximity setting for paragraph is somewhat useful, but doesn't seem to deal with the fact that I need to account for multiple slight variations in the phrase(s) I'm searching for. Plus, even excluding words like "the", some of the words are too common and make for too many hits.
Willy Van Nuffel wrote:Based on your very clear description, it seems obvious that the problem is due to the internal structure of the PDF.
I suppose you can also detect the different text boxes via the "Edit Content" feature of PDF-XChange Editor.
I can detect, but how would this help me search more effectively? We're talking about documents that are hundreds of pages.

Thanks!
Jane
Willy Van Nuffel
User
Posts: 2347
Joined: Wed Jan 18, 2006 12:10 pm

Re: Find vs Search for renderable text

Post by Willy Van Nuffel »

The last point was just a means to see the problem, not to resolve it.

Maybe, at this moment, the best solution is to OCR the document(s).
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17824
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Find vs Search for renderable text

Post by Tracker Supp-Stefan »

Hi All,

Thanks for the clear description janeg.
Thanks for the help Willy!

Janeg - you have already said that the files are sensitive - but can you send one of those to support@pdf-xchange.com with a link back to this topic?
This way the file won't be posted in the public forums, and we guarantee that it will only be used for testing, and then destroyed completely (including the e-mail attttachment on the file server). So is this an option you can consider?

Regards,
Stefan
janeg
User
Posts: 14
Joined: Wed Jun 14, 2017 9:03 pm

Re: Find vs Search for renderable text

Post by janeg »

I have e-mailed a redacted example page. Please let me know if you have any issues receiving or questions.

Thanks!
Jane
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17824
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Find vs Search for renderable text

Post by Tracker Supp-Stefan »

Hi Jane,

Thanks for the sample file.
To me it seems like all is working as it should. I've given you some images with the settings that I used - and the results I got. If it does not work the same for you - please check the version of the Editor to ensure you have 322.5 installed as this will also help with resolving the problem.

Regards,
Stefan
Post Reply