highlight text consistency in PDFs with OCR layer text below image
Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan
highlight text consistency in PDFs with OCR layer text below image
I would like to suggest a feature improvement.
Under certain circumstances, when highlighting text that has been OCR'd with the original image of the text showing on top, the highlight shows in a jagged manner. See the attachment. To improve readibility, the highlight ought to be smooth. See same attachment for how Xodo handles the highlighting in the same document - a lot smoother. TY.
Under certain circumstances, when highlighting text that has been OCR'd with the original image of the text showing on top, the highlight shows in a jagged manner. See the attachment. To improve readibility, the highlight ought to be smooth. See same attachment for how Xodo handles the highlighting in the same document - a lot smoother. TY.
- TrackerSupp-Daniel
- Site Admin
- Posts: 8593
- Joined: Wed Jan 03, 2018 6:52 pm
Re: highlight text consistency in PDFs with OCR layer text below image
Hello, Kunjan
Our Highlight feature is based on the text position and font height, Did you use our OCR process, or was this OCR done by a third party app? I ask becuase our OCR function should generate a more consistent text block, instead of giving the text itself jagged positioning and size, which is what causes this issue.
Kind regards,
Our Highlight feature is based on the text position and font height, Did you use our OCR process, or was this OCR done by a third party app? I ask becuase our OCR function should generate a more consistent text block, instead of giving the text itself jagged positioning and size, which is what causes this issue.
Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Re: highlight text consistency in PDFs with OCR layer text below image
Hello Daniel
I believe it was OCR'd via FileCenter DMS software.
I ran the OCR function in PDF X Change Editor and yes, the highlighted text is no longer jagged. And it's a pleasant surprise that none of my previously highlighted text was "un-highlighted" in the process. Thank you.
It's interesting that Xodo managed however. If you are curious I am happy to send you a few pages.
Thanks again.
I believe it was OCR'd via FileCenter DMS software.
I ran the OCR function in PDF X Change Editor and yes, the highlighted text is no longer jagged. And it's a pleasant surprise that none of my previously highlighted text was "un-highlighted" in the process. Thank you.
It's interesting that Xodo managed however. If you are curious I am happy to send you a few pages.
Thanks again.
- TrackerSupp-Daniel
- Site Admin
- Posts: 8593
- Joined: Wed Jan 03, 2018 6:52 pm
Re: highlight text consistency in PDFs with OCR layer text below image
Hello, Kunjan
It wouldnt hurt to have a sample to take a look at, best case scenario, the Devs may be able to find a way to keep our highlights a bit more consistent after seeing this.
Kind regards,
It wouldnt hurt to have a sample to take a look at, best case scenario, the Devs may be able to find a way to keep our highlights a bit more consistent after seeing this.
Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Re: highlight text consistency in PDFs with OCR layer text below image
Hello Daniel
Please see attached. Regards, Kunjan
Please see attached. Regards, Kunjan
- TrackerSupp-Daniel
- Site Admin
- Posts: 8593
- Joined: Wed Jan 03, 2018 6:52 pm
Re: highlight text consistency in PDFs with OCR layer text below image
Hello, Kunjan
Thank you for the files, the quality of the filecenter OCR on this, is quite frankly, extremely bad, in this screenshot we have turned the invisible "searchable" text visible ontop of the scanned page for comparison: The "off center" text you see in this screenshot is why the output is jagged, and unfortunately there is little we can do here. In the past we have seen complaints that the OCR was generalizing too much, when we took Xodo's approach and normalized the highlight across an entire line of text, because many users (especially those in mathematics/science fields, as well as authors/editors) would find highlight any equations, or "drop caps" in publishing (respectively), would cause multiple lines of text to be highlighted if they are present. As such, it is not likely we will be able to make any changes to this without reverting that critical change.
The best solution for you here would be that when you notice such issues, you re-do the OCR on the file with a more capable engine (or possibly just avoid using the Filecenter OCR entirely and use a more powerful engine from the start) to minimize how often you will encounter this. I have passed this along to the Dev team, and they are discussing it right now, but at this time, it is not looking likely that any changes will be made. I just wanted to give you an honest answer, instead of instilling false hope.
Kind regards,
Thank you for the files, the quality of the filecenter OCR on this, is quite frankly, extremely bad, in this screenshot we have turned the invisible "searchable" text visible ontop of the scanned page for comparison: The "off center" text you see in this screenshot is why the output is jagged, and unfortunately there is little we can do here. In the past we have seen complaints that the OCR was generalizing too much, when we took Xodo's approach and normalized the highlight across an entire line of text, because many users (especially those in mathematics/science fields, as well as authors/editors) would find highlight any equations, or "drop caps" in publishing (respectively), would cause multiple lines of text to be highlighted if they are present. As such, it is not likely we will be able to make any changes to this without reverting that critical change.
The best solution for you here would be that when you notice such issues, you re-do the OCR on the file with a more capable engine (or possibly just avoid using the Filecenter OCR entirely and use a more powerful engine from the start) to minimize how often you will encounter this. I have passed this along to the Dev team, and they are discussing it right now, but at this time, it is not looking likely that any changes will be made. I just wanted to give you an honest answer, instead of instilling false hope.
Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Re: highlight text consistency in PDFs with OCR layer text below image
Hello Daniel
Wow, that is bad...!!! Thankfully it's easy to re-OCR using the Tracker engine without losing highlights.
Thank you for the extra information. It makes a big difference as one relies on windows indexing to index the text.
May I ask, how you made the text visible?
Regards, Kunjan
Wow, that is bad...!!! Thankfully it's easy to re-OCR using the Tracker engine without losing highlights.
Thank you for the extra information. It makes a big difference as one relies on windows indexing to index the text.
May I ask, how you made the text visible?
Regards, Kunjan
-
- User
- Posts: 2393
- Joined: Wed Jan 18, 2006 12:10 pm
Re: highlight text consistency in PDFs with OCR layer text below image
How to make the OCR-ed text visible ?
- Activate the Content-pane (via the View-ribbon > Panes > Content)
- In the Content-pane, in the toolbar, click Options > Select > Text
- Go into the Text/Arrange tab/ribbon that now appears
- Change the "Fill Color" (first icon in the ribbon) from 'none' to a color you like to see for the text
Kind regards.
- Activate the Content-pane (via the View-ribbon > Panes > Content)
- In the Content-pane, in the toolbar, click Options > Select > Text
- Go into the Text/Arrange tab/ribbon that now appears
- Change the "Fill Color" (first icon in the ribbon) from 'none' to a color you like to see for the text
Kind regards.
- TrackerSupp-Daniel
- Site Admin
- Posts: 8593
- Joined: Wed Jan 03, 2018 6:52 pm
highlight text consistency in PDFs with OCR layer text below image
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Re: highlight text consistency in PDFs with OCR layer text below image
TY Willy and Daniel
- Paul - Tracker Supp
- Site Admin
- Posts: 6897
- Joined: Wed Mar 25, 2009 10:37 pm
- Location: Chemainus, Canada
- Contact:
highlight text consistency in PDFs with OCR layer text below image
Best regards
Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com