Issue with text selection and text blocks

PDF-XChange Editor SDK for Developers

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Forum rules
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.

When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
Post Reply
SMan
User
Posts: 23
Joined: Tue May 15, 2018 10:18 am

Issue with text selection and text blocks

Post by SMan »

Hello all,

in the software that our company makes, PDF-XChange Editor SDK is used to display PDF files, generally to everyone's satisfaction.

Recently, customer feedback has made us aware of an issue that we have displaying certain auto-generated documents with the Editor SDK (also reproducible with the standalone Editor). These PDF documents result from scans processed by OCR software. A sample document (showing no customer data but that of our own company) is attached.

In these documents, the OCR software has turned some table borders (vertical ones, in particular) into very high text blocks. Because of that, attempting to manually select the text of certain lines with the text selection tool results in highlighted areas much higher than the actual text, and in certain cases then you can't avoid selecting other, unwanted text along with it, or you are prevented from selecting the text you actually want because it is "hidden" "behind" another text block. Other PDF display software (Adobe, Chrome, Firefox...) we tested doesn't show this behavior.

In the example file: Try selecting the line containing the word "Gesamtbetrag". Then try just selecting the total "1703,05 €".

If we manually edit the file with the PDF-XChange Editor (via the content pane), we can access the various text blocks of the text layer that the OCR software has added. Some of these have no (printable, non-whitespace) text content, and selecting these in the content pane, we can see that these each align with one of the table's grid lines. If we delete these manually (in particular the ones aligning with vertical grid lines), then afterwards selecting text works as expected. But manually editing the PDF files is of course not a viable solution for our customers.

Is there a fix for this behavior, or maybe an option to disregard text blocks without printable content for manual text selection?

Thanks for any assistance.

example.pdf
(272.39 KiB) Downloaded 65 times
User avatar
Vasyl-Tracker Dev Team
Site Admin
Posts: 2351
Joined: Thu Jun 30, 2005 4:11 pm
Location: Canada

Re: Issue with text selection and text blocks

Post by Vasyl-Tracker Dev Team »

Hi SMan.

Sorry for delay with answer.

As I understand, the example document OCR'ed without using the Editor's OCR? Because with our OCR we couldn't get the same wrong text-content as it is in your example.pdf...

Anyway - seems we have trouble with our text selection mechanism on some 'strange' text contents. Will try to fix it on the near future.

Cheers.
Vasyl Yaremyn
Tracker Software Products
Project Developer

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
SMan
User
Posts: 23
Joined: Tue May 15, 2018 10:18 am

Re: Issue with text selection and text blocks

Post by SMan »

Hello Vasyl,

Thank you for your reply. Yes, the OCR was done by third party software, not the Editor's OCR.

Looking forward to the fix in the XChange Editor (SDK).

Best regards,
Sven (SMan)
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Issue with text selection and text blocks

Post by Sasha - Tracker Dev Team »

Hello SMan,

Well the problem is in the incorrect text blocks that were created by the 3rd party OCR software - thus the text selection does not work as intended. Please try using our OCR engine for this - I'm sure it will give better results.
Other then that - we'll have to wait for Vasyl on this one.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
Post Reply