Hello all,
in the software that our company makes, PDF-XChange Editor SDK is used to display PDF files, generally to everyone's satisfaction.
Recently, customer feedback has made us aware of an issue that we have displaying certain auto-generated documents with the Editor SDK (also reproducible with the standalone Editor). These PDF documents result from scans processed by OCR software. A sample document (showing no customer data but that of our own company) is attached.
In these documents, the OCR software has turned some table borders (vertical ones, in particular) into very high text blocks. Because of that, attempting to manually select the text of certain lines with the text selection tool results in highlighted areas much higher than the actual text, and in certain cases then you can't avoid selecting other, unwanted text along with it, or you are prevented from selecting the text you actually want because it is "hidden" "behind" another text block. Other PDF display software (Adobe, Chrome, Firefox...) we tested doesn't show this behavior.
In the example file: Try selecting the line containing the word "Gesamtbetrag". Then try just selecting the total "1703,05 €".
If we manually edit the file with the PDF-XChange Editor (via the content pane), we can access the various text blocks of the text layer that the OCR software has added. Some of these have no (printable, non-whitespace) text content, and selecting these in the content pane, we can see that these each align with one of the table's grid lines. If we delete these manually (in particular the ones aligning with vertical grid lines), then afterwards selecting text works as expected. But manually editing the PDF files is of course not a viable solution for our customers.
Is there a fix for this behavior, or maybe an option to disregard text blocks without printable content for manual text selection?
Thanks for any assistance.
Issue with text selection and text blocks
Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan
Forum rules
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.
When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.
When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
- Vasyl-Tracker Dev Team
- Site Admin
- Posts: 2353
- Joined: Thu Jun 30, 2005 4:11 pm
- Location: Canada
Re: Issue with text selection and text blocks
Hi SMan.
Sorry for delay with answer.
As I understand, the example document OCR'ed without using the Editor's OCR? Because with our OCR we couldn't get the same wrong text-content as it is in your example.pdf...
Anyway - seems we have trouble with our text selection mechanism on some 'strange' text contents. Will try to fix it on the near future.
Cheers.
Sorry for delay with answer.
As I understand, the example document OCR'ed without using the Editor's OCR? Because with our OCR we couldn't get the same wrong text-content as it is in your example.pdf...
Anyway - seems we have trouble with our text selection mechanism on some 'strange' text contents. Will try to fix it on the near future.
Cheers.
Vasyl Yaremyn
Tracker Software Products
Project Developer
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
Tracker Software Products
Project Developer
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
Re: Issue with text selection and text blocks
Hello Vasyl,
Thank you for your reply. Yes, the OCR was done by third party software, not the Editor's OCR.
Looking forward to the fix in the XChange Editor (SDK).
Best regards,
Sven (SMan)
Thank you for your reply. Yes, the OCR was done by third party software, not the Editor's OCR.
Looking forward to the fix in the XChange Editor (SDK).
Best regards,
Sven (SMan)
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: Issue with text selection and text blocks
Hello SMan,
Well the problem is in the incorrect text blocks that were created by the 3rd party OCR software - thus the text selection does not work as intended. Please try using our OCR engine for this - I'm sure it will give better results.
Other then that - we'll have to wait for Vasyl on this one.
Cheers,
Alex
Well the problem is in the incorrect text blocks that were created by the 3rd party OCR software - thus the text selection does not work as intended. Please try using our OCR engine for this - I'm sure it will give better results.
Other then that - we'll have to wait for Vasyl on this one.
Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ