PDF text with character by character

PDF-XChange Editor SDK for Developers

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Forum rules
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.

When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
Post Reply
prasantha
User
Posts: 43
Joined: Fri Jun 24, 2022 8:15 am

PDF text with character by character

Post by prasantha »

Hi
I have a PDF document which has been OCR handled. I can see texts inside the container in the content view as below.

Container<Div> : Text("t h i s i s a t e s t d o c u m e n t")
Tt t
Tt h
Tt i
Tt s
Tt i
Tt s
Tt t
Tt e
Tt s
Tt t
Tt d
Tt t.......
this text is shown in the pdf as "this is a test document"
I am reading this text using c# application ( Text_GetText()), but as this text is given in character by character i am unable to make the whole text correctly. becuase i suppose to add space between each text element.

So can anayone let me know how can i read the contianer text once without iterating though each texts inside the container.

Thank you
Prasantha
User avatar
Ivan - Tracker Software
Site Admin
Posts: 3549
Joined: Thu Jul 08, 2004 10:36 pm
Location: Vancouver Island - Canada
Contact:

Re: PDF text with character by character

Post by Ivan - Tracker Software »

Can you send us the document?
Tracker Software (Project Director)

When attaching files to any message - please ensure they are archived and posted as a .ZIP, .RAR or .7z format - or they will not be posted - thanks.
prasantha
User
Posts: 43
Joined: Fri Jun 24, 2022 8:15 am

Re: PDF text with character by character

Post by prasantha »

HI I cannot provide the original document because of the sensitivity of the document, But i have attached a sample document here it does not have character by character but it contains the texts with splitted the word.
image.png
.

1) As shown in the image the word "large" is splitted into two different texxt segments "...... l" and "arge".
file is attached as 1.pdf
1.pdf
(52.09 KiB) Downloaded 60 times
2)
image(1).png
As shown int he image it is splitted the word into two seperate words.
I have attache the file for this as well 2,pdf
2.pdf
(53.66 KiB) Downloaded 69 times
I tried to retrieve the whole text using GetPageText of the page, in this case it ommited all the line breaks so in concat the two lines.

But i can see that as an end user using pdf excange view , by selecting the page text i can copy whole text as expected without any issue.(copy and past into the notepad works as expected, )

can you please look this?
User avatar
Vasyl-Tracker Dev Team
Site Admin
Posts: 2352
Joined: Thu Jun 30, 2005 4:11 pm
Location: Canada

Re: PDF text with character by character

Post by Vasyl-Tracker Dev Team »

Hi Prasantha.

You may try to use the IPXC_Page::GetText feature. Please look there:

https://github.com/tracker-software/PDF ... RoboReader

Tip: also please look to IPXC_PageText::LinesCount, IPXC_PageText::LineInfo - it will allow you to get information about text lines composed by the Editor from pdf-content..

HTH.
Vasyl Yaremyn
Tracker Software Products
Project Developer

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
Post Reply