PDF XChange Forum

Posted: **Fri Aug 17, 2018 2:47 pm**

I am evaluating your CoreAPI SDK, I have downloaded the CoreAPIDemo from github -- which provides much useful insight and a framework within which I can do my evaluation. My primary interest is extracting the entire text from a PDF, paragraph-by-paragraph. I will be doing further processing on the text for each paragraph. Can you point me in the direction of useful resources or specific API calls to accomplish this? Thank you.

Posted: **Sat Aug 18, 2018 5:47 am**

Hello ddinnebeil,

Please check out the "9.3. Convert from PDF to txt file" sample. It will visually output (like you see it on screen) the text into the txt file.
Also, you can obtain each separate character from the IPXC_PageText by using the information provided by

Code: Select all

Text.GetChars(textsLineInfo[i].nFirstCharIndex, textsLineInfo[i].nCharsCount)

And you can see where is the end line character so that you can build paragraphs and implement your own logic.

Cheers,
Alex

PDF XChange Forum

Extract Text from PDF

Extract Text from PDF

Re: Extract Text from PDF