Get text in annotation

PDF-XChange Editor SDK for Developers

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Sean - Tracker, Paul - Tracker Supp, Chris - Tracker Supp, Tracker Supp-Stefan, Ivan - Tracker Software

Forum rules
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.

When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
Post Reply
ZY_BODS
User
Posts: 7
Joined: Wed Jan 15, 2020 1:33 am

Get text in annotation

Post by ZY_BODS » Wed Jan 15, 2020 2:11 am

Hi,

I want to get the text inside the annotation after user draw the annotation like rectangle.
I am using VB.net and I already able to get the coordinate for the annotation after user drawing it. Below is the coding I trying now.

If the method we using is wrong can get some example how to get the text, we are struggling this problem quite a long time.

Thanks

Code: Select all

'declare current page 
Dim doc As IPXV_Document = pdfCtl.Doc
Dim pl As PDFXEdit.IPXV_PagesLayoutManager = doc.ActiveView.PagesView.Layout
Dim pageNr = pl.CurrentPage
Dim curPage As PDFXEdit.IPXC_Page = doc.CoreDoc.Pages(pageNr)

'set the rect coordination
Dim rect As PXC_Rect
rect.left = dblLeft
rect.right = dblRight
rect.top = dblTop
rect.bottom = dblBottom
 
'declare for QCR (bottom part is no working we assume should assign the OCR to document we declare at upper part)
Dim OCRExt As IPXV_OCRExtension
Dim OCRRegions As IPXV_OCRRegions
Dim OCRRegion As IPXV_OCRRegion
Dim OCRTask As IPXV_OCRTask
Dim OCRResult As IPXV_OCRResult

OCRRegions = OCRExt.CreateOCRRegions()
OCRRegion = OCRRegions.InsertNew(PXV_OCRBlockType.PXV_OCRBlock_Text, rect)
OCRTask = OCRExt.CreateNewTask()
OCRResult = OCRTask.ProceedPage(curPage, OCRRegion, ,)
txtResult.Text = OCRResult.GetText(0, 0, )

User avatar
Sasha - Tracker Dev Team
User
Posts: 5033
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Get text in annotation

Post by Sasha - Tracker Dev Team » Wed Jan 15, 2020 11:04 am

Hello ZY_BODS,

Is that text an image or a physical text that you can select with the Select Text tool?

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

ZY_BODS
User
Posts: 7
Joined: Wed Jan 15, 2020 1:33 am

Re: Get text in annotation

Post by ZY_BODS » Thu Jan 16, 2020 1:22 am

Sasha - Tracker Dev Team wrote:
Wed Jan 15, 2020 11:04 am
Hello ZY_BODS,

Is that text an image or a physical text that you can select with the Select Text tool?

Cheers,
Alex
Hi thanks for replay,

Both condition may occur.

Below is example, what is we will get should be .618±.020
Attachments
image.png
image.png (6.78 KiB) Viewed 1188 times

User avatar
Sasha - Tracker Dev Team
User
Posts: 5033
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Get text in annotation

Post by Sasha - Tracker Dev Team » Fri Jan 17, 2020 12:40 pm

Hello ZY_BODS,

This can help:
viewtopic.php?f=66&t=32582&p=133499&hil ... xt#p133499

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

ZY_BODS
User
Posts: 7
Joined: Wed Jan 15, 2020 1:33 am

Re: Get text in annotation

Post by ZY_BODS » Fri Jan 31, 2020 7:11 am

From the example you giving we only can get the text when we know which line is the text is but for my problem, we do not know which line is the text allocate what we had is the coordinate of the rect.

zarkogajic
User
Posts: 679
Joined: Thu Sep 05, 2019 12:35 pm

Re: Get text in annotation

Post by zarkogajic » Fri Jan 31, 2020 8:52 am

Hi,

This is what you need to do: viewtopic.php?f=66&t=33540#p138629

-žarko

ZY_BODS
User
Posts: 7
Joined: Wed Jan 15, 2020 1:33 am

Re: Get text in annotation

Post by ZY_BODS » Fri Jan 31, 2020 9:47 am

Hi thanks for your reply,

but from the link i faced same problem I cannot get the line index for the rectangle.

zarkogajic
User
Posts: 679
Joined: Thu Sep 05, 2019 12:35 pm

Re: Get text in annotation

Post by zarkogajic » Fri Jan 31, 2020 10:48 am

Hi,
Get the CharRect of every char and "build" up the rectangle - and finally if the rectangle is inside annotation's rectangle - this is the text you're after...
Here's some Delphi code - I guess you would be able to convert this to the language of your choice:

Code: Select all

function TPXC_PageWraper.ExtractPageText(const fromRect: PXC_Rect): string;
const
  DELTA = 2;
var
  lineString : string;
  i, j, k : integer;
  tli : PXC_TextLineInfo;
  lineChars, nextChar : WideString;
  fcRect, charRect : PXC_Rect;
begin
  result := '';

  for i := 0 to -1 + fPageLineCount do
  begin
    fPageText.Get_LineInfo(i, tli);

    fPageText.GetChars(tli.nFirstCharIndex, tli.nCharsCount, lineChars);

    lineString := Trim(StringReplace(lineChars, ' ', '', [rfReplaceAll, rfIgnoreCase]));

    if lineString <> '' then
    begin
      fPageText.Get_CharRect(tli.nFirstCharIndex, fcRect);

      if (fcRect.top <= fromRect.top + DELTA) AND (fcRect.bottom >= fromRect.bottom - DELTA) then
      begin
        if result <> '' then result := result + ' ';

        for j := tli.nFirstCharIndex to tli.nFirstCharIndex + tli.nCharsCount do
        begin
          fPageText.Get_CharRect(j, charRect); //first char in line - could be lower than the rest

          if charRect.left >= fromRect.left - DELTA then
          begin
            k := j;
            while (charRect.right <= fromRect.right + DELTA) AND (k < tli.nFirstCharIndex + tli.nCharsCount) do
            begin
              fPageText.GetChars(k, 1, nextChar);

              result := result + nextChar;

              Inc(k);
              fPageText.Get_CharRect(k, charRect);
            end;
            Break; //for j
          end;
        end;
      end;
    end;
  end;

  result := Trim(result);
end;
The TPXC_PageWraper is my custom class to wrap IPXC_Page.

Constructor gets the page text and line count (used above)

Code: Select all

constructor TPXC_PageWraper.Create(const thePage: IPXC_Page);
begin
  fPage := thePage;

  fPage.GetText(nil, true, fPageText);

  fPageText.Get_LinesCount(fPageLineCount);
end;
HTH.

-žarko

ZY_BODS
User
Posts: 7
Joined: Wed Jan 15, 2020 1:33 am

Re: Get text in annotation

Post by ZY_BODS » Mon Feb 03, 2020 6:14 am

Thanks, I get it for searchable text. Do you have any idea for non searchable text? I had some try I believed should using OCR module but I am not sure how to link with OCR module and pdf current page.

User avatar
Sasha - Tracker Dev Team
User
Posts: 5033
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Get text in annotation

Post by Sasha - Tracker Dev Team » Mon Feb 03, 2020 9:01 am

Hello ZY_BODS,

Please refer to these topics - they should help:
viewtopic.php?f=66&t=30356
viewtopic.php?f=66&t=33535

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

ZY_BODS
User
Posts: 7
Joined: Wed Jan 15, 2020 1:33 am

Re: Get text in annotation

Post by ZY_BODS » Thu Feb 06, 2020 4:33 am

I tried faced error below is the code I using.

Code: Select all

	Dim nId As Integer = pdfCtl.Inst.Str2ID("op.document.OCRPages", False)
        Dim Op As IOperation = pdfCtl.Inst.CreateOp(nId)
        Dim Input As ICabNode = Op.Params.Root
        Input("Input").v = pdfCtl.Doc

        Dim options As ICabNode = Input("Options")

        options("PagesRange.Type").v = "All"
        options("OutputType").v = 0
        options("OutputDPI").v = 300

        pdfCtl.Inst.AsyncDoAndWaitForFinish(Op) //hit error here
the error as below

System.Runtime.InteropServices.COMException: 'Error HRESULT E_FAIL has been returned from a call to a COM component.'

User avatar
Sasha - Tracker Dev Team
User
Posts: 5033
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Get text in annotation

Post by Sasha - Tracker Dev Team » Tue Feb 11, 2020 9:01 am

Hello ZY_BODS,

And what if you do Op.Do() instead of the AsyncDoAndWaitForFinish method?

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

ZY_BODS
User
Posts: 7
Joined: Wed Jan 15, 2020 1:33 am

Re: Get text in annotation

Post by ZY_BODS » Tue Feb 11, 2020 9:40 am

Hi Alex,

I getting the same error using Op.Do() as below.

System.Runtime.InteropServices.COMException: 'Error HRESULT E_FAIL has been returned from a call to a COM component.'


I was try to use pdfCtl.Inst.AsyncDo(Op) .
This no return any error but no any output also.

Regards

User avatar
Sasha - Tracker Dev Team
User
Posts: 5033
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Get text in annotation

Post by Sasha - Tracker Dev Team » Tue Feb 11, 2020 9:49 am

Hello ZY_BODS,

Have you done everything as advised in the topics I mentioned? They are big so be sure to read everything.

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

Post Reply