Page 1 of 1

Get text in annotation

Posted: Wed Jan 15, 2020 2:11 am
by ZY_BODS
Hi,

I want to get the text inside the annotation after user draw the annotation like rectangle.
I am using VB.net and I already able to get the coordinate for the annotation after user drawing it. Below is the coding I trying now.

If the method we using is wrong can get some example how to get the text, we are struggling this problem quite a long time.

Thanks

Code: Select all

'declare current page 
Dim doc As IPXV_Document = pdfCtl.Doc
Dim pl As PDFXEdit.IPXV_PagesLayoutManager = doc.ActiveView.PagesView.Layout
Dim pageNr = pl.CurrentPage
Dim curPage As PDFXEdit.IPXC_Page = doc.CoreDoc.Pages(pageNr)

'set the rect coordination
Dim rect As PXC_Rect
rect.left = dblLeft
rect.right = dblRight
rect.top = dblTop
rect.bottom = dblBottom
 
'declare for QCR (bottom part is no working we assume should assign the OCR to document we declare at upper part)
Dim OCRExt As IPXV_OCRExtension
Dim OCRRegions As IPXV_OCRRegions
Dim OCRRegion As IPXV_OCRRegion
Dim OCRTask As IPXV_OCRTask
Dim OCRResult As IPXV_OCRResult

OCRRegions = OCRExt.CreateOCRRegions()
OCRRegion = OCRRegions.InsertNew(PXV_OCRBlockType.PXV_OCRBlock_Text, rect)
OCRTask = OCRExt.CreateNewTask()
OCRResult = OCRTask.ProceedPage(curPage, OCRRegion, ,)
txtResult.Text = OCRResult.GetText(0, 0, )

Re: Get text in annotation

Posted: Wed Jan 15, 2020 11:04 am
by Sasha - Tracker Dev Team
Hello ZY_BODS,

Is that text an image or a physical text that you can select with the Select Text tool?

Cheers,
Alex

Re: Get text in annotation

Posted: Thu Jan 16, 2020 1:22 am
by ZY_BODS
Sasha - Tracker Dev Team wrote: Wed Jan 15, 2020 11:04 am Hello ZY_BODS,

Is that text an image or a physical text that you can select with the Select Text tool?

Cheers,
Alex
Hi thanks for replay,

Both condition may occur.

Below is example, what is we will get should be .618±.020

Re: Get text in annotation

Posted: Fri Jan 17, 2020 12:40 pm
by Sasha - Tracker Dev Team
Hello ZY_BODS,

This can help:
viewtopic.php?f=66&t=32582&p=133499&hil ... xt#p133499

Cheers,
Alex

Re: Get text in annotation

Posted: Fri Jan 31, 2020 7:11 am
by ZY_BODS
From the example you giving we only can get the text when we know which line is the text is but for my problem, we do not know which line is the text allocate what we had is the coordinate of the rect.

Re: Get text in annotation

Posted: Fri Jan 31, 2020 8:52 am
by zarkogajic
Hi,

This is what you need to do: viewtopic.php?f=66&t=33540#p138629

-žarko

Re: Get text in annotation

Posted: Fri Jan 31, 2020 9:47 am
by ZY_BODS
Hi thanks for your reply,

but from the link i faced same problem I cannot get the line index for the rectangle.

Re: Get text in annotation

Posted: Fri Jan 31, 2020 10:48 am
by zarkogajic
Hi,
Get the CharRect of every char and "build" up the rectangle - and finally if the rectangle is inside annotation's rectangle - this is the text you're after...
Here's some Delphi code - I guess you would be able to convert this to the language of your choice:

Code: Select all

function TPXC_PageWraper.ExtractPageText(const fromRect: PXC_Rect): string;
const
  DELTA = 2;
var
  lineString : string;
  i, j, k : integer;
  tli : PXC_TextLineInfo;
  lineChars, nextChar : WideString;
  fcRect, charRect : PXC_Rect;
begin
  result := '';

  for i := 0 to -1 + fPageLineCount do
  begin
    fPageText.Get_LineInfo(i, tli);

    fPageText.GetChars(tli.nFirstCharIndex, tli.nCharsCount, lineChars);

    lineString := Trim(StringReplace(lineChars, ' ', '', [rfReplaceAll, rfIgnoreCase]));

    if lineString <> '' then
    begin
      fPageText.Get_CharRect(tli.nFirstCharIndex, fcRect);

      if (fcRect.top <= fromRect.top + DELTA) AND (fcRect.bottom >= fromRect.bottom - DELTA) then
      begin
        if result <> '' then result := result + ' ';

        for j := tli.nFirstCharIndex to tli.nFirstCharIndex + tli.nCharsCount do
        begin
          fPageText.Get_CharRect(j, charRect); //first char in line - could be lower than the rest

          if charRect.left >= fromRect.left - DELTA then
          begin
            k := j;
            while (charRect.right <= fromRect.right + DELTA) AND (k < tli.nFirstCharIndex + tli.nCharsCount) do
            begin
              fPageText.GetChars(k, 1, nextChar);

              result := result + nextChar;

              Inc(k);
              fPageText.Get_CharRect(k, charRect);
            end;
            Break; //for j
          end;
        end;
      end;
    end;
  end;

  result := Trim(result);
end;
The TPXC_PageWraper is my custom class to wrap IPXC_Page.

Constructor gets the page text and line count (used above)

Code: Select all

constructor TPXC_PageWraper.Create(const thePage: IPXC_Page);
begin
  fPage := thePage;

  fPage.GetText(nil, true, fPageText);

  fPageText.Get_LinesCount(fPageLineCount);
end;
HTH.

-žarko

Re: Get text in annotation

Posted: Mon Feb 03, 2020 6:14 am
by ZY_BODS
Thanks, I get it for searchable text. Do you have any idea for non searchable text? I had some try I believed should using OCR module but I am not sure how to link with OCR module and pdf current page.

Re: Get text in annotation

Posted: Mon Feb 03, 2020 9:01 am
by Sasha - Tracker Dev Team
Hello ZY_BODS,

Please refer to these topics - they should help:
viewtopic.php?f=66&t=30356
viewtopic.php?f=66&t=33535

Cheers,
Alex

Re: Get text in annotation

Posted: Thu Feb 06, 2020 4:33 am
by ZY_BODS
I tried faced error below is the code I using.

Code: Select all

	Dim nId As Integer = pdfCtl.Inst.Str2ID("op.document.OCRPages", False)
        Dim Op As IOperation = pdfCtl.Inst.CreateOp(nId)
        Dim Input As ICabNode = Op.Params.Root
        Input("Input").v = pdfCtl.Doc

        Dim options As ICabNode = Input("Options")

        options("PagesRange.Type").v = "All"
        options("OutputType").v = 0
        options("OutputDPI").v = 300

        pdfCtl.Inst.AsyncDoAndWaitForFinish(Op) //hit error here
the error as below

System.Runtime.InteropServices.COMException: 'Error HRESULT E_FAIL has been returned from a call to a COM component.'

Re: Get text in annotation

Posted: Tue Feb 11, 2020 9:01 am
by Sasha - Tracker Dev Team
Hello ZY_BODS,

And what if you do Op.Do() instead of the AsyncDoAndWaitForFinish method?

Cheers,
Alex

Re: Get text in annotation

Posted: Tue Feb 11, 2020 9:40 am
by ZY_BODS
Hi Alex,

I getting the same error using Op.Do() as below.

System.Runtime.InteropServices.COMException: 'Error HRESULT E_FAIL has been returned from a call to a COM component.'


I was try to use pdfCtl.Inst.AsyncDo(Op) .
This no return any error but no any output also.

Regards

Re: Get text in annotation

Posted: Tue Feb 11, 2020 9:49 am
by Sasha - Tracker Dev Team
Hello ZY_BODS,

Have you done everything as advised in the topics I mentioned? They are big so be sure to read everything.

Cheers,
Alex