Moderators:TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan
Forum rules DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.
When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
I want to get the text inside the annotation after user draw the annotation like rectangle.
I am using VB.net and I already able to get the coordinate for the annotation after user drawing it. Below is the coding I trying now.
If the method we using is wrong can get some example how to get the text, we are struggling this problem quite a long time.
'declare current page
Dim doc As IPXV_Document = pdfCtl.Doc
Dim pl As PDFXEdit.IPXV_PagesLayoutManager = doc.ActiveView.PagesView.Layout
Dim pageNr = pl.CurrentPage
Dim curPage As PDFXEdit.IPXC_Page = doc.CoreDoc.Pages(pageNr)
'set the rect coordination
Dim rect As PXC_Rect
rect.left = dblLeft
rect.right = dblRight
rect.top = dblTop
rect.bottom = dblBottom
'declare for QCR (bottom part is no working we assume should assign the OCR to document we declare at upper part)
Dim OCRExt As IPXV_OCRExtension
Dim OCRRegions As IPXV_OCRRegions
Dim OCRRegion As IPXV_OCRRegion
Dim OCRTask As IPXV_OCRTask
Dim OCRResult As IPXV_OCRResult
OCRRegions = OCRExt.CreateOCRRegions()
OCRRegion = OCRRegions.InsertNew(PXV_OCRBlockType.PXV_OCRBlock_Text, rect)
OCRTask = OCRExt.CreateNewTask()
OCRResult = OCRTask.ProceedPage(curPage, OCRRegion, ,)
txtResult.Text = OCRResult.GetText(0, 0, )
From the example you giving we only can get the text when we know which line is the text is but for my problem, we do not know which line is the text allocate what we had is the coordinate of the rect.
Get the CharRect of every char and "build" up the rectangle - and finally if the rectangle is inside annotation's rectangle - this is the text you're after...
Here's some Delphi code - I guess you would be able to convert this to the language of your choice:
function TPXC_PageWraper.ExtractPageText(const fromRect: PXC_Rect): string;
const
DELTA = 2;
var
lineString : string;
i, j, k : integer;
tli : PXC_TextLineInfo;
lineChars, nextChar : WideString;
fcRect, charRect : PXC_Rect;
begin
result := '';
for i := 0 to -1 + fPageLineCount do
begin
fPageText.Get_LineInfo(i, tli);
fPageText.GetChars(tli.nFirstCharIndex, tli.nCharsCount, lineChars);
lineString := Trim(StringReplace(lineChars, ' ', '', [rfReplaceAll, rfIgnoreCase]));
if lineString <> '' then
begin
fPageText.Get_CharRect(tli.nFirstCharIndex, fcRect);
if (fcRect.top <= fromRect.top + DELTA) AND (fcRect.bottom >= fromRect.bottom - DELTA) then
begin
if result <> '' then result := result + ' ';
for j := tli.nFirstCharIndex to tli.nFirstCharIndex + tli.nCharsCount do
begin
fPageText.Get_CharRect(j, charRect); //first char in line - could be lower than the rest
if charRect.left >= fromRect.left - DELTA then
begin
k := j;
while (charRect.right <= fromRect.right + DELTA) AND (k < tli.nFirstCharIndex + tli.nCharsCount) do
begin
fPageText.GetChars(k, 1, nextChar);
result := result + nextChar;
Inc(k);
fPageText.Get_CharRect(k, charRect);
end;
Break; //for j
end;
end;
end;
end;
end;
result := Trim(result);
end;
The TPXC_PageWraper is my custom class to wrap IPXC_Page.
Constructor gets the page text and line count (used above)
Thanks, I get it for searchable text. Do you have any idea for non searchable text? I had some try I believed should using OCR module but I am not sure how to link with OCR module and pdf current page.
Dim nId As Integer = pdfCtl.Inst.Str2ID("op.document.OCRPages", False)
Dim Op As IOperation = pdfCtl.Inst.CreateOp(nId)
Dim Input As ICabNode = Op.Params.Root
Input("Input").v = pdfCtl.Doc
Dim options As ICabNode = Input("Options")
options("PagesRange.Type").v = "All"
options("OutputType").v = 0
options("OutputDPI").v = 300
pdfCtl.Inst.AsyncDoAndWaitForFinish(Op) //hit error here
the error as below
System.Runtime.InteropServices.COMException: 'Error HRESULT E_FAIL has been returned from a call to a COM component.'