OCR function

PDF-X OCR SDK is a New product from us and intended to compliment our existing PDF and Imaging Tools to provide the Developer with an expanding set of professional tools for Optical Character Recognition tasks

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Sean - Tracker, Chris - Tracker Supp, Tracker Supp-Stefan

Post Reply
Audros
User
Posts: 63
Joined: Fri Jun 08, 2018 1:39 pm

OCR function

Post by Audros » Mon Oct 15, 2018 8:24 am

Hello, we have questions about the OCR functions with C #, are there functions to explore the pdf
• Identify the objects of the pdf
• Identify their nature (here image). What other types does it exist?
• What is your information about 'image? her size ? his coordinates ? in which unit? What other information do we ²have?

thank you,
Best Regards.

User avatar
Tracker Supp-Stefan
Site Admin
Posts: 13530
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: OCR function

Post by Tracker Supp-Stefan » Mon Oct 15, 2018 10:46 am

Hello AUdros,

Please take a look at what our OCR module has to offer here:
https://help.tracker-software.com/pdfxp ... odule.html

It will allow you to perform the actual OCR process on the pages. If you want to gather information about elements and structure of the pages - please use the Core API SDK (also part of the PRO SDK bundle which you need for the OCR SDK).

Regards,
Stefan

Audros
User
Posts: 63
Joined: Fri Jun 08, 2018 1:39 pm

Re: OCR function

Post by Audros » Mon Oct 15, 2018 4:08 pm

Thanks Sir :)

Audros
User
Posts: 63
Joined: Fri Jun 08, 2018 1:39 pm

Re: OCR function

Post by Audros » Mon Oct 15, 2018 5:40 pm

Hello
-I could use the code of the OCR API to retrieve a symbol of a text,
my goal is to be able to frame a symbol to choose, but I can not find the coordinates of the symble, my code is quoted below.
is there a way to compare images?
Thank you

/////////////////////////////////////////////////////////////////////

IntPtr pg;
PDFXOCR.PDFXOCR_Funcs.OCR_RasterPageSettings pRasterSettings;
PDFXOCR.PDFXOCR_Funcs.OCRp_Page(pdf, 0, ref options, out pg, out pRasterSettings);
uint pRegionCount;
PDFXOCR.PDFXOCR_Funcs.OCRp_RegionCountFromPage(pg, out pRegionCount);
for (uint i = 0; i <= pRegionCount; i++) {
IntPtr pRegionResults;
PDFXOCR.PDFXOCR_Funcs.OCRp_GetRegionFromPage(pg, i,out pRegionResults);
uint pSymbolCount;
PDFXOCR.PDFXOCR_Funcs.OCRp_SymbolCountFromRegion(pRegionResults,out pSymbolCount);
if (pSymbolCount > 0) {
for(uint j=0;j<=pSymbolCount;j++){
PDFXOCR.PDFXOCR_Funcs.OCR_SymbolBox pSymbolBox;
PDFXOCR.PDFXOCR_Funcs.OCRp_GetSymbolFromRegion(pRegionResults,j,out pSymbolBox);
Console.Write(pSymbolBox.wcSymbol);
if (pSymbolBox.wcSymbol == "z")
{
uint nFreeText = ((PDFXEdit.IPXS_Inst)pdfCtl.Inst.GetExtension("PXS")).StrToAtom("FreeText");
PDFXEdit.PXC_Rect rc ;
rc.left =pdfCtl.Width - pSymbolBox.rcBound.left ;//I can't get exact coordinate!!
rc.right = rc.left + (-pSymbolBox.rcBound.right + pSymbolBox.rcBound.left);
rc.top = 800;// pSymbolBox.rcBound.top;
rc.bottom = rc.top - (pSymbolBox.rcBound.bottom - pSymbolBox.rcBound.top);// pSymbolBox.rcBound.bottom;


IPXC_Annotation pAnnotF = pdfCtl.Doc.CoreDoc.Pages[0].InsertNewAnnot(nFreeText, ref rc, 0);
PDFXEdit.IPXC_AnnotData_FreeText SQDataF = (PDFXEdit.IPXC_AnnotData_FreeText)pAnnotF.Data;
SQDataF.Opacity = 0.7;
SQDataF.DefaultFontSize = pSymbolBox.pointsize;
SQDataF.DefaultTextAlign = (int)PDFXEdit.UIX_AlignFlags.UIX_Align_Center;
var borderF = new PDFXEdit.PXC_AnnotBorder();
borderF.nWidth = 2f;
borderF.nStyle = PDFXEdit.PXC_AnnotBorderStyle.ABS_Solid;
SQDataF.set_Border(borderF);
pAnnotF.Data = SQDataF;
int nID = pdfCtl.Inst.Str2ID("op.annots.addNew", false);
PDFXEdit.IOperation pOp = pdfCtl.Inst.CreateOp(nID);
PDFXEdit.ICabNode input = pOp.Params.Root["Input"];
input.Add().v = pAnnotF;
pOp.Do();
}

}

}


}

User avatar
Sasha - Tracker Dev Team
User
Posts: 4315
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR function

Post by Sasha - Tracker Dev Team » Tue Oct 16, 2018 7:17 am

Hello Audros,

Are you using the Editor SDK? Because this code is not correct at all:

Code: Select all

PDFXEdit.PXC_Rect rc ;
rc.left =pdfCtl.Width - pSymbolBox.rcBound.left ;//I can't get exact coordinate!!
rc.right = rc.left + (-pSymbolBox.rcBound.right + pSymbolBox.rcBound.left);
rc.top = 800;// pSymbolBox.rcBound.top;
rc.bottom = rc.top - (pSymbolBox.rcBound.bottom - pSymbolBox.rcBound.top);// pSymbolBox.rcBound.bottom;
Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

Audros
User
Posts: 63
Joined: Fri Jun 08, 2018 1:39 pm

Re: OCR function

Post by Audros » Wed Oct 17, 2018 7:48 am

Hello,
yes i use the sdk editor, how can i get coordinate Sir, we have the OCR sdk too. Thanks.
Best regards

User avatar
Sasha - Tracker Dev Team
User
Posts: 4315
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR function

Post by Sasha - Tracker Dev Team » Wed Oct 17, 2018 9:34 am

Hello Audros,

Then why don't you use the https://sdkhelp.tracker-software.com/vi ... t_OCRPages operation to OCR the document - not the entire OCR SDK?
Then you will have the text content items that you can modify as you wish. Basically you will have to get the IPXC_PageText from the IPXC_Page interface. And then you will get access to all of the symbol coordinates that you need.

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

Post Reply