OCR function

PDF-X OCR SDK is a New product from us and intended to compliment our existing PDF and Imaging Tools to provide the Developer with an expanding set of professional tools for Optical Character Recognition tasks

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan

Post Reply
Audros
User
Posts: 77
Joined: Fri Jun 08, 2018 1:39 pm

OCR function

Post by Audros »

Hello, we have questions about the OCR functions with C #, are there functions to explore the pdf
• Identify the objects of the pdf
• Identify their nature (here image). What other types does it exist?
• What is your information about 'image? her size ? his coordinates ? in which unit? What other information do we ²have?

thank you,
Best Regards.
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17810
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: OCR function

Post by Tracker Supp-Stefan »

Hello AUdros,

Please take a look at what our OCR module has to offer here:
https://help.pdf-xchange.com/pdfxp ... odule.html

It will allow you to perform the actual OCR process on the pages. If you want to gather information about elements and structure of the pages - please use the Core API SDK (also part of the PRO SDK bundle which you need for the OCR SDK).

Regards,
Stefan
Audros
User
Posts: 77
Joined: Fri Jun 08, 2018 1:39 pm

Re: OCR function

Post by Audros »

Thanks Sir :)
Audros
User
Posts: 77
Joined: Fri Jun 08, 2018 1:39 pm

Re: OCR function

Post by Audros »

Hello
-I could use the code of the OCR API to retrieve a symbol of a text,
my goal is to be able to frame a symbol to choose, but I can not find the coordinates of the symble, my code is quoted below.
is there a way to compare images?
Thank you

/////////////////////////////////////////////////////////////////////

IntPtr pg;
PDFXOCR.PDFXOCR_Funcs.OCR_RasterPageSettings pRasterSettings;
PDFXOCR.PDFXOCR_Funcs.OCRp_Page(pdf, 0, ref options, out pg, out pRasterSettings);
uint pRegionCount;
PDFXOCR.PDFXOCR_Funcs.OCRp_RegionCountFromPage(pg, out pRegionCount);
for (uint i = 0; i <= pRegionCount; i++) {
IntPtr pRegionResults;
PDFXOCR.PDFXOCR_Funcs.OCRp_GetRegionFromPage(pg, i,out pRegionResults);
uint pSymbolCount;
PDFXOCR.PDFXOCR_Funcs.OCRp_SymbolCountFromRegion(pRegionResults,out pSymbolCount);
if (pSymbolCount > 0) {
for(uint j=0;j<=pSymbolCount;j++){
PDFXOCR.PDFXOCR_Funcs.OCR_SymbolBox pSymbolBox;
PDFXOCR.PDFXOCR_Funcs.OCRp_GetSymbolFromRegion(pRegionResults,j,out pSymbolBox);
Console.Write(pSymbolBox.wcSymbol);
if (pSymbolBox.wcSymbol == "z")
{
uint nFreeText = ((PDFXEdit.IPXS_Inst)pdfCtl.Inst.GetExtension("PXS")).StrToAtom("FreeText");
PDFXEdit.PXC_Rect rc ;
rc.left =pdfCtl.Width - pSymbolBox.rcBound.left ;//I can't get exact coordinate!!
rc.right = rc.left + (-pSymbolBox.rcBound.right + pSymbolBox.rcBound.left);
rc.top = 800;// pSymbolBox.rcBound.top;
rc.bottom = rc.top - (pSymbolBox.rcBound.bottom - pSymbolBox.rcBound.top);// pSymbolBox.rcBound.bottom;


IPXC_Annotation pAnnotF = pdfCtl.Doc.CoreDoc.Pages[0].InsertNewAnnot(nFreeText, ref rc, 0);
PDFXEdit.IPXC_AnnotData_FreeText SQDataF = (PDFXEdit.IPXC_AnnotData_FreeText)pAnnotF.Data;
SQDataF.Opacity = 0.7;
SQDataF.DefaultFontSize = pSymbolBox.pointsize;
SQDataF.DefaultTextAlign = (int)PDFXEdit.UIX_AlignFlags.UIX_Align_Center;
var borderF = new PDFXEdit.PXC_AnnotBorder();
borderF.nWidth = 2f;
borderF.nStyle = PDFXEdit.PXC_AnnotBorderStyle.ABS_Solid;
SQDataF.set_Border(borderF);
pAnnotF.Data = SQDataF;
int nID = pdfCtl.Inst.Str2ID("op.annots.addNew", false);
PDFXEdit.IOperation pOp = pdfCtl.Inst.CreateOp(nID);
PDFXEdit.ICabNode input = pOp.Params.Root["Input"];
input.Add().v = pAnnotF;
pOp.Do();
}

}

}


}
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR function

Post by Sasha - Tracker Dev Team »

Hello Audros,

Are you using the Editor SDK? Because this code is not correct at all:

Code: Select all

PDFXEdit.PXC_Rect rc ;
rc.left =pdfCtl.Width - pSymbolBox.rcBound.left ;//I can't get exact coordinate!!
rc.right = rc.left + (-pSymbolBox.rcBound.right + pSymbolBox.rcBound.left);
rc.top = 800;// pSymbolBox.rcBound.top;
rc.bottom = rc.top - (pSymbolBox.rcBound.bottom - pSymbolBox.rcBound.top);// pSymbolBox.rcBound.bottom;
Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
Audros
User
Posts: 77
Joined: Fri Jun 08, 2018 1:39 pm

Re: OCR function

Post by Audros »

Hello,
yes i use the sdk editor, how can i get coordinate Sir, we have the OCR sdk too. Thanks.
Best regards
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR function

Post by Sasha - Tracker Dev Team »

Hello Audros,

Then why don't you use the https://sdkhelp.pdf-xchange.com/vi ... t_OCRPages operation to OCR the document - not the entire OCR SDK?
Then you will have the text content items that you can modify as you wish. Basically you will have to get the IPXC_PageText from the IPXC_Page interface. And then you will get access to all of the symbol coordinates that you need.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
Post Reply