Page 1 of 1

Placing OCR Text into a PDF Document

Posted: Thu Oct 02, 2014 9:03 pm
by jeffp
I'm trying to better align my OCR text behind the image where the text appears.

Here's my issue.

I know from the OCR coordinates the exact width of the word I'm try to place. However, when I uses either PXC_TextOutA or PXC_DrawTextEx to place the text, the "actual" width of the word being placed is more a function of my getting the Font Size and Font ID correct. That is, I can create a rect in PXC_DrawTextEx to be say 100 in width, but because of the font size and fontid being used, the word may only need 70% of that width.

Is there a way to better align my words given the fact that I know the exact width my word needs to be?

The importance of this is one of redaction. There are some programs out that that will redact the image using the coordinates of the word it finds behind the image. So if my word's width is too small, the redaction doesn't cover the entire word on the image.

Do you follow me here?

Any help would be greatly appreciate.

--Jeff

Re: Placing OCR Text into a PDF Document

Posted: Thu Oct 02, 2014 9:40 pm
by jeffp
Here's a thought: Is there a way I can place the word on a temporary basis and then check the width of the word I just placed to see how it compares to the width of the word given me by the OCR engine. If the width is less, I adjust the font size up; if the width is greater, I adjust the font size down. This would ensure that when I place the word, it's width will match the width of the image.

Re: Placing OCR Text into a PDF Document

Posted: Fri Oct 03, 2014 2:51 pm
by Serg - Tracker Dev
Function PXC_DrawTextExW has two last parameters flags and lpOptions:

Code: Select all

HRESULT PXC_DrawTextExW(
  _PXCContent* content,
  LPCPXC_RectF rect,
  LPCWSTR str,
  LONG sPos,
  LONG len,
  DWORD flags,
  LPPXC_DrawTextStruct lpOptions
); 
flags specifies text drawing flags. They specify horizontal and vertical text alignment and give some additional capabilities for DTF_CalcOnly value (0x1000). If this flag is specified there will be no text output produced, but endY and usedChars fields of passed PXC_DrawTextStruct will be calculated and filled.

Hope, that will help you to match a width.

Re: Placing OCR Text into a PDF Document

Posted: Fri Oct 10, 2014 11:12 pm
by jeffp
Ok. This is good. But I'm now running into another placement issue. In order to guide my text placement using PXC_DrawTextStruct() I am drawing a rectangle with border around my text so I can see if I have the right coordinates using PXC_RectF.

However, if I pass the same PXC_RectF into both PXC_Rect() and PXC_DrawerTextStruct(), the coordinates seem to be off. PXC_Rect() draws a perfect rectangle around the text on my image, but it appears that the PXC_RectF in PXC_DrawTextStruct() gets interpreted a bit differently. The rectangle seems to be about 5 or 6 points too high, or at least higher than what was drawn with PXC_Rect().

Is there a reason for this or am I missing something?

--Jeff

Re: Placing OCR Text into a PDF Document

Posted: Sat Oct 11, 2014 6:32 am
by Serg - Tracker Dev
Can you please provide a code snippet that demonstrates the issue?

Re: Placing OCR Text into a PDF Document

Posted: Sat Oct 11, 2014 3:08 pm
by jeffp
Ok. Here is a snippet that demonstrates my issue.

The call to PXC_TextOutA() below works great. It places the text using the BOTTOM of the rectangle as the baseline.

However, the call to PXC_DrawTextExW() places the text using the TOP of the rectangle as the base line.

Why is that?

Code: Select all

with AOpts do
begin
  cbSize := SizeOf(AOpts);
  fontID := AFont;
  fontSize := AWord.FontSize;
  nTextPosition := TextPosition_Baseline; //BASELINE
  nTextAlign := TextAlign_Left;
  LineSpacing := 0;
  PapaSpacing := 0;
  SimItalicAngle := 0;
  SimBoldThickness := 0;
end;
hr := PXC_SetTextOptions(hPage, @AOpts);

ARect.left := 100;
ARect.right := ARect.left + 40;
ARect.bottom := I2P(AInchesH) - 300;
ARect.top := ARect.bottom + 20;

APoint.x := ARect.left;
APoint.y := ARect.bottom;

//Draw Rectangle
hr := PXC_SetStrokeColor(hPage, RGB(0, 255, 0));
hr := PXC_Rect(hPage, ARect.left, ARect.top, ARect.right, ARect.bottom);
hr := PXC_StrokePath(hPage, TRUE);

hr := PXC_TextOutA(hPage, @APoint, PAnsiChar(AStr), Length(AStr));

with AStruct do
begin
  cbSize := SizeOf(AStruct);
end;

WStr := AStr;
ASize := Length(WStr);
hr := PXC_DrawTextExW(hPage, @ARect, PWideChar(WStr), 0, ASize, 0, @AStruct);


Re: Placing OCR Text into a PDF Document

Posted: Mon Oct 13, 2014 4:27 pm
by Tracker Supp-Stefan
Thanks Jeff,

I've passed your above post to Serg, and he said that he will take a look at it as soon as possible (the guys are currently busy with preparation of build 310).

Regards,
Stefan