Placing OCR Text into a PDF Document

This Forum is for the use of Software Developers requiring help and assistance for Tracker Software's PDF-Tools SDK of Library DLL functions(only) - Please use the PDF-XChange Drivers API SDK Forum for assistance with all PDF Print Driver related topics or PDF-XChange Viewer SDK if appropriate.

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Sean - Tracker, Chris - Tracker Supp, Tracker Supp-Stefan

Post Reply
jeffp
User
Posts: 857
Joined: Wed Sep 30, 2009 6:53 pm

Placing OCR Text into a PDF Document

Post by jeffp » Thu Oct 02, 2014 9:03 pm

I'm trying to better align my OCR text behind the image where the text appears.

Here's my issue.

I know from the OCR coordinates the exact width of the word I'm try to place. However, when I uses either PXC_TextOutA or PXC_DrawTextEx to place the text, the "actual" width of the word being placed is more a function of my getting the Font Size and Font ID correct. That is, I can create a rect in PXC_DrawTextEx to be say 100 in width, but because of the font size and fontid being used, the word may only need 70% of that width.

Is there a way to better align my words given the fact that I know the exact width my word needs to be?

The importance of this is one of redaction. There are some programs out that that will redact the image using the coordinates of the word it finds behind the image. So if my word's width is too small, the redaction doesn't cover the entire word on the image.

Do you follow me here?

Any help would be greatly appreciate.

--Jeff

jeffp
User
Posts: 857
Joined: Wed Sep 30, 2009 6:53 pm

Re: Placing OCR Text into a PDF Document

Post by jeffp » Thu Oct 02, 2014 9:40 pm

Here's a thought: Is there a way I can place the word on a temporary basis and then check the width of the word I just placed to see how it compares to the width of the word given me by the OCR engine. If the width is less, I adjust the font size up; if the width is greater, I adjust the font size down. This would ensure that when I place the word, it's width will match the width of the image.

Serg - Tracker Dev
User
Posts: 14
Joined: Wed Sep 17, 2014 7:40 am
Location: Ukraine

Re: Placing OCR Text into a PDF Document

Post by Serg - Tracker Dev » Fri Oct 03, 2014 2:51 pm

Function PXC_DrawTextExW has two last parameters flags and lpOptions:

Code: Select all

HRESULT PXC_DrawTextExW(
  _PXCContent* content,
  LPCPXC_RectF rect,
  LPCWSTR str,
  LONG sPos,
  LONG len,
  DWORD flags,
  LPPXC_DrawTextStruct lpOptions
); 
flags specifies text drawing flags. They specify horizontal and vertical text alignment and give some additional capabilities for DTF_CalcOnly value (0x1000). If this flag is specified there will be no text output produced, but endY and usedChars fields of passed PXC_DrawTextStruct will be calculated and filled.

Hope, that will help you to match a width.

jeffp
User
Posts: 857
Joined: Wed Sep 30, 2009 6:53 pm

Re: Placing OCR Text into a PDF Document

Post by jeffp » Fri Oct 10, 2014 11:12 pm

Ok. This is good. But I'm now running into another placement issue. In order to guide my text placement using PXC_DrawTextStruct() I am drawing a rectangle with border around my text so I can see if I have the right coordinates using PXC_RectF.

However, if I pass the same PXC_RectF into both PXC_Rect() and PXC_DrawerTextStruct(), the coordinates seem to be off. PXC_Rect() draws a perfect rectangle around the text on my image, but it appears that the PXC_RectF in PXC_DrawTextStruct() gets interpreted a bit differently. The rectangle seems to be about 5 or 6 points too high, or at least higher than what was drawn with PXC_Rect().

Is there a reason for this or am I missing something?

--Jeff

Serg - Tracker Dev
User
Posts: 14
Joined: Wed Sep 17, 2014 7:40 am
Location: Ukraine

Re: Placing OCR Text into a PDF Document

Post by Serg - Tracker Dev » Sat Oct 11, 2014 6:32 am

Can you please provide a code snippet that demonstrates the issue?

jeffp
User
Posts: 857
Joined: Wed Sep 30, 2009 6:53 pm

Re: Placing OCR Text into a PDF Document

Post by jeffp » Sat Oct 11, 2014 3:08 pm

Ok. Here is a snippet that demonstrates my issue.

The call to PXC_TextOutA() below works great. It places the text using the BOTTOM of the rectangle as the baseline.

However, the call to PXC_DrawTextExW() places the text using the TOP of the rectangle as the base line.

Why is that?

Code: Select all

with AOpts do
begin
  cbSize := SizeOf(AOpts);
  fontID := AFont;
  fontSize := AWord.FontSize;
  nTextPosition := TextPosition_Baseline; //BASELINE
  nTextAlign := TextAlign_Left;
  LineSpacing := 0;
  PapaSpacing := 0;
  SimItalicAngle := 0;
  SimBoldThickness := 0;
end;
hr := PXC_SetTextOptions(hPage, @AOpts);

ARect.left := 100;
ARect.right := ARect.left + 40;
ARect.bottom := I2P(AInchesH) - 300;
ARect.top := ARect.bottom + 20;

APoint.x := ARect.left;
APoint.y := ARect.bottom;

//Draw Rectangle
hr := PXC_SetStrokeColor(hPage, RGB(0, 255, 0));
hr := PXC_Rect(hPage, ARect.left, ARect.top, ARect.right, ARect.bottom);
hr := PXC_StrokePath(hPage, TRUE);

hr := PXC_TextOutA(hPage, @APoint, PAnsiChar(AStr), Length(AStr));

with AStruct do
begin
  cbSize := SizeOf(AStruct);
end;

WStr := AStr;
ASize := Length(WStr);
hr := PXC_DrawTextExW(hPage, @ARect, PWideChar(WStr), 0, ASize, 0, @AStruct);


User avatar
Tracker Supp-Stefan
Site Admin
Posts: 13428
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Placing OCR Text into a PDF Document

Post by Tracker Supp-Stefan » Mon Oct 13, 2014 4:27 pm

Thanks Jeff,

I've passed your above post to Serg, and he said that he will take a look at it as soon as possible (the guys are currently busy with preparation of build 310).

Regards,
Stefan

Post Reply