Placing OCR Text onto a PDF Page

PDF-XChange Editor SDK for Developers

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Forum rules
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.

When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
Post Reply
jeffp
User
Posts: 914
Joined: Wed Sep 30, 2009 6:53 pm

Placing OCR Text onto a PDF Page

Post by jeffp »

I'm trying to place OCR words onto a PDF page. If the page is rotated at 0 (straight up) then the code below works fine. However, if my PDF page is rotated at 90, 270 or 180, the text placement is not aligned with the word.

I have a property below named AWords.Rotated that tells me the rotation of the page. However, all my AWords coordinates are based on the rotation being 0.

I'd like to do a APage.Set_Rotation() call before I place content to ensure the Page in rotated upright. But this doesn't seem to work. I rotates the image part of the page but the content placement doesn't get rotated.

How can I modify the code below to allow for the proper placement of text regardless of the PDF page rotation. Again, I know before I make this call whether the PDF page is rotated or not. And the word coordinates I have are always based on a 0 rotation angle.

Also, am I using the IPXC_ContentCreator interface correctly in my loop of Words below? It seems to place the words very slowly. It seems like I'm placing one word at a time. Is there a way to place all the words at once.

Code: Select all

procedure TMyPDF.PlaceOCRWords(APageNum: Integer; AWords: TOCRWords; AOneTextObject: Boolean);
var
  CC: IPXC_ContentCreator;
  AContent: IPXC_Content;
  AFont: IPXC_Font;
  AText: String;
  i, ADPI: Integer;
  AFontSize, x, y: Double;
  AInchesH, AInchesW, AHeight: Double;
  APage: IPXC_Page;
  AWord: TOCRWord;
begin
  if Assigned(FDoc) and IsValidPageNumber(APageNum) then
  begin
    try
      if (High(AWords.Items) = -1) and (Trim(AWords.WordsAsText) <> '') then
      begin
        AOneTextObject := True;
      end;

      ADPI := AWords.ImageDPI;
      if (ADPI = 0) then ADPI := 300;
      ADPI := Max(100, ADPI);

      if (AWords.ImageWidth <= 100) then AWords.ImageWidth := Floor(8.5 * ADPI);
      if (AWords.ImageHeight <= 100) then AWords.ImageHeight := Floor(11 * ADPI);

      //ALWAYS CREATE PAGE STRAIGHT UP. WILL ROTATE IT BELOW
      //Image size reflect original before any rotation
      if (Abs(AWords.Rotated) = 90) or (Abs(AWords.Rotated) = 270) then
      begin
        AInchesW := AWords.ImageHeight / ADPI;
        AInchesH := AWords.ImageWidth / ADPI;
      end else
      begin
        AInchesH := AWords.ImageHeight / ADPI;
        AInchesW := AWords.ImageWidth / ADPI;
      end;

    except
      AInchesH := 11;
      AInchesW := 8.5;
    end;

    AFont := FDoc.CreateNewFont('Arial', 0, 400);
    APage := GetPage(APageNum);  //Test Page is 612 x 792 points

    for i := 0 to High(AWords.Items) do
    begin
      AWord := AWords.Items[i];
      AText := AWord.Text;
      AFontSize := AWord.FontSize;

      x := X2P(AWord.Left, ADPI);
      y := I2P(AInchesH) - X2P(AWord.Top + AWord.Height, ADPI);

      CC := FDoc.CreateContentCreator;
      CC.SetTextRenderMode(TRM_None); //TRM_None; //TRM_Fill
      CC.SetFont(AFont);
      CC.SetFontSize(AFontSize);
      CC.SetStrokeColorRGB(RGB(0, 0, 0));
      CC.ShowTextLine(x, y, PChar(AText), -1, STLF_Baseline); //Use Baseline
      CC.Detach(AContent);

      APage.PlaceContent(AContent, PlaceContent_After);
    end;

    //APage.Set_Rotation(AWords.Rotated);

    APage := nil;

    WriteToLog(Format('placed %d words on page %d', [High(AWords.Items) + 1, APageNum]));
  end;
end;
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Placing OCR Text onto a PDF Page

Post by Sasha - Tracker Dev Team »

Hello Jeff,

You mean you are placing the content, then rotating the page and the content is not rotated? If you could send us a small sample that illustrates the problem so we could debug it and see the problem quickly.

As for the speed up - you are not using the PlaceContent method correctly. It should be used once when you finish working with the Content Creator. Meaning that first you will need to fill the Content Creator with all of the text needed and then use PlaceContent method ONCE to place all of the created content.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
jeffp
User
Posts: 914
Joined: Wed Sep 30, 2009 6:53 pm

Re: Placing OCR Text onto a PDF Page

Post by jeffp »

I'll work on a sample for you.

In the meantime, is there a way to rotate the content in the ContentCreator before I place it on the page? That way I won't have to rotate the actual page itself.

Also, can you clarify which of the CC... calls above go in the loop and which ones go outside the loop. Do I just created the CC object once? Does CC.Detach go inside the loop or outside?
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Placing OCR Text onto a PDF Page

Post by Sasha - Tracker Dev Team »

Hello Jeff,

In your case, the Content Creator should be created once and the content should be placed once. The Content Creator settings should change when needed - changing them each time is unnecessary.

To place the text correctly, you should use the https://sdkhelp.pdf-xchange.com/vie ... r_ConcatCS method with the Inverted Page Matrix.
To get page's matrix use the https://sdkhelp.pdf-xchange.com/vie ... age_Matrix property and to invert it use the https://sdkhelp.pdf-xchange.com/vie ... rix_Invert method.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
jeffp
User
Posts: 914
Joined: Wed Sep 30, 2009 6:53 pm

Re: Placing OCR Text onto a PDF Page

Post by jeffp »

I still need a bit more help. Below is what I have so far but I can't figure out how to rotate the content before placing it. I'm not sure what invert is doing since what I really want is to rotate the text either 90, 180, or 270 degrees.

I my case the Word items are given to me based on a 0 rotation or straight up, but the actual PDF page I'm placing the content on may be rotated at 90, 180, or 270. As such, I need to convert my 0 rotation coordinates accordingly.

Can you modify the code below to show my what a 90 degrees rotation would look like with the CC object.

Thanks.

Code: Select all


   APage.Get_Matrix(M1);
    H := INST_AUX.Get_MathHelper;
    H.Matrix_Invert(M1, M2);

    CC := FDoc.CreateContentCreator;
    CC.ConcatCS(M2);
    CC.SetTextRenderMode(TRM_None);
    CC.SetFont(AFont);

    for i := 0 to High(AWords.Items) do
    begin
      AWord := AWords.Items[i];
      AText := AWord.Text;
      AFontSize := AWord.FontSize;
      x := X2P(AWord.Left, ADPI);
      y := I2P(AInchesH) - X2P(AWord.Top + AWord.Height, ADPI);
      CC.SetFontSize(AFontSize);
      CC.SetStrokeColorRGB(RGB(0, 0, 0));
      CC.ShowTextLine(x, y, PChar(AText), -1, STLF_Baseline); //Use Baseline
    end;

    CC.Detach(AContent);
    APage.PlaceContent(AContent, PlaceContent_After);


Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Placing OCR Text onto a PDF Page

Post by Sasha - Tracker Dev Team »

Hello Jeff,

In your case you won't need to calculate any rotation values by yourself. The inverted page matrix will place visual text block onto page including the rotation etc.
Please give us a sample so that we can help you, because we don't know what are you doing exactly and how.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
jeffp
User
Posts: 914
Joined: Wed Sep 30, 2009 6:53 pm

Re: Placing OCR Text onto a PDF Page

Post by jeffp »

See the two sample files attached.

Right - Before.pdf is the original file with no OCR text.

Right - After.pdf is the PDF file after I try to place OCR text using the code sample I sent in the last message.

The code sample in my last message details the calls I'm making. I just need some help to get them in the right order of something so that the OCR text is displayed behind the image words.

Remember, to see the OCR text positioning in the After file above, you need to click into the page and then press Ctrl-A.

Also, what do Invert and ConcatCS do? What does ConcatCS stand for? And what about CC.RotateCS? What does that do?

--Jeff
Attachments
Right - After.pdf
(55.75 KiB) Downloaded 82 times
Right - Before.pdf
(54.18 KiB) Downloaded 71 times
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Placing OCR Text onto a PDF Page

Post by Sasha - Tracker Dev Team »

Hello Jeff,

Please do the same thing with the attached file.

Cheers,
Alex
Attachments
Right - Before_Mod.pdf
(52.51 KiB) Downloaded 73 times
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
jeffp
User
Posts: 914
Joined: Wed Sep 30, 2009 6:53 pm

Re: Placing OCR Text onto a PDF Page

Post by jeffp »

Same result as the one I send you. The text is not rotated 90 to match the text in the image.

Maybe we need to do a much more simple example. Could you show me how to create a new empty PDF page and then place one word on that page in the following rotated positions 0, 90, 180, 270.

Again, I'm just trying to figure out a way to load a ContentCreator and then rotate everything I've loaded by 90, 180, or 270.

I know my rotation going into the call. I don't need to depend on anything related to the PDF page.

Alternatively, is there a call to place the contents of one PDF page onto another PDF page. This way I can place all my content onto a new PDF page, rotate the PDF page accordingly, and then place the contents of that page on my original PDF page. In the old DLLs you had a call named PXCp_PlaceContents which did this.

--Jeff
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Placing OCR Text onto a PDF Page

Post by Sasha - Tracker Dev Team »

Hello Jeff,

We sent you that file so that you could post us the resulting file. We need to know how your third-party OCR works to assist you. It's not only the rotation that should be worried about - page cropping, media box shift etc. should also be taking into an account. All of that can be done (and should be done) easily with matrices, but we need to now what do you have as input. So do send the file we sent you with OCR applied.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
jeffp
User
Posts: 914
Joined: Wed Sep 30, 2009 6:53 pm

Re: Placing OCR Text onto a PDF Page

Post by jeffp »

See the attached file.

You'll notice that the hidden OCR text is still being placed straight up. My objective is to have the OCR text placed at a 90 degree rotation so that it is placed behind the text on the image, which is rotated at 90 degrees.

Here's how my OCR engine works.

The PDF page is extracted to a TIF image.
The Image is detected as rotated at a 90 degrees and is then rotated back to straight up or 0 degrees.
The Image is then sent to the OCR engine straight up
OCR Text is giving back in my Words.Items record array
There word coordinates are for the straight up image but an original rotation of 90 is noted in the OCR Words record so that I can place the text at 90 degrees

As such, all I'm trying to do is rotate my content at 90 degrees before placing it on the page.

--Jeff
Attachments
Right - Before_Mod_Results.pdf
(54.06 KiB) Downloaded 71 times
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Placing OCR Text onto a PDF Page

Post by Sasha - Tracker Dev Team »

Hello Jeff,

OK, one more test to go. Please do the same procedure with these two files and give us the results. Then we will have all of the information we need to help you.

Cheers,
Alex
Attachments
Test2.pdf
(2.6 MiB) Downloaded 83 times
Test1.pdf
(126.96 KiB) Downloaded 66 times
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
jeffp
User
Posts: 914
Joined: Wed Sep 30, 2009 6:53 pm

Re: Placing OCR Text onto a PDF Page

Post by jeffp »

See attached.
Attachments
Test2_Results.pdf
(2.6 MiB) Downloaded 81 times
Test1_Results.pdf
(128.89 KiB) Downloaded 72 times
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Placing OCR Text onto a PDF Page

Post by Sasha - Tracker Dev Team »

Hello Jeff,

Aha, now everything's clear. Please try your first code but rotate the page BEFORE creating the content creator. Everything should work.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
jeffp
User
Posts: 914
Joined: Wed Sep 30, 2009 6:53 pm

Re: Placing OCR Text onto a PDF Page

Post by jeffp »

Nice. I've got it working now by just rotating the page before I place the text. Then I rotate it back.

This now works because of the Invert and ConcatCS calls which I didn't have before. Without then, it doesn't work.

Can you explain what the Invert and ConcatCS calls are doing?

--Jeff

Here is the relevant part of the code for anyone else that may be looking at this.

Code: Select all

    if (AWords.Rotated <> 0) then
    begin
      APage.Get_Rotation(AInt);
      APage.Set_Rotation(AWords.Rotated * -1);
    end;

    APage.Get_Matrix(M1);
    H := INST_AUX.Get_MathHelper;
    H.Matrix_Invert(M1, M2);

    CC := FDoc.CreateContentCreator;
    CC.ConcatCS(M2);
    CC.SetTextRenderMode(TRM_None); //TRM_None; //TRM_Fill
    CC.SetFont(AFont);

    for i := 0 to High(AWords.Items) do
    begin
      AWord := AWords.Items[i];
      AText := AWord.Text;
      AFontSize := AWord.FontSize;
      x := X2P(AWord.Left, ADPI);
      y := I2P(AInchesH) - X2P(AWord.Top + AWord.Height, ADPI);
      CC.SetFontSize(AFontSize);
      CC.SetStrokeColorRGB(RGB(0, 0, 0));
      CC.ShowTextLine(x, y, PChar(AText), -1, STLF_Baseline); //Use Baseline
    end;

    CC.Detach(AContent);
    APage.PlaceContent(AContent, PlaceContent_After);

    if (AWords.Rotated <> 0) then APage.Set_Rotation(AInt);
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Placing OCR Text onto a PDF Page

Post by Sasha - Tracker Dev Team »

Hello Jeff,

I've updated the descriptions on Wiki pages. Please see See Also sections fro more detailed information.
https://sdkhelp.pdf-xchange.com/vie ... r_ConcatCS
https://sdkhelp.pdf-xchange.com/vie ... rix_Invert

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
Post Reply