Text Under Image

ashmid · Post by **ashmid** » Sat Jan 07, 2012 11:45 pm

Many OCR programs offer the option of storing "text under image" in PDF files, such that the full bitmap image is shown in the PDF, yet the deciphered characters are encoded invisibly within the document, "beneath" the bitmap image, to enable searching.
I'm wondering: if I start out with a PDF containing bitmap images, and I'd like to add specific words underneath the bitmap images for searching, how can I use the PDF-Tools SDK to add in this extra layer? Do I need to create a separate layer? Or do I simply need to give some command to place the words underneath the image? Or is there some way to give the words an "invisible" attribute?

ashmid · Post by **ashmid** » Sun Jan 08, 2012 11:15 am

To be more specific: I'm looking for a way to use the PDF-Tools SDK to set the Z-Order of the elements in my PDF file.
For instance, when using the Pitstop plug-in in Acrobat, one can click on an element, and then use the right-click menu to choose "send to back", "bring to front", etc.
I'd like to do the same thing programmatically, via the SDK. Is this possible?
On a related note: Adobe Acrobat has a "touchup reading order" tool that allows the specification of reading-order tags, to indicate the proper order for parsing the text. How can this be done via the PDF-Tools SDK?

Post by **John - Tracker Supp** » Mon Jan 09, 2012 3:53 pm

Hi,

The OCR offered by us allows you to OCR an image based PDF and it adds an invisible layer to the top of the PDF so that it is in focus for text searching and selection - not in the background as you outline ...

You can add and modify the text using the PDF-Tools (XCPRO40) functionality for seachable and selection purposes - but would need to use the annotation options to add a flattened text box layer matching the background paper, font and style of the original - over your image text to 'mimic' that the original text in the image had been modified (it will not be) - as the image remains the primary display layer - all OCR'd text is invisible.

Hope that helps

ashmid · Post by **ashmid** » Mon Jan 09, 2012 4:01 pm

Hi John,
Thanks for the explanations. I'm going to look at the annotation functions to see how to add an invisible text box layer as you describe. If possible, it would be very helpful if you could please reply with a quick enumeration of the relevant functions/parameters that one would use to create such an invisible layer.
(I'm also still wondering about z-order control and tagging; but I'll post that as a separate thread because it's really a separate issue).

Walter-Tracker Supp · Post by **Walter-Tracker Supp** » Tue Jan 10, 2012 5:22 pm

One of the easiest ways to make text invisible is to simply set the rendering mode.

The PDF-Tools SDK lets you specify text rendering mode for text placement functions using PXC_SetTextRMode() (which takes an enum type, PXC_TextRenderingMode, as a parameter - see page 125 of the SDK manual). The value you would use to render invisible text is TextRenderingMode_None.

Code: Select all

HRESULT PXC_SetTextRMode(
_PXCContent* content,
PXC_TextRenderingMode mode,
PXC_TextRenderingMode* oldmode
);

Parameters

content
[in] Parameter content specifies identifier of the page content to which the function will be
applied.

mode
[in] Specifies text rendering mode. The possible modes are enumerated in
PXC_TextRenderingMode. (See comments for possible values)

oldmode
[out] Pointer to a variable which will contain the previous text rendering mode after the
function return. (See comments for possible values)

You can then use PXC_TextOutA() or PXC_TextOutW() to place text using this rendering mode.

If you would still prefer to set the order and place the text behind the images, let me know and I will look into this for you.

ashmid · Post by **ashmid** » Tue Jan 10, 2012 6:07 pm

Hi John,
Thanks for this info; I shall try it out.
Please clarify, though: is this an implementation of the invisible-annotation-text-box method that you described earlier, or is this a different method altogether? It seems to be the latter. If so, I'd also appreciate hearing about the relevant function calls for implementing the annotation layer that you had referenced earlier.
I am also still interested in the text-behind-image option, so if you do know of a way to do that with your SDK please do let me know (I'd like to place two separate non-visible text layers onto a single page, and in order to avoid confusion between them while searching, I'm considering putting one layer behind the image and a second layer above the image in invisible text, to make sure that the PDF readers won't combine them a single continuous text).

Walter-Tracker Supp · Post by **Walter-Tracker Supp** » Tue Jan 10, 2012 8:19 pm

ashmid wrote:Hi John,
Thanks for this info; I shall try it out.
Please clarify, though: is this an implementation of the invisible-annotation-text-box method that you described earlier, or is this a different method altogether? It seems to be the latter. If so, I'd also appreciate hearing about the relevant function calls for implementing the annotation layer that you had referenced earlier.
I am also still interested in the text-behind-image option, so if you do know of a way to do that with your SDK please do let me know (I'd like to place two separate non-visible text layers onto a single page, and in order to avoid confusion between them while searching, I'm considering putting one layer behind the image and a second layer above the image in invisible text, to make sure that the PDF readers won't combine them a single continuous text).

This is Walter, not John BTW (excuse confusion from multiple people addressing this).

The method I mentioned is simply to place text that will not be rendered as visible, but which still has a defined position, size, font, etc. Thus you can cause it to appear anywhere you want, so that a search will cause the invisible characters to be selected thus showing their position. Annotations are somewhat different; they provide an icon which opens up a text box when it is activated. You can add them with PXC_AddTextAnnotationW() and the related ascii function.

I will look into the other possibility (placing text above and below images) and get back to you shortly.

-Walter

Walter-Tracker Supp · Tue Jan 10, 2012 11:36 pm

It looks like you have received an answer to your other question in another thread. For the sake of posterity I am posting the link here (someone else may find this topic at some point!):

https://forum.pdf-xchange.com/ ... 44&t=12406

If there's anything else we can help with, or if this doesn't answer all of your questions, please don't hesitate to contact us, either here or at support@pdf-xchange.com

Thanks,
Walter

ronystg · Post by **ronystg** » Thu Aug 23, 2012 3:59 pm

Hi,

for some reason, when i check text frame for TextRenderingMode, it always return TextRenderingMode_Fill instead on TextRenderingMode_None.

I use
"
hr = PXCp_ET_GetElement(doc1, fnum, &TextElement, 0);
if(IS_DS_FAILED(hr) || (LONG)TextElement.Count <= 0)
continue;

TextElement.mask = PTEM_Text | PTEM_Offsets | PTEM_Matrix |PTEM_FontInfo | PTEM_TextParams;

hr = PXCp_ET_GetElement(doc1, fnum, &TextElement, GTEF_IgnorePageRotation );

if(TextElement.Count == 2 && (TextElement.Characters[0] == ' ' || TextElement.Characters[0] == '\t'))
continue;
// 22.12.06 IV s
if(TextElement.RenderingMode != TextRenderingMode_None)
{
visible_text = 1; // 22.12.06 IV
visible_text_num++; // 23.8.12
}
"
all 3 pdf files return "TextRenderingMode_Fill"

Please Help

Rony

Nico - Tracker Supp · Post by **Nico - Tracker Supp** » Thu Aug 23, 2012 7:46 pm

Hi ronystg,

Thanks for your post.
Could you provide a small sample or small project with your code, which we can test and run?
(Make sure that any dev info is removed from the code before posting it in the forums!)

Also, how are you initializing the TextElement structure and doc1?
Thanks.

Sincerely,

Text Under Image

Text Under Image

Re: Text Under Image

Re: Text Under Image

Re: Text Under Image

Re: Text Under Image

Re: Text Under Image

Re: Text Under Image

Re: Text Under Image

Re: Text Under Image

Re: Text Under Image