Text Under Image

This Forum is for the use of Software Developers requiring help and assistance for Tracker Software's PDF-Tools SDK of Library DLL functions(only) - Please use the PDF-XChange Drivers API SDK Forum for assistance with all PDF Print Driver related topics or PDF-XChange Viewer SDK if appropriate.

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan

Post Reply
ashmid
User
Posts: 27
Joined: Fri Jan 06, 2012 12:35 am

Text Under Image

Post by ashmid »

Many OCR programs offer the option of storing "text under image" in PDF files, such that the full bitmap image is shown in the PDF, yet the deciphered characters are encoded invisibly within the document, "beneath" the bitmap image, to enable searching.
I'm wondering: if I start out with a PDF containing bitmap images, and I'd like to add specific words underneath the bitmap images for searching, how can I use the PDF-Tools SDK to add in this extra layer? Do I need to create a separate layer? Or do I simply need to give some command to place the words underneath the image? Or is there some way to give the words an "invisible" attribute?
ashmid
User
Posts: 27
Joined: Fri Jan 06, 2012 12:35 am

Re: Text Under Image

Post by ashmid »

To be more specific: I'm looking for a way to use the PDF-Tools SDK to set the Z-Order of the elements in my PDF file.
For instance, when using the Pitstop plug-in in Acrobat, one can click on an element, and then use the right-click menu to choose "send to back", "bring to front", etc.
I'd like to do the same thing programmatically, via the SDK. Is this possible?
On a related note: Adobe Acrobat has a "touchup reading order" tool that allows the specification of reading-order tags, to indicate the proper order for parsing the text. How can this be done via the PDF-Tools SDK?
User avatar
John - Tracker Supp
Site Admin
Posts: 5219
Joined: Tue Jun 29, 2004 10:34 am
Location: United Kingdom
Contact:

Re: Text Under Image

Post by John - Tracker Supp »

Hi,

The OCR offered by us allows you to OCR an image based PDF and it adds an invisible layer to the top of the PDF so that it is in focus for text searching and selection - not in the background as you outline ...

You can add and modify the text using the PDF-Tools (XCPRO40) functionality for seachable and selection purposes - but would need to use the annotation options to add a flattened text box layer matching the background paper, font and style of the original - over your image text to 'mimic' that the original text in the image had been modified (it will not be) - as the image remains the primary display layer - all OCR'd text is invisible.

Hope that helps
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
ashmid
User
Posts: 27
Joined: Fri Jan 06, 2012 12:35 am

Re: Text Under Image

Post by ashmid »

Hi John,
Thanks for the explanations. I'm going to look at the annotation functions to see how to add an invisible text box layer as you describe. If possible, it would be very helpful if you could please reply with a quick enumeration of the relevant functions/parameters that one would use to create such an invisible layer.
(I'm also still wondering about z-order control and tagging; but I'll post that as a separate thread because it's really a separate issue).
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: Text Under Image

Post by Walter-Tracker Supp »

One of the easiest ways to make text invisible is to simply set the rendering mode.

The PDF-Tools SDK lets you specify text rendering mode for text placement functions using PXC_SetTextRMode() (which takes an enum type, PXC_TextRenderingMode, as a parameter - see page 125 of the SDK manual). The value you would use to render invisible text is TextRenderingMode_None.

Code: Select all

HRESULT PXC_SetTextRMode(
_PXCContent* content,
PXC_TextRenderingMode mode,
PXC_TextRenderingMode* oldmode
);

Parameters

content
[in] Parameter content specifies identifier of the page content to which the function will be
applied.

mode
[in] Specifies text rendering mode. The possible modes are enumerated in
PXC_TextRenderingMode. (See comments for possible values)

oldmode
[out] Pointer to a variable which will contain the previous text rendering mode after the
function return. (See comments for possible values)
You can then use PXC_TextOutA() or PXC_TextOutW() to place text using this rendering mode.

If you would still prefer to set the order and place the text behind the images, let me know and I will look into this for you.
ashmid
User
Posts: 27
Joined: Fri Jan 06, 2012 12:35 am

Re: Text Under Image

Post by ashmid »

Hi John,
Thanks for this info; I shall try it out.
Please clarify, though: is this an implementation of the invisible-annotation-text-box method that you described earlier, or is this a different method altogether? It seems to be the latter. If so, I'd also appreciate hearing about the relevant function calls for implementing the annotation layer that you had referenced earlier.
I am also still interested in the text-behind-image option, so if you do know of a way to do that with your SDK please do let me know (I'd like to place two separate non-visible text layers onto a single page, and in order to avoid confusion between them while searching, I'm considering putting one layer behind the image and a second layer above the image in invisible text, to make sure that the PDF readers won't combine them a single continuous text).
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: Text Under Image

Post by Walter-Tracker Supp »

ashmid wrote:Hi John,
Thanks for this info; I shall try it out.
Please clarify, though: is this an implementation of the invisible-annotation-text-box method that you described earlier, or is this a different method altogether? It seems to be the latter. If so, I'd also appreciate hearing about the relevant function calls for implementing the annotation layer that you had referenced earlier.
I am also still interested in the text-behind-image option, so if you do know of a way to do that with your SDK please do let me know (I'd like to place two separate non-visible text layers onto a single page, and in order to avoid confusion between them while searching, I'm considering putting one layer behind the image and a second layer above the image in invisible text, to make sure that the PDF readers won't combine them a single continuous text).
This is Walter, not John BTW (excuse confusion from multiple people addressing this).

The method I mentioned is simply to place text that will not be rendered as visible, but which still has a defined position, size, font, etc. Thus you can cause it to appear anywhere you want, so that a search will cause the invisible characters to be selected thus showing their position. Annotations are somewhat different; they provide an icon which opens up a text box when it is activated. You can add them with PXC_AddTextAnnotationW() and the related ascii function.

I will look into the other possibility (placing text above and below images) and get back to you shortly.

-Walter
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: Text Under Image

Post by Walter-Tracker Supp »

It looks like you have received an answer to your other question in another thread. For the sake of posterity I am posting the link here (someone else may find this topic at some point!):

https://forum.pdf-xchange.com/ ... 44&t=12406

If there's anything else we can help with, or if this doesn't answer all of your questions, please don't hesitate to contact us, either here or at support@pdf-xchange.com

Thanks,
Walter
ronystg
User
Posts: 4
Joined: Mon Sep 18, 2006 5:23 pm
Contact:

Re: Text Under Image

Post by ronystg »

Hi,

for some reason, when i check text frame for TextRenderingMode, it always return TextRenderingMode_Fill instead on TextRenderingMode_None.

I use
"
hr = PXCp_ET_GetElement(doc1, fnum, &TextElement, 0);
if(IS_DS_FAILED(hr) || (LONG)TextElement.Count <= 0)
continue;

TextElement.mask = PTEM_Text | PTEM_Offsets | PTEM_Matrix |PTEM_FontInfo | PTEM_TextParams;


hr = PXCp_ET_GetElement(doc1, fnum, &TextElement, GTEF_IgnorePageRotation );

if(TextElement.Count == 2 && (TextElement.Characters[0] == ' ' || TextElement.Characters[0] == '\t'))
continue;
// 22.12.06 IV s
if(TextElement.RenderingMode != TextRenderingMode_None)
{
visible_text = 1; // 22.12.06 IV
visible_text_num++; // 23.8.12
}
"
all 3 pdf files return "TextRenderingMode_Fill"

Please Help

Rony
Attachments
retextrenderingmode_none1.zip
(129.21 KiB) Downloaded 208 times
Nico - Tracker Supp
User
Posts: 205
Joined: Fri May 18, 2012 8:41 pm

Re: Text Under Image

Post by Nico - Tracker Supp »

Hi ronystg,

Thanks for your post.
Could you provide a small sample or small project with your code, which we can test and run?
(Make sure that any dev info is removed from the code before posting it in the forums!)

Also, how are you initializing the TextElement structure and doc1?
Thanks.

Sincerely,
Post Reply