How to Add OCR Text to an Image PDF?

PDF-XChange Viewer SDK for Developer's
(ActiveX and Simple DLL Versions)

Moderators: Tracker Support, TrackerSupp-Daniel, Chris - Tracker Supp, Vasyl-Tracker Dev Team, Sean - Tracker, Tracker Supp-Stefan, Ivan - Tracker Software

Post Reply
jeffp
User
Posts: 859
Joined: Wed Sep 30, 2009 6:53 pm

How to Add OCR Text to an Image PDF?

Post by jeffp » Wed Sep 30, 2009 7:41 pm

One of the primary items I need to replicate in my existing program if I make the switch to your PDF components is the ability to add OCR text to a PDF page in the viewer.

My OCR engine gives me a bunch of text with coordinates. With my existing PDF component (that I'm not too happy with), I use the text and coordinates to create a bunch of text objects and place them in the PDF page using the coordinates given by the OCR engine.

Could you please direct me to, or perhaps provide a small Delphi code example of, how I would do this with your ActiveX API to the Viewer (i.e, what Named Objects and Operations to use). I need to set the TextRenderingMode to Invisible in this case. I'm using Delphi 2007.

Thanks.

Corwin - Tracker Sup
User
Posts: 670
Joined: Tue Nov 14, 2006 12:23 pm

Re: How to Add OCR Text to an Image PDF?

Post by Corwin - Tracker Sup » Thu Oct 01, 2009 2:25 pm

Hi,
I'm affraid that this cannot be done with Viewer ActiveX component. You should use PDF-XChange Tools SDK to do this.
HTH.

jeffp
User
Posts: 859
Joined: Wed Sep 30, 2009 6:53 pm

Re: How to Add OCR Text to an Image PDF?

Post by jeffp » Thu Oct 01, 2009 3:19 pm

In a phone conversation I had with John, he mentioned he thought this was possible in the Viewer. What about the JavaScript route? Does that provide a way to embed hidden text into the active viewer document?

My document will already be open in the viewer. If I went the PDF Tools route I'd have to: to save it, reopen it with PDF tools, add text, save it again, and finally reopen in the Viewer. Also, I guess I could pass a stream to PDF Tools, but that would still mean I have to instances of the document open which I'd like to avoid.

Back to JavaScript? Do you know if that would be an option?

John - Tracker Supp
Site Admin
Posts: 8204
Joined: Tue Jun 29, 2004 10:34 am
Location: Vancouver Island - Canada
Contact:

Re: How to Add OCR Text to an Image PDF?

Post by John - Tracker Supp » Thu Oct 01, 2009 4:01 pm

Hi Jeff,

my mistake - I am afraid Sasha is correct, on checking - you will have to utilse the xcpro40 functionality to add the text back into the file ..

We are planning to merge the library functionality as far as is possible into the Viewer Ax as mentioned to you during the coming year - but for now - I am afraid within the viewer this is not possible - my apologies for misleading you.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com

jeffp
User
Posts: 859
Joined: Wed Sep 30, 2009 6:53 pm

Re: How to Add OCR Text to an Image PDF?

Post by jeffp » Thu Oct 01, 2009 7:43 pm

What about JavaScript? I found a post on this form entitled "Problem with Doc.addAnnot method?" which seems to discuss a java script route to adding text. Below is some of that post.

string script = "this.addAnnot({page : 0, type : \"FreeText\", rect : [255,10,355,30], fillColor : color.transparent, width : 0, strokeColor:color.red, alignment : 1, contents : \"sample text\"});";
axCoPDFXCview1.RunJavaScript(script, out bsResult, 0, 0);

Would I be able to do something like this in JavaScript while still inside the Viewer?

John - Tracker Supp
Site Admin
Posts: 8204
Joined: Tue Jun 29, 2004 10:34 am
Location: Vancouver Island - Canada
Contact:

Re: How to Add OCR Text to an Image PDF?

Post by John - Tracker Supp » Thu Oct 01, 2009 8:28 pm

Hi Jeff,

The problem is not so much adding text as an annotation - which can be done - but the required co-ordinates to simulate the location of the text as in the image. Also if the annotation is hidden - then the text is hidden from the perspective of selection.

So I am afraid at this point - the only option really is to use the XCPRO40 library
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com

jeffp
User
Posts: 859
Joined: Wed Sep 30, 2009 6:53 pm

Re: How to Add OCR Text to an Image PDF?

Post by jeffp » Thu Oct 01, 2009 8:59 pm

The OCR engine gives me the coordinates (since I will have alreay have made a Tiff image of the PDF page and sent it to the OCR engine for processing) and I don't want the user to select the text box in the viewer so that issue is ok.

As such, will RunJavaScript place text objects in the viewer document?

Also, you refer to "annotation". When I do this in the Amyuni component that I am currently using, the PDF object is referred to as a "Text" object. Is that the same as an annotation?

John - Tracker Supp
Site Admin
Posts: 8204
Joined: Tue Jun 29, 2004 10:34 am
Location: Vancouver Island - Canada
Contact:

Re: How to Add OCR Text to an Image PDF?

Post by John - Tracker Supp » Fri Oct 02, 2009 3:34 pm

If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com

Post Reply