Possible to detect if pdf is searchable already?

PDF-X OCR SDK is a New product from us and intended to compliment our existing PDF and Imaging Tools to provide the Developer with an expanding set of professional tools for Optical Character Recognition tasks

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan

Post Reply
elee
User
Posts: 1
Joined: Fri Aug 10, 2012 12:31 am

Possible to detect if pdf is searchable already?

Post by elee »

I'm working on a program in VB.net that will automatically make the PDF searchable if it isn't already and was wondering is it possible for the program to detect if a pdf has been OCR'ed or hasn't with this SDK.
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17948
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Possible to detect if pdf is searchable already?

Post by Tracker Supp-Stefan »

Hello elee,

I don't think the OCR SDK will be the right one for the purpose, but you can use e.g. our Tools SDK - and check if there is any text you can extract from a given page - and if there is - this means that there is some text content on it - this still can not guarantee that the whole page has been OCRed or created from a file format that allowed the PDF Creation tool to preserve the text as such.

Best,
Stefan
User avatar
John - Tracker Supp
Site Admin
Posts: 5219
Joined: Tue Jun 29, 2004 10:34 am
Location: United Kingdom
Contact:

Re: Possible to detect if pdf is searchable already?

Post by John - Tracker Supp »

Perhaps Stefan should have added;

the OCR SDK will be good for the purpose in the sense that you can use the OCR as required to create your Text searchable document - checking to see if there is text already in the the document is probably going to be the best 'general' solution without getting very complex and doing some very heavy coding ...

and - as Stefan states you can do this using the XCPRO40.dll library that comes with the PDF-XChange PRO SDK (which is the SDK of ours required to allow you to use the OCR functionality) - the PDF-Tools SDK has the required function to allow you to check for text - but does not allow use of our OCR SDK.

Hope that helps.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
Post Reply