Possible to detect if pdf is searchable already?
Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan
Possible to detect if pdf is searchable already?
I'm working on a program in VB.net that will automatically make the PDF searchable if it isn't already and was wondering is it possible for the program to detect if a pdf has been OCR'ed or hasn't with this SDK.
- Tracker Supp-Stefan
- Site Admin
- Posts: 17948
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
- Contact:
Re: Possible to detect if pdf is searchable already?
Hello elee,
I don't think the OCR SDK will be the right one for the purpose, but you can use e.g. our Tools SDK - and check if there is any text you can extract from a given page - and if there is - this means that there is some text content on it - this still can not guarantee that the whole page has been OCRed or created from a file format that allowed the PDF Creation tool to preserve the text as such.
Best,
Stefan
I don't think the OCR SDK will be the right one for the purpose, but you can use e.g. our Tools SDK - and check if there is any text you can extract from a given page - and if there is - this means that there is some text content on it - this still can not guarantee that the whole page has been OCRed or created from a file format that allowed the PDF Creation tool to preserve the text as such.
Best,
Stefan
- John - Tracker Supp
- Site Admin
- Posts: 5219
- Joined: Tue Jun 29, 2004 10:34 am
- Location: United Kingdom
- Contact:
Re: Possible to detect if pdf is searchable already?
Perhaps Stefan should have added;
the OCR SDK will be good for the purpose in the sense that you can use the OCR as required to create your Text searchable document - checking to see if there is text already in the the document is probably going to be the best 'general' solution without getting very complex and doing some very heavy coding ...
and - as Stefan states you can do this using the XCPRO40.dll library that comes with the PDF-XChange PRO SDK (which is the SDK of ours required to allow you to use the OCR functionality) - the PDF-Tools SDK has the required function to allow you to check for text - but does not allow use of our OCR SDK.
Hope that helps.
the OCR SDK will be good for the purpose in the sense that you can use the OCR as required to create your Text searchable document - checking to see if there is text already in the the document is probably going to be the best 'general' solution without getting very complex and doing some very heavy coding ...
and - as Stefan states you can do this using the XCPRO40.dll library that comes with the PDF-XChange PRO SDK (which is the SDK of ours required to allow you to use the OCR functionality) - the PDF-Tools SDK has the required function to allow you to check for text - but does not allow use of our OCR SDK.
Hope that helps.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.
Best regards
Tracker Support
http://www.tracker-software.com
Best regards
Tracker Support
http://www.tracker-software.com