Page 1 of 1

Extracting text

Posted: Sat Jun 03, 2006 3:26 pm
by bramkip
Hi,

I'm using the PDF2Text procedure to extract text from a scanned pdf file, but no text is showed. Do I have to scan with particular settings?

Thanks.

Bram Kip

Posted: Sat Jun 03, 2006 4:52 pm
by Tracker - Clarion Support
Hi Bram!

Text extraction only works with text-based PDF files, such as th eones produed by the PDF-Tools Report templates.

Those built from images (scans) cannot be extracted as there is no text to extract as such.

You'd need to extract the page as an image, and then feed the image to Optical Character Recognition software. Tracker SP doesn't offer such a product.

Posted: Sun Jun 04, 2006 8:12 am
by bramkip
Hi Craig,

Ok, that's clear to me now. Are there plans to build Optical Character Recognition in the PDF-Tools? This would really a nice feature.
Or can you recommend a 3rd party that has a OCR tool that works with Clarion?

Thanks

Best regards,
Bram

Posted: Sun Jun 04, 2006 8:33 am
by John - Tracker Supp
Hi Bram,

We have 2 developer's who have been working on OCR pretty solidly now for around 2 years - but it is no small task and it will be at least 9 months or so before I expect this to show any releasable results - possibly longer.

It will not be a native part of PDF-Tools but an optional add-on - as you can imagine the investment is not small.

In the meantime - there are some recent posts on the Clarion NG about an OCR SDK one Clarion dev has had some success using - I think in the 3rd party NG.

HTH

Posted: Sun Jun 04, 2006 2:32 pm
by John - Tracker Supp
Hi Bram,

John Dunn has written a Clarion Class that interfaces to this product :

http://www.simpleocr.com/

See the comp.lang.clarion NG on the SV news server for more info.

HTH