Extracting text

This Forum is for the use of Clarion For Windows - Software Developers requiring help and assistance for Tracker Software's PDF-XChange Printer Drivers SDK (only) - Please use the PDF-Tools SDK Forum for Library DLL assistance.

Moderators: Tracker Support, TrackerSupp-Daniel, Chris - Tracker Supp, Vasyl-Tracker Dev Team, Sean - Tracker, Tracker - Clarion Support, John - Tracker Supp, Tracker Supp-Stefan, Ivan - Tracker Software, Support Staff, moderators

Post Reply
bramkip
User
Posts: 19
Joined: Fri Nov 12, 2004 12:28 pm

Extracting text

Post by bramkip » Sat Jun 03, 2006 3:26 pm

Hi,

I'm using the PDF2Text procedure to extract text from a scanned pdf file, but no text is showed. Do I have to scan with particular settings?

Thanks.

Bram Kip

Tracker - Clarion Support
Site Admin
Posts: 1412
Joined: Wed Jun 30, 2004 4:45 pm
Location: Maryland, USA
Contact:

Post by Tracker - Clarion Support » Sat Jun 03, 2006 4:52 pm

Hi Bram!

Text extraction only works with text-based PDF files, such as th eones produed by the PDF-Tools Report templates.

Those built from images (scans) cannot be extracted as there is no text to extract as such.

You'd need to extract the page as an image, and then feed the image to Optical Character Recognition software. Tracker SP doesn't offer such a product.
Craig Ransom
Tracker Software - Clarion Support
http://www.tracker-software.com

bramkip
User
Posts: 19
Joined: Fri Nov 12, 2004 12:28 pm

Post by bramkip » Sun Jun 04, 2006 8:12 am

Hi Craig,

Ok, that's clear to me now. Are there plans to build Optical Character Recognition in the PDF-Tools? This would really a nice feature.
Or can you recommend a 3rd party that has a OCR tool that works with Clarion?

Thanks

Best regards,
Bram

John - Tracker Supp
Site Admin
Posts: 8201
Joined: Tue Jun 29, 2004 10:34 am
Location: Vancouver Island - Canada
Contact:

Post by John - Tracker Supp » Sun Jun 04, 2006 8:33 am

Hi Bram,

We have 2 developer's who have been working on OCR pretty solidly now for around 2 years - but it is no small task and it will be at least 9 months or so before I expect this to show any releasable results - possibly longer.

It will not be a native part of PDF-Tools but an optional add-on - as you can imagine the investment is not small.

In the meantime - there are some recent posts on the Clarion NG about an OCR SDK one Clarion dev has had some success using - I think in the 3rd party NG.

HTH
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com

John - Tracker Supp
Site Admin
Posts: 8201
Joined: Tue Jun 29, 2004 10:34 am
Location: Vancouver Island - Canada
Contact:

Post by John - Tracker Supp » Sun Jun 04, 2006 2:32 pm

Hi Bram,

John Dunn has written a Clarion Class that interfaces to this product :

http://www.simpleocr.com/

See the comp.lang.clarion NG on the SV news server for more info.

HTH
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com

Post Reply

Return to “PDF-XChange Drivers (only) API SDK - For Clarion For Windows Developers only Please”