Extract OCR, only numeric text

PDF-X OCR SDK is a New product from us and intended to compliment our existing PDF and Imaging Tools to provide the Developer with an expanding set of professional tools for Optical Character Recognition tasks

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Sean - Tracker, Chris - Tracker Supp, Tracker Supp-Stefan

Post Reply
dataco
User
Posts: 8
Joined: Mon Nov 26, 2012 3:11 pm

Extract OCR, only numeric text

Post by dataco » Thu Nov 29, 2012 7:56 am

Hi,

I use the clarion SDK.

It's possible to extract only(or force) numeric value when converting.

I've try to use the whitelist option in OCR_Options.SetOptions(PXO_French, OCR_Auto, '0123456789', '', 'C:\cw60\3rdparty\examples\TrackerSP\PXC_OCR\ocrdats', OCR_Image_Autorotate, 300, 0) but it doesn't work !

Any idea ?

Thanks in advance

Tracker - Clarion Support
Site Admin
Posts: 1412
Joined: Wed Jun 30, 2004 4:45 pm
Location: Maryland, USA
Contact:

Re: Extract OCR, only numeric text

Post by Tracker - Clarion Support » Thu Nov 29, 2012 6:01 pm

Hi!

I found two problems:

1. The pDataPath should not have the ocrdats after the last backslash "\":

'C:\cw60\3rdparty\examples\TrackerSP\PXC_OCR\'

2. I found a bug in the SetOptions method. Pleaase download and unzip the attached files into your 3rdparty or accessory \Libsrc folder o9r subfolder over the excisting files.
Attachments
OCR_Libsrc.zip
Modified OCR class files
(10.76 KiB) Downloaded 151 times
Craig Ransom
Tracker Software - Clarion Support
http://www.tracker-software.com

dataco
User
Posts: 8
Joined: Mon Nov 26, 2012 3:11 pm

Re: Extract OCR, only numeric text

Post by dataco » Fri Nov 30, 2012 9:42 am

I forgot to say that I use the SDK Trial Clarion.

I do what you said, but when I unzip the files into the Libsrc\Win of the Clarion 8 directory I have compilation errors(See attached image)

Thanks for your answer!
Attachments
Errors.zip
Compilation errors
(294.04 KiB) Downloaded 135 times

Tracker - Clarion Support
Site Admin
Posts: 1412
Joined: Wed Jun 30, 2004 4:45 pm
Location: Maryland, USA
Contact:

Re: Extract OCR, only numeric text

Post by Tracker - Clarion Support » Fri Nov 30, 2012 12:01 pm

Hi!

I'll have to look into why that's happening. I should have an answer later today.

Later:

I think I posted the wrong set of files. Please try the attached instead.
Craig Ransom
Tracker Software - Clarion Support
http://www.tracker-software.com

dataco
User
Posts: 8
Joined: Mon Nov 26, 2012 3:11 pm

Re: Extract OCR, only numeric text

Post by dataco » Mon Dec 03, 2012 11:20 am

HI,

It's compiling now, but the result is the same!

The output PDF file is ok, but when I try to export the PDF file into a text file, the result is a empty file!

If I put the blaklist parameter OCR_Options.SetOptions(PXO_French, OCR_Auto, '', 'ioO', 'C:\cw60\3rdparty\examples\TrackerSP\PXC_OCR\', OCR_Image_Autorotate, 300, 0) all is ok !

You know why ?

Tracker - Clarion Support
Site Admin
Posts: 1412
Joined: Wed Jun 30, 2004 4:45 pm
Location: Maryland, USA
Contact:

Re: Extract OCR, only numeric text

Post by Tracker - Clarion Support » Mon Dec 03, 2012 11:28 am

Hi!

Yes i do and there will be a patch out later today after I finish testing it. :D

Later: I ran into some problems. I'll have it out tomorrow for certain.
Craig Ransom
Tracker Software - Clarion Support
http://www.tracker-software.com

Tracker - Clarion Support
Site Admin
Posts: 1412
Joined: Wed Jun 30, 2004 4:45 pm
Location: Maryland, USA
Contact:

Re: Extract OCR, only numeric text

Post by Tracker - Clarion Support » Tue Dec 04, 2012 10:54 pm

Hi!

Not quite yet. :(

I ran into Access Violations while testing and I'm trying to figure out what's causing that.

It shouldn't take too long.
Craig Ransom
Tracker Software - Clarion Support
http://www.tracker-software.com

dataco
User
Posts: 8
Joined: Mon Nov 26, 2012 3:11 pm

Re: Extract OCR, only numeric text

Post by dataco » Wed Dec 05, 2012 12:59 pm

Hi,

I'm very interesting to this template, so I look forward to the patch!

Tracker - Clarion Support
Site Admin
Posts: 1412
Joined: Wed Jun 30, 2004 4:45 pm
Location: Maryland, USA
Contact:

Re: Extract OCR, only numeric text

Post by Tracker - Clarion Support » Wed Dec 05, 2012 6:42 pm

Hi Koen!

Be out tomorrow - I don't have the latest OCR Template Editor build yet. It'll be later today.
Craig Ransom
Tracker Software - Clarion Support
http://www.tracker-software.com

Tracker - Clarion Support
Site Admin
Posts: 1412
Joined: Wed Jun 30, 2004 4:45 pm
Location: Maryland, USA
Contact:

Re: Extract OCR, only numeric text

Post by Tracker - Clarion Support » Thu Dec 06, 2012 5:36 pm

Hi!

Please try this version of the OCR class files. Just unzip into your 3rdparty or accessory \Libsrc folder.

I found that the class CLW file was not matching the INC file, and had to correct the CLW file.

It is working here. I have tested with Clarion 6 and 8.

If you have problems with access violations, I suggest omitting the DataPath variable which will use an \ocrdats that should be in your application folder, or double check to make SURE you are using the correct parent folder for the \ocrdats folder. And don't forget the trailing backslash "\" on the path name.
Attachments
OCR_Libsrc.zip
OCR class files
(10.87 KiB) Downloaded 133 times
Craig Ransom
Tracker Software - Clarion Support
http://www.tracker-software.com

dataco
User
Posts: 8
Joined: Mon Nov 26, 2012 3:11 pm

Re: Extract OCR, only numeric text

Post by dataco » Mon Dec 10, 2012 3:53 pm

Hi,

Thank you for your library, it's work.


But thers is a little problem, when I open the output PDF file and save it as a text file, the result is an empty file !


You know why ?

Tracker - Clarion Support
Site Admin
Posts: 1412
Joined: Wed Jun 30, 2004 4:45 pm
Location: Maryland, USA
Contact:

Re: Extract OCR, only numeric text

Post by Tracker - Clarion Support » Mon Dec 10, 2012 5:02 pm

Hi!

Not without more information.

Are you using one of our demos or a program you wrote? If one of ours, which one? Have you changed it in any way?

Please supply a sample PDF file (zipped) that displays this behaviour - thanks.
Craig Ransom
Tracker Software - Clarion Support
http://www.tracker-software.com

dataco
User
Posts: 8
Joined: Mon Nov 26, 2012 3:11 pm

Re: Extract OCR, only numeric text

Post by dataco » Tue Dec 11, 2012 8:57 am

Hi,

I use your ocr1demo.app and I have changed the SetOptions line(OCR_Options.SetOptions(PXO_French, OCR_Auto, '0123456789>+', '', 'C:\Users\Public\Documents\SoftVelocity\Clarion8\accessory\TrackerSP\PXC_OCR\', OCR_Image_Autorotate, 300, 0))

I send you the input file 150.pdf and the output file 150_xs.pdf
Attachments
150_0.pdf
input
(112.42 KiB) Downloaded 136 times
150_xs.pdf
output
(770 Bytes) Downloaded 139 times

Tracker - Clarion Support
Site Admin
Posts: 1412
Joined: Wed Jun 30, 2004 4:45 pm
Location: Maryland, USA
Contact:

Re: Extract OCR, only numeric text

Post by Tracker - Clarion Support » Tue Dec 11, 2012 10:37 am

Wait - I thought you were extracting numeric fields from a rasterized PDF page. ocr1demo.app only makes a rasterized PDF page "searchable" by creating an "invsisble" text underlay for it. But for that to work, you should omit whitelist and blacklist parameters.

ocr2demo.app demonstrates field extraction from a rasterized PDF page.
Craig Ransom
Tracker Software - Clarion Support
http://www.tracker-software.com

Post Reply