Page 1 of 1

Extract OCR, only numeric text

Posted: Thu Nov 29, 2012 7:56 am
by dataco
Hi,

I use the clarion SDK.

It's possible to extract only(or force) numeric value when converting.

I've try to use the whitelist option in OCR_Options.SetOptions(PXO_French, OCR_Auto, '0123456789', '', 'C:\cw60\3rdparty\examples\TrackerSP\PXC_OCR\ocrdats', OCR_Image_Autorotate, 300, 0) but it doesn't work !

Any idea ?

Thanks in advance

Re: Extract OCR, only numeric text

Posted: Thu Nov 29, 2012 6:01 pm
by Tracker - Clarion Support
Hi!

I found two problems:

1. The pDataPath should not have the ocrdats after the last backslash "\":

'C:\cw60\3rdparty\examples\TrackerSP\PXC_OCR\'

2. I found a bug in the SetOptions method. Pleaase download and unzip the attached files into your 3rdparty or accessory \Libsrc folder o9r subfolder over the excisting files.

Re: Extract OCR, only numeric text

Posted: Fri Nov 30, 2012 9:42 am
by dataco
I forgot to say that I use the SDK Trial Clarion.

I do what you said, but when I unzip the files into the Libsrc\Win of the Clarion 8 directory I have compilation errors(See attached image)

Thanks for your answer!

Re: Extract OCR, only numeric text

Posted: Fri Nov 30, 2012 12:01 pm
by Tracker - Clarion Support
Hi!

I'll have to look into why that's happening. I should have an answer later today.

Later:

I think I posted the wrong set of files. Please try the attached instead.

Re: Extract OCR, only numeric text

Posted: Mon Dec 03, 2012 11:20 am
by dataco
HI,

It's compiling now, but the result is the same!

The output PDF file is ok, but when I try to export the PDF file into a text file, the result is a empty file!

If I put the blaklist parameter OCR_Options.SetOptions(PXO_French, OCR_Auto, '', 'ioO', 'C:\cw60\3rdparty\examples\TrackerSP\PXC_OCR\', OCR_Image_Autorotate, 300, 0) all is ok !

You know why ?

Re: Extract OCR, only numeric text

Posted: Mon Dec 03, 2012 11:28 am
by Tracker - Clarion Support
Hi!

Yes i do and there will be a patch out later today after I finish testing it. :D

Later: I ran into some problems. I'll have it out tomorrow for certain.

Re: Extract OCR, only numeric text

Posted: Tue Dec 04, 2012 10:54 pm
by Tracker - Clarion Support
Hi!

Not quite yet. :(

I ran into Access Violations while testing and I'm trying to figure out what's causing that.

It shouldn't take too long.

Re: Extract OCR, only numeric text

Posted: Wed Dec 05, 2012 12:59 pm
by dataco
Hi,

I'm very interesting to this template, so I look forward to the patch!

Re: Extract OCR, only numeric text

Posted: Wed Dec 05, 2012 6:42 pm
by Tracker - Clarion Support
Hi Koen!

Be out tomorrow - I don't have the latest OCR Template Editor build yet. It'll be later today.

Re: Extract OCR, only numeric text

Posted: Thu Dec 06, 2012 5:36 pm
by Tracker - Clarion Support
Hi!

Please try this version of the OCR class files. Just unzip into your 3rdparty or accessory \Libsrc folder.

I found that the class CLW file was not matching the INC file, and had to correct the CLW file.

It is working here. I have tested with Clarion 6 and 8.

If you have problems with access violations, I suggest omitting the DataPath variable which will use an \ocrdats that should be in your application folder, or double check to make SURE you are using the correct parent folder for the \ocrdats folder. And don't forget the trailing backslash "\" on the path name.

Re: Extract OCR, only numeric text

Posted: Mon Dec 10, 2012 3:53 pm
by dataco
Hi,

Thank you for your library, it's work.


But thers is a little problem, when I open the output PDF file and save it as a text file, the result is an empty file !


You know why ?

Re: Extract OCR, only numeric text

Posted: Mon Dec 10, 2012 5:02 pm
by Tracker - Clarion Support
Hi!

Not without more information.

Are you using one of our demos or a program you wrote? If one of ours, which one? Have you changed it in any way?

Please supply a sample PDF file (zipped) that displays this behaviour - thanks.

Re: Extract OCR, only numeric text

Posted: Tue Dec 11, 2012 8:57 am
by dataco
Hi,

I use your ocr1demo.app and I have changed the SetOptions line(OCR_Options.SetOptions(PXO_French, OCR_Auto, '0123456789>+', '', 'C:\Users\Public\Documents\SoftVelocity\Clarion8\accessory\TrackerSP\PXC_OCR\', OCR_Image_Autorotate, 300, 0))

I send you the input file 150.pdf and the output file 150_xs.pdf

Re: Extract OCR, only numeric text

Posted: Tue Dec 11, 2012 10:37 am
by Tracker - Clarion Support
Wait - I thought you were extracting numeric fields from a rasterized PDF page. ocr1demo.app only makes a rasterized PDF page "searchable" by creating an "invsisble" text underlay for it. But for that to work, you should omit whitelist and blacklist parameters.

ocr2demo.app demonstrates field extraction from a rasterized PDF page.