Low performance of the OCR_MakeSearchable method

PDF-X OCR SDK is a New product from us and intended to compliment our existing PDF and Imaging Tools to provide the Developer with an expanding set of professional tools for Optical Character Recognition tasks

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Sean - Tracker, Chris - Tracker Supp, Tracker Supp-Stefan

Post Reply
igor_p
User
Posts: 24
Joined: Wed Oct 02, 2013 7:18 am

Low performance of the OCR_MakeSearchable method

Post by igor_p » Sat Nov 23, 2013 10:50 am

Hello,

We are using your OCR component in our ASP.NET application. Everything works for us correctly, however we are wondering about low performance of the OCR_MakeSearchable method. We compared your product to the Quick Scan Pro solution and the QSP was definitely faster.
OCRing the file below (it's converted to PDF before OCRing) using the OCR_MakeSearchable() method takes about 3 minutes. It's pretty long. For comparison, QSP has processed the same document in about 20 seconds.


Is there any way to make this method faster? Are you planing improve performance in next release?

Our PXO_Options are:

Code: Select all

				OCR.PXO_Options options = new OCR.PXO_Options();
				options.blacklist = String.Empty;
				options.whitelist = String.Empty;
				options.DataPath = OcrUtility.GetLanguagesDirectory();
				options.ImageFlags = (uint)OCR.OCR_ImageProcessingFlags.OCR_Image_SuppressOutput;
				options.lang = OCR.PXO_Language.PXO_English;
				options.raster_dpi = 300;
				options.RegionMode = OCR.OCR_RegionMode.OCR_Auto;
				options.reserved = 0;
Out test machine has got 2 cores and 4gb physical memory. We use 1.0.14.1 version of the ocrtools.dll.

PS. When are you going to release a new version of the ocrtools? We are looking forward a two new abilities. First is the full orientation detection while OCRing. Second is the new functionality which places only text layer to the original PDF file. Now, we are dealing with it by using the OCR_Image_SuppresOutput setting and PlaceContents() method from the xcpro40.dll. Unfortunately, it prevents us from using the rotation mode.

Thanks in advance and best regards,
Igor
Last edited by igor_p on Tue Dec 03, 2013 8:11 am, edited 1 time in total.

User avatar
Tracker Supp-Stefan
Site Admin
Posts: 13651
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Low performance of the OCR_MakeSearchable method

Post by Tracker Supp-Stefan » Mon Nov 25, 2013 8:59 am

Hi Igor,

Thanks for the post. I will pass it to our OCR SDK experts and we will post back here a bit later with further advise!

Regards,
Stefan

Walter-Tracker Supp
User
Posts: 383
Joined: Mon Jun 13, 2011 5:10 pm

Re: Low performance of the OCR_MakeSearchable method

Post by Walter-Tracker Supp » Mon Nov 25, 2013 7:01 pm

I've looked at your document, and while I don't see nearly the poor performance you do, I do note that it takes longer than typical files. You will notice that pages 8 and 9 are the culprits, and this is because the layout of those pages are difficult for our engine to process, due to the complexity. This is a bit of an edge case for our engine (other OCR engines likewise have their own edge cases). Our benchmarks have shown comparable performance to other offerings, across a broad spectrum of documents, however this particular case happens to be troublesome.

Your other requests are on our feature wishlist and we will roll them out as soon as reasonably practical.

Post Reply