Page 1 of 1

OCR speed and CPU

Posted: Fri Oct 09, 2015 4:16 pm
by bulubuluplopplop
Hello,
I'm using OCR on some long pdf docs.

The OCR is very slow, but use only 25 % of my CPU capacity.
Is it possible to make it use more CPU and be faster ?

thank you

Re: OCR speed and CPU

Posted: Fri Oct 09, 2015 7:52 pm
by Will - Tracker Supp
Hi bulubuluplopplop,

Thanks for the post - can you please advise on how long OCR takes and supply a same document that you're working with?

Thanks,

Re: OCR speed and CPU

Posted: Sun Oct 11, 2015 2:52 pm
by bulubuluplopplop
Hello,
the OCR process takes about 30 minutes for a pdf document of 100 pages.

Re: OCR speed and CPU

Posted: Mon Oct 12, 2015 5:44 pm
by Will - Tracker Supp
Hi bulubuluplopplop,

I've not experienced any issues like this, nor have I heard reports of others experiencing the problem, so I would need to see a specific sample.

Thanks,

Re: OCR speed and CPU

Posted: Tue Mar 22, 2016 12:38 am
by claude vidal
I did a speed test on the scan + OCR of a one page document.

The scanner is a Canon LIDE 220. I used PDFXchange Editor 316.1 and the software bundled with the scanner (Canon quick menu). Both scans were at 300 DPI, auto-detect for color, OCR was using the same language (French) and OCR set to auto after scan.

Canon's program completed the entire task in 10 seconds.

PDFXchange took 39 seconds to scan and another 62 seconds to OCR for a total of 101 seconds. That's 10 times slower. The OCR output quality was a bit better with Canon's program.

I understand PDFXchange's main purpose is not scan & OCR, so I expected a bit slower performance. Is 10 times slower to be expected? Anything I could tune while retaining the same quality output?

Re: OCR speed and CPU

Posted: Tue Mar 22, 2016 11:30 am
by Tracker Supp-Stefan
Hello Claude,

A 10 times difference is certainly significant, but please note that the scanner does everything internally while we need to obtain the information from the scanner (which for high DPI scans over slower USB connection might take longer than the scanner needs internally to process), then we grab the image data and start OCRing it, and optimizing the image (also a slow process) - while the scanner will prepare the PDF internally, and then the already generated file will be written to disk.
May I ask you to include in an archive and attach here the following files:
- a scaned image (as 300 dpi png/jpeg)
- the .pdf your device produces (with the OCR layer)
- the .pdf the Editor produces

Please note that the Viewer is a deprecated product now and no longer developed so I would ask you to download and test with the Editor instead:
https://www.pdf-xchange.com/produc ... nge-editor

Regards,
Stefan

Re: OCR speed and CPU

Posted: Tue Mar 22, 2016 3:50 pm
by claude vidal
Hi Stefan,

Thanks for looking into this.

Given that my initial speed test was using a document with sensitive personal information, I ran another test:
- 300 dpi
- OCR language English this time
- Color
- Scanned image contains more images, less text

For this test, the speed ratio was 6.6 instead of 10 for the previous test.

I attached the requested files, including a summary of my system specs. Please note that, although the port is USB 3, the Canon scanner is USB 2. Also, I left the "Image to PDF" options to their PDFXchange default.

P.S. Your posts mentions the Viewer as being deprecated: as indicated in my first post, I use PDFXchange Editor 316.1 (Pro)

Re: OCR speed and CPU

Posted: Tue Mar 22, 2016 5:04 pm
by Tracker Supp-Stefan
Hi claude vidal,

Apologies for missing the part where you mention you use the Editor. But the topic itself is in the Viewer section, so that's what provoked my comment in the above post.

OCRing the page in the Editor using File -> New Document -> From Images, and leaving all options but the OCR to defaults produced the attached file in 12 seconds.
I notice that the file you've provided that was created by our Editor actually has two images in it. Maybe the processes the scanner uses when doing the PDF internally and when sending image data to external products are different and this causes the significant increase in processing time.

Can you try at your end and compare the speed of the scanner itself with the speed of the Editor generating the PDFs internally via File -> New Document -> From Images and tell us how it fares that way?

Regards,
Stefan

Re: OCR speed and CPU

Posted: Wed Mar 23, 2016 8:00 pm
by claude vidal
The object of my previous tests was created like this:
1- Print a page from an existing PDF document
2- Scan & OCR that page image

I did the following test. I loaded the original PDF document into the Editor and asked it to OCR that same page. I thought bypassing the analog part of scanning a printed page would help. Unfortunately, not by much: it took 50 seconds for that single page with well defined characters. But then you did it in 12 seconds, go figure.

I'm still puzzled how a low cost scanner can scan, OCR and transfer the same page in 10 seconds: it doesn't have the CPU cycles nor the memory to work with as my PC.

Bottom line: I'll stick with the scanner for OCR. This does not take away the great features of PDFXchange; as I said initially, I don't see scanning and OCR as the main focus for PDFXchange in handling PDF.

I may try again with 317. I know you guys had issues, so is availability on for tonight?

Re: OCR speed and CPU

Posted: Thu May 12, 2016 11:42 am
by Tracker Supp-Stefan
Hello claude vidal,

Indeed it is rather unusual that you get such high OCR times, as my CPU is 3-4 years old now, so it's unlikely it is 3-4 times faster than yours. Indeed we released a new build since the last time we wrote in this forum topic, so please do update to build 317.1, and let us know if this gives you any different result.

Regards,
Stefan

Re: OCR speed and CPU

Posted: Sun Aug 27, 2017 12:52 pm
by DIV
Here I have tested out the OCR capabilities on a colour 300dpi scan of German text that includes both roman fonts and fraktur (blackletter) fonts.

I compared Adobe Acrobat 7.0 OCR performance with three different accuracy settings in PDF-XChange Editor 6.0.

In summary, Acrobat is always much faster, but Editor is more accurate if either "Medium" or "High" accuracy is chosen:
  • Acrobat. GERMAN/EXACT/600DPI: 7 seconds, very poor accuracy (Note: this has no specific fraktur recognition capability.)
  • Editor, LOW ACCURACY: 60 seconds, poor accuracy
  • Editor, MEDIUM ACCURACY: a 24 seconds, good accuracy
  • Editor, HIGH ACCURACY: 30 seconds, good accuracy
As shown, actual accuracy of the results is practically equivalent for the first two and the last two.

Please note that times are for just one single page.
I consider 20–30 seconds to be rather slow for just one page. However, the increased accuracy of the results makes it worthwhile.
60 seconds for a single page is completely impractical, especially when the results are poor.

Due to copyright issues I am not going to post the entire document, but attached hereto are an overview of the page analysed, an enlarged view of the sample text, and various OCR results.

—DIV

N.B. As suggested also elsewhere, the newer versions of Adobe Acrobat can be expected to be much better than the old version (7.0) tested here!

Re: OCR speed and CPU

Posted: Mon Aug 28, 2017 7:16 am
by Will - Tracker Supp
Hi DIV,

As per my post in the other topic, this shold be addressed in the new OCR.

Thanks,