Page 1 of 1

OCR doesn't do anything

Posted: Thu Sep 15, 2016 3:29 pm
by vsrawat
I today downloaded and installed and ran the Editor.

I gave it a 78 pages pdf file (only English) and proceeded to OCR all pages.

It took hours and ended without doing anything or creating and ocr-output file.

I again did this, this time for a single page, and it did nothing.

It is not even asking me where to save the ocr-ed file. I don't know where it is saving that if it at all created one.

The input pdf file had only English text (not images), it should have just read the ascii letters,
I don't know why it went ahead to actually ocr those pages.

Thanks.

Re: OCR doesn't do anything, just wasted time

Posted: Thu Sep 15, 2016 3:33 pm
by Tracker Supp-Stefan
Hello vsrawat,

Welcome to our forums.
The default action when you OCR a PDF file with our tool would be to add an invisible layer of text over the existing image, in the existing file. So that is why no new file is created, and why it seems as if nothing happened. Please try using the "text select" tool now - and you should be able to select your text. You can also use the search tools, and they should now find text in your document.

Regards,
Stefan

Re: OCR doesn't do anything, just wasted time

Posted: Thu Sep 15, 2016 3:38 pm
by vsrawat
The input pdf is already a fully extract-able text file.
I could select entire text without doing ocr, so I couldn't know whether any difference has come.

I did ocr because some text like " ' etc., were coming as junk in normally picked text, so I thought ocr would be able to recognise them correctly.

I would say this method is very complicated, and the software doesn't gives any message anywhere
about this invisible layer creation, and how to proceed with that.

It would have been much simpler to do and easier to understand and handle, if it had just created a txt or docx file on the disk having ocr-ed text.

Thanks.

Re: OCR doesn't do anything

Posted: Thu Sep 15, 2016 3:47 pm
by vsrawat
Also, it should at least do cleaning up of text,

like

- merging different lines of a single paragragh to a single line, by removing extra cr-lf that comes in pdf.
- putting header and footer only on first page, or wherever it has changed, and removing it from all other pages.

I think adding the ocr-ed text in a new layer is cryptic and users would not like all that, and rather want it like I am needing.

Thanks.

Re: OCR doesn't do anything

Posted: Thu Sep 15, 2016 3:50 pm
by vsrawat
I opened a image pdf file in Editor and then in Viewer,
but "ocr pages" menu option is coming as dimmed (not active) in both.

So, it doesn't ocr images it seems, it only ocrs when the file is already text. What is the purpose then?

The said pdf file having image is attached.

Thanks.

Re: OCR doesn't do anything

Posted: Thu Sep 15, 2016 6:18 pm
by Willy Van Nuffel
Hello,

OCR (Optical Character Recognition) is a feature that can convert a scanned page (photo or image of a page) into a page with real text layer, so that you select text and it also becomes 'searchable'. If you open a PDF with scanned pages that are not yet OCR'ed, you can not select any text in it. When you zoom in onto the pages, you will probably see that they have been scanned. The text that you see will be of low(er) quality.

The example PDF that you sent, has been 'secured' against modifications, copying, ... by a password.
You can verify this in PDF-XChange Editor, when the PDF is open, via File > Document Properties > Security.
So, by consequence, it is not even possible to use OCR in it.
On the other hand, OCR that PDF has no sense, because it goes about real text. The origin of the content is not coming from a scanner.

An other thing that may be of interest to you, is that starting from the actual version 6.0 - Build 318.0 of PDF-XChange Editor, you can convert a PDF to a Word document (via File > Save As), on the condition that the PDF is not secured.

Best regards.

Re: OCR doesn't do anything

Posted: Thu Sep 15, 2016 6:31 pm
by Patrick-Tracker Supp
:)

Re: OCR doesn't do anything

Posted: Sat Sep 17, 2016 2:45 am
by vsrawat
I had joined here yesterday and posted in this sub forum.

Now I see there is a specific sub forum for OCR plugin.

Admin is requested to please move this thread to that appropriate sub forum.

Thanks.

Re: OCR doesn't do anything

Posted: Sat Sep 17, 2016 4:53 pm
by Will - Tracker Supp
Hi vsrawat,

The post has now been moved.

Cheers,