Page 1 of 1

OCR - multilanguage use

Posted: Fri Mar 02, 2012 10:48 am
by kejos
Hi,
is there any possibility to recognize two or more language text in the same time?

For example, if on one page of document there are Thai language and it's translation in French. Could I convert these texts in one time?

Greets
kejos

Re: OCR - multilanguage use

Posted: Fri Mar 02, 2012 11:04 am
by Tracker Supp-Stefan
Hello Kejos,

I am afraid that this is not currently possible.
If you have the Thai and the French translations on separate pages - then you can easily OCR only the needed pages in a specific language, but it's not possible to tell our OCR to work only with e.g. half a page.

A possbile workaround solution is to duplicate this page - and then cover one of the languages with a white rectangle on the first copy of the page, and the other language on the second copy - then OCR them separately with the appropriate language selected in the OCR tool.

Best,
Stefan

Re: OCR - multilanguage use

Posted: Fri Mar 02, 2012 12:15 pm
by kejos
Thanks Stefan!
kejos

Re: OCR - multilanguage use

Posted: Fri Mar 02, 2012 12:23 pm
by Tracker Supp-Stefan
:)

Re: OCR - multilanguage use

Posted: Thu Sep 19, 2013 8:51 am
by Ludwig
Hi there,
I just want to support Kejos' concern. It would be great if more than one language could be recognised at one scanning. I would prefer the option of ocr-ing a document in several languages simultaneously rather than telling the programm which parts should be ocr-ed in which language. By this I mean I would like to have the option of enabling two or three languages before ocr-ing so every single word can be checked in these languages and at the end a text layer is added in the language this certain word is most likely to be part of. Of course the ocr-ing itself will take twice or three times as long as usual. Very often I have documents with two languages on one page.
Please see the sample file with (Ancient)Greek and German: I would like "καὶ ἄρχοντα" to be recognised as "και αρχοντα" and not as "kai apxovta" (as the correct transcription would be "kai archonta") whereas the German parts should recognised as such.

Very often I also have scans of bilingual books where two pages are one one (landscape) page then. This means on the left side is French for example and on the right English. In this case my suggested option of scanning simultaneously would be more helpful.

Best regards,
Ludwig
Sample.pdf
(1.3 MiB) Downloaded 297 times

Re: OCR - multilanguage use

Posted: Thu Sep 19, 2013 12:17 pm
by Tracker Supp-Stefan
Hello Ludwig,

Automatically recognizing different languages especially if they are using similar or worse - the same alphabet could be quite tricky, and I can't make any promises that it will be available. As mentioned before (in another topic I believe) - we are considering an option to allow you to specify zones to be OCRed and selecting a specific (but only one) language for that zone.

Regards,
Stefan