PDF-XChange - Tracker PDF Viewer - TIFF-XChange - Image-XChange - XMF-XChange - Raster-XChange - Support

Moderators: Tracker Support, TrackerSupp-Daniel, Paul - Tracker Supp, Chris - Tracker Supp, Vasyl-Tracker Dev Team, Ivan - Tracker Software, Sean - Tracker, Tracker Supp-Stefan

 
kejos
User
Topic Author
Posts: 5
Joined: Tue Feb 28, 2012 2:57 pm

OCR - multilanguage use

Fri Mar 02, 2012 10:48 am

Hi,
is there any possibility to recognize two or more language text in the same time?

For example, if on one page of document there are Thai language and it's translation in French. Could I convert these texts in one time?

Greets
kejos
 
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 12027
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: OCR - multilanguage use

Fri Mar 02, 2012 11:04 am

Hello Kejos,

I am afraid that this is not currently possible.
If you have the Thai and the French translations on separate pages - then you can easily OCR only the needed pages in a specific language, but it's not possible to tell our OCR to work only with e.g. half a page.

A possbile workaround solution is to duplicate this page - and then cover one of the languages with a white rectangle on the first copy of the page, and the other language on the second copy - then OCR them separately with the appropriate language selected in the OCR tool.

Best,
Stefan
 
kejos
User
Topic Author
Posts: 5
Joined: Tue Feb 28, 2012 2:57 pm

Re: OCR - multilanguage use

Fri Mar 02, 2012 12:15 pm

Thanks Stefan!
kejos
 
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 12027
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: OCR - multilanguage use

Fri Mar 02, 2012 12:23 pm

:)
 
Ludwig
User
Posts: 16
Joined: Sun Feb 24, 2013 1:52 pm

Re: OCR - multilanguage use

Thu Sep 19, 2013 8:51 am

Hi there,
I just want to support Kejos' concern. It would be great if more than one language could be recognised at one scanning. I would prefer the option of ocr-ing a document in several languages simultaneously rather than telling the programm which parts should be ocr-ed in which language. By this I mean I would like to have the option of enabling two or three languages before ocr-ing so every single word can be checked in these languages and at the end a text layer is added in the language this certain word is most likely to be part of. Of course the ocr-ing itself will take twice or three times as long as usual. Very often I have documents with two languages on one page.
Please see the sample file with (Ancient)Greek and German: I would like "καὶ ἄρχοντα" to be recognised as "και αρχοντα" and not as "kai apxovta" (as the correct transcription would be "kai archonta") whereas the German parts should recognised as such.

Very often I also have scans of bilingual books where two pages are one one (landscape) page then. This means on the left side is French for example and on the right English. In this case my suggested option of scanning simultaneously would be more helpful.

Best regards,
Ludwig

Sample.pdf
(1.3 MiB) Downloaded 122 times
 
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 12027
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: OCR - multilanguage use

Thu Sep 19, 2013 12:17 pm

Hello Ludwig,

Automatically recognizing different languages especially if they are using similar or worse - the same alphabet could be quite tricky, and I can't make any promises that it will be available. As mentioned before (in another topic I believe) - we are considering an option to allow you to specify zones to be OCRed and selecting a specific (but only one) language for that zone.

Regards,
Stefan

Who is online

Users browsing this forum: No registered users and 1 guest