Page 1 of 1

Latin and Gothic Letters for OCR

Posted: Wed Oct 16, 2013 4:36 pm
by Ludwig
Hi there,

I would like to ask if you could please add Latin to the list of additional OCR-languages. But for me even more important would be the possibility to ocr books with Gothic/Black letters. Especially I am looking for the so called "Unger-Fraktur" as many German books from the 19th century have been printed using these letters. Do you think this is possible?

Thanks a lot
Ludwig

Re: Latin and Gothic Letters for OCR

Posted: Wed Oct 16, 2013 4:53 pm
by Walter-Tracker Supp
We will add Slovakian, Swedish, and German "fraktur" language data in the final release of the editor. We will not have direct Latin support, though results using English (or even other Latin alphabet) language selection will be fairly good since the word dictionary weighting is fairly weak (ie, it will not dominate results too seriously).

-Walter

Re: Latin and Gothic Letters for OCR

Posted: Wed Oct 16, 2013 8:24 pm
by Ludwig
I am very much looking forward to the final release then! Do you have a rough time horizon?

Re: Latin and Gothic Letters for OCR

Posted: Wed Oct 16, 2013 8:46 pm
by Paul - Tracker Supp
Hi Ludwig,

Walter tells me this should be available in the next few weeks.

hth

Re: Latin and Gothic Letters for OCR

Posted: Sat Oct 19, 2013 12:09 am
by Walter-Tracker Supp
Ludwig, I have prepared the Fraktur language pack and sent it to our installation guys. It may be a few days before it becomes available on the website but I thought I would update you to let you know that it will be very soon. It will work with both the viewer and the editor.

-Walter

Re: Latin and Gothic Letters for OCR

Posted: Sat Oct 19, 2013 9:05 am
by Ludwig
Thanks Walter! This is really good news.

Re: Latin and Gothic Letters for OCR

Posted: Mon Oct 21, 2013 10:28 am
by Tracker Supp-Stefan
:)

Re: Latin and Gothic Letters for OCR

Posted: Tue Oct 22, 2013 6:19 pm
by Walter-Tracker Supp
Ludwig, I have attached the language pack to this post, because I guess it will still be a few days since our installer people are very busy with the new editor release. You will have to place them in your language directory yourself, and we cannot provide support for this since we will have a proper installer generated pretty shortly. Languages for the *viewer* are placed in a directory called "ocrdats" off the main Viewer installation directory, e.g.:

C:\Program Files\Tracker Software\PDF Viewer\ocrdats

In the editor, you will have to find PluginsData\OCRLanguages, e.g.:

C:\Program Files\Tracker Software\PDF Editor\PluginsData\OCRLanguages

Copy all the .lng and .dat files into those directories and you should see the Fraktur choices in your OCR preferences / run dialog.

-Walter

Re: Latin and Gothic Letters for OCR

Posted: Wed Oct 23, 2013 11:17 am
by Ludwig
Hi Walter,
Thank you very much for the files. Using the Viewer Pro (not the Editor) I tried the new German Fraktur (don't really know what Swedish and Slovakian Fraktur is though, so I didn't try those) on three books so far: Very promissing! Great job!
Ludwig

Re: Latin and Gothic Letters for OCR

Posted: Wed Oct 23, 2013 4:35 pm
by Will - Tracker Supp
Great! I'll pass the message along to Walter :D

Re: Latin and Gothic Letters for OCR

Posted: Thu Oct 24, 2013 11:05 am
by Ludwig
Hi, is there a way to train the OCR programm for better Fraktur-letter-detection? I found out that the programm systematically misreads "ch" what is turned into just "c" then. For example "Bezeicnung" instead of "Bezeichnung".

Re: Latin and Gothic Letters for OCR

Posted: Thu Oct 24, 2013 5:01 pm
by Walter-Tracker Supp
Not at the moment. We may release a tool to help with training in the future. However, if you feel ambitious you can email us at support@pdf-xchange.com and I can point you in the right direction, but can't provide detailed support for it - you'd be on your own.

Re: Latin and Gothic Letters for OCR

Posted: Sat Nov 09, 2013 9:39 am
by zzmarko
Walter-Tracker Supp wrote:We will add Slovakian, Swedish, and German "fraktur" language data in the final release of the editor. We will not have direct Latin support, though results using English (or even other Latin alphabet) language selection will be fairly good since the word dictionary weighting is fairly weak (ie, it will not dominate results too seriously).

-Walter
will be may added Croatian language ?

thank you

Re: Latin and Gothic Letters for OCR

Posted: Mon Nov 11, 2013 5:08 pm
by Walter-Tracker Supp
Croatian will be available on or before the next build, anticipated in about a month's time. Meanwhile you can use any other language we provide which uses the same diacritics, if applicable (I'm not familiar with Croatian myself), because the word dictionary coupling is weak.

I will update this forum posting once we have included it.

Re: Latin and Gothic Letters for OCR

Posted: Sat Nov 26, 2016 12:38 pm
by Leonatus
This thread is quite old; nevertheless I wished to exress my big thanks for the "german Fraktur" ocr set! I had been desperately searching for this Feature!

Re: Latin and Gothic Letters for OCR

Posted: Sat Nov 26, 2016 12:42 pm
by John - Tracker Supp
Pleasure :)