Searchable image issues in V9

frebbe · Post by **frebbe** » Sat Jan 16, 2021 11:11 pm

Just updated from version 7 to 9. In the old version, without the Enhanced OCR PlugIn, I had no problems with text in tables. My first test of the new version with a PDF fresh from the digitization service of the Saxon State Library in Dresden (without OCR) is very sobering. Recognition is good on pure text pages, but tables are arbitrarily disfigured with newly drawn vertical and horizontal lines. That looks really bad:

These are the settings with which I carried out the text recognition. However, it is irrelevant which settings OCR is carried out with, the lines in the table always appear.

This is really a bad worsening compared to the results I had with version 7. You can see that I have been using your products since 2004. Today is the first time I am considering withdrawing from the purchase of a new version!

And, by the way, why is there no recognition of Fraktur fonts, which unfortunately have been used in German for a very long time?

Post by **TrackerSupp-JohnG** » Sun Jan 17, 2021 2:22 am

Hello,

Thank-you for informing us about this issue. To support you better, would you mind sending this document to
support@pdf-xchange.com as we would like to be able to take a closer look?

Kind regards,

Markus Stamm · Post by **Markus Stamm** » Mon Jan 18, 2021 3:23 pm

frebbe wrote: ↑Sat Jan 16, 2021 11:11 pm (...)And, by the way, why is there no recognition of Fraktur fonts, which unfortunately have been used in German for a very long time?

I second this question, see also my post on this topic: https://forum.pdf-xchange.com/viewtopic.php?f=63&p=147689#p147689

ABBYY FineReader 15 and FineReader Server are capable of black letter OCR, as was Tesseract, so perhaps this is only a question of adding the training data?

Tue Jan 19, 2021 5:08 am

Hi frebbe

frebbe wrote: Just updated from version 7 to 9. In the old version, without the Enhanced OCR PlugIn, I had no problems with text in tables. My first test of the new version with a PDF fresh from the digitization service of the Saxon State Library in Dresden (without OCR) is very sobering. Recognition is good on pure text pages, but tables are arbitrarily disfigured with newly drawn vertical and horizontal lines. That looks really bad..

Seems that this output-mode:

- has a trouble because in that mode the Editor should not add any lines at all, only invisible text over the scanned image.
We will fix that soon, sorry for the inconvenience.
Also in the next upcoming build we will add an additional option to suppress adding those lines for tables. You will be able to use it for other output-modes too.

Also tip: with V9 you are still able to use the previous OCR engine if you want. You can enable it there:

frebbe wrote: And, by the way, why is there no recognition of Fraktur fonts, which unfortunately have been used in German for a very long time?

As I said, with V9 you are still able to use the previous OCR engine and this engine has the ability to recognize Fraktur fonts as well:

Unfortunately, the new EnhancedOCR hasn't this ability, while is definitely faster and provides a significantly better result for most documents. And we still improving its performance and quality...

Cheers.

Searchable image issues in V9

Searchable image issues in V9

Re: Version 9 is now available

Re: Version 9 is now available

Re: Version 9 is now available