Page 1 of 1

A bug for Enhance Scanned Pages

Posted: Sat Sep 25, 2021 11:04 am
by eu4you
Hi,

I have some pdf files 'OCR'ed by ABBYY FineReader 14.
FineReader is great program for OCR, but a fault is to split texts to many object.
It results to reduce speed of reading files in any viewer.

So I try 'Enhance Scanned Pages' feature in PDF-Tools,
and your tools works good to combine pictures and text objects.
This works on only applying any filter in the feature.
Calling Guide Book.pdf
(3.9 MiB) Downloaded 90 times
But there is a problem in this, objects order is fail.
Text object should be under picture object, but always is on picture.
So text in PDF show overlaply and horribly.

So I request to fix this or develop a feature to order objects.
Thanks.

Re: A bug for Enhance Scanned Pages

Posted: Mon Sep 27, 2021 11:14 pm
by TrackerSupp-Daniel
Hello, eu4you

I am afraid that I do not understand the issue you are reporting in full here. The document you sent already appear to have had OCR run on it, and the OCR text is transparent, so it does not impact the images.
Could I ask you to
1. Provide a screenshot of what you see in this file as problematic
2. Provide a screenshot of what your OCR settings are, so that we can try to reproduce the issue
3. a copy of the original file, before any OCR has been performed.

Kind regards,

Re: A bug for Enhance Scanned Pages

Posted: Tue Sep 28, 2021 1:50 pm
by eu4you
Hello Daniel,
Yes, I think my explanation was lacking.

1. This is a original PDF which is processed by OCR feature of FineReader 14.
Calling Guide Book(Before).pdf
(3.9 MiB) Downloaded 92 times

You can see normally texts and pictures in the pdf file, but it may be very slow to read any pages.
And the contents in the file is like below :
Before Contents.png


2. So I excute the feature 'Enhance Scanned Pages' to the file as a setting below :
Setting.png


3. And the result file is below :
https://kutt.it/orc7up (Over 5MB..)

You can see abnormal pages. Texts should be under picture, but they are reversed. So text layers overlap with text on picture in page view.

And the contents in the file is like below :
After contents.png
You know, all contents in each page are structured as layer, and the top in the list places on bottom in real page.
In original PDF, container contained texts is above container contained pictures in list, So texts is hided under picture in page view.
But the result processed by the feature in pdf-tools is wrong, text layer is under than picture layer in list, so texts is upon the picture which is with texts in page view. Texts are overlapped with each other.


So what I would suggest is to modify the order of the layers of pictures and text to change, or to create another function to sort by setting the layer priority.

Note that this situation does not appear when the filter in the feature is turned off. Any filter must be turned on to appear.

Re: A bug for Enhance Scanned Pages

Posted: Tue Oct 12, 2021 4:05 pm
by eu4you
Hello?
Is this bug accepted?

Re: A bug for Enhance Scanned Pages

Posted: Tue Oct 12, 2021 11:30 pm
by TrackerSupp-Daniel
Hello, eu4you

My apologies! Yes I managed to reproduce this issue and created the following bug report for it:

RT#5745: Enhance Scans re-orders page content incorrectly.

I thought I had posted it here after creating the ticket, so again, please accept my apologies for missing this.

Kind regards,

Re: A bug for Enhance Scanned Pages

Posted: Mon Nov 15, 2021 11:43 am
by eu4you
Hello,

Looks like the latest update didn't reflect this bug fix.
May I know how is it going?

Re: A bug for Enhance Scanned Pages

Posted: Mon Nov 15, 2021 11:54 am
by Tracker Supp-Stefan
Hello eu4you,

Yes indeed this ticket and the bug reported in it could not be addressed in build 358.
The ticket is still in our system and will be looked at as soon as possible!

Kind regards,
Stefan

Re: A bug for Enhance Scanned Pages

Posted: Tue Nov 16, 2021 10:37 pm
by Vasyl-Tracker Dev Team
Hi eu4you.

The 'overlapped-text-becomes-visible-after-EnhanceScans' bug is confirmed and will be fixed soon.

Also, a tip related to your case, to your screenshot of EnhanceScans dialog: using the Descreen=High isn't a good idea for a major number of images, because it may add visual artifacts (vertical and horizontal lines). And using a High value for all params doesn't guarantee the Best result for your case. It is better to play more with such params. Sometimes the Medium or even Low value may give you better results, depending on the kind of image.

Cheers.

Re: A bug for Enhance Scanned Pages

Posted: Sat Dec 11, 2021 5:47 am
by eu4you
Thank you very much for the good news and also for your hard work!

A bug for Enhance Scanned Pages

Posted: Mon Dec 13, 2021 10:01 am
by Tracker Supp-Stefan
:)

Re: A bug for Enhance Scanned Pages

Posted: Fri Apr 08, 2022 10:38 pm
by eu4you
Hi,
has this problem been resolved in this version update?

Re: A bug for Enhance Scanned Pages

Posted: Mon Apr 11, 2022 12:19 pm
by Tracker Supp-Stefan
Hello eu4you,

I just tested with the sample file from the ticket - and no - it does not seem like this was fixed in build 360. I will ask our devs to check this again!

Kind regards,
Stefan

Re: A bug for Enhance Scanned Pages

Posted: Tue Apr 12, 2022 7:14 pm
by Vasyl-Tracker Dev Team
Hi eu4you.

This issue will be fixed in the upcoming 361 build.

Cheers.

Re: A bug for Enhance Scanned Pages

Posted: Fri May 27, 2022 8:23 am
by eu4you
Hi,
I recently updated PDF-Tools, and I've been busy, so I only tried this function today.

In the file I previously attached, the error seemed to be resolved. But when I tried other files, it still didn't solve the problem.

Among the options, there was a function to lower the background to the bottom of the layer, but then the text comes over the background, so I don't think the problem is solved. Rather, I think the background should be at the top. I wish this option was added as well.

Please take a look at the file attached below.

I am replacing the file attachment with the link below due to the error that the attachment is too large :
https://Enoch.myqnapcloud.com:8001/share.cgi?ssid=81b900ccb27b4da1af5dcdd52026510c

Re: A bug for Enhance Scanned Pages

Posted: Fri May 27, 2022 6:09 pm
by TrackerSupp-Daniel
Hello, eu4you

Looking at these files, the document seem to already be both well formed and contain a set of text data. I need to ask why it is that you are trying to run the "Enhance scanned pages" function on this document to begin with, as very, if any improvements at all, could come from it.

Is there some specific reason that you are running Enhance scanned pages on this file to begin with?

Kind regards,

Re: A bug for Enhance Scanned Pages

Posted: Sat May 28, 2022 2:37 pm
by eu4you
Hi Daniel,

Because it takes too much time to read this file. When reading with a PDF Reader program on a computer, it is slow when turning multiple pages quickly. And most importantly, I want to read these documents with an e-book reader (eg the Onyx Boox series) or something, which is seriously slow. It takes about 10 seconds to turn a page.

I think your program will be very useful in improving this problem, and in fact, some files that have been converted are already useful.

Re: A bug for Enhance Scanned Pages

Posted: Mon May 30, 2022 4:32 pm
by TrackerSupp-Daniel
Hello, eu4you

Thank you for the explanation. "Enhance scanned pages" is not the feature you are looking for in that case, it is not designed to reduce file size or overhead, and often can actually result in an increase in size for each page, making it take longer in the cases you just described.

What you are looking for here is the "save as optimized" function, which is located on the File tab. This is used to trim unnecessary information from pages, recompress images, and discard other unused data, such as duplicate fonts. I would recommend that you take a look at that tool and all of its various options, to see if they work for you.

Kind regards,