Page 2 of 2

Re: Enhance Scanned Pages

Posted: Fri Apr 12, 2019 11:29 pm
by TrackerSupp-Daniel
Hello Arnold,

That is an excellent question, I had to confer with our lead developer to answer this one.

The Enhance scanned page function is designed to do just that, enhance the scanned page, in turn increasing fidelity while offering a few other functions. The key point of the enhance scanned pages function is that it handles the entire page as a single object, so in cases like this file (where there are actually two images present) the compression is forced to alter the content drastically.

When running OCR on the other hand, we do not recompress the images in such a way, instead we simply recognize what is present and then shuffle all the content into a slightly more appropriate position. In this way it is possible to achieve different results while doing similar tasks.

The main takeaway here is that the enhance scanned pages function is specifically designed to enhance content of a scanned page, it is not designed for optimizing or anything else of the like. OCR on the other hand is not designed to alter the images, simply look at what exists and make it more functional. This is why the functions go hand in hand, but are completely separate.

After fiddling with the settings for a while, I did find that contrary to how the layout seems to indicate, the compression methods and the Size/quality scale are not disabled by having the "adaptive compression" checkbox unchecked. It seems that in the case of this specific file, the issue lies with the jpeg2000 compression method. If you change the greyscale method to JPEG and move the quality slider down to minimum, like so:
image.png
You should find that the output of this file is deskewed, without OCR, and the file size is actually reduced to below 200kb.
I have asked the Devs if we can move that checkbox to a different spot so that it does not confuse people quite as much as it is currently.

Kind regards,

Re: Enhance Scanned Pages

Posted: Sat Apr 13, 2019 12:09 am
by Arnold
Got it. I will check Enhance Scanned Pages some more in version 8. In version 7 I have always had Adaptive Compression cut off and the slider set all the way to Small Size but still got big files. So it may be that version 8 works better than version 7 based on your results. Out of necessity still using version 7 on Windows XP for the most part.

Thanks.

Re: Enhance Scanned Pages

Posted: Sat Apr 13, 2019 5:05 am
by winstar88
To site admin,
As recommended, the size is indeed reduced to about 200kb if we change the greyscale method to JPEG and move the quality slider down to minimum. However, the output file in such a setting has lost the quality of image compared with the original file.

Re: Enhance Scanned Pages

Posted: Sat Apr 13, 2019 12:06 pm
by Arnold
I think something must have changed/improved in version 8. I deskewed a 64 kb file in version 7 and it resulted in a 416 kb file using Enhanced Scanned Pages. In version 8 using the same file and settings I ended up with a 46 kb file. The version 8 file looks cleaner. The skewed file was scanned in at 300 dpi in B&W.

Thanks.

Re: Enhance Scanned Pages

Posted: Mon Apr 15, 2019 4:53 pm
by TrackerSupp-Daniel
Hello Winstar,

Glad to hear it helped. I would suggest meddling with those settings, likely the slider does not need ot be all the way at minimum to accomplish what you need, so the quality shouldn't need to take that much of a hit either.
This is news to me, but it does indeed appear that we made some changes to how the enhance scanned pages functions work in V8. Like I always say, we are constantly working on improving all aspects of the software. Sometimes you just don't see it because it happens gradually, and other times, like this case, it is a bit more pronounced.

In most of my tests the quality of the image didn't really take a noticeable hit with the settings at minimum, even when testing with your sample file from earlier, it did not look any less presentable than the original. I am not sure if there is much we can do about it, as there are so many veriables when it comes to images, but could you send us a before/after of the file in question? Perhaps include some annotations to show where the issues you are seeing are?

Kind regards,

Re: Enhance Scanned Pages

Posted: Fri Apr 19, 2019 1:06 am
by Vasyl-Tracker Dev Team
Hi all.

According to the problem with increasing the file size when only Deskew-filter is used - in the new upcoming build we will add new option to turn off the re-compressing just deskewed images:

NoRecompressOpt.png

- this configuration of options will allow you to deskew pages content without increasing the file size.

HTH.

Re: Enhance Scanned Pages

Posted: Wed Jun 19, 2019 8:18 pm
by Vasyl-Tracker Dev Team
In addition to my previous post, in the new upcoming build(332):
1. we added the simple and fast feature "Deskew Pages"
2. we fixed one issue with the Deskew-feature in context of EnhanceScans - sometimes it might not work.

Re: Enhance Scanned Pages

Posted: Wed Jun 19, 2019 9:57 pm
by TrackerSupp-Daniel
Regarding the initial issues in this thread posted by Arnold. As discussed, the file size issue has been resolved in recent builds, the latter issue has been thoroughly investigated and we found that in the original documents with the "white text" issues, the issue seems to stem from the PDFSharp software used to create them.
Arnold wrote: Thu Apr 19, 2018 4:32 am The 2nd file was not a scanned page, it was created with PDFsharp, a program I am not familiar with. If I deskew a page with a table manually, everything works fine. If I use the Enhance Scanned Pages feature I get a wild looking effect with the text (see before and after below).
Looking into the documents in question, the White text is present before the scan was performed, however PDFSharp seems to have placed it behind the image and set its color to white. This is not a dissimilar method to how our old OCR engine worked, however they chose to use White as the fill color for the text, instead of the transparent alternative. As such, we cannot "resolve" this issue directly (it is not caused by us, but the document). That said, simply changing the color of the text in this case to "none" will negate the issue.
You can force this text to the front by using the Edit > Text tool and right clicking to choose Arrange > Bring to front.

Kind regards,

Re: Enhance Scanned Pages

Posted: Thu Jun 20, 2019 1:57 am
by Ovg
Hi Vasyl!

Any news about issue with Russian language?

Re: Enhance Scanned Pages

Posted: Thu Jun 20, 2019 1:19 pm
by Tracker Supp-Stefan
Hello Ovg,

This topic has become a bit big - and there are several 'conversations' in it - so it's a bit hard to follow.
Would you please make a separate topic for the Russian Language issue - so that it can be followed on it's own?

Regards,
Stefan

Re: Enhance Scanned Pages

Posted: Thu Jun 20, 2019 5:03 pm
by Ovg
Hello Stefan!

This topic already exist - viewtopic.php?f=62&t=32402&p=132614#p132582

Re: Enhance Scanned Pages

Posted: Fri Jun 21, 2019 8:11 am
by Will - Tracker Supp
Hi OVG,

Thanks for that - Best to keep any queries on that there from here-on.

Thanks,

Re: Enhance Scanned Pages

Posted: Sat Feb 15, 2020 10:46 pm
by Timur Born
David.P wrote: Tue Apr 24, 2018 3:27 pmShhh... Editing ClearScan-OCR'ed text is not possible with Ad*be Acr*bat itself (for understandable reasons of legal document security), and it probably should not become widely known that you can "freely edit the text of scanned documents" this way -- using PDF-XChange Editor :o
I just checked a trial of Acrobat again, Clearscan OCR can very much be edited (now). Here is a test of its current results and it is astounding. Notice the AaBbCc text in the lower right box, I edited that in while Acrobat chose the font. When I entered the first X I got a short "OCR" message where it seemed to have obtained extra data not present in the original OCR run. I add an Editor OCR result and the original scan for comparison.

Re: Enhance Scanned Pages

Posted: Sun Feb 16, 2020 11:33 am
by Timur Born
Worth mentioning: There is no X or Y present anywhere in the text! Acrobat still manages to create these as editable vector font that fits the style of the other letters. This likely is why I saw the short progress message when I typed the first X, maybe Acrobat looked up fitting fonts in its (online) font database?!

It did not so good with letters that are present in the text, but using a different font, like the "F".

Re: Enhance Scanned Pages

Posted: Sun Feb 16, 2020 11:55 am
by David.P
🤞🤞 that this doesn't come to the attention of any legislat*rs in the area of electronic document handling/archiving regulations

Re: Enhance Scanned Pages

Posted: Sun Feb 16, 2020 12:16 pm
by Timur Born
I checked again and all letters seem to be present in both fonts. So the "F" can be had with serifs as well (Times New Roman style).

Re: Enhance Scanned Pages

Posted: Sun Feb 16, 2020 7:31 pm
by David.P
Timur, what version number was that Adobe trial version that you used?

I just realized that either I was wrong stating:
David.P wrote: Tue Apr 24, 2018 3:27 pmEditing ClearScan-OCR'ed text is not possible with Ad*be Acr*bat itself (for understandable reasons of legal document security)
...two years ago -- or facts have changed in the meantime.

Acrobat XI v.11.0.23 now very well allows editing of its own ClearScan-OCR'ed text (as attached).

Also, I just realized that Acr0bat even fabricates PAPER TEXTURE behind the font letters :shock:

I believe that this almost qualifies for Clarke's law #3.