Enhance Scanned Pages

Forum for the PDF-XChange Editor - Free and Licensed Versions

Moderators: Tracker Support, TrackerSupp-Daniel, Paul - Tracker Supp, Chris - Tracker Supp, Vasyl-Tracker Dev Team, Sean - Tracker, Tracker Supp-Stefan, Ivan - Tracker Software

User avatar
TrackerSupp-Daniel
Site Admin
Posts: 3032
Joined: Wed Jan 03, 2018 6:52 pm

Re: Enhance Scanned Pages

Post by TrackerSupp-Daniel » Fri Apr 12, 2019 11:29 pm

Hello Arnold,

That is an excellent question, I had to confer with our lead developer to answer this one.

The Enhance scanned page function is designed to do just that, enhance the scanned page, in turn increasing fidelity while offering a few other functions. The key point of the enhance scanned pages function is that it handles the entire page as a single object, so in cases like this file (where there are actually two images present) the compression is forced to alter the content drastically.

When running OCR on the other hand, we do not recompress the images in such a way, instead we simply recognize what is present and then shuffle all the content into a slightly more appropriate position. In this way it is possible to achieve different results while doing similar tasks.

The main takeaway here is that the enhance scanned pages function is specifically designed to enhance content of a scanned page, it is not designed for optimizing or anything else of the like. OCR on the other hand is not designed to alter the images, simply look at what exists and make it more functional. This is why the functions go hand in hand, but are completely separate.

After fiddling with the settings for a while, I did find that contrary to how the layout seems to indicate, the compression methods and the Size/quality scale are not disabled by having the "adaptive compression" checkbox unchecked. It seems that in the case of this specific file, the issue lies with the jpeg2000 compression method. If you change the greyscale method to JPEG and move the quality slider down to minimum, like so:
image.png
You should find that the output of this file is deskewed, without OCR, and the file size is actually reduced to below 200kb.
I have asked the Devs if we can move that checkbox to a different spot so that it does not confuse people quite as much as it is currently.

Kind regards,
Daniel McIntyre
Support Technician
Tracker Software Products (Canada) LTD

Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623

Arnold
User
Posts: 717
Joined: Tue Jun 09, 2009 3:53 am
Location: Florida

Re: Enhance Scanned Pages

Post by Arnold » Sat Apr 13, 2019 12:09 am

Got it. I will check Enhance Scanned Pages some more in version 8. In version 7 I have always had Adaptive Compression cut off and the slider set all the way to Small Size but still got big files. So it may be that version 8 works better than version 7 based on your results. Out of necessity still using version 7 on Windows XP for the most part.

Thanks.

winstar88
User
Posts: 11
Joined: Mon Apr 30, 2018 12:04 am

Re: Enhance Scanned Pages

Post by winstar88 » Sat Apr 13, 2019 5:05 am

To site admin,
As recommended, the size is indeed reduced to about 200kb if we change the greyscale method to JPEG and move the quality slider down to minimum. However, the output file in such a setting has lost the quality of image compared with the original file.

Arnold
User
Posts: 717
Joined: Tue Jun 09, 2009 3:53 am
Location: Florida

Re: Enhance Scanned Pages

Post by Arnold » Sat Apr 13, 2019 12:06 pm

I think something must have changed/improved in version 8. I deskewed a 64 kb file in version 7 and it resulted in a 416 kb file using Enhanced Scanned Pages. In version 8 using the same file and settings I ended up with a 46 kb file. The version 8 file looks cleaner. The skewed file was scanned in at 300 dpi in B&W.

Thanks.

User avatar
TrackerSupp-Daniel
Site Admin
Posts: 3032
Joined: Wed Jan 03, 2018 6:52 pm

Re: Enhance Scanned Pages

Post by TrackerSupp-Daniel » Mon Apr 15, 2019 4:53 pm

Hello Winstar,

Glad to hear it helped. I would suggest meddling with those settings, likely the slider does not need ot be all the way at minimum to accomplish what you need, so the quality shouldn't need to take that much of a hit either.
This is news to me, but it does indeed appear that we made some changes to how the enhance scanned pages functions work in V8. Like I always say, we are constantly working on improving all aspects of the software. Sometimes you just don't see it because it happens gradually, and other times, like this case, it is a bit more pronounced.

In most of my tests the quality of the image didn't really take a noticeable hit with the settings at minimum, even when testing with your sample file from earlier, it did not look any less presentable than the original. I am not sure if there is much we can do about it, as there are so many veriables when it comes to images, but could you send us a before/after of the file in question? Perhaps include some annotations to show where the issues you are seeing are?

Kind regards,
Daniel McIntyre
Support Technician
Tracker Software Products (Canada) LTD

Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623

User avatar
Vasyl-Tracker Dev Team
Site Admin
Posts: 1962
Joined: Thu Jun 30, 2005 4:11 pm
Location: Canada

Re: Enhance Scanned Pages

Post by Vasyl-Tracker Dev Team » Fri Apr 19, 2019 1:06 am

Hi all.

According to the problem with increasing the file size when only Deskew-filter is used - in the new upcoming build we will add new option to turn off the re-compressing just deskewed images:

NoRecompressOpt.png

- this configuration of options will allow you to deskew pages content without increasing the file size.

HTH.
Vasyl Yaremyn
Tracker Software Products
Project Developer

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.

User avatar
Vasyl-Tracker Dev Team
Site Admin
Posts: 1962
Joined: Thu Jun 30, 2005 4:11 pm
Location: Canada

Re: Enhance Scanned Pages

Post by Vasyl-Tracker Dev Team » Wed Jun 19, 2019 8:18 pm

In addition to my previous post, in the new upcoming build(332):
1. we added the simple and fast feature "Deskew Pages"
2. we fixed one issue with the Deskew-feature in context of EnhanceScans - sometimes it might not work.
Vasyl Yaremyn
Tracker Software Products
Project Developer

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.

User avatar
TrackerSupp-Daniel
Site Admin
Posts: 3032
Joined: Wed Jan 03, 2018 6:52 pm

Re: Enhance Scanned Pages

Post by TrackerSupp-Daniel » Wed Jun 19, 2019 9:57 pm

Regarding the initial issues in this thread posted by Arnold. As discussed, the file size issue has been resolved in recent builds, the latter issue has been thoroughly investigated and we found that in the original documents with the "white text" issues, the issue seems to stem from the PDFSharp software used to create them.
Arnold wrote:
Thu Apr 19, 2018 4:32 am
The 2nd file was not a scanned page, it was created with PDFsharp, a program I am not familiar with. If I deskew a page with a table manually, everything works fine. If I use the Enhance Scanned Pages feature I get a wild looking effect with the text (see before and after below).
Looking into the documents in question, the White text is present before the scan was performed, however PDFSharp seems to have placed it behind the image and set its color to white. This is not a dissimilar method to how our old OCR engine worked, however they chose to use White as the fill color for the text, instead of the transparent alternative. As such, we cannot "resolve" this issue directly (it is not caused by us, but the document). That said, simply changing the color of the text in this case to "none" will negate the issue.
You can force this text to the front by using the Edit > Text tool and right clicking to choose Arrange > Bring to front.

Kind regards,
Daniel McIntyre
Support Technician
Tracker Software Products (Canada) LTD

Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623

User avatar
Ovg
User
Posts: 275
Joined: Tue Sep 05, 2017 4:56 pm
Location: Moscow

Re: Enhance Scanned Pages

Post by Ovg » Thu Jun 20, 2019 1:57 am

Hi Vasyl!

Any news about issue with Russian language?
It's impossible to lead us astray for we don't care even to choose the way.
PDF-XChange PRO, 8.0 (Build 336.0) / W7 x64 SP1

User avatar
Tracker Supp-Stefan
Site Admin
Posts: 13868
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Enhance Scanned Pages

Post by Tracker Supp-Stefan » Thu Jun 20, 2019 1:19 pm

Hello Ovg,

This topic has become a bit big - and there are several 'conversations' in it - so it's a bit hard to follow.
Would you please make a separate topic for the Russian Language issue - so that it can be followed on it's own?

Regards,
Stefan

User avatar
Ovg
User
Posts: 275
Joined: Tue Sep 05, 2017 4:56 pm
Location: Moscow

Re: Enhance Scanned Pages

Post by Ovg » Thu Jun 20, 2019 5:03 pm

Hello Stefan!

This topic already exist - viewtopic.php?f=62&t=32402&p=132614#p132582
It's impossible to lead us astray for we don't care even to choose the way.
PDF-XChange PRO, 8.0 (Build 336.0) / W7 x64 SP1

User avatar
Will - Tracker Supp
Site Admin
Posts: 6855
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Enhance Scanned Pages

Post by Will - Tracker Supp » Fri Jun 21, 2019 8:11 am

Hi OVG,

Thanks for that - Best to keep any queries on that there from here-on.

Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com

Timur Born
User
Posts: 689
Joined: Tue Jun 26, 2012 1:50 pm

Re: Enhance Scanned Pages

Post by Timur Born » Sat Feb 15, 2020 10:46 pm

David.P wrote:
Tue Apr 24, 2018 3:27 pm
Shhh... Editing ClearScan-OCR'ed text is not possible with Ad*be Acr*bat itself (for understandable reasons of legal document security), and it probably should not become widely known that you can "freely edit the text of scanned documents" this way -- using PDF-XChange Editor :o
I just checked a trial of Acrobat again, Clearscan OCR can very much be edited (now). Here is a test of its current results and it is astounding. Notice the AaBbCc text in the lower right box, I edited that in while Acrobat chose the font. When I entered the first X I got a short "OCR" message where it seemed to have obtained extra data not present in the original OCR run. I add an Editor OCR result and the original scan for comparison.
Attachments
Original_Scan.pdf
(4.74 MiB) Downloaded 3 times
Editable_detect_fix.pdf
(3.5 MiB) Downloaded 3 times
Acrobat_Editable_Fonts.pdf
(1.19 MiB) Downloaded 6 times

Timur Born
User
Posts: 689
Joined: Tue Jun 26, 2012 1:50 pm

Re: Enhance Scanned Pages

Post by Timur Born » Sun Feb 16, 2020 11:33 am

Worth mentioning: There is no X or Y present anywhere in the text! Acrobat still manages to create these as editable vector font that fits the style of the other letters. This likely is why I saw the short progress message when I typed the first X, maybe Acrobat looked up fitting fonts in its (online) font database?!

It did not so good with letters that are present in the text, but using a different font, like the "F".

User avatar
David.P
User
Posts: 883
Joined: Thu Feb 28, 2008 8:16 pm
Location: Germany

Re: Enhance Scanned Pages

Post by David.P » Sun Feb 16, 2020 11:55 am

🤞🤞 that this doesn't come to the attention of any legislat*rs in the area of electronic document handling/archiving regulations
David.P
PDF-XChange Pro

Timur Born
User
Posts: 689
Joined: Tue Jun 26, 2012 1:50 pm

Re: Enhance Scanned Pages

Post by Timur Born » Sun Feb 16, 2020 12:16 pm

I checked again and all letters seem to be present in both fonts. So the "F" can be had with serifs as well (Times New Roman style).

User avatar
David.P
User
Posts: 883
Joined: Thu Feb 28, 2008 8:16 pm
Location: Germany

Re: Enhance Scanned Pages

Post by David.P » Sun Feb 16, 2020 7:31 pm

Timur, what version number was that Adobe trial version that you used?

I just realized that either I was wrong stating:
David.P wrote:
Tue Apr 24, 2018 3:27 pm
Editing ClearScan-OCR'ed text is not possible with Ad*be Acr*bat itself (for understandable reasons of legal document security)
...two years ago -- or facts have changed in the meantime.

Acrobat XI v.11.0.23 now very well allows editing of its own ClearScan-OCR'ed text (as attached).

Also, I just realized that Acr0bat even fabricates PAPER TEXTURE behind the font letters :shock:

I believe that this almost qualifies for Clarke's law #3.
Attachments
OCR Clearscan Acrobat XI v.11.0.23.pdf
(540.64 KiB) Downloaded 4 times
David.P
PDF-XChange Pro

Post Reply