Page 1 of 2

Enhance Scanned Pages

Posted: Thu Apr 19, 2018 4:32 am
by Arnold
I was happy to see the new Enhance Scanned Pages feature, as I often use the Transform feature to Deskew scanned pages I receive. However, I had problems with the first 2 files I tried it with.

The first file was a 2 page scanned document. If I use the Transform feature and manually deskew the pages, the file size remains the same as the original at 1.03 megabytes. However if I use the new feature to deskew the pages the file size increases to 2.61 meg.

The 2nd file was not a scanned page, it was created with PDFsharp, a program I am not familiar with. If I deskew a page with a table manually, everything works fine. If I use the Enhance Scanned Pages feature I get a wild looking effect with the text (see before and after below).

The settings I am using are shown below. Is there something I am missing? Thanks.
2018-04-18 22_27_57-.png
2018-04-18 22_25_54-.png
2018-04-18 22_23_23-Enhance Scanned Pages.png

Re: Enhance Scanned Pages

Posted: Thu Apr 19, 2018 5:38 am
by Sasha - Tracker Dev Team
Hello Arnold,

Can we have a copy of this file/page, so that we can recreate this behavior?

Cheers,
Alex

Re: Enhance Scanned Pages

Posted: Thu Apr 19, 2018 7:24 am
by Arnold
I will email you both files, the one that doubles in size and the one where the text gets crazy looking.

Re: Enhance Scanned Pages

Posted: Thu Apr 19, 2018 7:50 am
by Sasha - Tracker Dev Team
Hello Arnold,

Please mail me directly at polaringu@tracker-software.com

Cheers,
Alex

Re: Enhance Scanned Pages

Posted: Thu Apr 19, 2018 8:01 am
by Arnold
I just emailed you the files directly with a note. I tried deskewing the file that doubled in size in Windows XP 32 bit on a Windows 7 64 bit machine, and the file quadrupled to 4.37 meg.

Re: Enhance Scanned Pages

Posted: Thu Apr 19, 2018 8:06 am
by Sasha - Tracker Dev Team
Hello Arnold,

Thanks, I will forward that to the developer who implemented this directly.

Cheers,
Alex

Re: Enhance Scanned Pages

Posted: Thu Apr 19, 2018 2:58 pm
by David.P
No offense meant, but this is one of the very few fields where the former market leader still excels with their sensational OCR function (which also perfectly deskews scanned pages):
Image
Ad*be's Super Resolution Vector OCR

Re: Enhance Scanned Pages

Posted: Fri Apr 20, 2018 12:27 am
by Patrick-Tracker Supp
Hi David,
Thank you for illustrating this for us. We will, as always, endeavor to improve. As Sasha mentioned, the developer directly responsible has been made aware and is looking for a solution.

Re: Enhance Scanned Pages

Posted: Sun Apr 22, 2018 9:09 am
by Ovg
I have a similar issue:
 

Code: Select all

       Before                      After
Untitled.png
Settings:
Capture.PNG
But quality of OCR is much better compare to old method.

Re: Enhance Scanned Pages

Posted: Mon Apr 23, 2018 12:44 pm
by Tracker Supp-Stefan
Thanks for the sample OVG,

Passing it along to the devs working on the tool for consideration!

Regards,
Stefan

Re: Enhance Scanned Pages

Posted: Mon Apr 23, 2018 1:07 pm
by Ovg
Thank you, Stefan!
I can send file in question, if you need one

Re: Enhance Scanned Pages

Posted: Mon Apr 23, 2018 1:42 pm
by Tracker Supp-Stefan
I thought the image you included above is of the original scan?
It is quite a high quality image so is the Enhancement necessary at all for that one?

Regards,
Stefan

Re: Enhance Scanned Pages

Posted: Mon Apr 23, 2018 1:55 pm
by Ovg
This isn't a scan - this is a screenshot of PDF file before new method of OCR and after.
New OCR method is MUCH accurate than older one, so I've tried OCR existing pdf file.
Sorry for confusing you!

Re: Enhance Scanned Pages

Posted: Mon Apr 23, 2018 2:14 pm
by Sasha - Tracker Dev Team
Hello guys,

We've been investigating this matter a little and here are the investigation results:
1) As for the Arnold's file - there is already text in it and the bad thing is that that text is of white color. The text itself is below the image in the starting file, but after the enhancement, it becomes top level and visible - thus the image looks like that.
Also for that file - each word is already deskewed separately (not an entire table but each word separately).

2) As of the OVG's file - the image is being rasterized and there is a cap for the rasterization DPI. We suspect that the DPI of the original image was much greater thus the text looks like that after the enhancement process, because it reached the cap.

Cheers,
Alex

Re: Enhance Scanned Pages

Posted: Mon Apr 23, 2018 2:25 pm
by Ovg

Re: Enhance Scanned Pages

Posted: Mon Apr 23, 2018 4:19 pm
by Tracker Supp-Stefan
Hi all,

And I've just created a ticket for the problem Arnold experienced at the start of this topic with his specific file:
#4332: Editor 325: Enhance Scanned Pages issues with a file that contains white text already
Please note that my colleagues told me that this will be a long term one, as there are more pressing matters to attend to first, and this one will need serious logic redesign that will take time!

Regards,
Stefan

Re: Enhance Scanned Pages

Posted: Mon Apr 23, 2018 5:18 pm
by Arnold
Please don't forget about the file I sent you where the file size doubled or quadrupled depending on the version of Windows. For me, that is more of a problem than the white text one. Many of the files that I deskew end up in post job submittal packages that I then email out to clients. So the size of documents that I need to include is always a concern.

Thanks.

Re: Enhance Scanned Pages

Posted: Mon Apr 23, 2018 8:08 pm
by TrackerSupp-Daniel
Hello Arnold,
Don't worry, we haven't forgotten you, the team is working on it.
Have a great day!

Re: Enhance Scanned Pages

Posted: Tue Apr 24, 2018 2:25 pm
by Timur Born
David.P wrote:No offense meant, but this is one of the very few fields where the former market leader still excels with their sensational OCR function (which also perfectly deskews scanned pages):
Even better, they turn scanned text into fonts, which means that you can edit a scanned document using the "original" font. Said market leader's OCR even can properly scan round stamps/seals. They are really in their own league, even compared to "specialized" competitors like Nuance. Still the monthly fee is too high for the single product.

Re: Enhance Scanned Pages

Posted: Tue Apr 24, 2018 3:27 pm
by David.P
Timur Born wrote:Even better, they turn scanned text into fonts...
Yeah, that's what I meant to say with "Vector OCR".

Timur Born wrote:They are really in their own league [OCR-wise], even compared to "specialized" competitors like Nuance
True. Particularly also note the Super Resolution part (i.e. how they make almost perfect, smooth font characters out of ugly skewed, pixelated faxes and the like).

Timur Born wrote:...which means that you can edit a scanned document using the "original" font.
Shhh... Editing ClearScan-OCR'ed text is not possible with Ad*be Acr*bat itself (for understandable reasons of legal document security), and it probably should not become widely known that you can "freely edit the text of scanned documents" this way -- using PDF-XChange Editor :o

Re: Enhance Scanned Pages

Posted: Tue Apr 24, 2018 6:47 pm
by Paul - Tracker Supp
Hi guys,

I don't think there is any legal reason for us to prevent editing these documents. They are a PDF and we offer a tool to Edit PDF's. We honour the security specifications of the format. I don't think there is anything wrong here.

:-)

Re: Enhance Scanned Pages

Posted: Mon Apr 30, 2018 12:23 am
by winstar88
I have got the same problem, a two page scanned pdf file (264k) after using Enhance Scanned Pages feature (only Deskew, all others off, i.e. without OCR), the size of resulting file has increased to 695k.
I suppose the file size should be close to the orginal file.

Re: Enhance Scanned Pages

Posted: Mon Apr 30, 2018 4:50 pm
by TrackerSupp-Daniel
Hello winstar,
we are looking for any input on this process we can get, Could I ask you send us a file this happens with, (before and after?) please and thank you!
You can add them as attachments here, or if it is confidential you could also send it via email to us at support@pdf-xchange.com

Have a great day!

Re: Enhance Scanned Pages

Posted: Tue May 01, 2018 5:26 am
by winstar88
xx1_1.pdf is the original file, and xx1_3.pdf is teh file after using enchance scanned page

Re: Enhance Scanned Pages

Posted: Tue May 01, 2018 12:36 pm
by Tracker Supp-Stefan
Thanks for those files winstar88,

I've notified Dan you've shared them and he will take them and continue the investigation when he gets to the office a bit later today.

Cheers,
Stefan

Re: Enhance Scanned Pages

Posted: Wed May 02, 2018 5:37 pm
by David.P
Paul - Tracker Supp wrote:I don't think there is any legal reason for us to prevent editing these documents. They are a PDF and we offer a tool to Edit PDF's. We honour the security specifications of the format. I don't think there is anything wrong here.
Sorry -- I didn't mean to imply that it could be illegal to edit ClearScan OCR'ed files.

I was only going to say that as far as Ad○be Acr○bat goes, it is not possible to edit the text in such files. This is probably due to the fact that Adobe has set some hidden or unofficial flag that prevents their own software to edit text that has been created by ClearScan OCR.

The reason why they decided to prevent editing of ClearScan-created text probably is that they wanted the document to look and behave exactly as a normal scanned (and OCR'ed) bitmap document -- although the former almost has nothing in common with the latter.

ClearScan OCR with its ability to create high-quality, real fonts from scanned text is so out of league that otherwise (had they not artificially prevented it from editing) it probably would have been banned by the authorities as a permitted format for document management and storage.

Keep up the great work
Best regards
David
:)

Re: Enhance Scanned Pages

Posted: Wed May 02, 2018 11:51 pm
by TrackerSupp-Daniel
Hi everyone,
As for Winstar88's files - We have reproduced the issue and created an internal development ticket about this, hopefully we will receive some feedback on it soon, but we can make no guarantees about timelines at the moment.

Thank you for your patience and understanding!

Re: Enhance Scanned Pages

Posted: Sat Jun 09, 2018 9:43 am
by Sasha - Tracker Dev Team
Hello winstar88,

The Enhance Scanned Pages feature is used for updating the image quality and visual representation either for OCR or overall better look. It does not lower the resulting file size. To lower the size of the file, use the Optimize PDF feature:
https://www.pdf-xchange.com/knowle ... cts-create
Or you can use the Recompress Image file to optimize only the currently selected image content item (via the Edit Content tool or by clicking on it with the Select Text tool).

Cheers,
Alex

Re: Enhance Scanned Pages

Posted: Wed Oct 03, 2018 7:29 am
by winstar88
I just tried the new version 7.327 on the file (see #24,posted before). The saved file after deskew only without OCR is till significant larger than the orginal file.
The problem is still not solved. It is expected the file size should not change a lot.

Re: Enhance Scanned Pages

Posted: Wed Oct 03, 2018 1:50 pm
by Tracker Supp-Stefan
Hello winstar88,

The "Deskew" element of the "Enhance Scanned Pages" method removes the original images from the page, and replaces them with a new one. During this process we do create a new image with different compression and settings, and unfortunately there's no way to turn this off currently.

If you manually select the image in the content pane, and then rotate it using the transform tools (I needed about -0.7°) - then this rotates the image already existing in the file, and does not recreate it - so as such the file size remains the same. I know this is not the ideal solution but is something you can try while I check with our devs if we can do any further improvements on the Deskew element of "Enhance scanned pages".

Regards,
Stefan

Re: Enhance Scanned Pages

Posted: Wed Oct 03, 2018 2:21 pm
by Arnold
The Transform works brilliantly but it sure is time consuming as it has to be done page by page AFAIK. The Enhanced Scanned Pages feature could be so, so useful if it could deskew pages in a pdf like the Transform feature (ie no file size increase). I regularly receive pdf's that are scanned very poorly.

Re: Enhance Scanned Pages

Posted: Wed Oct 03, 2018 3:46 pm
by Tracker Supp-Stefan
Hi Arnold,

Yes - Transforming each page's images one by one will be time consuming. I am offering it just as a temporary workaround while we discuss the options for the Enhance tool.

Regards,
Stefan

Re: Enhance Scanned Pages

Posted: Wed Oct 17, 2018 7:48 pm
by Arnold
In the last couple of days I noticed that Transform changed the look of a few skewed pages I received. This is something that I did not think happened. I scanned in the attached pdf using the Editor, and then used the Transform feature on it as a test. The text in the rotated document definitely looks degraded to me. Have I changed a setting somewhere that is causing this, or did I just not notice it before? Using 327.0.

Thanks.

Re: Enhance Scanned Pages

Posted: Thu Oct 18, 2018 12:15 am
by TrackerSupp-Daniel
Hi Arnold,
In cases like this, remember that a scanner scans similarly to a pixel by pixel method(dpi).
In turn this means that when the document arrives each point should appear as a nearly perfect square:
DMKB18October523.png
DMKB18October523.png (7.74 KiB) Viewed 12241 times
This in turn means that no smoothing is applied to the original scan, but when it is rotated, each letter is a series of slight slants, instead of a clean cut.
If you look closely, you can see that at 1600% zoom, the Tansformed image is a but fuzzier than the original scan.

You can turn on/off image smoothing from the Preferences, under Page Display > rendering.

Re: Enhance Scanned Pages

Posted: Thu Oct 18, 2018 1:52 am
by Arnold
I tried turning off smoothing, but it makes the Transformed pdf look very thin and unpleasant looking (see attached). It even makes the original scanned document look like that.

My thinking was that since a scan was basically a picture, then using Transfrom in this manner was simply rotating a picture. Obviously more to this than I figured, just like with the Text Hinting thing.

Thanks.

Re: Enhance Scanned Pages

Posted: Sat Nov 24, 2018 11:12 pm
by Eymann
Hello everybody

We have noticed a problem in the tool "Enhance Scanned Pages".
If you have a scanned PDF document that contains partially rotated pages, optimized and converted to PDF A standard.

Have in the attachment the original PDF and the optimized PDF with a total of 11 pages of which are a few pages portrait and a few pages landscape format.
After optimizing and converting to PDF A2-b, the pages are as cropped --> This is probably a mistake that nobody has noticed ???
Maybe something for the development department ...?

The problem is somewhere in the Tool "Enhance Scanned Pages"

Attached you will see a print screen with the setting how we work!

Optimized "PDF Tools-2018-0103 Einsprache Jost Rüegg.pdf" and original PDF "Original-2018-0103 Einsprache Jost Rüegg.pdf" you will find attached!

We work with the version 7.0 Build 327.1

Cheers Ueli

Re: Enhance Scanned Pages

Posted: Mon Nov 26, 2018 3:01 pm
by Sasha - Tracker Dev Team
Hello Ueli,

This is fixed now and will be available from the next build (328) that should be held in a month or so.

Cheers,
Alex

Re: Enhance Scanned Pages

Posted: Mon Nov 26, 2018 3:28 pm
by Eymann
Hello Alex
Thanks for the feedback, I will now inform my staff that it will be fixed (Build 328) at the next update.

I also hope that the other small problem with the validations of PDF standards will be resolved. --> viewtopic.php?t=31720

I stay now and wish you a successful day!
Thanks again for the quick help!

Cheers Ueli

Re: Enhance Scanned Pages

Posted: Mon Nov 26, 2018 6:46 pm
by TrackerSupp-Daniel
Hello Eymann,

I cannot make a promise for a timeline, but those fixes are certainly coming in the future.

Regards,

Re: Enhance Scanned Pages

Posted: Wed Mar 13, 2019 8:22 pm
by Arnold
Over here using this feature still causes file size to increase quite a bit even if it is only deskewing the page. As I mentioned (I believe) in my 1st post, if I manually deskew a page using Transpose, there is no increase in file size. Please see below. Do you guys get the same results over there?
2019-03-13 16_11_36-C__Documents and Settings_Arnold_My Documents_Downloads - FreeCommander XE.png
2019-03-13 16_11_36-C__Documents and Settings_Arnold_My Documents_Downloads - FreeCommander XE.png (5.76 KiB) Viewed 11450 times
Thanks.

Re: Enhance Scanned Pages

Posted: Sat Mar 16, 2019 1:58 am
by winstar88
I also tested the latest version v7.0.328.2. The issue (size increased after deskew) is still there.

Re: Enhance Scanned Pages

Posted: Mon Mar 18, 2019 8:44 pm
by TrackerSupp-Daniel
Hello All,

I still see this issue as well and have brought it back to the Dev team for further action. Apologies for the delay in this.

With the New OCR engine coming out in Version 8 we are planning to make a number of changes to how this all works, so hopefully they can bundle it into this release, but I cannot make any promises at the moment.

Kind regards.

Re: Enhance Scanned Pages

Posted: Sat Apr 06, 2019 5:32 am
by winstar88
It seems that the latest version 8.0.330 does not solve the problem (the size of a file after deskew is still getting large)

Re: Enhance Scanned Pages

Posted: Mon Apr 08, 2019 12:21 pm
by Tracker Supp-Stefan
Hello winstar88,

Are you using the new enhanced OCR tool - and if yes - can we have a screenshot with your settings and the sample file on which you try to perform the OCR - so that we can see if we can replicate the same on our end!

Regards,
Stefan

Re: Enhance Scanned Pages

Posted: Thu Apr 11, 2019 7:14 am
by winstar88
To site admin,
The file (Wastewater Engineering - Treatment and Resource Recovery _2014_1_1.pdf) in my previous post of this topic can be used for a test.

Re: Enhance Scanned Pages

Posted: Thu Apr 11, 2019 5:11 pm
by TrackerSupp-Daniel
Hello winstar,
In testing with your sample files above, I found that running the new Enhanced OCR reduced the file size significantly, Dropping from 254kb to 35kb with the following settings enabled:
image.png
image.png
image.png (10.74 KiB) Viewed 11080 times
Can I ask you to please send us a screenshot of the exact settings you have selected in the OCR dialog to cause this increase in file size you speak of?

Kind regards.

Re: Enhance Scanned Pages

Posted: Fri Apr 12, 2019 12:22 am
by Arnold
I tried the same thing but changed the Output setting to Searchable Image which I prefer. Size stayed about the same (see below). So definite improvement over what happens with the Enhance Scanned Pages feature. It does not appear possible to use this Deskew feature without using OCR also though. Can an option be added to allow Deskew and Correct Rotation without OCR? If not, is there still a plan to fix the file bloat problem with Deskew in Enhance Scanned Pages?

Thanks.

2019-04-11 20_17_04-C__Documents and Settings_Arnold_My Documents_Downloads - FreeCommander XE.png
2019-04-11 20_17_04-C__Documents and Settings_Arnold_My Documents_Downloads - FreeCommander XE.png (3.31 KiB) Viewed 11069 times

Re: Enhance Scanned Pages

Posted: Fri Apr 12, 2019 8:25 am
by winstar88
Please see the size of files (2014_1_1 original file 254kb; 2014_1_3 is the file after deskew 703kb).

Re: Enhance Scanned Pages

Posted: Fri Apr 12, 2019 9:06 pm
by TrackerSupp-Daniel
Hello Winstar88 and Arnold,

Thank you for the feedback, and My apologies, It had been a while since I looked at this and somehow I forgot that this topic was centered on the enhance scanned pages feature, not the regular ocr feature. After testing again I have reproduced this yet again, and to my surprise, with the same input file as was originally in the ticket, the output size is even larger than before!

I have once again brought this to the dev team to emphasize that this is an issue that needs to be addressed.
Prior, I received the reply that it is expected that the files size can increase from the deskew process, this is because to deskew an image, we need to decompress the image, rotate, and then recompress it. Often due to the minute differences in the uncompressed image, the recompression process can result in widely varied results. Sometimes seeing very nice space savings, other times, not so much...

With all that said, It is certainly true that there should at the very least, not be any difference between a few months ago and now, when doing the same process to the same file. So we should still be able to improve this.

Kind regards,

Re: Enhance Scanned Pages

Posted: Fri Apr 12, 2019 10:29 pm
by Arnold
I used version 8 to create a deskewed version of the original file and it was only 263 kb. See post dated 04/12/19. Winstar88 then used the same file and came up with file size of 703 kb using Enhance Scanned Pages. Wouldn't the deskew process in either feature need to decompress the image, rotate, and then recompress it?

The difference in file sizes is pretty large. If they are using different methods, can the Enhanced OCR feature have a setting to deskew and or correct rotation without OCR then? I would use it instead of Enhanced Scanned Pages. The truth of the matter is that I do not use Enhance Scanned Pages anyway due to the file size bloat. I use Transform and rotate the image manually to keep file size about the same.

Thanks.