get chinese pages in conversion

This Forum is for the use of End Users requiring help and assistance for Tracker Software's PDF-Tools Version 4

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Sean - Tracker, Chris - Tracker Supp, Tracker Supp-Stefan

Post Reply
sinecure
User
Posts: 15
Joined: Thu Nov 22, 2012 7:11 pm

get chinese pages in conversion

Post by sinecure » Tue Apr 07, 2015 4:59 pm

Here is a weird one and it may not be a Tracker issue at all but thought I would ask if anyone had seen such a thing before and if there is a solution or should some software be changed.
I scanned a 33 page document with an Epson WF-3640 and it produced a 15 meg PDF. I then used PDF-Tools 4 (latest version) to convert it to a Word file .doc. It seems to have done the job but inserted all the pages in Chinese too. Chinese does certainly not appear on the original in any way. I tried compressing the PDF using http://smallpdf.com/compress-pdf and converted it again -same result. I will have to go through and delete each of the Chinese pages to use the Word Doc.
Does this make any sense to anyone?
I have attached the smaller compressed version as the orig is 15 meg and the resulting word doc which regrettably is 15 meg.
Attachments
HighpointCurrent.compressed.pdf
(4.95 MiB) Downloaded 102 times

User avatar
Will - Tracker Supp
Site Admin
Posts: 6816
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: get chinese pages in conversion

Post by Will - Tracker Supp » Tue Apr 07, 2015 5:27 pm

Hi sinecure,

Thanks for the post - there is actually an OCR layer (invisible text layer) placed on top of the page in the document that you uploaded and, it appears, this is the source of the Chinese characters. Did you use our OCR to OCR the document, or was it sent to you like this?

Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com

sinecure
User
Posts: 15
Joined: Thu Nov 22, 2012 7:11 pm

Re: get chinese pages in conversion

Post by sinecure » Tue Apr 07, 2015 5:52 pm

What I did was scan as PDF using the Epson scan program. It produced the .pdf. I then used the convert to .doc tool not the OCR tool.

Now I have saved the Epson produced pdf with Tracker to produce a new pdf. I OCR it using Tracker and it is a good PDF that is OCR'ed. So I tried to convert that to a .doc with Tracker and it did it but with all sort of anomalies. A line or two of Chinese here and there some strange spacing.
I think you hit the nail on the head when you found there was a hidden layer of Chinese in the original PDF barring Tracker from being able to do much.
I'm going to ask Epson but if you have any more wisdom I would relish it.

User avatar
Will - Tracker Supp
Site Admin
Posts: 6816
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: get chinese pages in conversion

Post by Will - Tracker Supp » Tue Apr 07, 2015 6:27 pm

Hi sinecure,

If you're using the Epson Scan software to scan this, then it is likely that they have a post processing option to automatically OCR scanned documents. I would try turning this off (I can't advise on how, as I'm not familiar with the software). Doing this should stop the Chinese character placement, though the Word document will only contain images, as there would no longer be any text layer.

HTH!
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com

sinecure
User
Posts: 15
Joined: Thu Nov 22, 2012 7:11 pm

Re: get chinese pages in conversion

Post by sinecure » Tue Apr 07, 2015 7:32 pm

Thanks for all this but I have now gone to Epson for their thoughts or whatever. If and when I get something that is worth reporting I will post it here but for the moment forget about this dumb thing.

User avatar
Will - Tracker Supp
Site Admin
Posts: 6816
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: get chinese pages in conversion

Post by Will - Tracker Supp » Tue Apr 07, 2015 7:47 pm

Not a problem, do keep us posted!
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com

Post Reply