Page 1 of 1

get chinese pages in conversion

Posted: Tue Apr 07, 2015 4:59 pm
by sinecure
Here is a weird one and it may not be a Tracker issue at all but thought I would ask if anyone had seen such a thing before and if there is a solution or should some software be changed.
I scanned a 33 page document with an Epson WF-3640 and it produced a 15 meg PDF. I then used PDF-Tools 4 (latest version) to convert it to a Word file .doc. It seems to have done the job but inserted all the pages in Chinese too. Chinese does certainly not appear on the original in any way. I tried compressing the PDF using http://smallpdf.com/compress-pdf and converted it again -same result. I will have to go through and delete each of the Chinese pages to use the Word Doc.
Does this make any sense to anyone?
I have attached the smaller compressed version as the orig is 15 meg and the resulting word doc which regrettably is 15 meg.

Re: get chinese pages in conversion

Posted: Tue Apr 07, 2015 5:27 pm
by Will - Tracker Supp
Hi sinecure,

Thanks for the post - there is actually an OCR layer (invisible text layer) placed on top of the page in the document that you uploaded and, it appears, this is the source of the Chinese characters. Did you use our OCR to OCR the document, or was it sent to you like this?

Cheers,

Re: get chinese pages in conversion

Posted: Tue Apr 07, 2015 5:52 pm
by sinecure
What I did was scan as PDF using the Epson scan program. It produced the .pdf. I then used the convert to .doc tool not the OCR tool.

Now I have saved the Epson produced pdf with Tracker to produce a new pdf. I OCR it using Tracker and it is a good PDF that is OCR'ed. So I tried to convert that to a .doc with Tracker and it did it but with all sort of anomalies. A line or two of Chinese here and there some strange spacing.
I think you hit the nail on the head when you found there was a hidden layer of Chinese in the original PDF barring Tracker from being able to do much.
I'm going to ask Epson but if you have any more wisdom I would relish it.

Re: get chinese pages in conversion

Posted: Tue Apr 07, 2015 6:27 pm
by Will - Tracker Supp
Hi sinecure,

If you're using the Epson Scan software to scan this, then it is likely that they have a post processing option to automatically OCR scanned documents. I would try turning this off (I can't advise on how, as I'm not familiar with the software). Doing this should stop the Chinese character placement, though the Word document will only contain images, as there would no longer be any text layer.

HTH!

Re: get chinese pages in conversion

Posted: Tue Apr 07, 2015 7:32 pm
by sinecure
Thanks for all this but I have now gone to Epson for their thoughts or whatever. If and when I get something that is worth reporting I will post it here but for the moment forget about this dumb thing.

Re: get chinese pages in conversion

Posted: Tue Apr 07, 2015 7:47 pm
by Will - Tracker Supp
Not a problem, do keep us posted!