Page 1 of 1

PDF-XChange Editor OCR Image for Chinese is not working

Posted: Fri Jan 26, 2018 4:21 am
by skycats
PDF-XChange Editor
Version: 7.0.323.2
Download: Zip Installer(32/64 bit) And OCR Chinese Languages Pack

When I use the ORC Image, parse strange characters. like as
1.png
this is the result of my copying
2.png
test image
Test Image.png

Re: PDF-XChange Editor OCR Image for Chinese is not working

Posted: Fri Jan 26, 2018 1:34 pm
by Tracker Supp-Stefan
Hello skycats,

Thanks for your post and the sample file.
While the image quality is not ideal I managed to get this result (for the first three lines of the main text):
本书的写柞缂起几年前找学习 WPFn 固为找是从 Windows Forms 开发转来做 wPF 开发的. 学习过程中
遇到帷多新概念 新特性, 其中包括 Data Binding. 路由鬟伴 命令~ 各种模板等- 我的工作风格足对于每个
新知识. 一定先把它理孵进伽 涮丑白再应用于项目中. 不然总感觉仗用起未不放心1 于是就对照已有的英

I can't quite tell if it's the correct result - but to me it looks almost as if it is.

I used "Medium" quality and "Preserve original content and add text layer on top".

Regards,
Stefan

Re: PDF-XChange Editor OCR Image for Chinese is not working

Posted: Fri Jan 26, 2018 6:51 pm
by skycats
Tracker Supp-Stefan wrote:Hello skycats,

Thanks for your post and the sample file.
While the image quality is not ideal I managed to get this result (for the first three lines of the main text):
本书的写柞缂起几年前找学习 WPFn 固为找是从 Windows Forms 开发转来做 wPF 开发的. 学习过程中
遇到帷多新概念 新特性, 其中包括 Data Binding. 路由鬟伴 命令~ 各种模板等- 我的工作风格足对于每个
新知识. 一定先把它理孵进伽 涮丑白再应用于项目中. 不然总感觉仗用起未不放心1 于是就对照已有的英

I can't quite tell if it's the correct result - but to me it looks almost as if it is.

I used "Medium" quality and "Preserve original content and add text layer on top".

Regards,
Stefan
thank you for your reply.
to see your results, I conducted some tests.
finally, I found that this is a bug

I printed a pdf file using other tools. see
test.pdf
(78.24 KiB) Downloaded 258 times
(above is the text, below is a screenshot)

when I use OCR Page, it works fine.
TIM截图20180127024630.png

Code: Select all

我是测试文本0 我是测试文本O 我是测试文本〇
I am test text. I am test text. I am test text.
when I use OCR Image, it does not work.
TIM截图20180127024921.png

Code: Select all

332%iflfliitfidio fi%ifilflifiilfio fiEifilflifiY$o
I am test text. I am test text. I am test text.
:D these strange characters, looks like a character encoding error

Re: PDF-XChange Editor OCR Image for Chinese is not working

Posted: Fri Jan 26, 2018 7:55 pm
by TrackerSupp-Daniel
By any chance, are you using english while trying to OCR? If you are you can download alternate OCR languages here:
https://www.pdf-xchange.com/pdf-xchange-viewer-ocr
I believe this is the most likely reason behind this situation as my test with English as the selected language net similar results to your tests.
xxasaq?wm?m?Fméiia wm Eli‘J-?iAk Windows Forms afii?usk WPF ?ié?. Ermi?'?
i?f'HtlifTM-?r. ??lk’i. ?q’?ie‘; Data Binding. 33-113344'. ¢¢~ ?-‘?'?i‘iyf- ??l?'m?x??f?“?
1154mm. —-2;i4wazmmsm mullai?r?m—?m??v. mma?mtmn?km?ru, fiit??????‘li
xvi—NH" MSDNi?-‘??ib?‘i?i??. ?-????. ?xixk?ni??ik?r?i. rk?k?i?éal??. Lit
w—+mm.wmmgm?n??iizék?m?m???mhu?.k?????n?giim#&m.
?jhi—$f¥3??. fi?Tskvufriél-Eiuk?. ii$4$tr?z$¢mt?7Mll??liia?gf¥~—«iii/xii
:1: WPF».

Conversely, with the chinese character pack installed
????????????? WPFn ????? wmdows FOnns????? wPF ???. ?????
??????? ???, ???? Data Binding. ???? ??~ ?????- ???????????
???. ????????? ????????? ???????????? ?????????
???? MSDN????????? ?-???, ???????????? ????????? ??
????? ?????? ?????????????????? ???????????????
???????? ???????????? ????????????????????((???
? WPF».
Significantly fewer errors.

Re: PDF-XChange Editor OCR Image for Chinese is not working

Posted: Sat Jan 27, 2018 5:36 pm
by skycats
TrackerSupp-Daniel wrote:By any chance, are you using english while trying to OCR? If you are you can download alternate OCR languages here:
https://www.pdf-xchange.com/pdf-xchange-viewer-ocr
I believe this is the most likely reason behind this situation as my test with English as the selected language net similar results to your tests.
xxasaq?wm?m?Fméiia wm Eli‘J-?iAk Windows Forms afii?usk WPF ?ié?. Ermi?'?
i?f'HtlifTM-?r. ??lk’i. ?q’?ie‘; Data Binding. 33-113344'. ¢¢~ ?-‘?'?i‘iyf- ??l?'m?x??f?“?
1154mm. —-2;i4wazmmsm mullai?r?m—?m??v. mma?mtmn?km?ru, fiit??????‘li
xvi—NH" MSDNi?-‘??ib?‘i?i??. ?-????. ?xixk?ni??ik?r?i. rk?k?i?éal??. Lit
w—+mm.wmmgm?n??iizék?m?m???mhu?.k?????n?giim#&m.
?jhi—$f¥3??. fi?Tskvufriél-Eiuk?. ii$4$tr?z$¢mt?7Mll??liia?gf¥~—«iii/xii
:1: WPF».

Conversely, with the chinese character pack installed
????????????? WPFn ????? wmdows FOnns????? wPF ???. ?????
??????? ???, ???? Data Binding. ???? ??~ ?????- ???????????
???. ????????? ????????? ???????????? ?????????
???? MSDN????????? ?-???, ???????????? ????????? ??
????? ?????? ?????????????????? ???????????????
???????? ???????????? ????????????????????((???
? WPF».
Significantly fewer errors.
please see #3.
I have installed the chinese character pack.
you can download test.pdf for testing.
try my steps.
(click image to see gif)
action1.gif
action2.gif

Re: PDF-XChange Editor OCR Image for Chinese is not working

Posted: Mon Jan 29, 2018 1:55 pm
by Tracker Supp-Stefan
Hi skycats,

Thanks for the report.
I now managed to reproduce the issue and have created a ticket in our internal system:
#4218: Editor 323.2: OCR image does not work the same as Document -> OCR Pages...
To allow our developers to investigate and get this fixed.

In the mean time please use the Document -> OCR pages as a workaround!

Regards,
Stefan

Re: PDF-XChange Editor OCR Image for Chinese is not working

Posted: Mon Jan 29, 2018 3:11 pm
by Sasha - Tracker Dev Team
This was fixed and will be available from the next release.

Cheers,
Alex