Strange output in resulting pdf

PDF-X OCR SDK is a New product from us and intended to compliment our existing PDF and Imaging Tools to provide the Developer with an expanding set of professional tools for Optical Character Recognition tasks

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan

joost
User
Posts: 7
Joined: Mon Dec 05, 2011 9:16 pm

Strange output in resulting pdf

Post by joost »

Hi there,

When ocr-ing a pdf the resulting (incorrect) text is as follows:

ľŏōŒŘœŝōŒ ŞŏŝŞŚŖŋŘ
ĶřŋŎ ŋŘŎ ĽŞŜŏŝŝ ľŏŝŞ Đ
ĽţŝŞŏŗ ijŘŞŏőŜŋŞœřŘ ľŏŝŞ
etc. etc.

This is when selecting dutch as the language and the result is no dutch at all, if i use english the resulting pdf is ok. I downloaded and use the complete langauge pack. Can i solve this?
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: Strange output in resulting pdf

Post by Walter-Tracker Supp »

joost wrote:Hi there,

When ocr-ing a pdf the resulting (incorrect) text is as follows:

ľŏōŒŘœŝōŒ ŞŏŝŞŚŖŋŘ
ĶřŋŎ ŋŘŎ ĽŞŜŏŝŝ ľŏŝŞ Đ
ĽţŝŞŏŗ ijŘŞŏőŜŋŞœřŘ ľŏŝŞ
etc. etc.

This is when selecting dutch as the language and the result is no dutch at all, if i use english the resulting pdf is ok. I downloaded and use the complete langauge pack. Can i solve this?
We will investigate this immediately. If it is not confidential, could you send us the PDF you are working with? You can send it to support@pdf-xchange.com, with "attention: Walter" in the subject. Otherwise we will do our best to reproduce with other PDF inputs.

-Walter
joost
User
Posts: 7
Joined: Mon Dec 05, 2011 9:16 pm

Re: Strange output in resulting pdf

Post by joost »

Walter-Tracker Supp wrote: We will investigate this immediately. If it is not confidential, could you send us the PDF you are working with? You can send it to support@pdf-xchange.com, with "attention: Walter" in the subject. Otherwise we will do our best to reproduce with other PDF inputs.
-Walter
Thanks for the quick response Walter, i've mailed you an example PDF
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: Strange output in resulting pdf

Post by Walter-Tracker Supp »

I have been unable to reproduce the problem with the provided sample PDF. I wonder if it is a unicode vs. ASCII issue? The text in the PDF should be UTF-8 encoded. It is conceivable that whatever method you are using to extract the text layer from the PDF is using ASCII (8-bit) encoding.
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: Strange output in resulting pdf

Post by Walter-Tracker Supp »

Hi Joost,

We have diagnosed this and will provide a fix shortly. It had something to do with a small bug with PDF level unicode / character encodings, which caused some problems with certain PDF viewers.

A new PDF-X OCR DLL which fixes this will be up by the end of the working week (DLL version 1.0.5).

As always we really appreciate you bringing this to our attention!

-Walter
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: Strange output in resulting pdf

Post by Walter-Tracker Supp »

Hi Joost,

The new version has been built and it resolves this issue with Dutch language files (and greatly improves memory consumption!). I understand that you are a Clarion developer, so you will need to wait for the Clarion build to be available, which will be very soon.

-Walter
joost
User
Posts: 7
Joined: Mon Dec 05, 2011 9:16 pm

Re: Strange output in resulting pdf

Post by joost »

Walter-Tracker Supp wrote:Hi Joost,

The new version has been built and it resolves this issue with Dutch language files (and greatly improves memory consumption!). I understand that you are a Clarion developer, so you will need to wait for the Clarion build to be available, which will be very soon.

-Walter
Hi walter, im very pleased how you managed to resolve this issue so quick. Im not a clarion developer,can i already download this new version somewhere? the "live version" in the downloads is of last month
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: Strange output in resulting pdf

Post by Walter-Tracker Supp »

joost wrote:
Walter-Tracker Supp wrote:Hi Joost,

The new version has been built and it resolves this issue with Dutch language files (and greatly improves memory consumption!). I understand that you are a Clarion developer, so you will need to wait for the Clarion build to be available, which will be very soon.

-Walter
Hi walter, im very pleased how you managed to resolve this issue so quick. Im not a clarion developer,can i already download this new version somewhere? the "live version" in the downloads is of last month
I will contact you via email shortly!

-Walter