What is this? When I try to copy this text, it looks like it is text, but not what I see. Instead totally random characters.
It is not image, so can't fix it by OCR. What to do? How is it possible that I see text correctly but when I copy or edit it, it is unreadable.
I see in Content pane that it is really text. How to fix it? How is it possible for that to exist?
Sample file: https://mo.ks.gov.ba/sites/mo.ks.gov.ba/files/zakon_o_osnovama_sigurnosti_saobracaja_na_putevima_u_bosni_i_hercegovini_glasnik_broj_6_06.pdf
For example the word ZAKON which is bosnian word for LAW and which appears on first page in large bold letters near bottom left, is shown as yz{|}. All words are shown as random characters in Content pane or when copied, so search is useless. But somehow all are perfectly displayed.
Text that is neither text nor image, but totally another text and symbols SOLVED
Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan
- TrackerSupp-Daniel
- Site Admin
- Posts: 8613
- Joined: Wed Jan 03, 2018 6:52 pm
Re: Text that is neither text nor image, but totally another text and symbols SOLVED
Hello, asocialis
What this indicates is that the font embedded within the document was not embedded properly, and so its glyph association data is corrupt. What this leads to is the font assigning (usually in order of appearance) character codes to the glyphs on screen, so if you had a sentance that wrote the following:
"This is a test"
it may come out as:
"abcdecdefeagda"
This is not an issue within our software, but an issue within the document itself.
Now, I should note that to resolve this, OCR is certainly an option, simply open the OCR pages dialog, and ensure you disable the option to ignore existing text. Then set the OCR to create editable text and images, and it will scan and then replace the original text with properly formatted text that uses a real font.
Kind regards,
What this indicates is that the font embedded within the document was not embedded properly, and so its glyph association data is corrupt. What this leads to is the font assigning (usually in order of appearance) character codes to the glyphs on screen, so if you had a sentance that wrote the following:
"This is a test"
it may come out as:
"abcdecdefeagda"
This is not an issue within our software, but an issue within the document itself.
Now, I should note that to resolve this, OCR is certainly an option, simply open the OCR pages dialog, and ensure you disable the option to ignore existing text. Then set the OCR to create editable text and images, and it will scan and then replace the original text with properly formatted text that uses a real font.
Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com