Does the PDF Format Support OpenType GSUB?
Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan
- Bhikkhu Pesala
- User
- Posts: 1776
- Joined: Tue May 29, 2007 9:29 am
- Location: East London
- Contact:
Does the PDF Format Support OpenType GSUB?
If I produce a PDF from Serif PagePlus X5 using an OpenType font with ligatures, a text stream such as f + f + i is substituted with the Alphabetic Presentation Form, ffi (hex U+FB03), so if someone copies a word like "difficult" from the PDF file, what they get is the text stream difficult, i.e. di(U+FB03)cult.
The OpenType font is embedded in the PDF file, so why isn't the text stream f+f+i used? Is this a limitation of the PDF format, or a limitation of PagePlus? How about PDF files with ligatures produced from InDesign or Word 2010?
The OpenType font is embedded in the PDF file, so why isn't the text stream f+f+i used? Is this a limitation of the PDF format, or a limitation of PagePlus? How about PDF files with ligatures produced from InDesign or Word 2010?
Windows 10 Home 64-bit • AMD Ryzen 5 3400G, 8 Gb
Review: http://www.softerviews.org/PDF-XChange.html
Review: http://www.softerviews.org/PDF-XChange.html
- Vasyl-Tracker Dev Team
- Site Admin
- Posts: 2353
- Joined: Thu Jun 30, 2005 4:11 pm
- Location: Canada
Re: Does the PDF Format Support OpenType GSUB?
Hi, Bhikkhu.
The PDF can contain ligatures for sure. In future we can add new option like "Expand existing ligatures when copying the text"...
Is it what you are speaking about?
Best
Regards.
The PDF can contain ligatures for sure. In future we can add new option like "Expand existing ligatures when copying the text"...
Is it what you are speaking about?
Best
Regards.
Vasyl Yaremyn
Tracker Software Products
Project Developer
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
Tracker Software Products
Project Developer
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
- Bhikkhu Pesala
- User
- Posts: 1776
- Joined: Tue May 29, 2007 9:29 am
- Location: East London
- Contact:
Re: Does the PDF Format Support OpenType GSUB?
That's not quite what I meant. I thought, that since the PDF contains the OpenType font, that the original text string would already be used in the PDF. That doesn't seem to be the case, and is perhaps not possible.Vasyl-Tracker Dev Team wrote:Hi, Bhikkhu.
The PDF can contain ligatures for sure. In future we can add new option like "Expand existing ligatures when copying the text"...
Is it what you are speaking about?
Best
Regards.
An option to expand the existing GSUBs back to their source text string when copying the text would indeed solve the problem. As it stands, my PDF files are not so useful for copying text from if I use ligatures or small capitals.
Ligatures: suffice
Small Capitals: E’ P
Windows 10 Home 64-bit • AMD Ryzen 5 3400G, 8 Gb
Review: http://www.softerviews.org/PDF-XChange.html
Review: http://www.softerviews.org/PDF-XChange.html
- Vasyl-Tracker Dev Team
- Site Admin
- Posts: 2353
- Joined: Thu Jun 30, 2005 4:11 pm
- Location: Canada
Re: Does the PDF Format Support OpenType GSUB?
Hello, Bhikkhu.
Please give us the your example file with E’ P for investigation. Thanks.
Best
Regards.
It depends on the creation program only. The PDF can contain the ligatures or clear(original) text. Note: the original source text can contain ligatures also.I thought, that since the PDF contains the OpenType font, that the original text string would already be used in the PDF.
Please give us the your example file with E’ P for investigation. Thanks.
Best
Regards.
Vasyl Yaremyn
Tracker Software Products
Project Developer
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
Tracker Software Products
Project Developer
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
- Bhikkhu Pesala
- User
- Posts: 1776
- Joined: Tue May 29, 2007 9:29 am
- Location: East London
- Contact:
Re: Does the PDF Format Support OpenType GSUB?
Just for good measure, here is a PDF with Small Capitals, Standard Ligatures, and Discretionary Ligatures.
- Attachments
-
- Ligatures.7z
- (9.56 KiB) Downloaded 82 times
Windows 10 Home 64-bit • AMD Ryzen 5 3400G, 8 Gb
Review: http://www.softerviews.org/PDF-XChange.html
Review: http://www.softerviews.org/PDF-XChange.html
- Tracker Supp-Stefan
- Site Admin
- Posts: 17910
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
- Contact:
Re: Does the PDF Format Support OpenType GSUB?
Thanks for the sample Bhikkhu,
Passed this to Victor who will have a look at it.
Best,
Stefan
Passed this to Victor who will have a look at it.
Best,
Stefan
- Lzcat - Tracker Supp
- Site Admin
- Posts: 677
- Joined: Thu Jun 28, 2007 8:42 am
Re: Does the PDF Format Support OpenType GSUB?
Hi, Bhikkhu.
Your pdf file does not use original character codes, it use glyph indexes (glyph is set of instructions which describe how to draw a character), so the original text can be obtained only using "reverse" mechanism.
This mean that any font in this file has additional information - how to translate the glyph index to Unicode, and this information is used for text copying/extraction. The problem is in codes for Ligatures and Small Capital letters - they are correct, but most fonts do not have glyphs and/or translation from this codes to corresponding glyphs, so when you copy such characters from PDF you see boxes.
Formally all is correct, but the result is not so good (Adobe produces the same results, maybe except "decoding" ligatures).
Your pdf file does not use original character codes, it use glyph indexes (glyph is set of instructions which describe how to draw a character), so the original text can be obtained only using "reverse" mechanism.
This mean that any font in this file has additional information - how to translate the glyph index to Unicode, and this information is used for text copying/extraction. The problem is in codes for Ligatures and Small Capital letters - they are correct, but most fonts do not have glyphs and/or translation from this codes to corresponding glyphs, so when you copy such characters from PDF you see boxes.
Formally all is correct, but the result is not so good (Adobe produces the same results, maybe except "decoding" ligatures).
Victor
Tracker Software
Project manager
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
Tracker Software
Project manager
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
- Bhikkhu Pesala
- User
- Posts: 1776
- Joined: Tue May 29, 2007 9:29 am
- Location: East London
- Contact:
Re: Does the PDF Format Support OpenType GSUB?
Decoding standard ligatures would certainly be useful — I would be pleasantly surprised if you could do any more than that when copying text.
Windows 10 Home 64-bit • AMD Ryzen 5 3400G, 8 Gb
Review: http://www.softerviews.org/PDF-XChange.html
Review: http://www.softerviews.org/PDF-XChange.html