Does the PDF Format Support OpenType GSUB?

The PDF-XChange Viewer for End Users
+++ FREE +++

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
User avatar
Bhikkhu Pesala
User
Posts: 1776
Joined: Tue May 29, 2007 9:29 am
Location: East London
Contact:

Does the PDF Format Support OpenType GSUB?

Post by Bhikkhu Pesala »

If I produce a PDF from Serif PagePlus X5 using an OpenType font with ligatures, a text stream such as f + f + i is substituted with the Alphabetic Presentation Form, ffi (hex U+FB03), so if someone copies a word like "difficult" from the PDF file, what they get is the text stream difficult, i.e. di(U+FB03)cult.

The OpenType font is embedded in the PDF file, so why isn't the text stream f+f+i used? Is this a limitation of the PDF format, or a limitation of PagePlus? How about PDF files with ligatures produced from InDesign or Word 2010?
Windows 10 Home 64-bit • AMD Ryzen 5 3400G, 8 Gb
Review: http://www.softerviews.org/PDF-XChange.html
User avatar
Vasyl-Tracker Dev Team
Site Admin
Posts: 2353
Joined: Thu Jun 30, 2005 4:11 pm
Location: Canada

Re: Does the PDF Format Support OpenType GSUB?

Post by Vasyl-Tracker Dev Team »

Hi, Bhikkhu.

The PDF can contain ligatures for sure. In future we can add new option like "Expand existing ligatures when copying the text"...
Is it what you are speaking about?

Best
Regards.
Vasyl Yaremyn
Tracker Software Products
Project Developer

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
User avatar
Bhikkhu Pesala
User
Posts: 1776
Joined: Tue May 29, 2007 9:29 am
Location: East London
Contact:

Re: Does the PDF Format Support OpenType GSUB?

Post by Bhikkhu Pesala »

Vasyl-Tracker Dev Team wrote:Hi, Bhikkhu.

The PDF can contain ligatures for sure. In future we can add new option like "Expand existing ligatures when copying the text"...
Is it what you are speaking about?

Best
Regards.
That's not quite what I meant. I thought, that since the PDF contains the OpenType font, that the original text string would already be used in the PDF. That doesn't seem to be the case, and is perhaps not possible.

An option to expand the existing GSUBs back to their source text string when copying the text would indeed solve the problem. As it stands, my PDF files are not so useful for copying text from if I use ligatures or small capitals.

Ligatures: suffice

Small Capitals: E’ P
Windows 10 Home 64-bit • AMD Ryzen 5 3400G, 8 Gb
Review: http://www.softerviews.org/PDF-XChange.html
User avatar
Vasyl-Tracker Dev Team
Site Admin
Posts: 2353
Joined: Thu Jun 30, 2005 4:11 pm
Location: Canada

Re: Does the PDF Format Support OpenType GSUB?

Post by Vasyl-Tracker Dev Team »

Hello, Bhikkhu.
I thought, that since the PDF contains the OpenType font, that the original text string would already be used in the PDF.
It depends on the creation program only. The PDF can contain the ligatures or clear(original) text. Note: the original source text can contain ligatures also.
Please give us the your example file with E’ P for investigation. Thanks.

Best
Regards.
Vasyl Yaremyn
Tracker Software Products
Project Developer

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
User avatar
Bhikkhu Pesala
User
Posts: 1776
Joined: Tue May 29, 2007 9:29 am
Location: East London
Contact:

Re: Does the PDF Format Support OpenType GSUB?

Post by Bhikkhu Pesala »

Just for good measure, here is a PDF with Small Capitals, Standard Ligatures, and Discretionary Ligatures.
Attachments
Ligatures.7z
(9.56 KiB) Downloaded 82 times
Windows 10 Home 64-bit • AMD Ryzen 5 3400G, 8 Gb
Review: http://www.softerviews.org/PDF-XChange.html
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17910
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Does the PDF Format Support OpenType GSUB?

Post by Tracker Supp-Stefan »

Thanks for the sample Bhikkhu,

Passed this to Victor who will have a look at it.

Best,
Stefan
User avatar
Lzcat - Tracker Supp
Site Admin
Posts: 677
Joined: Thu Jun 28, 2007 8:42 am

Re: Does the PDF Format Support OpenType GSUB?

Post by Lzcat - Tracker Supp »

Hi, Bhikkhu.
Your pdf file does not use original character codes, it use glyph indexes (glyph is set of instructions which describe how to draw a character), so the original text can be obtained only using "reverse" mechanism.

This mean that any font in this file has additional information - how to translate the glyph index to Unicode, and this information is used for text copying/extraction. The problem is in codes for Ligatures and Small Capital letters - they are correct, but most fonts do not have glyphs and/or translation from this codes to corresponding glyphs, so when you copy such characters from PDF you see boxes.

Formally all is correct, but the result is not so good (Adobe produces the same results, maybe except "decoding" ligatures).
Victor
Tracker Software
Project manager

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
User avatar
Bhikkhu Pesala
User
Posts: 1776
Joined: Tue May 29, 2007 9:29 am
Location: East London
Contact:

Re: Does the PDF Format Support OpenType GSUB?

Post by Bhikkhu Pesala »

Decoding standard ligatures would certainly be useful — I would be pleasantly surprised if you could do any more than that when copying text.
Windows 10 Home 64-bit • AMD Ryzen 5 3400G, 8 Gb
Review: http://www.softerviews.org/PDF-XChange.html
Post Reply