Home   All Products  |   Support  |   Search  | Home  

Microsoft Typography | Developer information | Windows glyph processing
Introduction | Overview | Glyph processing in detail | Conclusion and notes


We have seen in this article how glyph processing extends the capabilities of fonts and text layout applications, and we have seen practical demonstrations of why this technology is needed to process the world's many complex writing systems and the languages that use them. We've also seen, for example in the use of localized glyph forms for Serbian, how this technology can respond to the typographical preferences of particular user communities even as it provides a universal text processing solution using the Unicode Standard for character encoding. Not least, we have seen how glyph processing can enable rich typographic features that can be applied to text without font switching, custom encodings or other primitive solutions that threaten the semantic integrity of the text.

Many people will look at these capabilities and identify complex script support as an obvious priority; after all, such support is necessary to allow users of these scripts to communicate at a basic level. Text processing for complex scripts is glyph processing, and the challenges of implementing OpenType Layout features simply cannot be avoided. It seems likely then, that application developers looking to prioritize resource allocation for implementing glyph processing support are likely to view correct Arabic shaping as much more important than, for example, adding smallcaps support for the Latin script.

It is indeed difficult to overestimate the importance of Arabic shaping, but it is easy to underestimate the importance of Latin smallcaps. It is easy to think of typography as simply making text look nice, and so to treat many elements of typography as frills: luxuries that applications can get around to supporting when they have dealt with all the more important things.

I want to conclude this article by suggesting why support for sophisticated typography belongs among the important things, and not relegated to a frill. The role of typography is not to prettify text, but to articulate it. That it does so in an aesthetic way—utilizing all the art it can draw from its own heritage, the heritage of manuscript tradition, and individual creative vision—should not disguise the expressive and organizational relationship of typography to text. A typographic culture, such as the one in which you engage as you read this article, is a system of visual indicators that helps readers navigate text and helps writers express their ideas. In the 550 years since Gutenberg developed metal type casting at Mainz, the printed Latin script has developed a particularly rich typographic culture, using romans, italics, bold type, smallcaps, ligatures, swash forms, etc., to organize and articulate the texts of hundreds of languages around the world. Other scripts have developed equally complex and adaptive systems, some more complex and sophisticated, while others are only just beginning their typographic journey. Typography is part of how the human race expresses itself, individually and collectively. Sadly, whenever support for a particular aspect of a typographic culture is limited, texts are inevitably rendered less expressive of the ideas they contain than they might be. As if we were forced to speak in a monotone, we cannot fully articulate what we need to say.

Software developers, of course, have to prioritize, to allocate resources carefully, and they need to ship a product in a marketplace that will not wait for them to do everything they might like to do. The Windows glyph processing model provides such developers with a powerful set of helper functions and system components to provide users with rich, expressive typographical controls. The OpenType format provides font developers with ways to add considerable typographic intelligence to their fonts: intelligence that, with proper application support, can help users articulate their ideas with the full register of their typographic voices.


1. The term 'complex script' refers to any writing system that requires some degree of character reordering and/or glyph processing to display, print or edit. In other words, scripts for which Unicode logical order and nominal glyph rendering of codepoints do not result in acceptable text. Such scripts, examples of which are Arabic and the numerous Indic scripts descended from the Brahmi writing system, are generally identifiable by their morphographic characteristics: the changing of the shape or position of glyphs as determined by their relationship to each other. It should be noted that such processing is not optional, but is essential to correctly rendering text in these scripts. Additional glyph processing to render appropriately sophisticated typography may be desirable beyond the minimum required to make the text readable.

2. For more information about the Unicode Standard and the work of the Unicode Consortium and its technical committees, see I recommend that anyone developing Unicode applications or fonts invest in a copy of the published Unicode Standard Version 3.0 (ISBN 0-201-61633-5).

3. DLL is an abbreviation for Dynamic Link Library. A DLL is a collection of small programs that can be called by a larger program to perform a specific task or series of tasks. The principal advantage of DLLs is that they are only loaded into random access memory when called, otherwise keeping memory free for other processes.

4. Swash letters are stylised, flourished variants of their staid typographic cousins, generally found accompanying italic fonts but occasionally in romans. They are at home in the major European scripts—Latin, Cyrillic and Greek—but have relatives around the world. The term corsiva literally means cursive, a word that is often applied to any type style that displays some aspects of a handwritten model. In this instance it is used, as it was by renaissance scribes, to distinguish the informal flowing style of ascender from the more carefully formed formata seen in the first example.

5. This example is adapted from the chapter on Arabic typesetting in Théotiste Lefevre's Guide Pratique du Compositeur et de L'Imprimeur Typographes. The business of complex script and multilingual typography is not new, nor was it in the 1870s when this book was first published. If anything, digital typography is only just beginning to catch up with its analogue history.

6. API is an abbreviation for Application Programming Interface. An API is a function, or set of functions, that applications use to take advantage of system components. For example, amost all Windows applications that process plain text use the common TextOut or ExtTextOut system APIs to draw text.

7. The Aitreyopanishad or Aitareya Upanishad is one of the classical Hindu spiritual texts collectively known as the Upanishads. A meditation on the singular spirit that creates the universe out of His own being, the Aitreyopanishad was written around 550BC. The sentence from which we have chosen the sample word comes from the end of the third and final section of this Upanishad. It was translated by F. Max Müller, the emminent 19th Century German Sanskritist, as: He (Vamadeva), having by this conscious self stepped forth from this world, and having obtained all desires in that heavenly world, became immortal, yea, he became immortal.

8. In The Unicode Standard Version 3.0, the Devanagari shaping rules are explained in §9.1, pp. 211-223. Devanagari provides the model for the shaping of other Indic scripts in Unicode, and this section should be read in conjunction with the descriptions of script shaping requirements for Bengali, Gurmukhi, Gujarati, etc.. Note that Thai, although closely related to the Indic scripts, has unique requirements that justify a separate Thai shaping engine in Uniscribe (see Part Two). This may also be true of other Indic related scripts that have not been included in Uniscribe yet, such as Burmese (Myanmar).

9. Here is a syllable-by-syllable—i.e. cluster-by-cluster—description of the Indic OpenType Layout features applied to our sample Sanskrit word. These are the features used to render this text string in the Mangal Devanagari UI font; other OpenType fonts may well employ more full conjunct form substitutions in place of the half forms used heavily in Mangal. The codepoint sequence of each cluster is presented in bold type, followed by a Latin transliteration of the syllable, followed by a detailed description of the applied OTL features and their effects.

092A 0930 094D : pra : The 'Below-Base Forms' feature is applied to substitute the 0930 (ra) and 094D (halant, vowel killer) with a default below-base form of 0930 (vattu). The 'Below-Base Substitutions' feature is then applied to 092A (pa) and the default below-base form (vattu) to render the required ligature.

091C 094D 091E 0947 : jñe : The 'Akhand' feature is applied to replace 091C (ja) 094D (halant) 091E (nya) with the necessary ligature. The 'Above-Base Mark Positioning' feature is applied to position 0947 (e matra, vowel sign) above the ligature.

0928 093E : na : No OpenType Layout features required.

0924 094D 092E : tma : The 'Half Forms' feature is applied to the combination 0924 (ta) 094D (halant) to substitute the half form of 0924 (t).

0928 093E : na : No OTL features required.

0938 094D 092E 093E : sma : The 'Half Forms' feature is applied to the combination 0938 (sa) 094D (halant) to substitute the half form of 0938 (s).

0932 094D 0932 094B : llo : The 'Half Forms' feature is applied to the combination 0932 (la) 094D (halant) to substitute the half form of 0932 (l).

0915 093E : ka : No OTL features required.

0926 0941 : du : The 'Below-Base Mark Positioning' feature is applied to position 0941 (u matra) below 0926 (da).

0924 094D 0915 0930 094D : tkra : The 'Half Forms' feature is applied to the combination 0924 (ta) 094D (halant) get the half form of 0924 (t). The 'Below-Base Form' feature is applied to 0930 (ra) and 094D (halant) to substitute it with a default below base form of 0930 (vattu). The 'Below-base Substitutions' is then applied to 0915 (ka) and the default below base form (vattu) to render the required ligature.

092E 094D 092F 093E : mya : The 'Half Forms' feature is applied to the combination 092E (ma) 094D (halant) to substitute the half form of 092E (m).

092E 0941 : mu : No OTL features required.

093F 0937 094D 092E : smi : The 'Pre-Base Substitutions' feature is applied to get the desired glyph variant of 093F (i matra); this feature substitutes one of five different i matras in the Mangal font that are designed to fit over different widths of consonants and conjuncts.

0928 094D 0938 094D 0935 : nsva : The 'Half Forms' feature is applied to the combinations 0928 (na) 094D (halant) and 0938 (sa) 094D (halant) to substitute the respective half forms of 0928 (n) and 0938 (s).

0917 0947 0930 094D : rge : The 'Reph' feature is applied to substitute 0930 (ra) and 094D (halant) with a default glyph for the reph (the above-base form of ra). The 'Above-Base Substitutions' feature is then applied to 0947 (e matra) and the reph to substitute them with a composite. The 'Above-Base Mark Positioning' feature is applied to position the composite above 0917 (ga).

I am indebted to Apurva Joshi of Microsoft Typography for providing the sample Sanskrit word and for assisting me in preparing this example of complex script rendering. Thanks are also due to Dr Anthony P. Stone for providing the Latin transliteration.

10. Both the English and Turkish samples are from the work of the Turkish poet Nazim Hikmet.

11. These are the opening lines of 'Belgrade, April 1944', by the Yugoslav poet Miodrag Pavlović. The opening of the poem is translated, by Bernard Johnson: The bombers | removed your house | and your room | and took the notebook | out of your hand....

12. CJKV is a common software development abbreviation for Chinese, Japanese, Korean and Vietnamese. It is also frequently encountered simply as CJK, since the Chinese ideographs are no longer part of the day-to-day writing system of most Vietnamese. The best source of information on Far Eastern text and digital typography is Ken Lunde's book CJKV Information Processing.

13. A kashida is a lengthening stroke that extends, often very dramatically, certain Arabic letters in traditional calligraphic styles and fine typography.

this page was last updated 7 November 2000
© 2000 Microsoft Corporation. All rights reserved. Terms of use.
comments to the MST group: how to contact us


Introduction | Overview | Glyph processing in detail | Conclusion and notes
Microsoft Typography | Developer information | Windows glyph processing