Microsoft Typography | Developer information | Specifications | OpenType font development
Tamil OpenType Specification | Terms | Shaping | Features | Other | Appendix

How the Tamil shaping engine works

The Tamil shaping engine of Uniscribe processes text in stages. The stages are:

  1. Analyzing the syllables.
  2. Reordering characters.
  3. Shaping (substituting) glyphs with OTLS (OpenType Library Services).
  4. Positioning glyphs with OTLS.

The descriptions which follow will help font developers understand the rationale for the Tamil feature encoding model, and help application developers better understand how layout clients can divide responsibilities with operating system functions.

Analyzing the syllables

The syllable unit that the shaping engine receives for the purpose of shaping is a string of Unicode characters, in a sequence. These are not necessarily positioned within the sequence as they appear when composed into a syllable. Also a few of these units could have a priority in combining within the syllable to form another graphically distinct unit (e.g. akhand KSSA).

First, the shaping engine determines syllable boundaries and isolates certain parts and properties of a syllable. To be able to place syllable elements in required positions and find out the possibility of them combining to form graphically distinct units, the shaping engine analyzes the syllable and gets answers to questions like:

  • Is there a 'reph' formation in the syllable? A syllable starting with letter "Ra" + Halant may be flagged as a syllable containing a 'reph'.

  • Which consonants can form 'vattu' variants?

  • Given a consonant within the syllable, does it qualify as a pre-base/below-base/post-base form? Does it qualify as a 'halant' form?

Next, the base (full-form) consonant of the syllable is identified. All other elements are classified by their position relative to the base: pre-base, below-base, above-base and post-base.

Then the Tamil shaping engine splits matras that have components appearing on more than one side of the base glyph into the corresponding parts (pre-base, below-base, above-base or post-base parts).

Reordering characters

Uniscribe creates and manages a buffer of appropriately reordered character codes, delineated as "clusters." Uniscribe reorders character codes within clusters according to several rules (described below). Then, Uniscribe obtains the corresponding glyph string by passing the reordered character string to the glyph substitution function of the OTL Services.

Because glyph strings are obtained from reordered character strings, the features in an Tamil font must be encoded to map reordered characters (and combinations of characters) to their corresponding glyphs. Consequently, font developers are relieved of several layers of complexity in defining features - allowing Uniscribe to perform standard character reordering operations.

The character reordering rules of the Uniscribe Tamil shaping engine are described below. None of the rules need to be encoded in an OpenType font, as long as the font is to be used with Uniscribe (or another client that follows the Unicode Standard for character reordering). In fact, if a font developer attempted to encode such reordering information in an OpenType font, they would need to add a huge number of many-to-many glyph mappings to cover the very simple algorithms that Uniscribe uses.

Uniscribe always performs reordering operations in a specified order, as described below.

Starting with a syllable of one of the following forms:

{C + [Nukta] + H} + C + [M] + [VM] + [SM]

...or a syllable without vowels

{C + [Nukta] + H} + C + H

...or a syllable without consonants

VO + [VM] + [SM]

  1. The shaping engine finds the base consonant of the syllable, using the following algorithm: starting from the end of the syllable, move backwards until a consonant is found that does not have a below-base or post-base form (post-base forms have to follow below-base forms), or arrive at the first consonant. The consonant stopped at will be the base.

  2. If the base consonant is not the last one, Uniscribe moves the halant from the base consonant to the last one.

  3. If the syllable starts with Ra + H, Uniscribe moves this combination so that it follows the base consonant.

  4. Uniscribe splits two- or three-part matras into their parts. This splitting is a character-to-character operation). Then the left 'matra' is moved to immediately precede the base glyph (* see section Other Encoding Issues).

  5. Uniscribe classifies consonants and 'matra' parts as pre-base, above-base (Reph), below-base or post-base. This classification exists on the character code level and is language-dependent, not font-dependent.

  6. Uniscribe then groups elements of the syllable (consonants and 'matras') according to this classification. Pre-base elements will precede the base consonant. The above-base, below-base and post-base components will follow the base glyph.

    • Below-base Ra (vattu) will be positioned following the consonants on which it is placed (which could either be the base consonant or one of the pre-base consonants).

    • 'Halants' are moved with the consonants they affect.

After performing the character reordering steps, the sequence of characters will have one of the following forms:

For Tamil:

{Cpre + H} + [Mpre]* + Cbase + [Mabove] + [Mpost] + [VMpost]

(Out of Mpre, Mabove and Mpost different combinations can be present)

* Will be reordered at syllable start for shaping.

In the absence of a vowel, we'll have

{Cpre + [Nukta] + H + [Ra + H]vattu} + Cbase + [Ra + H]vattu + H

Finally, a syllable with independent vowel will look like

VO + [VM1] + [VM2]

Shaping with OTLS

The first step Uniscribe takes in shaping the character string is to map all characters to their nominal form glyphs. Then, Uniscribe applies contextual shape features to the glyph string.

Next, Uniscribe calls the OTL Services Library to shape the Tamil syllable. All OTL processing is divided into a set of predefined features (described and illustrated in the Feature section of this document). Each feature is applied, one by one, to the appropriate glyphs in the syllable and OTLS processes them. Uniscribe makes as many calls to the OTL Services as there are features. This ensures that the features are executed in the desired order.

The steps of the shaping process are outlined below.

Shaping features:

  1. Language forms
    1. Apply feature 'akhn' to get akhand ligatures.
    2. Apply feature 'half' to get half forms of pre-base

  2. Conjuncts and Typographical forms
    1. Apply feature 'pres' to get pre-base consonant conjuncts and pre-base matra conjuncts. (ie. consonant and matra conjuncts to the left of the base glyph).
    2. Apply feature 'abvs' to get above-base matra conjuncts; reph conjuncts; above-base vowel modifiers; and above-base stress and tone marks. (ie. reph and matra conjuncts, typographical forms and vowel modifier forms of above-base elements).
    3. Apply feature 'blws' to get below-base consonant conjuncts; below-base matra conjuncts; below-base vowel modifier forms; and below-base stress and tone mark forms. (ie. consonant and matra conjuncts; typographical forms; vowel modifier forms; and stress and tone mark forms of below-base elements).
    4. Apply feature 'psts' to get post-base consonant conjuncts, post-base matra conjuncts and post-base vowel modifiers. (ie. consonant and 'matra' conjuncts, typographical forms and vowel modifier forms of post-base elements).

  3. Halant form
    1. Apply feature 'haln' to put the base consonant in halant form (if the syllable ends with a halant).

      Note: The halant substitution is performed last to ensure that the base consonant is always in the full form during shaping.

Positioning glyphs with OTLS

Uniscribe next applies features concerned with positioning, calling functions of OTLS to position glyphs.

Positioning features:

  1. Above-base marks
    1. Apply feature 'abvm' to position above-base forms, vowel modifiers and or stress/tone marks (on base glyph or post-base matra).

  2. Below-base marks
    1. Apply feature 'blwm' to position below-base forms, vowel modifiers and or stress/tone marks.

  3. Distances
    1. Apply feature 'dist' to adjust other distances. (e.g. to provide kerning between post and pre-base elements and the base glyph).

this page was last updated 20 March 2002
© 2002 Microsoft Corporation. All rights reserved. Terms of use.
comments to the MST group: how to contact us


Tamil OpenType Specification | Terms | Shaping | Features | Other | Appendix
Microsoft Typography | Developer information | Specifications | OpenType font development