Microsoft Typography | Developer information | Specifications | OpenType font development
Indic OpenType Specification | Terms | Shaping | Features | Other | Appendix

Other encoding issues

Interaction between below-base, post-base and above-base elements

Commonly, a feature is required for dealing with the base glyph and one of the post-base, pre-base or above-base elements. Since it is not possible to reorder ALL of these elements next to the base glyph, we need to skip over the elements "in the middle" (reordering-wise).

The solution is to assign different mark attachment classes to different elements of the syllable and positional forms, and in any given lookup work with one mark type only. For example, in above-base substitutions we need only consider above-base elements most of the time.

Generally, it is good practice to label as "mark" glyphs that are denoted as marks in the Unicode Standard as well as below-base/above-base forms of consonants. Then, different attachment classes should be assigned to different marks depending on their position with respect to the base.

Left Matras in Malayalam and Tamil

In these languages the left (part of a) 'matra' is not placed in front of the whole syllable but immediately precedes the base glyph.

The problem is that in presence of (font-dependent) consonant conjuncts it is impossible to predict to where the 'matra' should be reordered so that consonant conjunct ligatures don't have to "skip over" it.

Although the Tamil script uses only one consonant conjunct (KSSA), conjuncts are in abundance in Malayalam.

To solve the problem, Uniscribe always places the pre-base 'matras' at the beginning of the syllable for shaping. Then, for the above-mentioned scripts Uniscribe will reorder it before the base glyph at the end of script shape routine for correct placement.

Chillaksharams in Malayalam

Some consonants in Malayalam have more than one way of representing consonant followed by halant (chilla). These forms are distinct from the parent consonant and do not have a visible chilla. These appear only at non-initial or final consonant locations in syllables. Known as Chillaksharams, they are treated as halant forms by the shaping engine. Consonants that have been identified to possess a chillaksharam form are: KA, NNA, NA, RA, LA, LLA. Their respective chillaksharams are: IK, INN, IN, IR, IL, ILL.

Handling invalid combining marks

Combining marks and signs that appear in text not in conjunction with a valid consonant base are considered invalid. Uniscribe displays these marks using the fallback rendering mechanism defined in the Unicode Standard (section 5.12, 'Rendering Non-Spacing Marks' of the Unicode Standard 3.1), i.e. positioned on a dotted circle.

Please note that to render a sign standalone (in apparent isolation from any base) one should apply it on a space (see section 2.5 'Combining Marks' of the Unicode Standard). Uniscribe requires a ZWJ to be placed between the space and a mark for them to combine into a standalone sign. (ie. to get a shape of I-matra without the dotted circle one should type + ZWJ + I-matra).

For the fallback mechanism to work properly, an Indic OTL font should contain a glyph for the dotted circle (U+25CC). In case this glyph is missing from the font, the invalid signs will be displayed on the missing glyph shape (white box).

In addition to the 'dotted circle' other Unicode code points that are recommended for inclusion in any Indic font are the ZWJ (zero width joiner; U+200C), the ZWNJ (zero width non-joiner; U+200D) and the ZWSP (zero width space; U+200B).

this page was last updated December 2001
© 2001 Microsoft Corporation. All rights reserved. Terms of use.
comments to the MST group: how to contact us


Indic OpenType Specification | Terms | Shaping | Features | Other | Appendix
Microsoft Typography | Developer information | Specifications | OpenType font development