Microsoft Typography | Developer information | Specifications | OpenType font development
Indic OpenType Specification | Terms | Shaping | Features | Other | Appendix


Features for Indic scripts

The features listed below have been defined to create the basic forms for the scripts and languages that are supported on Indic systems. Regardless of the model an application chooses for supporting layout of complex scripts, Uniscribe requires a fixed order for executing features within a run of text to consistently obtain the proper basic form. This is achieved by calling features one-by-one in the standard order listed below.

The order of the lookups within each feature is also very important. For more information on lookups and defining features in OpenType fonts, see Encoding feature information in the OpenType font development section.

The standard order for applying Indic features encoded in OpenType fonts:
(Not all of the features listed below apply to all Indic script languages)

Feature Feature function Layout operation Required
Language based forms:
nukt Nukta form GSUB X
akhn Akhand ligature GSUB X
rphf Reph form GSUB X
blwf Below-base form GSUB X
half Half-form (pre-base form) GSUB X
pstf Post-base form GSUB X
vatu Vattu variants GSUB X
Conjuncts & typographical forms:
pres Pre-base substitution GSUB X
blws Below-base substitution GSUB X
abvs Above-base substitution GSUB X
psts Post-base substitution GSUB X
Halant forms:
haln Halant form substitution GSUB X
Positioning features:
blwm Below-base mark positioning GPOS
abvm Above-base mark positioning GPOS
dist Distances GPOS
       
[GSUB = glyph substitution, GPOS = glyph positioning]


Descriptions and examples of above features

Many of the registered features described and illustrated in this document are based on the OpenType font MANGAL, a Windows 2000 system font. MANGAL contains layout information and glyphs to support all of the required features for the Devanagari script and language systems supported. The MANGAL font is available for download in the Appendix of this document.

When using the MANGAL font to produce illustrations, the "Devanagari" script was active, and the Hindi language was chosen. Consequently, most feature illustrations show the basic form in Hindi, when the feature is applied. Some illustrations based on the scripts; Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, and Tamil are also included, though these are not a part of the MANGAL font.

Applications may need to display shaped and positioned clusters for an entire string of text, or for a string of text as it is being typed. In order to demonstrate how the basic form is arrived at, most of the illustrations in this document show how displayed text changes incrementally as each new character is input, and the entire cluster is reshaped.


Nukta

Feature Tag: "nukt"

This feature takes nominal (full) forms of consonants and produces nukta forms:

Cf + Nukta -> Kf-nukta

The nukta alters the way a preceding consonant is pronounced. Many of the nukta forms have been defined as separate glyphs in Unicode, with their own code points.

All nukta forms must be based on an input context consisting of the full form of consonants. All consonants in a font must have an associated nukta form, and nukta forms must exist in the font for all glyphs with akhand forms as well.

As a user types each character, the text-processing application will reshape the glyph or glyph cluster, which is displayed, as illustrated below. The example below uses the Devanagari script.

Nukta feature applied:


Akhand

Feature Tag: "akhn"

This feature creates an akhand ligature glyph from two consonants in nominal forms separated by a halant:

Cf + H + Cf -> Af

The input context for the akhand feature always consists of the full form of the consonant.

The example below uses the Devanagari script.
Two examples of the Akhand feature applied:


Reph

Feature Tag: "rphf"

Applying this feature produces the reph glyph:

Ra + H -> Reph

If the first consonant of the cluster consists of the (full form of Ra + Halant), this feature substitutes the combining-mark form of Reph. In addition, the glyph that represents the combining-mark form of Reph is repositioned in the glyph string: it is attached to the final base glyph of the consonant cluster.

The input context for the Reph feature always consists of the full form of Ra + Halant.

The example below uses the Devanagari script.
Mark glyph form substituted and repositioned:

In the following illustration, a longer cluster is formed, and the mark glyph form of the "RA" syllable is repositioned at the last consonant of the consonant cluster.

Reph feature applied with multiple consonants:


Below-base form

Feature Tag: "blwf"

Applying this feature produces below-base forms of consonants. These forms include Devanagari 'Ra'; Gujarati 'Ra'; Gurmukhi 'Ra', 'Ha' and 'Wa'; and most of the consonant forms in Oriya, Malayalam and Kannada/Telugu.

Kf + H -> Ks

The input context for the 'below-base form' feature must always consist of the full form of the consonant + Halant.

Note that for Devanagari and Gujarati the only consonant with a below-base form is Ra. Thus this feature produces the vattu glyph for those languages.

The feature 'below-base form' is applied to consonants having below-base forms and following the base consonant. The exception is vattu, which may appear below half forms as well as below the base glyph. The feature 'below-base form' will be applied to all such occurrences of Ra as well.

If a ligature is required between the vattu glyph and the preceding consonant, it will be handled by the feature 'Vattu Variants'.

As a user types each character, the text-processing application will reshape the glyph or glyph cluster, which is displayed, as illustrated below.

The example below uses the Devanagari script.

The example below uses the Oriya script.


Half form of consonant

Feature Tag: "half"

Applying this feature gives us half forms - forms of consonants used in pre-base position. Half forms must exist for all consonants in the font, and half forms of nukta consonants and Akhand consonants also must exist. Use the halant form for consonants that do not have distinct shapes for half forms.

Kf + H -> Kh

This feature is not applied to the base glyph even if the syllable ends with a halant.

The example below uses the Devanagari script.
Example 1, "Half form of Consonant" feature applied (shaded box)

Example 2, "Half form of Consonant" feature applied (shaded boxes)

Example 3, "Half form of Consonant" feature applied (shaded boxes)


Post-base form of consonant

Feature Tag: "pstf"

Applying this feature gives us post-base forms. Examples include Bengali and Oriya 'Ya' and Malayalam 'Ya', and 'Va'.

Kf + H -> Kp

The example below uses the Oriya script.


Vattu variants

Feature Tag: "vatu"

Vattu variants are formed when combining consonants with the vattu mark. Vattu ligatures can be either half or full form, and fonts must contain both.

Kh + V -> Lh-vattu

Kf + V -> Lf-vattu

Vattu is a below-base form of Ra that can occur (and form ligatures) anywhere in the syllable and not just after the base glyph. Since it is an exception we treat it here as such.

The input context for the 'vattu variants' feature must always consist of a consonant (in full or half form) + vattu glyph.

Very often a specific, context-dependent shape is required when vattu mark is combining with other consonants. We advise font developers to form ligatures of the consonant with the vattu glyph in all such cases, with lookup formats shown above. An additional example in Devanagari can been seen in the Below-base Form section.

The example below uses the Gurmukhi script.

The example below uses the Devanagari script.
Vattu ligature feature applied (shaded boxes)


Conjuncts and typographical forms

Feature Tag: "pres", "abvs", "blws", "psts"

All previous features have dealt with language features only, dedicated to forming glyph shapes dictated by the languages. The remaining shaping features cover optional features. Although it is hard to imagine a Devanagari font without any consonant conjuncts encoded within it, almost none are, strictly speaking, required. In fact, different fonts may contain different subsets.

Thus the range of features covered here spans from those that will exist in every font to rare typographical ornaments. It is important to stress once more however that all features discussed here operate only within one orthographic syllable.

Since the language features do not limit typographical processing here, Uniscribe passes the entire syllable to the OTL Services library. Uniscribe does not strictly specify the format of lookup tables to use or their inputs, allowing for context-dependent processing of any of the conjuncts and forms below.

OTL Services library processes the syllable "left to right", executing lookups in the order they are specified in the font. First, pre-base substitutions will be handled, then below-base, above-base and post-base ones.

Thus a font developer should first take care of all ligatures to the left of the base glyph and then work your way to the right, substituting below-bases, above-bases and then finally post-base elements. The lookups in the font should be ordered in the same way.

With every new element and feature, the following operations should be considered, as appropriate, in this order:

  • Ligatures with the base glyph
  • Ligatures with preceding (in the canonical syllable form below) elements, and
  • Contextual forms of the element

At every feature step, one should take into account all ligatures and forms that were produced by previous steps.

In general, at this point the syllable being shaped will have one of the following forms:


For Devanagari, Bengali and Gujarati:

[Mpre] + {Kh + [V]} + Kf +

[V] + [Mbelow] + [VMbelow] + [SMbelow] + [Mabove] +

[Mpost] + [Reph] + [VMabove] + [SMabove] + [VMpost]

(Only one of Mpre, Mbelow, Mabove or Mpost can be present)


For Gurmukhi:

[Mpre] + {Kh} + Kf +

[Ks] + [Mbelow] +

[Mabove] + [Kp] + [Mpost] + [VMabove]

(Only one of Mpre, Mbelow, Mabove or Mpost can be present)


For Oriya:

{Kh} + [Mpre] + Kf +

{Ks} + [Mbelow] +

[Mabove] + [Reph] + [VMabove] + [SMabove] +

[Kp] + [Mpost] + [VMpost]

(Out of Mpre, Mbelow, Mabove or Mpost at most two can be present)


For Reformed Malayalam:

{Kh} + [Mpre1]* + [Mpre2]* + [Rh]* + Kf +

{Ks} + [Mbelow] +

[Kp] + [Mpost] + [VMpost]

(Out of Mpre1, Mpre2, Mbelow or Mpost different combinations can be present)


For Traditional Malayalam:

{Kh} + [Mpre1]* + [Mpre2]* + [Rh]* + Kf +

{Ks} + [Mbelow] + [Reph]

[Kp] + [Mpost] + [VMpost]

(Out of Mpre1, Mpre2, Mbelow or Mpost different combinations can be present)


For Tamil:

{Kh} + [Mpre]* + Kf + [Mabove] + [Mpost] + [VMpost]

(Out of Mpre, Mabove and Mpost different combinations can be present)


For Telugu:

{Kh} + Kf + [Mabove] + [Mbelow] + [Mpost] + {Ks} +

{Kp} + [LMpost] + [VMpost]

(Out of Mabove, Mbelow, Mpost and LMpost different combinations can be present)


For Kannada:

{Kh} + Kf + [Mabove] + [Mpost] + {Ks} +

{Kp} + [LMpost] + [Reph] + [VMpost]

(Out of Mabove, Mbelow, Mpost and LMpost different combinations can be present)

* Will be reordered at syllable start for shaping.

In the absence of a vowel we have

{Kh} + Kf + [V] + H

Finally, a syllable with independent vowel will look like

VO + [VM1] + [SM]


Pre-base substitutions

Feature Tag: "pres"

Pre-base consonant conjuncts

This feature produces conjuncts with half forms, the type most common in Devanagari. The examples below are common, but you may define lookups for other forms as well.

{Kh} + Kf -> Lf

{Kh} + Lf -> Lf

If forms of pre-base consonants need to be changed (e.g. changing a half form to a halant form in a certain context) it is handled with this feature as well.

The example below uses the Devanagari script.

The example below uses the Gujarati script.

Pre-base Matra conjuncts

This feature produces the correct shape of I-Matra (in Devanagari and similar scripts) and also may take care of pre-base matra ligatures like Tamil 'elephant trunk' shape of AI-Matra. For example,

Mpre + {Kh} + Kf -> correct form of Mpre or a ligature

Mpre + {Kh} + Lf -> correct form of Mpre or a ligature

The example below uses the Devanagari script.


Below-base substitutions

Feature Tag: "blws"

Below-base consonant conjuncts

This feature produces conjuncts of the base glyph with below-base consonants. For example,

Kf + {Ks} -> Lf

Specific context-dependent forms or below-base consonants are handled by this lookup as well.

The example below uses the Malayalam script.


Below-base Matra conjuncts

This feature produces matra ligatures with the base consonants. For example,

Kf + Msub -> Ligature

Lf + Msub -> Ligature

In the presence of below-base consonants, the below-base matra can be used for a ligature with them or change shape as well.

The example below uses the Devanagari script.


Below-base Stress and Tone Marks

This feature produces the correct form of signs like anudatta, depending on context.


Above-base substitutions

Feature Tag: "abvs"

Above-base Matra Conjuncts

This feature produces the correct typographic shape when an above-base matra forms a ligature with the base glyph.

The example below uses the Kannada script.


Reph conjuncts

This feature produces conjuncts of the base glyph or matra with Reph.

The example below uses the Devanagari script.


Above-base vowel modifiers

This feature produces ligatures and forms involving above-base vowel modifiers.

The example below uses the Devanagari script.

Above-Base Stress and Tone Marks

This feature produces the correct form of signs above the base glyph. These signs include the udatta, acute and grave depending on context.


Post-base substitutions

Feature Tag: "psts"

Post-base consonant conjuncts

This feature produces ligatures of the base glyph with post-base forms of consonants.

The example below uses the Malayalam script.


Post-base Matra conjuncts

This feature produces the correct typographic shape when a post-base matra forms a ligature with the base glyph (as Tamil Uu-Matra would do) .

Note: This feature will be executed prior to any 'above-base' features in scripts that show similarity to Devanagari (because of different ordering).

The example below uses the Tamil script.


Post-base vowel modifiers

This feature produces different forms of post-base vowel modifiers, one among them is the visarga.

The example below uses the Devanagari script.


Halant form of consonants

Feature Tag: "haln"

This feature produces the halant form of the base glyph in syllables ending with a halant.

Kf + H -> Khalant

Lf + H -> Lhalant

One can also realize halant forms by positioning the halant as a below-base mark on the base glyph.

This feature is applied only on the base glyph and the following halant.

The example below uses the Devanagari script.
"Halant Form of Consonant" feature applied (shaded box)


In scripts like Malayalam, the halant form of certain consonants is represented by 'chillaksharams'. These can appear at any non-initial or final consonant location in a syllable.

The example below is the chillaksharam for consonant NNA:



Below-base marks

Feature Tag: "blwm"

This feature positions all below-base marks on the base glyph. The best method for encoding this feature in an OpenType font is to use a chaining context positioning lookup that triggers mark-to-base and mark-to-mark attachments for below-base marks.

The example below uses the Devanagari script.
"Below-base marks" feature applied (shaded box)


Above-base marks

Feature Tag: "abvm"

This feature positions all above-base marks on the base glyph or the post-base matra. The best method for encoding this feature in an OpenType font is to use a chaining context positioning lookup that triggers mark-to-base and mark-to-mark attachments for below-base marks. The example below uses the Devanagari script.

"Above-base marks" feature applied (shaded box)


Distances

Feature Tag: "dist"

This feature covers all other positioning lookups defining various distances between glyphs, such as kerning between pre- and post-base elements (like Visarga) and the base glyph.


More samples of Indic syllables

Interaction between some of the components, which when composed form an Indic syllable.

Complex syllable formation is easily possible using the wide range of features available in OpenType. The formation of two such syllables are illustrated below.

The examples below use the Devanagari script.



this page was last updated December 2001
© 2001 Microsoft Corporation. All rights reserved. Terms of use.
comments to the MST group: how to contact us

 

Indic OpenType Specification | Terms | Shaping | Features | Other | Appendix
Microsoft Typography | Developer information | Specifications | OpenType font development