Developing fonts > Specifications

Developing OpenType Fonts
for Bengali Script (2 of 3):
Shaping Engine

Please note: Microsoft's OpenType font specifications for Indic scripts have been revised. This version is for reference purposes only. An updated version of the Bengali specification can be found here.

The Bengali shaping engine of Uniscribe processes text in stages. The stages are:

  1. Analyze the syllables
  2. Reorder characters
  3. Shape (substitute) glyphs with OTLS (OpenType Library Services)
  4. Position glyphs with OTLS

The descriptions which follow will help font developers understand the rationale for the Bengali feature encoding model, and help application developers better understand how layout clients can divide responsibilities with operating system functions.


Analyze the syllables

The syllable unit that the shaping engine receives for the purpose of shaping is a string of Unicode characters, in a sequence. These are not necessarily positioned within the sequence as they appear when composed into a syllable. Also a few of these units could have a priority in combining within the syllable to form another graphically distinct unit (e.g. akhand KSSA).

First, the shaping engine determines syllable boundaries and isolates certain parts and properties of a syllable. To be able to place syllable elements in required positions and find out the possibility of them combining to form graphically distinct units, the shaping engine analyzes the syllable and gets answers to questions like:

  • Is there a 'reph' formation in the syllable? A syllable starting with letter "Ra" + Halant may be flagged as a syllable containing a 'reph'.

  • Which consonants can form 'vattu' variants?

  • Given a consonant within the syllable, does it qualify as a pre-base/below-base/post-base form? Does it qualify as a 'halant' form?

Next, the base (full-form) consonant of the syllable is identified. All other elements are classified by their position relative to the base: pre-base, below-base, above-base and post-base.

Then the Bengali shaping engine splits matras that have components appearing on more than one side of the base glyph into the corresponding parts (pre-base, below-base, above-base or post-base parts).


Reorder characters

Uniscribe creates and manages a buffer of appropriately reordered character codes, delineated as "clusters." Uniscribe reorders character codes within clusters according to several rules (described below). Then, Uniscribe obtains the corresponding glyph string by passing the reordered character string to the glyph substitution function of the OTL Services.

Because glyph strings are obtained from reordered character strings, the features in an Bengali font must be encoded to map reordered characters (and combinations of characters) to their corresponding glyphs. Consequently, font developers are relieved of several layers of complexity in defining features - allowing Uniscribe to perform standard character reordering operations.

The character reordering rules of the Uniscribe Bengali shaping engine are described below. None of the rules need to be encoded in an OpenType font, as long as the font is to be used with Uniscribe (or another client that follows the Unicode Standard for character reordering). In fact, if a font developer attempted to encode such reordering information in an OpenType font, they would need to add a huge number of many-to-many glyph mappings to cover the very simple algorithms that Uniscribe uses.

Uniscribe always performs reordering operations in a specified order, as described below.

Starting with a syllable of one of the following forms:

{C + [Nukta] + H} + C + [M] + [VM] + [SM]

...or a syllable without vowels

{C + [Nukta] + H} + C + H

...or a syllable without consonants

VO + [VM] + [SM]

  1. The shaping engine finds the base consonant of the syllable, using the following algorithm: starting from the end of the syllable, move backwards until a consonant is found that does not have a below-base or post-base form (post-base forms have to follow below-base forms), or arrive at the first consonant. The consonant stopped at will be the base.

    • If the syllable starts with Ra + H, Ra is excluded from candidates for base consonants.

  2. If the base consonant is not the last one, Uniscribe moves the halant from the base consonant to the last one.

  3. If the syllable starts with Ra + H, Uniscribe moves this combination so that it follows the base consonant.

  4. Uniscribe splits two-part matras into their parts. This splitting is a character-to-character operation). Then Uniscribe moves the left 'matra' part to the beginning of the syllable.

  5. Uniscribe classifies consonants and 'matra' parts as pre-base, above-base (Reph), below-base or post-base. This classification exists on the character code level and is language-dependent, not font-dependent.

  6. Uniscribe then groups elements of the syllable (consonants and 'matras') according to this classification. Pre-base elements will precede the base consonant. The above-base, below-base and post-base components will follow the base glyph.

    • If the syllable starts with Ra + H, Uniscribe moves this combination so that it follows the base consonant

      The Reph (Ra + H) and above base vowel modifiers will be positioned in the syllable before post base consonants and post based matras (if any); since these usually become marks on the base glyph.

    • Below-base Ra (vattu) will be positioned following the consonants on which it is placed (which could either be the base consonant or one of the pre-base consonants).

    • 'Halants' and 'nukta' marks are moved with the consonants they affect.

After performing the character reordering steps, the sequence of characters will have one of the following forms:

[Mpre] + {Cpre + [Nukta] + H} +Cbase +

{Cbelow + H} + [Mbelow] +

[Mabove] + [Ra + H]reph + [VMabove] +

[Cpost + H] + [Mpost] + [VMpost]

(Out of Mpre, Mbelow, Mabove or Mpost at most two can be present)


In the absence of a vowel, we'll have

{Cpre + [Nukta] + H + [Ra + H]vattu} + Cbase + [Ra + H]vattu + H

Finally, a syllable with independent vowel will look like

VO + [VM1] + [VM2]


Shape glyphs with OTLS

The first step Uniscribe takes in shaping the character string is to map all characters to their nominal form glyphs. Then, Uniscribe applies contextual shape features to the glyph string.

Next, Uniscribe calls the OTL Services Library to shape the Bengali syllable. All OTL processing is divided into a set of predefined features (described and illustrated in the Features section of this document). Each feature is applied, one by one, to the appropriate glyphs in the syllable and OTLS processes them. Uniscribe makes as many calls to the OTL Services as there are features. This ensures that the features are executed in the desired order.

The steps of the shaping process are outlined below. Not all of the features listed apply to all Bengali script languages.

Shaping features:

  1. Language forms
    1. Apply feature 'init' to get initial forms of consonants.
    2. Apply feature 'nukt' to get nukta forms of consonants.
    3. Apply feature 'akhn' to get akhand ligatures.
    4. Apply feature 'rphf' to Ra and the following halant to get the reph glyph.
    5. Apply feature 'blwf' to get below-base forms of consonants.
    6. Apply feature 'half' to get half forms of pre-base consonants.
    7. Apply feature 'pstf' to get post-base forms of consonants.
    8. Apply feature 'vatu' to get ligatures of the below-base form of 'Ra' and the preceding consonant.

  2. Conjuncts and Typographical forms
    1. Apply feature 'pres' to get pre-base consonant conjuncts and pre-base matra conjuncts. (ie. consonant and matra conjuncts to the left of the base glyph).
    2. Apply feature 'abvs' to get above-base matra conjuncts; reph conjuncts; above-base vowel modifiers; and above-base stress and tone marks. (ie. reph and matra conjuncts, typographical forms and vowel modifier forms of above-base elements).
    3. Apply feature 'blws' to get below-base consonant conjuncts; below-base matra conjuncts; below-base vowel modifier forms; and below-base stress and tone mark forms. (ie. consonant and matra conjuncts; typographical forms; vowel modifier forms; and stress and tone mark forms of below-base elements).
    4. Apply feature 'psts' to get post-base consonant conjuncts, post-base matra conjuncts and post-base vowel modifiers. (ie. consonant and 'matra' conjuncts, typographical forms and vowel modifier forms of post-base elements).

      Note: The post-base matra goes before the above-base part. Thus causing the 'Post-base Substitutions' feature to be executed before any above-base features.

  3. Halant form
    1. Apply feature 'haln' to put the base consonant in halant form (if the syllable ends with a halant).

      Note: The halant substitution is performed last to ensure that the base consonant is always in the full form during shaping.


Position glyphs with OTLS

Uniscribe next applies features concerned with positioning, calling functions of OTLS to position glyphs.

Positioning features:

  1. Above-base marks
    1. Apply feature 'abvm' to position above-base forms, vowel modifiers and or stress/tone marks (on base glyph or post-base matra).

  2. Below-base marks
    1. Apply feature 'blwm' to position below-base forms, vowel modifiers and or stress/tone marks.

  3. Distances
    1. Apply feature 'dist' to adjust other distances. (e.g. to provide kerning between post and pre-base elements and the base glyph).


Base elements

Commonly, a feature is required for dealing with the base glyph and one of the post-base, pre-base or above-base elements. Since it is not possible to reorder ALL of these elements next to the base glyph, we need to skip over the elements "in the middle" (reordering-wise).

The solution is to assign different mark attachment classes to different elements of the syllable and positional forms, and in any given lookup work with one mark type only. For example, in above-base substitutions we need only consider above-base elements most of the time.

Generally, it is good practice to label as "mark" glyphs that are denoted as marks in the Unicode Standard as well as below-base/above-base forms of consonants. Then, different attachment classes should be assigned to different marks depending on their position with respect to the base.


Invalid combining marks

Combining marks and signs that appear in text not in conjunction with a valid consonant base are considered invalid. Uniscribe displays these marks using the fallback rendering mechanism defined in the Unicode Standard (section 5.12, 'Rendering Non-Spacing Marks' of the Unicode Standard 3.1), i.e. positioned on a dotted circle.

Please note that to render a sign standalone (in apparent isolation from any base) one should apply it on a space (see section 2.5 'Combining Marks' of the Unicode Standard). Uniscribe requires a ZWJ to be placed between the space and a mark for them to combine into a standalone sign. (ie. to get a shape of I-matra without the dotted circle one should type + ZWJ + I-matra).

For the fallback mechanism to work properly, an Bengali OTL font should contain a glyph for the dotted circle (U+25CC). In case this glyph is missing from the font, the invalid signs will be displayed on the missing glyph shape (white box).

In addition to the 'dotted circle' other Unicode code points that are recommended for inclusion in any Bengali font are the ZWJ (zero width joiner; U+200C), the ZWNJ (zero width non-joiner; U+200D) and the ZWSP (zero width space; U+200B).

Next section:  Features

introduction | shaping engine | features | appendices

Top of page