Microsoft Typography | Developer information | Specifications | OpenType font development
Khmer OpenType Specification | Terms | Shaping | Features | Other | Appendix


How the Khmer shaping engine works

The Uniscribe Khmer shaping engine processes text in stages. The stages are:

  1. Analyzing the syllables and reordering characters.
  2. Shaping (substituting) glyphs with OTLS (OpenType Library Services).
  3. Positioning glyphs with OTLS.

The descriptions which follow will help font developers understand the rationale for the Khmerf feature encoding model, and help application developers better understand how layout clients can divide responsibilities with operating system functions.


Analyzing the syllables and reordering characters

All Khmer syllables begin with a consonant, independent vowel, or number. The following should be considered as canonical ordering for Khmer Unicode input. The ordering is in the same order that the Khmer syllable is formed and produces the correct sort/search order. Any device using Khmer Unicode should use this input sequence order to correctly handle Khmer text. One complex and two simple constructs are elaborated below.

Canonical ordering
It is important for the user inputting the text to remember that although it is possible to input some of the formed sequences by using individual glyphs, the Unicode characters that are input must be in a correctly defined and consistent order for sorting and searching mechanisms to work. For example, a person might try to enter a syllable with U+17C1 and U+17B6. A user might think that they look the same as inserting U+17C4. However, the meaning is very different. Any devices, like text-to-speech, that require correct characters for correct output would not consider these the same. More importantly, the users attempt to incorrectly use U+17C1 and U+17B6 in the same syllable would result in breaking the rule of having only one vowel character per syllable.

Syllables beginning with consonants
Consonant based syllables are formed in the following order:

Cons + {COENG + (Cons | IndV)} + [PreV | BlwV] + [RegShift] + [AbvV] + {AbvS} + [PstV] + [PstS]

RegShift case The RegShift glyphs automatically take positioning based on the context of the vowel above. Normally, the RegShift will be rendered immediately above the base glyph. In the event that the RegShift character precedes an AbvV, the RegShift is normally rendered as a vertical stroke at the lowest extreme of the syllable. In some cases it is necessary to force the RegShift to be placed above the base glyph. In this case a ZERO WIDTH NON-JOINER (ZWNJ) is inserted between the RegShift and the AbvV to prevent the context rule of the shaping engine from being applied.

U+179F U+17CA U+17B8 (for a child or animal 'to eat') is an example where the below base form of TRIISAP is used.

U+1784 17C9 U+17B7 U+1780 U+1784 U+17C9 U+1780 U+17CB ('sulky') is an example where the first MUUSIKATOAN is in a below base form and the second in an above base form.

U+17A2 U+200D(ZWNJ) U+17CA U+17B7 U+17A2 U+17BB U+17CA U+17C7 is an interesting case where the first TRIISAP needs to be escaped, but the second does not (as there is a below base vowel)

An overview of the logic used when analyzing and reordering characters in the shaping engine looks something like the following;

  1. Khmer shaping assumes that a syllable will begin with a Cons, IndV, or Number.

  2. When a COENG + (Cons | IndV) combination are found (and subscript count is less than two) the character combination is handled according to the subscript type of the character following the COENG.

    1. Subscript Type 1 The COENG + (Cons | IndV) characters are assigned to have the blwf OpenType feature applied to them.
    2. Subscript Type 2 The COENG + RO characters are reordered to immediately before the base glyph. Then the COENG + RO characters are assigned to have the pref OpenType feature applied to them.
    3. Subscript Type 3 The COENG + Cons characters are assigned to have the pstf OpenType feature applied to them.

  3. When a RegShift character is followed by and AbvV character, the RegShift character is assigned have the blwf OpenType feature applied to change the shape to the below base form of the RegShift glyph (like U+17BB).

  4. When a AbvV character with KHF_ABVSPLIT assigned is found, the pre-base vowel part (U+17C1) is prepended to the beginning of the cluster. The AbvV character is then assigned to have the abvf OpenType feature applied so the glyph form is changed to the shape of the above vowel ( like U+17B8).

  5. When a PstV character with KHF_PSTSPLIT assigned is found, the pre-base vowel part (U+17C1) is prepended to the beginning of the cluster. The PstV character is then assigned to have the abvf OpenType feature applied so the glyph form is changed to the shape of the second half.


Shaping with OTLS

The first step Uniscribe takes in shaping the reordered character string is to apply the assigned layout features to the glyph string during the shaping process. These features, described and illustrated later in this document, are always applied in the order in which they are listed below.

Next, Uniscribe calls OTLS to apply the features. All OTL processing is divided into a set of predefined features (described and illustrated in the Feature section of this document). Each feature is applied, one by one, to the appropriate glyphs in the syllable and OTLS processes them. Uniscribe makes as many calls to the OTL Services as there are features. This ensures that the features are executed in the desired order.

The steps of the shaping process are outlined below.

Shaping features:

  1. Language forms
    1. Apply feature 'pref' to get pre based ligatures
    2. Apply feature 'blwf' to get below based ligatures or below base RegShift.
    3. Apply feature 'abvf' to Ro and the following COENG to get the Robat glyph, or to the AbvV that has KHF_ABVSPLIT to get the above glyph.
    4. Apply feature 'pstf' to get post base ligatures.

  2. Conjuncts and Typographical forms
    1. Apply feature 'pres' to get pre-base substitutions on the COENG RO glyph when there is a subscript type 1 on the syllable.
    2. Apply feature 'blws' to get below base substitutions that might be required for typographical correctness.
    3. Apply feature 'abvs' to get above base substitutions that might be required for typographical correctness.
    4. Apply feature 'psts' to get post base substations that might be required for typographical correctness. For example, a subscript type 3 glyph that needs to have a lower descent when a subscript type 1 glyph is on the syllable.
    5. Apply feature 'clig' to form ligatures that are desired for typographical correctness. For example, a subscript type 3 glyph that is followed by the OO glyph (U+17C4.secondhalf).


Positioning glyphs with OTLS

Uniscribe next applies features concerned with positioning, calling functions of OTLS to position glyphs.

Positioning features:

  1. Distances
    1. Apply feature 'dist' to adjust other distances, e.g. to provide kerning between post- and pre-base elements and the base glyph.

  2. Below-base marks
    1. Apply feature 'blwm' tto position below-base forms, vowel modifiers and or stress/tone marks on base glyph.

  3. Above-base marks
    1. Apply feature 'abvm' to position above-base forms, vowel modifiers and or stress/tone marks on base glyph.

  4. Mark to mark
    1. Apply feature 'mkmk' to position AbvS glyphs above AbvV glyphs or BlwV glyphs below subscript glyphs.



this page was last updated 26 February 2002
© 2001 Microsoft Corporation. All rights reserved. Terms of use.
comments to the MST group: how to contact us

 

Khmer OpenType Specification | Terms | Shaping | Features | Other | Appendix
Microsoft Typography | Developer information | Specifications | OpenType font development