Microsoft Typography | Developer information | Specifications | OpenType font development
Khmer OpenType Specification | Terms | Shaping | Features | Other | Appendix


Terms

The following terms are useful for understanding the layout features and script rules discussed in this document.

Base Glyph – the one and only consonant, independent vowel, or number in the syllable that is written in its “full” (nominal) form. In Khmer, the first consonant or independent vowel of the syllable usually forms the base glyph. Layout operations are defined in terms of a base glyph, not a base character, since the results of the shaping process are a series of glyphs.

U+17D2 (COENG) – the code point before a consonant or independent vowel which causes the formation of the subscript form of that letter. The COENG is always tied to the letter following it and is always handled as a unit with the following letter. Note: the shape of the COENG is arbitrary and is not rendered.

Consonant – each represents a single consonant sound. Consonants may exist in different contextual forms, and have an inherent vowel (usually, the long vowel “A”). Therefore, those illustrated in the examples to follow are named, for example, “KA” and “TA,” rather than just “K” or “T.”

Consonant Shifters – used to shift the base consonant between registers. (U+17C9, U+17CA)

Khmer Syllable –the effective orthographic “unit” of Khmer writing systems; consisting of a consonant and a vowel core, and optionally with one or two subscripts inserted between the two and followed by signs. Syllables are composed of consonant letters, independent vowels, dependant or inherent vowels, and signs. In a text sequence, these characters are stored in phonetic order (although they may not be represented in phonetic order when displayed). Once a syllable is shaped, it is indivisible (but deletions of its characters may take place starting from the end). The cursor cannot be positioned within the syllable. Transformations discussed in this document do not cross syllable boundaries.

ROBAT – (U+17CC) the above-base or combining form of the letter RO that is used in most scripts if RO is the first consonant in the syllable and is not the base consonant. The ROBAT is entered in the order in which one would write the text. Thus, the word KARMA would be encoded as KA + MA + ROBAT.

Subscript Glyph – The subscript form of a consonant or independent vowel. An example of this is COENG KA. Subscripts are formed by a combination of COENG glyph followed by a consonant or independent vowel. COENG does not have a conventional visual form in Khmer as it is a control character to cause the formation of a subscript. An implementer of Khmer should never allow the entry of the COENG character by itself.

There are three types of subscript glyphs. Subscript Type 1 is positioned below the base glyph. Subscript Type 2 is the form (currently only COENG RO) that has an “arm” that causes spacing on the left side of the base glyph. Subscript Type 3 is the subscript form that has an “arm” that causes spacing on the right side of the base glyph.

There may be up to two subscript glyphs per base glyph. The subscripts for a base may be of different subscript types. The ordering of subscripts must be in the order of: Subscript Type 1 (may be doubled), Subscript Type 2, and then Subscript Type 3 (may be doubled). Any subscripts not in this ordering are considered to be illegal ordering and will cause a new cluster to be formed that has the dotted circle glyph as the base glyph.

Vowel – a Khmer syllable is permitted to have only one vowel. In the notation there are four different types of vowels indicated, based on the positioning of the vowel when it is rendered. There are five vowels (U+17BE, U+17BF, U+17C0, U+17C4, U+17C5) that are composed of two glyph pieces. Although there are two glyph pieces, these are considered as one vowel in the backing store. The shaping engine will take care of pre-pending the syllable with the glyph piece shaped like U+17C1.

Notation - The following notation is used in this document to illustrate layout operations:

Cons – A consonant character.

IndV – An independent vowel character.

COENG – The COENG code.

PreV – A vowel that is positioned before the base glyph. It is not possible to have both a PreV and a PstV in the same syllable. Those vowels that have both prebase and postbase glyphs (U+17BF, U+17C0, U+17C4, U+17C5) are classified as PstV. The shaping engine will take care of prepending the U+17C1 glyph to the syllable.

BlwV – A vowel that is positioned below the base glyph. It is not possible that a base glyph will have both a BlwV and an AbvV (the combination U+17BB + U+17C6 is of a vowel and a sign).

RegShift – A Triisap or Muusikatoan character that is normally situated immediately above the Base glyph, but often changes to an ambiguous glyph at the extreme below position when there is an above-base vowel/vowel-part glyph.

AbvV – A vowel that is positioned above the base glyph. It is not possible that a base glyph will have both a BlwV and an AbvV. Note as above: The combination U+17BB + U+17C6 is of a vowel and a sign. The vowel with pre and above glyphs (U+17BE) is considered an AbvV. In this case the shaping engine prepends the U+17C1 to the beginning of the syllable.

AbvS – A sign character that is positioned above the base glyph.

Robat – The Robat glyph.

PstV – A vowel that is positioned after the base glyph. In some cases the PstV has a part (U+17C1) that is prepended to the beginning of the syllable. Thus, it is not possible to have both a PreV and a PstV in the same syllable.

PstS – A sign character that is positioned after the base glyph.

Nikahit – A sign which on its own or in combination with vowel characters creates a constructed vowel. It adds an ‘m’ or ‘n’ sound. This is classified as an AbvS.

Reahmuk – A sign which on its own or in combination with vowel characters creates a constructed vowel. It adds an ‘h’ aspiration. This is classified as a PstS.

[ ] – Indicates 0 or 1 occurrence

{ } – Indicates 0 to 2 occurrences

| – Exclusive OR

+ – Cumulative AND



this page was last updated 26 February 2002
© 2001 Microsoft Corporation. All rights reserved. Terms of use.
comments to the MST group: how to contact us

 

Khmer OpenType Specification | Terms | Shaping | Features | Other | Appendix
Microsoft Typography | Developer information | Specifications | OpenType font development