Typography Home Typography Home

Developing fonts > Specifications

Developing OpenType Fonts
for Khmer Script (1 of 3):
Introduction

Microsoft Typography
February 2002

This document presents information that will help font developers create or support OpenType fonts for the Khmer script languages covered by the Unicode Standard.

This is a multi-page specification. To access specific pages, use the Contents section below, or the navigation bar at the bottom of each page.

Contents

Introduction

In this specification, font developers will learn how to encode complex script features in their fonts, choose character sets, organize font information, and use existing tools to produce Khmer fonts. Registered features of the Khmer script are defined and illustrated, encodings are listed, and templates are included for compiling Khmer layout tables for OpenType fonts.

This document also presents information about the Khmer OpenType shaping engine of Uniscribe, the Windows component responsible for text layout.

In addition to being a primer and specification for the creation and support of Khmer fonts, this document is intended to more broadly illustrate the OpenType Layout architecture, feature schemes, and operating system support for shaping and positioning text.

Glossary

The following terms are useful for understanding the layout features and script rules discussed in this document.

Base Glyph – The one and only consonant, independent vowel, or number in the syllable that is written in its “full” (nominal) form. In Khmer, the first consonant or independent vowel of the syllable usually forms the base glyph. Layout operations are defined in terms of a base glyph, not a base character, since the results of the shaping process are a series of glyphs.

U+17D2 (COENG) – Code point before a consonant or independent vowel, which causes the formation of the subscript form of that letter. The COENG is always tied to the letter following it and is always handled as a unit with the following letter. NOTE: The shape of the COENG is arbitrary and is not rendered.

Consonant- Represents a single consonant sound. Consonants may exist in different contextual forms, and have an inherent vowel (usually, the long vowel “A”). Therefore, those illustrated in the examples to follow are named, for example, “KA” and “TA,” rather than just “K” or “T.”

Consonant Shifters – Used to shift the base consonant between registers (U+17C9, U+17CA).

Khmer Syllable – Effective orthographic “unit” of Khmer writing systems, consisting of a consonant and a vowel core, and optionally with one or two subscripts inserted between the two, and followed by signs. Syllables are composed of consonant letters, independent vowels, dependant or inherent vowels, and signs. In a text sequence, these characters are stored in phonetic order, although they may not be represented in phonetic order when displayed. Once a syllable is shaped, it is indivisible (but deletions of its characters may take place starting from the end). The cursor cannot be positioned within the syllable. Transformations discussed in this document do not cross syllable boundaries.

ROBAT (U+17CC) – Above-base or combining form of the letter RO, used in most scripts if RO is the first consonant in the syllable and is not the base consonant. It is ordered as one would write the text. For example, the word KARMA would be encoded as KA + MA + ROBAT.

Subscript Glyph – Subscript form of a consonant or independent vowel. An example of this is COENG KA. Subscripts are formed by a combination of COENG glyph, followed by a consonant or independent vowel. COENG does not have a conventional visual form in Khmer, as it is a control character to cause the formation of a subscript. An implementer of Khmer should never allow the entry of the COENG character by itself.

There are three types of subscript glyphs: Type 1 is positioned below the base glyph; Type 2 is the form (currently only COENG RO) that has an “arm” for spacing the left side of the base glyph; and Type 3 is the subscript form with an “arm” for spacing the right side of the base glyph.

There may be up to two subscript glyphs per base glyph, which may be of different subscript types. The ordering of subscripts must be in the order of: Type 1 (may be doubled), Type 2, and then Type 3 (may be doubled). Exceptions to this ordering are invalid, and will cause a new cluster to be formed that has the dotted circle glyph as the base glyph.

Vowel - A Khmer syllable is permitted to have only one vowel. In the notation, four different types are indicated, based on the position when rendered. There are five vowels (U+17BE, U+17BF, U+17C0, U+17C4, U+17C5), composed of two glyph pieces, although these two pieces are treated as one vowel in the backing store. The shaping engine will take care of pre-pending the syllable, with the glyph piece shaped like U+17C1.

Notation

The following notation is used in this document to illustrate layout operations:

Cons – Consonant character

IndV – Independent vowel character

COENG – The COENG code

PreV – Vowel that is positioned before the base glyph; it is not possible to have both a PreV and a PstV in the same syllable; vowels that have both prebase and postbase glyphs (U+17BF, U+17C0, U+17C4, U+17C5) are classified as PstV; the shaping engine will take care of prepending the U+17C1 glyph to the syllable

BlwV – Vowel that is positioned below the base glyph; a base glyph cannot have both a BlwV and an AbvV (the combination U+17BB + U+17C6 is of a vowel and a sign)

RegShift – Triisap or Muusikatoan character that is normally situated immediately above the Base glyph, but often changes to an ambiguous glyph at the extreme below position when there is an above-base vowel/vowel-part glyph

AbvV - Vowel that is positioned above the base glyph; a base glyph cannot have both a BlwV and an AbvV; note as above: the combination U+17BB + U+17C6 is of a vowel and a sign; the vowel with pre and above glyphs (U+17BE) is considered an AbvV; in this case, the shaping engine prepends the U+17C1 to the beginning of the syllable

AbvS – A sign character that is positioned above the base glyph

Robat – The Robat glyph

PstV – Vowel that is positioned after the base glyph; in some cases, the PstV has a part (U+17C1) that is prepended to the the syllable; therefore, a PreV and a PstV cannot exist in the same syllable

PstS – Sign character that is positioned after the base glyph

Nikahit – Sign which on its own or in combination with vowel characters creates a constructed vowel; it adds an ‘m’ or ‘n’ sound; this is classified as an AbvS

Reahmuk – Sign which on its own or in combination with vowel characters creates a constructed vowel; it adds an ‘h’ aspiration; this is classified as a PstS

{ } – Indicates 0 to 2 occurrences

[ ] – Indicates 0 or 1 occurrence

| – Exclusive OR

+ - Cumulative AND

Next section:  Shaping Engine

introduction | shaping engine | features | appendix


Top of page