Microsoft Typography | Developer information | Specifications | OpenType font development
Khmer OpenType Specification | Terms | Shaping | Features | Other | Appendix

Other encoding issues

Interaction between below-base, post-base and above-base elements

Commonly, a feature is required for dealing with the base glyph and one of the post-base, pre-base or above-base elements. Since it is not possible to reorder ALL of these elements next to the base glyph, we need to skip over the elements "in the middle" (reordering-wise).

The solution is to assign different mark attachment classes to different elements of the syllable and positional forms, and in any given lookup work with one mark type only. For example, in above-base substitutions we need only consider above-base elements most of the time.

Generally, it is good practice to mark as "mark" glyphs that are denoted as marks in the Unicode standard as well as below-base/above-base forms of consonants. Then, different attachment classes should be assigned to different marks depending on their position with respect to the base.

Handling invalid combining marks

Combining marks and signs that appear in text not in conjunction with a valid consonant base are considered invalid. Uniscribe displays these marks using the fallback rendering mechanism defined in the Unicode standard (section 5.12, 'Rendering Non-Spacing Marks' of Unicode Standard 3.1), i.e. positioned on a dotted circle.

Please note that to render a sign standalone (in apparent isolation from any base) one should apply it on a space (see section 2.5 'Combining Marks' of Unicode Standard 3.1).

For the fallback mechanism to work properly, a Khmer OTL font should contain a glyph for the dotted circle (U+25CC). In case this glyph is missing form the font, the invalid signs will be displayed on the missing glyph shape (white box).

In addition to the "dotted circle" other Unicode code points that are recommended for inclusion in any Khmer font are the ZWJ (zero width joiner; U+200C), the ZWNJ (zero width non-joiner; U+200D) and the ZWSP (zero width space; U+200B) which can be used for word boundaries.

If an invalid combination is found; more than one vowel character in a syllable, more than two subscripts on the same base character, or the incorrect ordering of subscripts, a new cluster will be formed that has the dotted circle as the base glyph. The shaping engine for non-OpenType fonts will cause invalid mark combinations to overstrike. This is the problem that inserting the dotted circle for the invalid base solves. It should also be noted that the dotted circle is not inserted into the application's backing store. This is a run-time insertion into the glyph array that is returned from the ScriptShape function.

The invalid diacritic logic for Khmer is based on the classes listed below.

Class Description Code points
XXXX NEED CLASS INFO HERE and unicode points

this page was last updated 26 February 2002
© 2001 Microsoft Corporation. All rights reserved. Terms of use.
comments to the MST group: how to contact us


Khmer OpenType Specification | Terms | Shaping | Features | Other | Appendix
Microsoft Typography | Developer information | Specifications | OpenType font development