Microsoft Typography | Developer information | Specifications | OpenType font development
Lao OpenType Specification | Terms | Shaping | Features | Other | Appendix


Other encoding issues


Handling invalid combining marks

Combining marks and signs that appear in text not in conjunction with a valid consonant base are considered invalid. Uniscribe displays these marks using the fallback rendering mechanism defined in the Unicode Standard (section 5.12, 'Rendering Non-Spacing Marks' of the Unicode Standard 3.0), i.e. positioned on a dotted circle.

For the fallback mechanism to work properly, a Lao OTL font should contain a glyph for the dotted circle (U+25CC). In case this glyph is missing from the font, the invalid signs will be displayed on the missing glyph shape (white box).

In addition to the 'dotted circle' other Unicode code points that are recommended for inclusion in any Lao font is the ZWSP (zero width space; U+200B). Lao words are not separated by spaces, therefore the ZWSP can be used for word boundaries since it will allow for word wrapping at the end of a line. Some applications will use a lexical lookup to do word wrapping without needing ZWSP characters.

If an invalid combination is found, the diacritic that causes the invalid state is placed on a dotted circle to indicate to the user the invalid combination. The shaping engine for non-OpenType fonts will cause invalid mark combinations to overstrike. This is the problem that inserting the dotted circle for the invalid base solves. It should also be noted that the dotted circle is not inserted into the application's backing store. This is a run-time insertion into the glyph array that is returned from the ScriptShape function.

The invalid diacritic logic for Lao is based on the classes listed below. There is a check to make sure more than one mark of a class is not placed on the same base.

Class Description Code points
ABOVE1 Above mark closest to base U+0EB1, U+0EB4, U+0EB5, U+0EB6, U+0EB7, U+0EBB, U+0ECD
ABOVE2 Second level above mark U+0EC8, U+0EC9, U+0ECA, U+0ECB, U+0ECC
BELOW1 Below mark closest to base U+0EBC
BELOW2 Second level below mark U+0EB8, U+0EB9
AM The AM character is decomposed into two glyphs (NIGGAHITA and AA). The NIGGAHITA is of class ABOVE1. U+0EB3



this page was last updated 5 April 2002
© 2002 Microsoft Corporation. All rights reserved. Terms of use.
comments to the MST group: how to contact us

 

Lao OpenType Specification | Terms | Shaping | Features | Other | Appendix
Microsoft Typography | Developer information | Specifications | OpenType font development