Microsoft Typography | Developer information | Specifications | OpenType font development
Arabic OpenType Specification | Terms | Shaping | Features | Other | Appendix

Other encoding issues

Handling invalid combining marks

Combining marks and signs that appear in text not in conjunction with a valid consonant base are considered invalid. Uniscribe displays these marks using the fallback rendering mechanism defined in the Unicode Standard (section 5.12, 'Rendering Non-Spacing Marks' of the Unicode Standard 3.1), i.e. positioned on a dotted circle.

Please note that to render a sign standalone (in apparent isolation from any base) one should apply it on a space (see section 2.5 'Combining Marks' of Unicode Standard 3.1). Uniscribe requires a ZWJ to be placed between the space and a mark for them to combine into a standalone sign.

For the fallback mechanism to work properly, an Arabic OTL font should contain a glyph for the dotted circle (U+25CC). In case this glyph is missing form the font, the invalid signs will be displayed on the missing glyph shape (white box).

In addition to the 'dotted circle' other Unicode code points that are recommended for inclusion in any Arabic font are; ZWJ (zero width joiner U+200C), ZWNJ (zero width non-joiner; U+200D), LTR (left to right mark; U+200E), and RTL (right to left mark; U+200F). The ZWNJ can be used between two letters to prevent them from forming a cursive connection.

If an invalid combination is found, like two fathas on the same base character, the diacritic that causes the invalid state is placed on a dotted circle to indicate to the user the invalid combination. The shaping engine for non-OpenType fonts will cause invalid mark combinations to overstrike. This is the problem that inserting the dotted circle for the invalid base solves. It should also be noted that the dotted circle is not inserted into the application's backing store. This is a run-time insertion into the glyph array that is returned from the ScriptShape function.

The invalid diacritic logic for Arabic is based on the classes listed below. There is a check to make sure more than one mark of a class is not placed on the same base. Additionally, DIAC1 and DIAC2 classes should not be applied on the same base character.

Class Description Code points
DIAC1 Arabic above diacritics U+064B, U+064C, U+064E, U+064F, U+0652, U+0657, U+0658, U+06E1
DIAC2 Arabic below diacritics U+064D, U+0650, U+0656
DIAC3 Arabic seat shadda U+0651
DIAC4 Arabic Qur'anic marks above U+0610 - U+0614, U+0659, U+06D6 - U+06DC, U+06DF, U+06E0, U+06E2, U+06E4, U+06E7, U+06E8, U+06EB, U+06EC
DIAC5 Arabic Qur'anic marks below U+06E3, U+06EA, U+06ED
DIAC6 Arabic superscript alef U+0670
DIAC7 Arabic madda U+0653
DIAC8 Arabic madda U+0654, U+0655

this page was last updated 13 August 2002
© 2001 Microsoft Corporation. All rights reserved. Terms of use.
comments to the MST group: how to contact us


Arabic OpenType Specification | Terms | Shaping | Features | Other | Appendix
Microsoft Typography | Developer information | Specifications | OpenType font development