Developing fonts > Specifications Developing OpenType Fonts On This Page
The Uniscribe Arabic shaping engine processes text in stages. The stages are:
The descriptions which follow will help font developers understand the rationale for the Arabic feature encoding model, and help application developers better understand how layout clients can divide responsibilities with operating system functions. The unit that the shaping engine receives for the purpose of shaping is a string of Unicode characters, in a sequence. The contextual analysis engine determines the correct contextual form the character should take, based on the character before and after it. The contextual shape maps to an OTL feature for that form (isol, init, medi, fina). Additionally, during the analysis process, the engine verifies valid diacritic combinations. For additional information, see the Handling Invalid Combining Marks section. The first step Uniscribe takes in shaping the character string is to map all characters to their nominal form glyphs (e.g. the glyph for U+0627). Then, Uniscribe applies contextual shape features to the glyph string. Next, Uniscribe calls OTLS to apply the features. All OTL processing is divided into a set of predefined features (described and illustrated in the Features section of this document). Each feature is applied, one by one, to the appropriate glyphs in the syllable and OTLS processes them. Uniscribe makes as many calls to the OTL Services as there are features. This ensures that the features are executed in the desired order. The steps of the shaping process are outlined below. Not all of the features listed apply to all Arabic script languages. Shaping features:
Uniscribe next applies features concerned with positioning, calling functions of OTLS to position glyphs. Positioning features:
Handling Invalid Combining Marks Combining marks and signs that appear in text not in conjunction with a valid consonant base are considered invalid. Uniscribe displays these marks using the fallback rendering mechanism defined in the Unicode Standard (section 5.12, 'Rendering Non-Spacing Marks' of the Unicode Standard 3.1), i.e. positioned on a dotted circle. Please note that to render a sign standalone (in apparent isolation from any base) one should apply it on a space (see section 2.5 'Combining Marks' of Unicode Standard 3.1). Uniscribe requires a ZWJ to be placed between the space and a mark for them to combine into a standalone sign. For the fallback mechanism to work properly, an Arabic OTL font should contain a glyph for the dotted circle (U+25CC). In case this glyph is missing form the font, the invalid signs will be displayed on the missing glyph shape (white box). In addition to the 'dotted circle,' other Unicode code points that are recommended for inclusion in any Arabic font are: ZWJ (zero width joiner U+200C), ZWNJ (zero width non-joiner; U+200D), LTR (left to right mark; U+200E), and RTL (right to left mark; U+200F). The ZWNJ can be used between two letters to prevent them from forming a cursive connection. ![]() If an invalid combination is found, like two fathas on the same base character, the diacritic that causes the invalid state is placed on a dotted circle to indicate to the user the invalid combination. The shaping engine for non-OpenType fonts will cause invalid mark combinations to overstrike. This is the problem that inserting the dotted circle for the invalid base solves. It should also be noted that the dotted circle is not inserted into the application's backing store. This is a run-time insertion into the glyph array that is returned from the ScriptShape function. The invalid diacritic logic for Arabic is based on the classes listed below. There is a check to make sure more than one mark of a class is not placed on the same base. Additionally, DIAC1 and DIAC2 classes should not be applied on the same base character.
Next section: Features introduction | shaping engine | features | appendices |