Typography Home Typography Home

Developing fonts > Specifications

Developing OpenType Fonts
for Gurmukhi Script (1 of 3):
Introduction

Microsoft Typography
March 2002

Please note: Microsoft's OpenType font specifications for Indic scripts have been revised. This version is for reference purposes only. An updated version of the Gurmukhi specification can be found here.

This document presents information that will help font developers create or support OpenType fonts for the Gurmukhi script covered by the Unicode Standard. Gurmukhi is closely related to Devanagari and is used to write the Punjabi language in the Punjab in India.

This is a multi-page specification. To access specific pages, use the Contents section below, or the navigation bar at the bottom of each page.

Introduction

In this specification, font developers will learn how to encode complex script features in their fonts, choose character sets, organize font information, and use existing tools to produce Gurmukhi fonts. Registered features of Gurmukhi script are defined and illustrated, encodings are listed, and templates are included for compiling Gurmukhi layout tables for OpenType fonts.

This document also presents information about the Gurmukhi OpenType shaping engine of Uniscribe, the Windows component responsible for text layout.

In addition to being a primer and specification for the creation and support of Gurmukhi fonts, this document is intended to more broadly illustrate the OpenType Layout architecture, feature schemes, and operating system support for shaping and positioning text.

Glossary

The following terms are useful for understanding the layout features and script rules discussed in this document.

Akhand ligatures - Required consonant ligatures that may appear anywhere in the syllable, and may or may not involve the base glyph. Akhand ligatures have the highest priority and are formed first; some languages include them in their alphabets. Akhand ligatures may be in either half or full or other form.

Base glyph - The only consonant or consonant conjunct in the syllable that is written in its "full" (nominal) form. The last consonant of the syllable (except for syllables ending with letter "Ra") usually forms the base glyph. In "degenerate" syllables that have no vowel (last letter of a word), the last consonant in halant form serves as the base consonant and is mapped as the base glyph. Layout operations are defined in terms of a base glyph, not a base character, since the base can often be a ligature.

Below-base form of consonants - The form that consonants appear below the base glyph. Consonants in below-base form appear in Gurmukhi syllables after the ones that form the base glyph. Below-base forms are represented by the non-spacing mark glyph.

Consonant - Each represents a single consonant sound. Consonants may exist in different contextual forms, and have an inherent vowel (usually, the short vowel "a"). Therefore, those illustrated in the examples below are named, for example, "Ka" and "Ta", rather than just "K" or "T."

Consonant conjuncts (aka 'conjuncts') - Ligatures of two or more consonants. Consonant conjuncts may have both full and half forms, or only full forms.

Gurmukhi syllable - Effective orthographic "unit" of Gurmukhi writing systems, consisting of a consonant and a vowel core, and optionally preceded by one or more consonants. Syllables are composed of consonant letters, independent vowels, and dependant vowels. In a text sequence, these characters are stored in phonetic order (although they may not be represented in phonetic order when displayed). Once a syllable is shaped, it is indivisible. The cursor cannot be positioned within the syllable. Transformations discussed in this document do not cross syllable boundaries.

Halant (Virama) - The character used after a consonant to "strip" it of its inherent vowel. A Virama follows all but the last consonant in every Gurmukhi syllable.

NOTE: A syllable containing halant characters may be shaped with no visible halant signs by using different consonant forms or conjuncts instead.

Halant form of consonants - The form produced by adding the Virama to the nominal shape. The Halant form is used in syllables that have no vowel or as the half form when no distinct shape for the half form exists.

Half form of consonants (pre-base form) - The form where consonants appear to the left of the base consonant, if it does not participate in a ligature. Consonants in their half form precede the ones forming the base glyph. Gurmukhi has distinctly shaped half forms for most of the consonants. If a consonant does not have a distinct shape for the half form, and does not form any ligature, it will be displayed with an explicit Virama; that is, in this case, the half form and the halant form have the same shape.

Matra (Dependent Vowel) - Used to represent a vowel sound that is not inherent to the consonant. Dependent vowels are referred to as "matras" in Sanskrit. They are always depicted in combination with a single consonant, or with a consonant cluster. The greatest variation among different Indian scripts is found in the rules for attaching dependent vowels to base characters.

Nukta - Character that alters the way a preceding consonant is pronounced.

Post-base form of consonants - Form in which consonants appear to the right of the base glyph. Post-base forms are usually spacing glyphs.

Pre-base form of consonants - Form in which consonants appear to the left of the base glyph.

Vattu (Rakar) - Below-base form of letter "Ra". Vattu requires exceptional treatment for two reasons: 1) it may become a below-base form to half form glyphs, as well as to full form glyphs; 2) it often assumes different shapes depending on what consonant it follows. Consonants will form required ligatures with Vattu, producing vattu ligatures that can either be in half or full form.

Notation

The following notation is used in this document to illustrate layout operations:

C - Consonant character

K - Generalized consonant, including:

  • Akhand ligatures
  • Nukta forms of consonants (ligatures with Nukta)

Generalized consonants are produced by composing akhands and composing ligatures that contain the nukta sign.

Ah, Af - Akhand half and full form respectively

V - Vattu glyph

Reph - Reph glyph

Cf, Kf - Glyph representing a full form of a consonant/generalized consonant

Ch, Kh - Half form of a (generalized) consonant

Cs, Ks - Below-base form of a (generalized) consonant

Cp, Kp - Post-base form of a (generalized) consonant

H - Halant character or glyph

M - Matra character or glyph

VO - Independent vowel

Lf - Consonant conjunct

Lh - Half form ligature (except Akhand) e.g. a vattu ligature

VM - Vowel modifier character or glyph

SM - Stress or tone mark (e.g. Udatta, Anudatta, Acute, Grave)

LM - Length mark

{ } – Indicates 0, 1 or multiple occurrence

[ ] – Indicates 0 or 1 occurrence

Next section:  Shaping Engine

introduction | shaping engine | features | appendices


Top of page