Back to Specifications Overview

Developing OpenType Fonts for Korean Hangul Script

Introduction

Microsoft Typography
April 2003

This document presents information that will help font developers create or support OpenType fonts for the Korean Hangul script covered by the Unicode Standard.

Contents

Introduction

The Korean Hangul script is a 'syllabic' script. The syllables are formed by combining sequences of elemental, alphabetic consonants and vowels. The process of composing syllables is additive in nature and follows a set of predefined rules. The glyph elements of each composed syllable are shaped and positioned into a square display cell, often referred to as a 'syllable block', or 'syllable glyph'.

The Unicode Standard provides encodings for pre-composed Hangul syllables known as 'Modern Hangul', as well as encodings for individual Hangul alphabetic elements, called 'Jamo' and known as 'Old Hangul'. 'Modern Hangul' has 11,172 pre-composed characters in the Unicode range U+AC00 through U+D7AF. 'Old Hangul' syllables can be composed from the individual Hangul Jamos encoded in the Unicode Hangul Jamo block (U+1100 through U+11FF). More specifically, only certain sequences of these Jamo characters can combine to form Old Hangul syllables. These sequences are defined in Appendix B. Only sequences defined in Appendix B will result in formation of Old Hangul syllables. Sequences of character codes from the Hangul Jamo Block that do not match any sequence pattern in Appendix B, will be considered as a sequence of individual non-Old Hangul characters.

Hangul shaping

In this specification, font developers will learn how to address Old Hangul syllable formation, encode complex script features in their fonts, choose character sets, organize font information, and use existing tools to produce Old Hangul fonts. Registered features of the Korean Hangul script are defined and illustrated, encodings are listed, and templates are included for compiling Korean Hangul layout tables for OpenType fonts.

This document also presents information about the Korean OpenType shaping engine of Uniscribe, the Windows component responsible for text layout.

In addition to being a primer and specification for the creation and support of Hangul fonts, this document is intended to more broadly illustrate the OpenType Layout architecture, feature schemes, and operating system support for shaping and positioning text.

Glossary

The following terms are useful for understanding the layout features and script rules discussed in this document.

Jamo - Individual Hangul alphabetic elements or atomic unit in a syllable. Consonants and vowels are both known as Jamos.

Consonant - Represents a single consonant sound. Consonants are further divided into leading consonants and trailing consonants.

Vowel (Vowel Jamo) "Jungseong" - A phoneme; an independent unit in a syllable. It does not combine with any consonant to result in the transformation of any consonant-vowel combination.

Notation

The following notation is used in this document to illustrate layout operations:

L – Leading consonant

V – Vowel

T – Trailing consonant

S – Syllable

X – Non-Jamo character

{ } – Indicates 0, 1 or multiple occurrence

[ ] – Indicates 0 or 1 occurrence

() – Indicates 1 or multiple occurrence



Shaping Engine

The Uniscribe Korean shaping engine processes text in stages. The stages are:

  1. Compose Old Hangul Jamo combinations
  2. Identify syllable boundaries with OTLS
  3. Analyze the syllables
  4. Shape glyphs with OTLS (OpenType Library Services)

The descriptions which follow will help font developers understand the rationale for the Korean Hangul feature encoding model, and help application developers better understand how layout clients can divide responsibilities with operating system functions.

Compose Old Hangul Jamo combinations

The shaping engine receives a sequence of characters (character run), which have been identified into sequences of leading consonant (L), vowel (V) and trailing consonant (T) Jamos. In each of these sequences, the shaping engine identifies the maximum length of characters which can combine to form registered Jamos. This is done according to the list of standard character combinations in Appendix B.

Next, it replaces these with the corresponding old Hangul Jamo. This process is repeated on the next longest string in the sequence. This process of identification and replacement is repeated for all sequences.

The result of this process is a string of registered Old Hangul Jamos like the example below:

V1L1L2L3V2V3T1T2T3L4L5V4T4V5V6L6V7
---> V1L1(L2L3)V2V3(T1T2T3)L4L5V4T4(V5V6)L6V7
---> V1L1(L23)V2V3(T123)L4L5V4T4(V56)L6V7

Analyze the Syllables

The syllable unit that the shaping engine receives for the purpose of shaping is a string of Unicode characters, in a sequence. Since each Hangul syllable has the canonical format of LVT, fillers Lf and Vf, are then added, where required, in the registered Jamo sequence to convert each of them to canonical form. The shaping engine then flags each of these for appropriate feature processing. OTLS will then be called to perform OpenType layout processing for each syllable in turn.

It is important to note that if any of the Jamo sequences being analyzed is capable of forming a Modern Hangul Syllable, the shaping engine does not apply OpenType features to shape them. Composition of Modern Hangul syllables is expected to be done using the pre-composed section (U+AC00 – U+D7AF), as described in the Unicode Standard.

Shaping with OTLS

The first step Uniscribe takes in shaping the character string is to map all characters to their nominal form glyphs.

Next, Uniscribe calls the OTL Services Library to shape the Old Hangul syllable. All OTL processing is divided into a set of predefined features (described and illustrated in the Features section of this document). Each feature is applied, one by one, to the appropriate glyphs in the syllable and OTLS processes them. Uniscribe makes as many calls to the OTL Services as there are features. This ensures that the features are executed in the desired order.

The steps of the shaping process are outlined below.

Shaping features:

  1. Language forms
    1. Apply feature 'ccmp' to preprocess any glyphs that require composition
    2. Apply feature 'ljmo' to get the leading consonant Jamo
    3. Apply feature 'vjmo' to get the vowel Jamo
    4. Apply feature 'tjmo' to get the trailing consonant Jamo

Handling Invalid Combining Marks

Combining marks and signs that appear in text not in conjunction with a valid consonant base are considered invalid. When an invalid combination of letters is encountered, Uniscribe simply starts a new syllable/cluster.

Please note that to render a sign standalone (in apparent isolation from any base) one should apply it on a space (see section 2.5 'Combining Marks' of Unicode Standard 3.1). Uniscribe requires a ZWJ to be placed between the space and a mark for them to combine into a standalone sign.

img

While not required for OpenType functionality, inclusion of the ZWJ (zero width joiner; U+200C), the ZWNJ (zero width non-joiner; U+200D) and the ZWSP (zero width space; U+200B) are recommended for inclusion in Korean Hangul fonts.



Features

The features listed below have been defined to create the basic forms for the languages that are supported on Korean Hangul systems. Regardless of the model an application chooses for supporting layout of complex scripts, Uniscribe requires a fixed order for executing features within a run of text to consistently obtain the proper basic form. This is achieved by calling features one-by-one in the standard order listed below.

The order of the lookups within each feature is also very important. For more information on lookups and defining features in OpenType fonts, see the Encoding section of the OpenType Development document.

The standard order for applying Korean Hangul features encoded in OpenType fonts:

Feature Feature function Layout operation Required
Language based forms:    
ccmp Character composition/decomposition substitution GSUB  
ljmo Leading consonant Jamo GSUB X
vjmo Vowel Jamo GSUB X
tjmo Trailing consonant Jamo GSUB X

[GSUB = glyph substitution, GPOS = glyph positioning]

Descriptions and examples of above features

Character composition (and decomposition)

Feature Tag: "ccmp"

The 'ccmp' feature is used to compose a number of glyphs into one glyph (GSUB lookup type 4). This feature is implemented before any other features because there may be times when a font vender wants to control certain shaping of glyphs.

This feature permits the composition of Old Hangul Jamos corresponding to sequences described in Appendix B. To compose Old Hangul syllables, these Jamo glyphs are then substituted to the appropriate form using the 'ljmo', 'vjmo' and 'tjmo' features. The 'ccmp' feature should be implemented before any other feature, so that these actions are given topmost priority. It is applicable to each of: Leading, Vowel and Trailing Jamo sequences.

For Example: the below sequence (U1107 + U1109 + U1110) of leading Jamos composed with the 'ccmp' feature.
img

Leading consonant Jamo

Feature Tag: "ljmo"

The 'ljmo' feature is used to substitute the correct shape of a leading consonant Jamo for a Hangul syllable. The shaping of leading consonant Jamos is context based and depends on whether the leading Jamo is followed by a vowel Jamo alone or a sequence of vowel and trailing Jamo.

For Example: the leading Jamo (U1113) is replaced by the correct leading form when followed by a vowel Jamo alone.
img

Vowel Jamo

Feature Tag: "vjmo"

The 'vjmo' feature is used to substitute the correct shape of a vowel Jamo for a Hangul syllable. The shaping of vowel Jamos is context based and depends on whether it is preceded by a leading Jamo alone, or a leading Jamo and followed by a trailing Jamo.

For Example: the Hangul vowel Jungseong AE (U1162) is replaced by the correct form when preceded by a leading Jamo alone.
img

Trailing consonant Jamo

Feature Tag: "tjmo"

The 'tjmo' feature is used to substitute the correct shape of a trailing consonant Jamo for a Hangul syllable. The shaping of trailing consonant Jamos is context based and depends on whether the trailing Jamo is preceded by a leading Jamo filler and vowel Jamo or by a leading Jamo and vowel Jamo.

For Example: U11C7 is replaced by the correct trailing consonant when preceded by a leading Jamo and vowel Jamo.
img

More Examples

1. Old Hangul Jamo containing leading consonants, vowels and trailing Jamos.

Input sequence: This sequence consists of: Choseong Pieup, Choseong Sios, Choseong Thieuth, Jungseong O, Jungseong Ya, Jungseong I, Jongseong Rieul, Jongseong Mieum, Jongseong Hieuh.
img

'ccmp' feature applied:
img

'ljmo', 'vjmo' and 'tjmo' features applied:
img

2. Leading consonant Jamo + vowel Jamo + trailing Jamo.

Input sequence: This sequence consists of: Choseong Ssangkiyeok, Jungseong A, Jongseong Nieun-Sios.
img

'ljmo', 'vjmo' and 'tjmo' features applied:
img

3. Leading consonant Jamo + vowel Jamo

Input sequence: This sequence consists of: Choseong Nieun-Kiyeok, Jungseong Ae.
img

'ljmo' and 'vjmo' features applied:
img



Appendices

Appendix A: Writing System Tags

Features are encoded according to both a designated script and language system. The language system tag specifies a typographic convention associated with a language or linguistic subgroup.

Currently, the Uniscribe engine only supports the "default" language for each script. However, font developers may want to build language specific features which are supported in other applications and will be supported in future Microsoft OpenType implementations.

* NOTE: It is strongly recommended to include the "dflt" language tag in all OpenType fonts because it defines the basic script handling for a font. The "dflt" language system is used as the default if no other language specific features are defined or if the application does not support that particular language. If the "dflt" tag is not present for the script being used, the font may not work in some applications.

The following tables list the registered tag names for scripts and language systems.

Registered tags for the Korean Hangul script Registered tags for Korean Hangul language systems
Script tag Script Language system tag Language
"hang" Korean Hangul "dflt" *default script handling
    "KOR " Korean

Note: both the script and language tags are case sensitive (script tags should be lowercase, language tags are all caps) and must contain four characters (ie. you must add a space to the three character language tags).

Appendix B: Standard Composition for Old Hangul Jamos

Leading Consonants
Code point Glyph   Code point Glyph   Code point Glyph
U+115F              
U+1100 img            
U+1101 img            
U+1102 img            
U+1113 img            
U+1114 img            
U+1115 img            
U+1116 img            
U+1102 img + U+1109 img      
U+1102 img + U+110C img      
U+1102 img + U+1112 img      
U+1103 img            
U+1117 img            
U+1104 img            
U+1103 img + U+1105 img      
U+1103 img + U+1106 img      
U+1103 img + U+1107 img      
U+1103 img + U+1109 img      
U+1103 img + U+110C img      
U+1105 img            
U+1105 img + U+1100 img      
U+1105 img + U+1100 img + U+1100 img
U+1118 img            
U+1105 img + U+1103 img      
U+1105 img + U+1103 img + U+1103 img
U+1119 img            
U+1105 img + U+1106 img      
U+1105 img + U+1107 img      
U+1105 img + U+1107 img + U+1107 img
U+1105 img + U+112B img      
U+1105 img + U+1109 img      
U+1105 img + U+110C img      
U+1105 img + U+110F img      
U+111A img            
U+111B img            
U+1106 img            
U+1106 img + U+1100 img      
U+1106 img + U+1103 img      
U+111C img            
U+1106 img + U+1109 img      
U+111D img            
U+1107 img            
U+111E img            
U+111F img            
U+1120 img            
U+1108 img            
U+1121 img            
U+1122 img            
U+1123 img            
U+1124 img            
U+1125 img            
U+1126 img            
U+1107 img + U+1109 img + U+1110 img
U+1127 img            
U+1128 img            
U+1107 img + U+110F img      
U+1129 img            
U+112A img            
U+1107 img + U+1112 img      
U+112B img            
U+112C img            
U+1109 img            
U+112D img            
U+112E img            
U+112F img            
U+1130 img            
U+1131 img            
U+1132 img            
U+1133 img            
U+110A img            
U+1109 img + U+1109 img + U+1107 img
U+1134 img            
U+1135 img            
U+1136 img            
U+1137 img            
U+1138 img            
U+1139 img            
U+113A img            
U+113B img            
U+113C img            
U+113D img            
U+113E img            
U+113F img            
U+1140 img            
U+110B img            
U+1141 img            
U+1142 img            
U+110B img + U+1105 img      
U+1143 img            
U+1144 img            
U+1145 img            
U+1146 img            
U+1147 img            
U+1148 img            
U+1149 img            
U+114A img            
U+114B img            
U+110B img + U+1112 img      
U+114C img            
U+110C img            
U+114D img            
U+110D img            
U+110C img + U+110C img + U+1112 img
U+114E img            
U+114F img            
U+1150 img            
U+1151 img            
U+110E img            
U+1152 img            
U+1153 img            
U+1154 img            
U+1155 img            
U+110F img            
U+1110 img            
U+1110 img + U+1110 img      
U+1111 img            
U+1156 img            
U+1111 img + U+1112 img      
U+1157 img            
U+1112 img            
U+1112 img + U+1109 img      
U+1158 img            
U+1159 img            
U+1159 img + U+1159 img      
Vowels
Code point Glyph   Code point Glyph   Code point Glyph
U+1160              
U+1161 img            
U+1176 img            
U+1177 img            
U+1161 img + U+1173 img      
U+1162 img            
U+1163 img            
U+1178 img            
U+1179 img            
U+1163 img + U+116E img      
U+1164 img            
U+1165 img            
U+117A img            
U+117B img            
U+117C img            
U+1166 img            
U+1167 img            
U+1167 img + U+1163 img      
U+117D img            
U+117E img            
U+1168 img            
U+1169 img            
U+116A img            
U+116B img            
U+1169 img + U+1163 img      
U+1169 img + U+1163 img + U+1175 img
U+117F img            
U+1180 img            
U+1169 img + U+1167 img      
U+1181 img            
U+1182 img            
U+1169 img + U+1169 img + U+1175 img
U+1183 img            
U+116C img            
U+116D img            
U+116D img + U+1161 img      
U+116D img + U+1161 img + U+1175 img
U+1184 img            
U+1185 img            
U+116D img + U+1165 img      
U+1186 img            
U+1187 img            
U+1188 img            
U+116E img            
U+1189 img            
U+118A img            
U+116F img            
U+118B img            
U+1170 img            
U+116E img + U+1167 img      
U+118C img            
U+118D img            
U+1171 img            
U+116E img + U+1175 img + U+1175 img
U+1172 img            
U+118E img            
U+1172 img + U+1161 img + U+1175 img
U+118F img            
U+1190 img            
U+1191 img            
U+1192 img            
U+1172 img + U+1169 img      
U+1193 img            
U+1194 img            
U+1173 img            
U+1173 img + U+1161 img      
U+1173 img + U+1165 img      
U+1173 img + U+1165 img + U+1175 img
U+1173 img + U+1169 img      
U+1195 img            
U+1196 img            
U+1174 img            
U+1197 img            
U+1175 img            
U+1198 img            
U+1199 img            
U+1175 img + U+1163 img + U+1169 img
U+1175 img + U+1163 img + U+1175 img
U+1175 img + U+1167 img      
U+1175 img + U+1167 img + U+1175 img
U+119A img            
U+1175 img + U+1169 img + U+1175 img
U+1175 img + U+116D img      
U+119B img            
U+1175 img + U+1172 img      
U+119C img            
U+1175 img + U+1175 img      
U+119D img            
U+119E img            
U+119E img + U+1161 img      
U+119F img            
U+119E img + U+1165 img + U+1175 img
U+11A0 img            
U+11A1 img            
U+11A2 img            
Trailing Consonants
Code point Glyph   Code point Glyph   Code point Glyph
U+11A8 img            
U+11A9 img            
U+11A8 img + U+11AB img      
U+11C3 img            
U+11A8 img + U+11B8 img      
U+11AA img            
U+11C4 img            
U+11A8 img + U+11BE img      
U+11A8 img + U+11BF img      
U+11A8 img + U+11C2 img      
U+11AB img            
U+11C5 img            
U+11AB img + U+11AB img      
U+11C6 img            
U+11AB img + U+11AF img      
U+11C7 img            
U+11C8 img            
U+11AC img            
U+11AB img + U+11BE img      
U+11C9 img            
U+11AD img            
U+11AE img            
U+11CA img            
U+11AE img + U+11AE img      
U+11AE img + U+11AE img + U+11B8 img
U+11CB img            
U+11AE img + U+11B8 img      
U+11AE img + U+11BA img      
U+11AE img + U+11BA img + U+11A8 img
U+11AE img + U+11BD img      
U+11AE img + U+11BE img      
U+11AE img + U+11C0 img      
U+11AF img            
U+11B0 img            
U+11AF img + U+11A8 img + U+11A8 img
U+11CC img            
U+11AF img + U+11A8 img + U+11C2 img
U+11CD img            
U+11CE img            
U+11CF img            
U+11D0 img            
U+11AF img + U+11AF img + U+11BF img
U+11B1 img            
U+11D1 img            
U+11D2 img            
U+11AF img + U+11B7 img + U+11C2 img
U+11B2 img            
U+11AF img + U+11B8 img + U+11AE img
U+11D3 img            
U+11AF img + U+11B8 img + U+11C1 img
U+11D4 img            
U+11D5 img            
U+11B3 img            
U+11D6 img            
U+11D7 img            
U+11AF img + U+11F0 img      
U+11D8 img            
U+11B4 img            
U+11B5 img            
U+11B6 img            
U+11D9 img            
U+11AF img + U+11F9 img + U+11C2 img
U+11AF img + U+11BC img      
U+11B7 img            
U+11DA img            
U+11B7 img + U+11AB img      
U+11B7 img + U+11AB img + U+11AB img
U+11DB img            
U+11B7 img + U+11B7 img      
U+11DC img            
U+11B7 img + U+11B8 img + U+11BA img
U+11DD img            
U+11DE img            
U+11DF img            
U+11B7 img + U+11BD img      
U+11E0 img            
U+11E1 img            
U+11E2 img            
U+11B8 img            
U+11B8 img + U+11AE img      
U+11E3 img            
U+11B8 img + U+11AF img + U+11C1 img
U+11B8 img + U+11B7 img      
U+11B8 img + U+11B8 img      
U+11B9 img            
U+11B8 img + U+11BA img + U+11AE img
U+11B8 img + U+11BD img      
U+11B8 img + U+11BE img      
U+11E4 img            
U+11E5 img            
U+11E6 img            
U+11BA img            
U+11E7 img            
U+11E8 img            
U+11E9 img            
U+11BA img + U+11B7 img      
U+11EA img            
U+11BA img + U+11E6 img      
U+11BB img            
U+11BA img + U+11BA img + U+11A8 img
U+11BA img + U+11BA img + U+11AE img
U+11BA img + U+11EB img      
U+11BA img + U+11BD img      
U+11BA img + U+11BE img      
U+11BA img + U+11C0 img      
U+11BA img + U+11C2 img      
U+11EB img            
U+11EB img + U+11B8 img      
U+11EB img + U+11E6 img      
U+11BC img            
U+11EC img            
U+11ED img            
U+11BC img + U+11B7 img      
U+11BC img + U+11BA img      
U+11EE img            
U+11EF img            
U+11BC img + U+11C2 img      
U+11F0 img            
U+11F0 img + U+11A8 img      
U+11F1 img            
U+11F2 img            
U+11F0 img + U+11BF img      
U+11F0 img + U+11C2 img      
U+11BD img            
U+11BD img + U+11B8 img      
U+11BD img + U+11B8 img + U+11B8 img
U+11BD img + U+11BD img      
U+11BE img            
U+11BF img            
U+11C0 img            
U+11C1 img            
U+11F3 img            
U+11C1 img + U+11BA img      
U+11C1 img + U+11C0 img      
U+11F4 img            
U+11C2 img            
U+11F5 img            
U+11F6 img            
U+11F7 img            
U+11F8 img            
U+11F9 img