Summary: This article discusses the ins and outs of creating a globalized and localizable Web site. The authors give tips about code page decisions, layered graphics, content mechanisms, and an understanding of how HTML and CSS technology can be used in globalization. (20 printed pages.)
Contents
| • | Introduction |
| • | Character Encoding |
| • | The Design |
| • | The Content |
| • | Summary |
| • | Links to Explore |
| • | About the Authors |
If you've read our first article, The Localization Process: Globalizing Your Code and Localizing Your Site (http://www.microsoft.com/technet/archive/ittasks/plan/sysplan/glolocal.mspx), you should already understand some of the issues involved in creating an international Web site. We will continue by discussing some of the issues in the design of global Web sites and how you can avoid some of the standard errors we mentioned in the first article.
By definition, any site that is on the World Wide Web is international. However, we define a global site as one that can present information to users in a format they are accustomed to; this includes language, formatting conventions, market-specific data, and so on. For example, we once saw an Asian Web site that used a GIF of a pair of shoes instead of a house to represent the home page. For that market, a pair of shoes had more cultural resonance to the user than a house.
However, globalization cannot provide answers to innate cultural issues, such as how problem solving is approached. Although your own cultural issues may profoundly affect how site architecture is designed, these issues are difficult to define and address on a worldwide scale. In this article, we will concentrate on specific globalization issues.
One of the first decisions you will need to make when designing a Web site is which character encoding to use. What follows is a short discussion on what character encoding is and some pros and cons of the types of character encoding.
The Evolution of Character Encoding
A computer uses codes in its memory to represent characters so that text can be stored and reproduced. These character encodings (or code points) are used to represent the character's location in a particular place in the text without indicating anything about its shape, size, color, and so on.
Early computers structured their memory into units; one unit represented one character. In different computers, these units were different sizes. This led to two problems:
| • | Different computers could store different numbers of code points. |
| • | Different computers would store different characters at the same code point. |
This made it difficult to exchange information between computers, so the computer industry set a de facto standard for the length of a unit or byte in the early days of the 7-bit byte. The 7-bit byte allowed 128 different characters to be encoded. This coding scheme sufficiently handled uppercase English and extended to some lowercase characters as well. It also included punctuation, numbers, and some control codes. The new coding scheme came to be called the American Standard Code for Information Interchange, or more familiarly, ASCII.
ASCII was efficient when computing was mainly an English-language preserve and text-based computing was relatively unimportant. However, with advances in computing, the size of the byte became 8 bits, and there was room to store 256 codes. So extended ASCII (also called ANSI) was born. Internationalization of the concept of code pages also began at this time. Code pages are alternative sets of 256 characters that can be used in different parts of the world or for different languages. By then, the code page and the character set had become virtually synonymous.
It became essential that computers recognized and manipulated text in different languages. This was easily accomplished for single-language, single-machine use. However, when it was necessary to send information between computers, problems occurred because a code point on one computer could represent a different code point on another computer if these computers used different character sets.
The problem intensified when mixing text in several languages and correctly representing and interfiling these languages was required. Another twist to the tale occurred because of the ideographic languages of the Far East, where the number of characters is large (the Chinese language has between 7,000 and 13,000 basic characters). These languages would obviously not fit into 128 code points, no matter how clever the programmer was. It was necessary to invent another encoding method.
The most severe problem with encoding methods was the lack of sufficient code points in the code space provided by the 8-bit byte. Because of this problem, code pages were invented, and the different encoding standards were produced. The Chinese, Japanese, and Korean (CJK) character sets needed a lot of space, so 16-bit codes were invented. Unfortunately, at least four different encodings were invented. This means that the transfer of text of, for example, Chinese characters could only be accomplished with massive lookup tables to convert the character encodings from "foreign" to local before the local system could display the message.
At this point, Unicode was invented to provide a unified encoding standard. The Unicode Consortium (http://www.unicode.org/) is a construct of a number of computer companies, standards organizations, and bibliographic interests.
Unicode is a character encoding system for most of the world's characters. With Unicode, each character is encoded as a single code point, which allows computer systems to exchange text information unambiguously. However, Unicode is not:
| • | A sorting sequence. |
| • | A glyph definition. |
| • | A string comparison mechanism. |
| • | A text formatting control mechanism. |
| • | A language definition. |
| • | A character conversion mechanism. |
Each of the above functions is necessary for the complete handling of text documents and, as such, there are separate standards for each function. In many cases (as with character encoding), there are multiple standards that must be reconciled.
A commonly-used code page on the Web is UTF-8, which is a subset of Unicode. In UTF-8, each 16-bit Unicode character is encoded as a sequence of one, two, or three 8-bit bytes, depending on the value of the character. The following table shows the format of such UTF-8 byte sequences (where the "free bits" shown by x's in the table are combined in the order shown, and interpreted from most significant to least significant).
Binary format of bytes in sequence
| 1st Byte | 2nd Byte | 3rd Byte | Number of Free Bits | Maximum Expressible Unicode Value |
0xxxxxxx |
|
| 7 | 007F hex (127) |
110xxxxx | 10xxxxxx |
| (5+6)=11 | 07FF hex (2047) |
1110xxxx | 10xxxxxx | 10xxxxxx | (4+6+6)=16 | FFFF hex (65535) |
The value of each individual byte indicates its UTF-8 function, as follows:
| • | 00 to 7F hex (0 to 127): first and only byte of a sequence. |
| • | 80 to BF hex (128 to 191): continuing byte in a multi-byte sequence. |
| • | C2 to DF hex (194 to 223): first byte of a two-byte sequence. |
| • | E0 to EF hex (224 to 239): first byte of a three-byte sequence. |
Note: UTF-8 remains a simple, single-byte, ASCII-compatible encoding method, as long as no more than 127 characters are directly present. This means that an HTML document technically declared to be encoded as UTF-8 can remain a normal single-byte ASCII/ISO-8859-1 file. The document can remain so even though it may contain Unicode characters above 255, as long as all characters above 127 are referred to indirectly by ampersand entities.
Examples of encoded Unicode characters (in hexadecimal notation)
| 16-bit Unicode | UTF-8 Sequences |
0001 | 01 |
007F | 7F |
0080 | C2 80 |
07FF | DF BF |
800 | E0 A0 80 |
FFFF | EF BF BFF |
Native Encoding vs. UTF-8
Two popular encoding schemes are native encoding and Unicode or UTF-8 encoding. Native encoding here refers to the individual standards for a given language or set of languages, such as ANSI for English. The choice of which encoding scheme to use is a very important one. While some applications are perfect candidates for Unicode, others are not. Choosing the wrong character encoding method could doom a project to failure from the start.
Ultimately, your decision will be based on your site architecture and what browsers you must support. Both encoding schemes have advantages and disadvantages, some of which are listed below.
| Native Encoding | UTF-8 | |
Support one encoding |
| X |
Supported by 3.0 browsers | X |
|
Support one encoding |
| X |
Small download for all languages | X |
|
Small download for 1252 languages | X | X |
Works well with Netscape 4.0 | X |
|
Universal system for all back-end architecture |
| X |
New Web technologies easily and automatically support all Windows languages without conversion metrics |
| X |
If native encoding is selected, your site architecture must support all markets in which you are planning to launch your site. In fact, it's a good idea to ensure that your site works for all code pages (with the possible exception of the bidirectional languages, which involve many other issues) so you do not have to fix major bugs if you later decide to launch in a new market.
Note: Bidirectional languages refer to languages that generally read from right to left. However, Arabic numbers and some non-English text are read from left-to-right just like in English; therefore, these languages are considered "bidirectional." These languages include Arabic, Hebrew, and Thai.
Meta Character Set
If the author of an HTML page doesn't specify the character set (charset) information in the header of every Web page, the user may have to manually adjust the browser's encoding options to view the page correctly, and any forms contained in the page may not work properly (as you will see later). This is especially true for older Web browsers. Below are some examples of the charset tag.
Traditional Chinese or Big5 (GB_2312-80)
<META HTTP-EQUIV="Content-Type" content="text/html; charset=Big5">
UTF-8 (Universal)
<META HTTP-EQUIV="Content-Type" content="text/html; charset=UTF-8">
Special Characters
HTML standards allow for named and numbered entities. These special codes allow users to specify characters. For example, "æ" and "æ" both reference the same character - æ. Tables of these entities are readily available at http://www.w3.org/TR/WD-html40-970708/sgml/entities.html. However, we don't recommend using named entities, as they don't work consistently in all browsers, operating systems, or fonts.
There are a number of ways to enter extended characters (used generically to refer to any accented character, such as é or ñ or ö, and sometimes Asian characters):
| • | Use a language-specific keyboard setting. If you install a French operating system, for example, the default is to use the French keyboard that has "é" and other characters on it. You can also switch your keyboard settings in the Microsoft® Windows® control panel. |
| • | Hold down the ALT key while typing in a four-digit number between 0128 and 0255 on the keypad of your keyboard. For <Alt>+0255 the "ÿ" character will appear. (This works for Western European languages only. A table of extended characters can be found at http://www.htmlhelp.com/reference/charset. The ANSI number is what you type on the numeric keypad while holding down the ALT key.) |
| • | A tool can be installed during Windows Setup called Character Map. Windows Setup will place Character Map in your Accessories menu. This tool graphically shows you characters and allows you to cut and paste them from the tool. |
| • | Another way you can enter double-byte characters (DBCS) characters is with an Input Method Editor (IME), called the Global IME http://www.microsoft.com/windows/ie/downloads/recommended/ime/default.asp. This is a component designed specifically for Internet Explorer 4.0 (also available for Internet Explorer 5.0). The Global IME is activated in the same way as the traditional IME (ALT+~), but you must first left-click on the Task Tray in the bottom right corner of the screen to activate it. An "EN" will appear after it is installed, and Internet Explorer will be the active window. Unlike a traditional IME running on a Far East operating system, the Global IME can enter characters only into Internet Explorer. It is not possible to use the Global IME with Netscape. It is also not possible to enter DBCS characters in Netscape on a U.S. Microsoft Windows platform. There is no equivalent to the Global IME for Netscape browsers. |
If you must use an entity, numerical entities enjoy more widespread support, so it is generally safer to use the entity number if you cannot guarantee your viewers' browser version.
Creating an HTML page is easy, but creating a nice looking HTML page is a challenge. Making sure this nice-looking HTML page is also localizable is an even bigger challenge. Why would you do this, anyway? Well, what makes sites so powerful for corporations these days is that you can use them for a broad international customer base. Take http://www.microsoft.com, for example. Product information is available in all languages these products are localized in.
To make the work doable, it is best to use a template so that you don't need to redesign the page for each language. In the following sections, we will tell you some things to watch out for when creating a template for localized Web pages.
Graphics
Most localizable graphics consist of text on top of some sort of structured background. To localize the text, you need to get access to the text only. If you get a .gif or .jpeg file, the text and the background are in the same layer, so changing the text means altering the background as well. If the background is just a plain color, you can easily replace the text. Here is an example of a file with a structured background:

Localizable graphics are commonly handed off in Adobe PhotoShop files (.psd files). PhotoShop supports layers, which means localizable text can go into a separate layer and the other components of the file don't have to be touched.
These are the layers of localizable text:
Instead of using text-embedded GIFs in your pages, you could use a GIF with just a background and position the text on top of it.
Here is an example of how to position text on top of an image:
<IMG SRC="image.gif" WIDTH=100 HEIGHT=100> <SPAN STYLE="position:relative;top:-50;left:-90;width:80">Text on Top</SPAN>
If you do have to include text-embedded files, the following guidelines apply when creating your graphics:
| • | Provide a well-documented, layered PhotoShop source file. Document which font and colors are used. |
| • | Keep in mind that text within the graphic is probably longer for localized languages (average 30 percent) than for English, so allow room for the text to expand. |
Note: Because working with graphics requires extra skills, expect a higher cost for localizing graphics. These guidelines apply to both HTML and software projects.
Tables: To Wrap or Not to Wrap
Tables will almost certainly cause localization problems. As localized text will be an average 30 percent longer than English, text that fits in a table on a U.S. Web page might be too long to fit in the table on a localized page. The result is that the table width will be pushed out and all text in the table will conform to this new width as well. There is no way to make the table stick to its defined size in this case. If a background image for the table (or one of its cells) is used, the image will not be stretched out. A new instance of the image will be used.
The image below is a screenshot of a table (background repeated) that is stretched because one of the strings in it doesn't fit:

This code shows how the stretching problems become worse with nested tables:
<TABLE> <TR> <TD> <TABLE> <TR> <TD> </TD> <TD> </TD> <TD> </TD> </TR> <TR> <TD> <TABLE> <TR> <TD> </TD> <TD> </TD> </TR> </TABLE> </TD> <TD> </TD> <TD> </TD> </TR> </TABLE> </TD> </TR> </TABLE>
In this sample, it's hard to know how to change the code when something wraps (and I've even left out extra attributes and values!). Structuring the code might bring some relief:
<TABLE> <TR> <TD> <TABLE> <TR> Etc.
Even then, a nested structure can be hard to debug. There are a few things you can do to make life easier when creating your Web page.
| • | Include HTML comments to specify when a new section starts. Example: <!-Start of toolbar -> |
| • | Keep your layout simple. If you use tables, don't nest into more than two layers. |
| • | Use DHTML technology. With DHTML, you can position areas with absolute or relative positioning. Make sure you leave enough space for localized text; specifying both start and end points and all dimensions (top, left, width) will make the area nonexpandable. |
Using DHTML for Layout
Above, you've learned that nested tables can be difficult to localize. However, tables themselves aren't difficult at all! You can use them in combination with DHTML elements to produce a layout in your Web page. This can help you to reduce the amount of code, while taking advantage of the powerful features of tables.
An example of code for three-color layout with nested tables follows:
<TABLE WIDTH="100%" CELLSPACING="0" CELLPADDING="0"> <TR> <TD BGCOLOR="#FF0000">Who's afraid of</TD> </TR> <TR> <TD> <TABLE WIDTH="100%" CELLSPACING="0" CELLPADDING="0"> <TR> <TD WIDTH="100" BGCOLOR="#FFFF00">these </TD> <TD BGCOLOR="#0000FF">colors?</TD> </TR> </TABLE> </TD> </TR> </TABLE>
Here is an example of three-color layout with a combination of DHTML and tables:
<DIV STYLE="width:100%; background-color: red">Who's afraid of</DIV> <TABLE WIDTH="100%" CELLSPACING="0" CELLPADDING="0"> <TR> <TD WIDTH="100" BGCOLOR="#FFFF00">these </TD> <TD BGCOLOR="#0000FF">colors?</TD> </TR> </TABLE>
Forms and Sort Order
Forms and sort orders are closely related because sort orders mostly apply to drop-down lists in forms. For both of them, character encoding can cause problems. If you want to post data and expect this data to be in a specific character format, you need to set the right encoding in the metatag of the page. So if you want to post UTF-8 characters to a database, you need to specify:
<META HTTP-EQUIV=Content-Type content="text/html; charset=UTF-8">
If you don't do this, the characters in the form elements (input boxes, for example) will not get converted to UTF-8; they will remain in the standard encoding as set in your browser (in Internet Explorer you can change this setting in the Options dialog box under Languages). The same mechanism needs to be applied when filling a form with data from a database. You need to set the encoding in the page to make sure all characters display correctly in the form.
Sort order is not the same for all languages, particularly for languages that do not use the Western alphabet. In Swedish, for example, some extended characters get sorted after the letter Z. After localization, the first letter of the word might change as well, changing its position in the sort order list. "The Hague," for example, is called "Den Haag" in the Netherlands, but can also be called "s-Gravenhage." Different spellings will change the position of the city in the sort order.
| Sort Order 1 | Localized Sort Order (Dutch) |
Brussels | Brussel |
Cologne | Den Haag (The Hague) |
Hanover | Hanover |
The Hague | Keulen (Cologne) |
To build a globalized Web site, you need to either find a way to automatically sort the items (this can be a very difficult task) or ensure that your localizers can change the sort order of the list while they are localizing the code. At the back end, sort orders in SQL are determined during SQL server installation. Any changes in sort order require a rebuild of the database.
A possible way to automatically sort data is to parse it through a control that uses the correct sorting mechanism for the language required. This still requires you to specify a language in the control. Making this decision on the client side might not work because the operating system and browser that the user is running can be different than what you'd expect for a specific market. Another drawback of using controls like this is that performance will decrease.
The fastest user experience can be achieved by manual interaction at the page localization phase. It would require the localizer to personally sort the list. This can be done, for example, in Microsoft Excel, or when more than one language is involved, in Microsoft Word. At the time of publication, we know of no tool that can do this in HTML.
Cascading Style Sheets
For localization, one of the best features of Cascading Style Sheets (CSS) is that globally changing the font of a Web page is very easy. As we noted in our first article, fonts are crucial to presenting your corporate image in a market. Most Web sites in the United States are designed using certain fonts (Arial, Times New Roman, and so forth). While these fonts can generally also be used for Western European languages, they do not contain the characters for Japanese, Korean, Traditional Chinese, Simplified Chinese, Russian, or others.
Here is a typical style definition:
Body{
font-family: Verdana;
font-size:9pt;}
Here is the same style after a Japanese localizer has changed the font. With just a simple change in one location, the site becomes legible for millions of users:
Body{
font-family: MS P????,MS P Gothic,MS Gothic;
font-size:9pt;}
Note: If you cannot see the Japanese text, it is because you do not have the appropriate Microsoft Internet Explorer multi-language support. You can download the language packs http://www.microsoft.com/windows/ie/default.asp directly from our Internet Explorer site.
Text Resizing
Just as your HTML tables need to be designed to potentially expand, so do all of your dialog boxes. For example, here is the original code for a dialog box the MSN.com team sent to us for localization:
<style>
INPUT{width:50px;margin:8px,3px;}
</style>
</HEAD>
<input type=button value=OK accesskey="o" onclick="go(true);"> <input type=button
value=Cancel accesskey="c" onclick="go(false);">
During localization for Brazilian Portuguese, it was found that the "Cancelar" (Cancel) button was truncated. Through a simple edit of the CSS, the problem was easily fixed. Inline styles and style sheets are a great way to control size attributes:
<style>
INPUT{width:75px;margin:8px,3px;}
</style>
<input type=button value=OK accesskey="o" onclick="go(true);"> <input type=button
value=Cancelar accesskey="c" onclick="go(false);">
Common Content
To keep a site consistent in look and feel, some elements are repeated on every page. Navigation, or disclaimer and copyright text, are examples of this. When you want to localize a site, you don't want to spend money on localizing the same word or string multiple times. This is where Include files come in. You can specify repeated elements in the content for just one file, and then include this file in all appropriate pages.
Below is an example of common content on a Web page:
Code (using ASP): <!--#include virtual="/common/footer.inc"-->
Code and Content Separation
In order to localize a site, a translator will have to open the files in a translating tool. Some of these tools can "lock" the HTML so a translator cannot touch the tags or scripting. However, such tools are often not available to every company or do not offer full safety functionality. In such a case, code/content separation is a good solution. By separating the code from the content, the translator will see only the text, and the functionality will be protected simply because it's not visible.
Before getting into examples of how to separate code from content, let's analyze a bit more what code and content actually are.
Web pages typically consist of three layers; these layers might not seem obvious even to the people who create the pages. Almost every page has a user interface (UI), defined as the portion of the program with which the user interacts. The UI is what the user sees, what makes the page "look great." Most pages also have code, defined as scripting (be it on the client or server side). Code makes the page interactive, and makes the site "cool." Content may be less obvious to define. Content is defined as a dynamically changing part of your page (headline news, for example), making your page "worthwhile to come back to."
The UI is often perceived as content, but it is different, because the UI changes infrequently. This is why it's easy to create a template for the UI. Content frequently changes; if it doesn't, a Web site gets boring pretty quickly. Content can be an editorial clip, stock quotes, headlines, and so on. Typically, there's no need to translate these.
Content can be broken down into two types, although the fine line between these two types of content is not always clear.
| • | Static content: content that changes somewhat frequently but is still manageable to do manually. |
| • | Dynamic content: content that updates on a very frequent basis and needs to be handled automatically. |
Three-layer site:
| Layer | Function |
Code | The engine. Also responsible for feeding the content. |
UI | The look and feel. Very powerful when using templates. |
Content | The information. Gets updated frequently to keep site interesting. |
By applying the three-layer principle, you can build a site that is more manageable for a global rollout. This is a bold statement, so let's look at some samples to give you an idea of how to actually do this.
Sample 1: Creating a Template for the Layout
| Code (default.asp): | Content (content.inc): |
<DIV ALIGN="center"> | Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diem nonummy nibh euismod tincidunt ut lacreet dolore magna aliguam erat volutpat. Ut wisis enim ad minim veniam, quis nostrud exerci tution ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. |
Sample 2: Scripting
| Strings inside script function | Strings outside script function |
//Declare variablesvar browser, version | //Declare variablesvar browser, version //functional variablesvar L_Browser1, L_Browser2 |
A drawback of this layered approach is that the design of the site needs to be pretty well globalized, allowing strings to grow.
Natural Language Support Data
Many of the most common globalization errors are caused by using hard-coded Natural Language Support (NLS) data for your own market (commonly the U.S. market). You can easily avoid these errors by identifying NLS data in the design phase of your project and designing ways to make you Web pages locally independent.
The best resource for market-specific NLS data is the classic book, Developing International Software (http://msdn.microsoft.com/library/books/devintl/s24aa.htm) by Nadine Kano; it is out of print, but is available at MSDN Online.
It is highly recommended that you take advantage of the Win32 NLS APIs for such functions as sorting, character typing, string mapping, date/time formatting and other functions in all programming languages that can access these functions. (Visual C++, Visual Basic, and Perl are some examples of programming languages that can use these APIs.)
To give you an idea of what exactly NLS data is, below are most NLS data fields for Italy.
Country/Region and language specifications
Country code: | 39 |
English name of country: | Italy |
Abbreviated country name (ISO Standard 3166): | ITA |
Native name of country: | Italia |
DOS code page: | 850 |
Main language: | Italian |
Native language name: | Italiano |
Abbreviated language name: | ITA |
General Numeric Status
Positive numeric pattern: | 1.000 (no sign string is used) |
Negative numeric pattern: | -1.000 |
Thousand separator: | period (.) |
Decimal separator: | commaex: 1,23 |
List separator: | semicolon (;) |
Numeric grouping:Sizes for each group of digits to the left of the decimal; sizes are separated by semicolons. If the last value is 0, the preceding value is repeated. To group thousands, specify "3;0". | 3;0 ex: 1.000.000 |
List of the ten native digits: | 0123456789 |
Position of sign string in positive values: | No sign string is used. |
Position of sign string in negative values: | Immediately precedes the monetary symbol.ex: -L. 1.234 |
Decimal leading zero: | yes |
Decimal digits: | 2 |
String used for positive sign: | no positive sign used |
String used for negative sign: | dash |
Metric measurement system: | yes |
Spacing before measurement abbreviation: | yes, single space |
Spacing before symbols %," etc: | no space |
Monetary/Currency Settings
Currency name: | Lira |
Currency field length: | 15 |
International monetary symbol: | ITL |
Positive currency format: | No sign string, currency symbol precedes, space after L.ex: L. 1.000 |
Negative currency format: | Sign string and currency symbol precede, space after L.ex: -L. 1.000 |
Monetary decimal separator: | none |
Currency decimal digits: | 0 |
Monetary thousands separator: | period (.) |
Monetary grouping:Sizes for each group of digits to the left of the decimal; sizes are separated by semicolons. If the last value is 0, the preceding value is repeated. To group thousands, specify "3;0". | 3;0 |
Currency symbol: | L. |
Date/Calendar Formatting Options
Date separator: | slash (/) |
Which week of the year is considered first: | Week number 1 contains the first Thursday of January. |
Which day of the week is considered first: | Monday |
Short date format: | dd/MM/yy |
Short date order: | DMY |
Long date format: | dddd d MMMM yyyy |
Long date order: | DMY |
Number of digits for century in short date format: | 2 |
Month leading zero in short date format: | yes |
Day leading zero in short date format: | yes |
Number of digits for century in long date format: | 4 |
Leading zeros in month fields for long date format: | n/a |
Leading zeros in day fields for long date format: | no |
Month Information
Month Names | 1. gennaio (January) |
Month names abbreviated: | 1. gen |
Day Information
Day names: | 1. lunedì (Monday) |
Day names abbreviated: | 1. lun |
Time Setting
Time format: | H.mm.ss |
Time separator: | period |
String for the A.M. designator: | none--24-hour system is used |
String for the P.M. designator: | none--24-hour system is used |
Hours leading zero:(whether to use leading zeros in time fields) | no |
24-hour format: | yes |
As you have seen, some of the key elements of designing a globalized Web site are:
| • | Code page decisions |
| • | Layered graphics |
| • | Content mechanisms |
| • | An understanding of how HTML and CSS technology can be used in globalization. |
Language and Code Pages
| • | Internet Explorer Language Packs (http://www.microsoft.com/windows/ie/default.asp )--Download the language packs that let you view sites from around the world. |
| • | Unicode Consortium (http://www.unicode.org/ ) --The standards body for developing universal character encoding. |
| • | Multilingual World Wide Web Working Group (http://www.vicnet.net.au/ )--Useful information about code pages, how to use them, and browsers. |
Globalization and Localization
| • | The classic book on software localization, Developing International Software (http://msdn.microsoft.com/library/books/devintl/s24aa.htm ) by Nadine Kano. Out of print, but available online at MSDN. |
| • | Babel (http://babel.alis.com:8080/ ) contains articles and reference materials about building international Web sites. |
| • | MSDN: Going Global (http://www.microsoft.com/globaldev/default.mspx ) |
Sjoert Ebben is a program manager on the Web Essentials team in Ireland and has been with Microsoft for nearly four years. For the last three years he's been in the Internet fast lane, which corresponds to his driving style. When he's not spending his time on Internet business, his wife Cathérine and daughter Lilith make sure to catch him in their web.
Gwyneth Marshall is a program manager specializing in the globalization and localization of Web sites; currently she is working on CarPoint (http://carpoint.msn.com/). She dabbles in high finance on a global scale via her investment club.