Dr. International usually tends to mix and match the various questions he is asked, but after looking over the vast array of questions, there does seem to be quite a proliferation of questions about locale IDs, also known as LCIDs. Therefore, this column will be devoted to covering many of the common questions he has received about these useful numbers.
| Decimal versus hex | |
| What's in a locale? | |
| Why all the duplicate information? | |
| Do-it-yourself LCIDs | |
| Will the Real Locale Please Stand? |
When I look at programs like Microsoft Office, the French files are kept in a directory named '1036', but Windows MUI uses a directory named '40C'. Why are they not consistent?
(from the Internet)
These are actually the same thing, believe it or not! One is the decimal representation, and the other is the hexadecimal representation, that's all.
Dr. International has searched long and hard for an explanation. Clearly, the original creators of the LCID model were using the hexadecimal form and thought of US-English as 0x0409, but at some point many products started using the decimal representation. Dr. International has had no luck, however, in trying to determine when it first started. The Doctore does admit it can get very confusing, at times.
But the important thing to keep in mind is that they mean the same thing. You will want to understand which numeric format is being used so you can properly interpret the value. Windows, most of the development done with the Windows Platform SDK, and the actual constant representations are in hexadecimal. Several programs such as Office, Visual Basic, SQL Server, and VBA use the decimal represenation. For your own work, you can use whichever makes you most comfortable.
In your first column you briefly mentioned LCIDs, but I am having a lot of trouble understanding just what is an LCID, anyway?
(from the Internet)
Simpy put an LCID is a Locale Identifier, a number that represents a language/region/sort combination that has been either partially or completely researched by people in Microsoft in regards to date, time, number, currency and other formatting, calendar preferences, input methods, sorting preferences, and more.
There are many aspects in supporting a locale under Windows (or any operating system):
| • | display, e.g. in Internet Explorer, in Notepad, or in Word |
| • | input, e.g. keyboards or input method editors (IMEs) |
| • | regional options, e.g. choosing a locale in the control panel applet |
| • | sorting/collation, e.g. the CompareString in the Win32 API or StrComp in VB/VBA |
| • | custom usage by other components and applications |
The Doctor will take these one at a time.
| • | Display is mainly an issue with fonts. Even when you use a Unicode-enabled product such as Word 2000, you will not be able to see text in some languages unless there is a font available to support the text. When this happens, you will only be able to store the information. As customers request that support for Gaelic, Yiddish, languages of Africa/native America, or others be added in future versions, they often do not realize that all of the support that is needed for display is already present for most languages! |
| • | Input of text does get better every version, though obviously that is no consolation if the language you read and write does not have an associated keyboard yet. This is an issue that the International division of Microsoft is constantly looking for ways to improve, including encouraging third parties to produce both keyboards and products that edit/create keyboards. |
| • | Regional options are in most cases a simple convenience. By choosing one of the locales listed in the regional options dialog either at setup time or afterwards in the control panel applet, all of the settings in the entire Regional Options dialog can be changed. You can override any of these settings that you like, which is why these options are mainly there as a convenience. One of the only exceptions to this would be in the case of attempting to provide the proper formatting for other locales in the case of global web sites (a topic that Dr. International discussed in his first column in the browser sniffing sample). |
| • | Sorting is in some ways one of the crucial issues that is behind an LCID. By specifying the means by which an application can determine how to compare two strings, issues as simple as sorted lists and as complex as database indexes can be handled. Although many sort orders are added with each version, it would likely be impossible for Windows to support every possible sort for every dialect and language. In many locales, people have learned to live with existing sort orders, which are tied to LCIDs even though there is no way to change a sort order for a given LCID (meaning in order to support proper sorting, you may have to sacrifice the regional options discussed previously). |
| • | Custom usage is when applications and components use LCIDs to help with their own unique problems and issues with changing behavior based on locale. This is often used to provide entirely new functionality (such as with Microsoft Office's proofing tools, which obviously must be at least language and often locale specific). Note that operating system support is not required for such an LCID to be used. |
The locale ID itself has several parts: the first ten bits are the primary language ID, which contains the language itself. The next six bits contain the sublanguge ID, which is often used to differentiate regions that share the same primary language ID. The next four bits represent the sort ID, which can differentiate between alternate sorting orders that might be used for the same language and region. The remaining 12 bits are reserved for future use and should always be zero.

I am writing a program to return the supported LCID values and associated data with the GetLocaleInfo API. When I displayed the LOCALE_SISO639LANGNAME and LOCALE_SISO3166CTRYNAME combinations, I was surprised to find several duplicates. Why does this happen?
(Confused about seeing double)
Dear Confused:
Dr. International does have to blush a bit after waxing on in the previous question about the wonderful features in LCIDs. Luckily you were able to to remind the Doctor that the LCID system has a few limitations. Any time you try to capture all of the information that can be both language and region specific in one place, you will hit a fair number of places where information is duplicated. There are many possible causes:
| • | More than one language is spoken in the same region, such as in the case with Hindi and Konkani, both of which are spoken in India. |
| • | There might be alternate sorting rules, such as in the case of Traditional and Modern Spanish. |
| • | There may be different scripts used in the same region for a single language, such as in the case of Azeri (which has both Latin and Cyrillic forms). |
In some of these cases, much of the other information stored in an LCID may also be duplicated. This is not a bug, just a limitation of trying to tie so many different functionalities into a single classification. It is an issue that is shared by many other products that use the ISO-639 language name and ISO-3166 country/region name combination to try to capture locale information, and one that Microsoft is aware of. Many of Dr. International's friends are looking into alternate ways of looking at the same information for future versions of Windows.
I am using an LCID-based system for my localized content just like Microsoft Office and other applications use, but I have to support some languages that are not listed, such as Tagalog. How can I make my own locale IDs for these languages?
(from the Internet)
The doctor had to dig a bit for this information, but it does indeed exist. Buried deep within Nadine Kano's Developing International Software for Windows 95 and Windows NT there can be found the following text:
The range for customized primary language IDs is 0x200 through 0x3FF, and the range for customized sublanguage IDs is 0x20 through 0x3F. (These ranges correspond to setting the high bit on each predefined ID.) You can use custom language IDs to tag resources, but the national language support functions (described in Chapter 5) will not accept them as parameters, and you cannot add custom locale information to the system registry.
Whether you wish to use a customized primary language ID or sublanguage ID is an interesting question. Any time there is already a primary language ID, you should use it; thus if you have a specific dialect of French that you wish to use, you would combine LANG_FRENCH (0x00C) with 0x20, making an LCID of 0x800C. If you are covering an ancient language that does not have a unique primary language ID assigned such as Latin, you can combine 0x200 with SUBLANG_DEFAULT (0x01), making an LCID of 0x0600. In both cases, the Doctor would want you to remember that none of the locale-specific functionalities in Windows will understand these custom locale IDs you create.
This is yet another question that people frequently ask ("how do I add my own LCIDs?") and for that, Dr. International would recommend you look at his advice from a previous column on keyboard layouts. If you can provide the Doctor with all of the information that is needed to justify adding a new locale ID, then he can promise that it will be fairly evaluated!
I am getting really confused about all these different locales. There is a user default, a system default, and now I am reading about even more for Windows 2000 and Office 2000. What is the difference between them all?
(from the Internet)
You are correct, there are many different locales, and it can be confusing to determine what each one means. A really good chart that does a nice comparsion and contrast of the various "locales" has been posted to the Global Software Development site, and it can be found here. It includes the definition of each term and important information such as whether it can be changed and how/whether it interacts with the MUI (Multilingual User Interface) edition of Windows 2000.
The only item not on that list is the Office UI language. New for Office 2000, this setting determines the UI language used by Office applications such as Word, Excel, Outlook, and Access. Like Windows 2000 MUI, it is a separate product that you must purchase, and is also specific to the currently logged in user. There is no API to retrieve the language, but all Office applications can retrieve the UI language by using the following VBA code:
Application.LanguageSettings.LanguageID(msoLanguageIDUI)
If you have a localized product that runs on Windows 2000 or works with Office 2000, it is worth considering whether you wish to integrate your product to use some of these settings. Your users will definitely be expecting the UI to change when they change the UI language, and might be very pleased if it works.
See you next time!
Dr. International
Windows International Division