Global Development and Computing Portal Global Development and Computing Portal

Ask Dr. International

Column #1

Welcome to the first column of Dr. International. This is where the good Dr. will post answers to questions he gets asked most often.

On This Page
An aside on acronymsAn aside on acronyms
If everyone supports Unicode, then why are my strings broken?If everyone supports Unicode, then why are my strings broken?
Browser sniffing for i18NBrowser sniffing for i18N
Ask Dr. International HomeColumn 2: Arabic, Hijri Dates, Surrogates
*

An aside on acronyms

Dear Dr. International,

What does i18n stand for?

(From the Internet)

Dr. International replies:

This was the very first question that Dr. International ever asked, himself! It is an abbreviation for internationalization. It stands for the initial i, the terminating n, and the 18 characters in between those two letters. Usually people prefer to use a lowercase "i" in the front because some fonts do not distinguish well between uppercase I and lowercase L. Dr. International has not noticed strong feelings about the case of the final N at the end, so both i18n and i18N seem acceptable.

When it comes to Microsoft and especially the Windows International Division, i18n is usually broken down into two parts:

Globalization: Designing software for the input, display, and output of a defined set of supported language scripts and data relating to specific geographic areas. Coding areas include currency, date/time, number formatting, bidi awareness, mirroring, IME support, etc. (Bugs in this area are found with functional testing).

Localizability: Designing software code and resources such that said resources can be localized with no changes to the source code. Coding areas include no hard coded strings, to composite messages, allowing for text expansion, no text in graphics, etc. (Bugs in this area are found during localization - pseudo or real)

On a related note, L10n stands for localization, for much the same reasons behind i18n (and the uppercase L is often preferred because once again many fonts do not distinguish well between the lowercase L and the uppercase I.

Dr. International is going to try to provide a little bit of terminology with each column he does.

Top of pageTop of page

If everyone supports Unicode, then why are my strings broken?

Dear Dr. International,

I am using SQL Server 7.0, and I am trying to run an UPDATE query in an ASP page via ADO. I am building the query on the fly and the syntax I am using is:

UPDATE Table1 SET Col1 = 'New Text' WHERE id = 1000

The column data type is NTEXT. This SQL works well for English data, but if I try to insert Japanese data on my US English Windows 2000 server, all of the characters inserted into the database are broken. I thought that ADO was supposed to be Unicode?

(From the Internet)

Dr. International replies:

Yes, you are right. This is a problem that many people run into (and also happens with SQL Server 2000). It seems like every component (SQL Server, ADO, ASP) supports Unicode, so how could strings get corrupted?

Luckily, Dr. International can assure you that you are not crazy. Incidentally, the problem is not merely restricted to ADO and web-based interfaces; it can also happen in stored procedures that build UPDATE SQL statements on the fly.

The answer to this question is actually in the SQL Server Books Online, hidden in a topic named Using Unicode Data. At almost the very bottom of the topic, there is a very innocuous-seeming statement: "Unicode constants are specified with a leading N: N'A Unicode string'."

Here is the root of the problem! Since SQL Server 7.0 and 2000 support both Unicode and non-Unicode data types, they require you to use the "N" prefix in front of Unicode string literals. Even though ADO is sending the query as Unicode text (since all COM automation components have Unicode interfaces), SQL Server believes that the string literal 'New Text' is supposed to be on the default system code page (CP_ACP) and it converts it to that code page. Then, upon inserting the text to an NTEXT column, SQL Server converts it back to Unicode using CP_ACP. This causes a slight performance hit for the query because of the two extra conversions, but it also causes strings to be corrupted if they are not on your server's default system code page.

Now that Dr. International has diagnosed the problem, he can prescribe the cure: simply place that "N" prefix in front of the string literal you are using:

UPDATE Table1 SET Col1 = N'New Text' WHERE id = 1000

This will allow your data to be inserted without corruption, and also slightly faster!

Top of pageTop of page

Browser sniffing for i18N

Dear Dr. International,

I would like to be able to make changes to my ASP-based web site based on a client's language/locale. How can I do that?

(From the Internet)

Dr. International replies:

The technique of trying to get information from the client browser is usually referred to as "browser sniffing." Users can set their language in the browser. For example, in Internet Explorer you can choose Tools|Internet Options... and click on the Languages... button of the General tab to choose one or more preferred languages, as in the figure below.

Figure 1

Other browsers support the same functionality; Netscape 4.x and 6.x allow you set this information in Edit|Preferences. If you are using IE, You can also change your Regional Settings and have the browser automatically pick up your new language choice. This information is sent to the server in the form of a server variable known as HTTP_ACCEPT_LANGUAGE. You can retrieve it in ASP with VBScript code such as the following:

Dim stLang
stLang = Request.ServerVariables("HTTP_ACCEPT_LANGUAGE")

This string, now sitting in the stLang variable, can be used in many different ways to control the content of your site. For example, you can:

use the IIS 5.0 Server.Transfer or Server.Execute methods to move to a localized page.

use the VBScript 5.0 SetLocale function to change the locale of the page, which will affect the display of date/time, number, percent, and currency formats.

query a database for content (assuming some or all of your site is database driven).

provide localized content through components.

provide localized content directly in your ASP code.

The SetLocale function is new to VBScript 5.0; in prior versions of ASP you could set the Session.LCID property to get the same results, but this would require you to map the string returned by HTTP_ACCEPT_LANGUAGE to a Locale ID (LCID). This mapping can be seen in the Microsoft Knowledge Base article Q229690.

Unfortunately, if you select multiple languages, your HTTP_ACCEPT_LANGUAGE string will look something like:

en-gb,es;q=0.7,ar-sa;q=0.3

If you try to call SetLocale on this string, you will find out that it fails with a runtime error. The simplest way to handle this is some script to parse the string such as the following:

<%

Dim stQueryString
Dim stAcceptLang
Dim stSetLocaleLang
Dim ich

stQueryString = Request.ServerVariables("QUERY_STRING")
If Len(stQueryString) > 0 Then
    ' If there is a query string, preprend it to the accept language
    stLang = stQueryString + "," & _
    Request.ServerVariables("HTTP_ACCEPT_LANGUAGE")
Else
    ' No query string, just use the accept language
    stLang = Request.ServerVariables("HTTP_ACCEPT_LANGUAGE")
End if
stSetLocaleLang = AcceptLanguageToValidSetLocale(stLang)

%>

The script will allow you to override the settings, so that someone can explicitly choose a language. This can be easily done by prepending the Request.ServerVariables("QUERY_STRING") return to the return of Request.ServerVariables("HTTP_ACCEPT_LANGUAGE"). This will allow any valid locale (whether ISO string or LCID) to be passed in to the page, by optionally adding a ?<ISO-lang|LCID> to the call to the page.

To validate the actual locale to be used, the AcceptLanguageToValidSetLocale procedure parses the HTTP_ACCEPT_LANGUAGE string to retrieve the first legal locale that can be used in a call to SetLocale:

<%

Function AcceptLanguageToValidSetLocale(ByVal stLang)
' Take a raw HTTP_ACCEPT_LANGUAGE string and parse
' out the first locale that SetLocale will accept
Dim ich
Dim stMaybe

Do Until Len(stLang) = 0
    ich = InStr(1, stLang & ",", _
    ",", vbBinaryCompare)
    stMaybe = Left(stLang, ich - 1) & ";"
    stMaybe = Left(stMaybe, InStr(1, stMaybe, ";", _
    vbBinaryCompare) - 1)
    On Error Resume Next
    Call SetLocale(stMaybe)
    If Err.Number = 0 Then Exit Do
    On Error Goto 0
    stMaybe = vbNullString
    stLang = Mid(stLang, ich + 1)
Loop

If Len(stMaybe) = 0 Then
    ' No valid language was found, so choose a
    ' sensible default
    stMaybe = "en-us"
    Call SetLocale(stMaybe)
End If

AcceptLanguageToValidSetLocale = stMaybe
End Function

%>

To put it all together, you can look at the br_sniff.asp sample page. This page takes all of the items that were discussed here and allows you to use them. This page is in UTF-8 format because the multiple languages on different code pages cannot be supported by any other means. You can download the source ASP for this page as well in the sample for this column. Because only Internet Information Server 5.0 can handle ASP that has to set the @CODEPAGE and Session.CodePage to 65001 (UTF-8), you will have to delete these languages and change the code page in order to support NT4 and other locales.

If you are providing localized content, you may want to take this a step further and support a SELECT case statement something like the one in lang_map.asp (you can simply remove or comment out the languages you do not support).

You can download the source for two sample ASP files in the BrowserSniff.exe self-extracting executable.

Top of pageTop of page

See you next time!

Dr. International
Windows International Division

Ask Dr. International HomeColumn 2: Arabic, Hijri Dates, Surrogates
Top of pageTop of page