Global Development and Computing Portal Global Development and Computing Portal

Ask Dr. International

Column #19

After contributing to the redesign the web site (which I hope you like), the Dr. now has the time to answer another series of your questions.

On This Page
Resource File EncodingResource File Encoding
The Yen, the Won, and the Reverse Solidus (aka, Backslash)The Yen, the Won, and the Reverse Solidus (aka, Backslash)
Entering Characters Using their Unicode Code PointsEntering Characters Using their Unicode Code Points
Switching to the Thai Keyboard Using the Grave Accent KeySwitching to the Thai Keyboard Using the Grave Accent Key
Detecting a String's Character SetDetecting a String's Character Set
Ask Dr. International, #18Ask Dr. International, #20
*

Resource File Encoding

Dear Dr. International,

I am fairly new to international programming, and I'm having a mental block trying to get my head around a resource compilation issue I can't find an explanation of in your excellent book, the web, or any online help. Please help!

I'm trying a very simple Unicode MFC dialog box application with three buttons, to show a message box saying "hello world" in either an English, Russian, or Greek character set. I'm using Visual Studio .NET's C++, on Windows XP. My system is in English US system locale, and that's what I used to generate the application.

As per p. 219 of your book, I've added a section to the resource file, changed the keyboard layout to Russian, and type the Cyrillic characters into the editor. When I try to save it, I get a message box telling me some characters will be lost unless I change the encoding. The options for encoding require that I choose what looks like a code page. Isn't Unicode supposed to eliminate that, or do I have to specify a code page for every text file because ultimately, they're just text?

How is it possible to save English, Russian, and Greek characters together in the same resource file? What are the #pragma codepage directives for in a resource file?

I really would appreciate any help you could provide. Thank you in advance.

Christopher

Dr. International replies:

Dear Christopher,

The main reason you are having problems is because the Visual Studio does not handle Unicode encoded text. However, several workarounds do exist:

1.

Build a Unicode (UTF-16) .rc file. You cannot edit it in the resource editor of Visual Studio, but you can start with the file created by Visual Studio.

In the IDE, create language-tagged copies of localizable resources. The steps to insert those copies are described in Knowledge Base article 198846. Save the file in Unicode. To open this file, you need an editor that supports Unicode - Visual Studio won't help, but you can use Notepad.

Use source editor (not resource editor) or some other text editor capable to save plain text files in UTF-16 to translate the resources and update the file.

A sample can be found on MSDN: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vcsample/html/vcconUniresSampleDemonstratesUseOfUnicodeResourceFiles.asp. Note that the file can grow fairly long, and this can cause an error in the compilation of resources. The solution will be to split the rc-file in parts, compile them separately and join the resulting res-files together with a binary copy command, as the KB article 76714 suggests. The same solution may work with a set of non-Unicode rc-files, as outline in point 2 below.

2.

Create a set of ANSI-encoded .rc files (english.rc, russian.rc, greek.rc, etc.). Each of them has to be edited under corresponding system locale. Then you can take one of the following paths:

Compile the rc-files into separate res-files (english.res, russian.res, greek.res). Make sure that codepage and language ID are specified for each of the files - either in the file itself or as a compilation flag. The set of .res files can be merged into one .res file (copy /b english.res + russian.res + greek.res result.res), which is linked with the project; or you can simply link multiple res-files with your project.

Use the #include directive add multiple rc-files to the build process, as specified in KB article 76714. In this configuration both codepage and language must be declared within the rc-file.

The solution with multiple rc-files is an important step towards the satellite resource DLLs, the recommended approach for dealing with multilingual resources.

Top of pageTop of page

The Yen, the Won, and the Reverse Solidus (aka, Backslash)

Dear Dr. International,

I develop applications for a trading company. I have a question about the replacement of the reverse solidus (aka backslash) with the Yen (Japanese currency) sign on the Japanese localized versions of Windows 2000, and with the Won (Korean currency) sign on the Korean localized version. Why is this done? I find the change rather confusing. Does it violate the Unicode Stadard, which assigns Reverse Solidus code point U+005c?

International Trader

Dr. International replies:

Hello Trader,

This is a very good question that many people have asked. Let me take you back to how it all started.

Legacy Japanese code-page 932 and Korean code-page 949 replaced the Reverse Solidus with the Yen and Won sign, respectively, at code point 0x5c.

On systems that used these legacy code pages (e.g., DOS, Win3.1, Unix, VMS), the Yen/Won sign appeared in place of the reverse solidus in paths.

This became the preferred appearance of paths on Japanese and Korean systems, regardless of code page.

To achieve the preferred appearance, Windows NT implemented a change in GDI that changes Solidus to Yen/Won if:

the font is one of the ones used for Japanese or Korean UI (MS UI Gothic for Japanese, and Gulim for Korean), and

the system locale is either Korean or Japanese

So that is why you see these symbols used.

Note there's no Unicode conformance violation here - the font cmap is accurate Unicode. U+005C still point to the reverse solidus on all NT-based Windows systems.

Top of pageTop of page

Entering Characters Using their Unicode Code Points

Dear Dr. International,

How can I enter characters into Windows XP Notepad and Word XP if all I know is their Unicode encoding?

Numerically Challenged

Dr. International replies:

Dear Challenged,

Both Notepad and Word XP have new functionality that allows you to do this.

Notepad on Windows XP:

Use the new <ALT> + <+> function for Notepad. What you do is while holding the Alt key key press the Plus key key on the numeric keypad followed by the Unicode hexadecimal number encoding (You need to remember to hold the Alt key key down until after you have typed in the hexadecimal encoding).

For example, here is how to input the Latin Capital Letter Æ whose encoding is U+00C6. While holding down the Alt key key, you would type the numeric keypad's Plus key then 0 key0 keyc key6 key. After letting go of the Alt key key, the "Æ" will appear in your Notepad file. (Note Preceding zeroes "0" can be left out, so you can also input "Æ" by only pressing the keypad's Plus key then c key6 key while holding down the Alt key key.) In summary the key sequence is:

Alt key + Plus key + <Unicode Hex Value>

Word 2002 (XP):

Since you know the Unicode (hexadecimal) value of a character, you can use the new ALT+X keyboard shortcut to enter the character directly in your document.

Type the Unicode (hexadecimal) value of the character.Note: The value string can also begin with U+.

Press ALT+X.

Microsoft Word 2002 replaces the string to the left of the insertion point with the character you specified.

So using our example above you would type "00c6" into your Word XP document then press ALT+X. This will replace the "00c6" with the "Æ" character.

Top of pageTop of page

Switching to the Thai Keyboard Using the Grave Accent Key

Dear Dr. International,

I am writing to you on behalf of the Windows users in Thailand (I love Windows XP, by the way ;-) ). Many of us want to know how to switch to the Thai keyboard layout using the Grave Accent key.

Joe of Bangkok

Dr. International replies:

Dear Joe,

How can I pass on helping thousands of fans and users?! I'm honored.

To enable this feature, you need to:

1.

Install Thai language support. Skip this step if you already have Thai support installed.

In Windows XP/Server 2003, Thai is part of the complex script and right-to-left languages. Click here for the steps.

In Windows 2000, Thai has its own language group. Click here for the steps.

2.

Change the Language for non-Unicode Programs/System Locale to Thai.

For steps in Windows XP/Server 2003, click here.

In Windows 2000, click here.

3.

Load the Thai keyboard and change the switching hotkey to Grave Accent:

For Windows XP/Server 2003:

1.

Open Regional and Language Options in Control Panel

2.

Click on the Languages tab

3.

Under "Text services and input languages," click on the "Details..." button

Text services

4.

Under Installed Services, click "Add..."

Installed Services

5.

In the Add Input Language dialog box, select Thai from the "Input language" drop-down list and the Thai keyboard you want from the "Keyboard layout/IME drop-down list. Click OK to confirm and exit dialog

6.

Click Apply on the Text Services and Input Languages page

7.

Click on "Key Settings" in Preferences. The Advanced Key Settings dialog appears

8.

Click on "Change Key Sequence". The Change Key Sequence dialog appears

9.

Click the Grave Accent radio button in this dialog, click OK three times to exit from Regional and Language Options

Here's the illustration for steps 7, 8, and 9:

Steps 7,8,9

For Windows 2000:

1.

Open Regional and Language Options in Control Panel

2.

Click on the Input Locales tab

3.

On the Input Locales tab, click Add

Input Locales tab

4.

In the Add Input Language dialog box, select Thai from the "Input language" drop-down list and and the Thai keyboard you want from the "Keyboard layout/IME drop-down list. Click OK to confirm and exit.

5.

Click Apply on the Regional Options page

6.

Click on "Change Key Sequence". The Change Key Sequence dialog appears

7.

Click the Grave Accent radio button in this dialog, click OK twice to exit from Regional and Language Options

Here's the illustration for steps 5, 6, and 7:

Steps 5, 6, 7

Top of pageTop of page

Detecting a String's Character Set

Dear Doc,

How can I make Visual Basic 6.0 recognize a string's character set? For instance, I have one VB program, it will retrieve data from data warehouse, then perform some processing. But the data is string type, which contain multi-language character. So, I must code the program to detect which character set the string is (how to get the lcid value?). Then perform the appropriate processing. Is there any VB function I can use to solve this problem?

Raj

Dr. International replies:

Dear Raj,

That's an interesting question! The answer would have been much easier if you had to detect the script of the data while it's being entered. Then, you can safely assume that the data language is following the input language that the user is using to enter text.

In your case, I believe that you are given an existing text file and have no idea how to interpret its content. Unfortunately there is no API to detect the charset of a text string. But, since VB6 is not Unicode aware, if the data is not Unicode, you can always assume that the currently selected system locale of the system (GetDefaultSystemLCID) can be used to interpret the text file. If this solution is not good enough for you and you are doing your own black magic to provide a multilingual solution, then the only solution I see is to do a round-trip conversion between ANSI and Unicode of a text sample using different code pages. If after a round trip you find the original string, then you have right code page.

Here is an example of a full round trip (Unicode to ANSI and then back to Unicode):

// Loading an ANSI string from resources
LoadString(g_hInst, IDS_ENUMESTRTEST, szANSI1, MAX_STR);

//  Convert to Unicode using 1256 code page
MultiByteToWideChar(1256, 0, szANSI1, strlen(szANSI1), szUni, MAX_STR);

// Converting the Unicode string back to to ANSI using 1256 code page again
WideCharToMultiByte(1256, 0, szUni, _tcslen(szUni), szANSI2, MAX_STR, NULL, NULL);

If szANSI1 and szANSI2 are identical, then 1256 is the right code page for this conversion.

You have to do this for all available code pages that you can enumerate by a call to EnumSystemCodePages. Keep in mind that this is a very time consuming operation and you have to minimize its usage.

See you next time!

Dr. International
Windows Division

Ask Dr. International, #18Ask Dr. International, #20
Top of pageTop of page