After contributing to the redesign the web site (which I hope you like), the Dr. now has the time to answer another series of your questions.
I am fairly new to international programming, and I'm having a mental block trying to get my head around a resource compilation issue I can't find an explanation of in your excellent book, the web, or any online help. Please help!
I'm trying a very simple Unicode MFC dialog box application with three buttons, to show a message box saying "hello world" in either an English, Russian, or Greek character set. I'm using Visual Studio .NET's C++, on Windows XP. My system is in English US system locale, and that's what I used to generate the application.
As per p. 219 of your book, I've added a section to the resource file, changed the keyboard layout to Russian, and type the Cyrillic characters into the editor. When I try to save it, I get a message box telling me some characters will be lost unless I change the encoding. The options for encoding require that I choose what looks like a code page. Isn't Unicode supposed to eliminate that, or do I have to specify a code page for every text file because ultimately, they're just text?
How is it possible to save English, Russian, and Greek characters together in the same resource file? What are the #pragma codepage directives for in a resource file?
I really would appreciate any help you could provide. Thank you in advance.
Christopher
Dear Christopher,
The main reason you are having problems is because the Visual Studio does not handle Unicode encoded text. However, several workarounds do exist:
1. | Build a Unicode (UTF-16) .rc file. You cannot edit it in the resource editor of Visual Studio, but you can start with the file created by Visual Studio.
| ||||
2. | Create a set of ANSI-encoded .rc files (english.rc, russian.rc, greek.rc, etc.). Each of them has to be edited under corresponding system locale. Then you can take one of the following paths:
|
I develop applications for a trading company. I have a question about the replacement of the reverse solidus (aka backslash) with the Yen (Japanese currency) sign on the Japanese localized versions of Windows 2000, and with the Won (Korean currency) sign on the Korean localized version. Why is this done? I find the change rather confusing. Does it violate the Unicode Stadard, which assigns Reverse Solidus code point U+005c?
International Trader
Hello Trader,
This is a very good question that many people have asked. Let me take you back to how it all started.
| • | Legacy Japanese code-page 932 and Korean code-page 949 replaced the Reverse Solidus with the Yen and Won sign, respectively, at code point 0x5c. | ||||
| • | On systems that used these legacy code pages (e.g., DOS, Win3.1, Unix, VMS), the Yen/Won sign appeared in place of the reverse solidus in paths. | ||||
| • | This became the preferred appearance of paths on Japanese and Korean systems, regardless of code page. | ||||
| • | To achieve the preferred appearance, Windows NT implemented a change in GDI that changes Solidus to Yen/Won if:
|
So that is why you see these symbols used.
Note there's no Unicode conformance violation here - the font cmap is accurate Unicode. U+005C still point to the reverse solidus on all NT-based Windows systems.
How can I enter characters into Windows XP Notepad and Word XP if all I know is their Unicode encoding?
Numerically Challenged
Dear Challenged,
Both Notepad and Word XP have new functionality that allows you to do this.
Notepad on Windows XP:
Use the new <ALT> + <+> function for Notepad. What you do is while holding the
key press the
key on the numeric keypad followed by the Unicode hexadecimal number encoding (You need to remember to hold the
key down until after you have typed in the hexadecimal encoding).
For example, here is how to input the Latin Capital Letter Æ whose encoding is U+00C6. While holding down the
key, you would type the numeric keypad's
then ![]()
![]()
![]()
. After letting go of the
key, the "Æ" will appear in your Notepad file. (Note Preceding zeroes "0" can be left out, so you can also input "Æ" by only pressing the keypad's
then ![]()
while holding down the
key.) In summary the key sequence is:
+
+ <Unicode Hex Value>
Word 2002 (XP):
Since you know the Unicode (hexadecimal) value of a character, you can use the new ALT+X keyboard shortcut to enter the character directly in your document.
| • | Type the Unicode (hexadecimal) value of the character.Note: The value string can also begin with U+. |
| • | Press ALT+X. |
Microsoft Word 2002 replaces the string to the left of the insertion point with the character you specified.
So using our example above you would type "00c6" into your Word XP document then press ALT+X. This will replace the "00c6" with the "Æ" character.
I am writing to you on behalf of the Windows users in Thailand (I love Windows XP, by the way ;-) ). Many of us want to know how to switch to the Thai keyboard layout using the Grave Accent key.
Joe of Bangkok
Dear Joe,
How can I pass on helping thousands of fans and users?! I'm honored.
To enable this feature, you need to:
1. | Install Thai language support. Skip this step if you already have Thai support installed.
| ||||||||||||||||||||||||||||||||
2. | Change the Language for non-Unicode Programs/System Locale to Thai.
| ||||||||||||||||||||||||||||||||
3. | Load the Thai keyboard and change the switching hotkey to Grave Accent: For Windows XP/Server 2003:
Here's the illustration for steps 7, 8, and 9: ![]() For Windows 2000:
|
How can I make Visual Basic 6.0 recognize a string's character set? For instance, I have one VB program, it will retrieve data from data warehouse, then perform some processing. But the data is string type, which contain multi-language character. So, I must code the program to detect which character set the string is (how to get the lcid value?). Then perform the appropriate processing. Is there any VB function I can use to solve this problem?
Raj
Dear Raj,
That's an interesting question! The answer would have been much easier if you had to detect the script of the data while it's being entered. Then, you can safely assume that the data language is following the input language that the user is using to enter text.
In your case, I believe that you are given an existing text file and have no idea how to interpret its content. Unfortunately there is no API to detect the charset of a text string. But, since VB6 is not Unicode aware, if the data is not Unicode, you can always assume that the currently selected system locale of the system (GetDefaultSystemLCID) can be used to interpret the text file. If this solution is not good enough for you and you are doing your own black magic to provide a multilingual solution, then the only solution I see is to do a round-trip conversion between ANSI and Unicode of a text sample using different code pages. If after a round trip you find the original string, then you have right code page.
Here is an example of a full round trip (Unicode to ANSI and then back to Unicode):
// Loading an ANSI string from resources LoadString(g_hInst, IDS_ENUMESTRTEST, szANSI1, MAX_STR); // Convert to Unicode using 1256 code page MultiByteToWideChar(1256, 0, szANSI1, strlen(szANSI1), szUni, MAX_STR); // Converting the Unicode string back to to ANSI using 1256 code page again WideCharToMultiByte(1256, 0, szUni, _tcslen(szUni), szANSI2, MAX_STR, NULL, NULL);
If szANSI1 and szANSI2 are identical, then 1256 is the right code page for this conversion.
You have to do this for all available code pages that you can enumerate by a call to EnumSystemCodePages. Keep in mind that this is a very time consuming operation and you have to minimize its usage.
See you next time!
Dr. International
Windows Division