Click Here to Install Silverlight*
Middle EastChange|All Microsoft Sites
Microsoft

Comparing and Sorting Text for Arabic Cultures in Visual Studio .NET

Content



Introduction

Visual Studio .NET has a variety of classes to support string manipulation. Alphabetical order and conventions for comparing items vary from culture to culture. Arabic is a very rich language that contains several things to consider while comparing and searching strings. This article will discuss and demonstrate to what extent the .NET Framework and Visual Studio .NET support Arabic.

Comparing Strings

The String class contains the String.Compare method. This method compares two specified String objects, ignoring or honoring their case, and honoring culture-specific information. It returns a negative integer if string1 is less than string2, zero (0) if string1 and string2 are equal, and a positive integer if string1 is greater than string2. The String.Compare method uses the information in the CultureInfo.CompareInfoproperty to compare strings. CultureInfo.CompareInfoproperty class provides a set of methods you can use to perform culture-sensitive string comparisons.

The following code example illustrates how two strings can be evaluated differently by the String.Compare and the String.CompareOrdinal. First, the CurrentCulture is set to Arabic and the strings "محمد" and "مــــحـــمـــــد" are compared using String.Compare. The Arabic language treats the Kashida character as non existent. Therefore, both strings are equal for the Arabic culture, the return is 0. Second, the same two strings are compared using String.CompareOrdinal. This method performs binary search and does not identify the culture information. Therefore "مــــحـــمـــــد" is not equal to "محمد".

ImportsSystem.Globalization
Imports System.Threading
      .
      .
      .
        Dim Str1 As String
        Dim Str2 As String
        ' Set the CurrentCulture to Arabic.
        Dim ArabicCult As CultureInfo() = New CultureInfo("ar")
          Str1 = "محمد"
          Str2 = "مــــحـــمـــد"
        Dim Result As Integer
          Result = String(Str1, Str2, false, ArabicCult)
        Dim ResultStr As String
          ResultStr = "The result of String.Compare "
          ResultStr += Str1 + " and " + Str2 + " is : " + CStr(Result)
          MessageBox.Show(ResultStr)
        'Use the Binary comparison
          Result = String(Str1, Str2)
          ResultStr = "The result of String.CompareOrdinal "
          ResultStr += Str1 + " and " + Str2 + " is : " + CStr(Result)
          MessageBox.Show(ResultStr)

If you execute this code, the output appears as follows:

Message box text:

The result of String.Compare محمد and مــــحـــمـــد is : 0

Followed by, message box text:

The result of String.CompareOrdinal محمد and مــــحـــمـــد is : -19

For more information on comparing strings, see String Class and Comparing Strings in the .NET Framework Class Library.


Advanced String Comparison

Enumeration Options Description
IgnoreNonSpace Indicates that the string comparison must ignore the Alef-hamza, in the case of Arabic
IgnoreSymbols Indicates that the string comparison must ignore symbols, such as white-space characters, punctuation, currency symbols, the percent sign, mathematical symbols, the ampersand, and so on. In addition to ignoring diacritics, in the case of Arabic.
CompareOrdinal Indicates that the string comparison must be done using the Unicode values of each character, which is a fast comparison but is culture-insensitive. This is important if you want to disable any culture specific search like ignoring the kashida.

The main drawback of using the IgnoreSymbols is that you are forced to ignore symbols if you want to ignore diacritics.

The following is a code example of comparing two strings while ignoring diacritics and hamza. First, the CultureInfo is set to Arabic and the strings "أخْرَجْنَا" and "اخرجنا" are compared using CompareInfo.Compare with no additional flags. Both strings are not equal and the return is 1. Second, the same two strings are compared using CompareInfo.Compare, with the CompareOptions.IgnoreSymbols and the CompareOptions.IgnoreSymbols. This method ignores the diacritics and hamza. Therefore "أخْرَجْنَا" is equal to "اخرجنا".

        Dim Str1 As String
        Dim Str2 As String
        'Set the CurrentCulture to Arabic.
        Dim ArabicCult As CultureInfo() = New CultureInfo("ar")
          Str1 = "أخْرَجْنَا"
          Str2 = "اخرجنا"
        Dim Result As Integer
        Dim ResultStr As String
          Result = ArCulture.CompareInfo.Compare(Str1, Str2)
          ResultStr = "The result of comparing text without CompareOptions "
          ResultStr += Str1 + " and " + Str2 + " is : " + CStr(Result)
          MessageBox.Show(ResultStr)
        'Use the Binary comparison
          Result = ArCulture.CompareInfo.Compare(Str1, Str2,
          CompareOptions.IgnoreNonSpace Or CompareOptions.IgnoreSymbols)
          ResultStr = "The result of comparing text with CompareOptions"
          ResultStr += Str1 + " and " + Str2 + " is : " + CStr(Result)
          MessageBox.Show(ResultStr)


Searching Strings

You can use the overloaded CompareInfo.IndexOf method to return the zero-based index of a character or substring within a specified string. The method returns a negative integer if the character or substring is not found in the specified string. You may still use the CompareOptions.IgnoreSymbols and the CompareOptions.IgnoreSymbols as described earlier.

The following code example illustrates the usage of CompareInfo.IndexOf(string, char) method. A CultureInfo object is created for "ar" (Arabic). Next, the CompareInfo.IndexOf method is used to search for the string "احمد" in the string " أحمدُ و عُمر " .

        Dim Str1 As String
        Dim Str2 As String
        Dim Char As Char
        ' Set the CurrentCulture to Arabic.
        Dim ArabicCult As CultureInfo() = New CultureInfo("ar")
          Str1 = "أحمدُ و عُمر"
          Str2 = "احمد"
        Dim Result As Integer
        Dim ResultStr As String
          Result = ArCulture.CompareInfo.IndexOf(Str1, Str2,
          CompareOptions.IgnoreNonSpace Or CompareOptions.IgnoreSymbols)
          ResultStr = "The result of comparing " + Str1
          ResultStr += " with string " + Str2 + " is : " + CStr(Result)
          MessageBox.Show(ResultStr)



Sorting Strings

The Array class provides an overloaded Array.Sort method that allows you to sort arrays based on the CultureInfo.CurrentCulture property.

In the following example, an array of three strings is created. First, the CurrentCulture is set to "ar-EG" and the Array.Sort method is called. The resulting sort order is based on sorting conventions for the "ar-EG" culture.


Dim Str1 As [String] = "عمـر"
        Dim Str2 As [String] = "عُمر"
        Dim Str3 As [String] = "عُامر"
        Dim Str4 As [String] = "أحمد"
        Dim Str5 As [String] = "احمد"
        ' Create and initialize a new Array instance to store
        Dim stringArray As Array = Array.CreateInstance(GetType([String]),5)
          stringArray.SetValue(Str1, 0)
          stringArray.SetValue(Str2, 1)
          stringArray.SetValue(Str3, 2)
          stringArray.SetValue(Str4, 3)
          stringArray.SetValue(Str5, 4)
        ' Display the values of the Array.
          PrintIndexAndValues("Before Sorting ", stringArray)
        ' Set the CurrentCulture to "ar-eg".
          Thread.CurrentThread.CurrentCulture = New CultureInfo("ar-eg")
        ' Sort the values of the Array.
          Array.Sort(stringArray)
        ' Display the values of the Array.
          PrintIndexAndValues("After Sorting ", stringArray)
        End Sub
        Public Shared Sub PrintIndexAndValues(ByVal Str As String, ByVal
          myArray As Array)
        Dim i As Integer
        Dim DispStr As String = Str
        For i = myArray.GetLowerBound(0) To myArray.GetUpperBound(0)
          DispStr += CStr(i) + " : " + myArray.GetValue(i) + " Next "
        Next i
          MessageBox.Show(DispStr)
        End Sub

If you execute this code, the output appears as follows:

Message box text:

Before Sorting 0 : عمرا Next 1 : عُمر Next 2 : عُامر Next 3 : أحمد Next 4 : احمد Next

Followed by, message box text:

After Sorting 0 : احمد Next 1 : أحمد Next 2 : عُامر Next 3 : عُمر Next 4 : عمـر Next




Download Demo

Click here to download demo



©2016 Microsoft Corporation. All rights reserved. Contact Us |Terms of Use |Trademarks |Privacy Statement
Microsoft