Training
Certifications
Books
Special Offers
Community




 
Inside C#, Second Edition
Author Tom Archer, Andrew Whitechapel
Pages 912
Disk 1 Companion CD(s)
Level All Levels
Published 04/24/2002
ISBN 9780735616486
Price $49.99
To see this book's discounted price, select a reseller below.
 

More Information

About the Book
Table of Contents
Sample Chapter
Index
Companion Content
Related Series
Related Books
About the Author

Support: Book & CD

Rate this book
Barnes Noble Amazon Quantum Books

 


Chapter 10: String Handling and Regular Expressions continued


Regular Expressions

The System.Text namespace provides a number of classes for regular expression processing. Regular expressions offer a powerful, flexible, and efficient strategy for processing text. The .NET Framework regular expressions have evolved from languages such as Perl and awk and are designed to be compatible with Perl 5 regular expressions. In addition, the .NET regular expressions include unique features such as right-to-left matching and on-the-fly compilation. Using regular expressions, you can quickly parse large amounts of text to find specific character patterns—to extract, edit, replace, or delete text substrings. These features are particularly useful for parsing HTML pages, http headers, XML files, system log files, and so on.

The regular expression language includes two basic character types: literal (normal) text characters and metacharacters. Regular expression metacharacters are an evolved extension of the ? and * metacharacters used with the MS-DOS file system to represent any single character or group of characters. The most commonly used metacharacters in the regular expression pattern syntax are listed in Table 10-5.

Table 10-5 Common Regular Expression Metacharacters

ExpressionMeaning
.Matches any character except \n
[characters]Matches a single character in the list
[^characters]Matches a single character not in the list
[charX-charY]Matches a single character in the specified range
\wMatches a word character; same as [a-zA-Z_0-9]
\WMatches a nonword character
\sMatches a whitespace character; same as [\n\r\t\f]
\SMatches a nonwhitespace character
\dMatches a decimal digit; same as [0-9]
\DMatches a nondigit character
^Beginning of the line
$End of the line
\bOn a word boundary
\BNot on a word boundary
*Zero or more matches
+One or more matches
?Zero or one matches
{n}Exactly n matches
{n,}At least n matches
{n,m}At least n but no more than m matches
( )Capture matched substring
(?<name>)Capture matched substring into group name
|Logical OR

For example, when applied to a body of text, the regular expression \sFoo matches all occurrences of the string "Foo" that are preceded by any whitespace character (such as space, tab, or carriage return/linefeed).

As a simple introduction to regular expressions, let's revisit the String.Split method from the previous section, which splits a string into substrings according to specified separators. The Regex class in the System.Text.RegularExpression namespace can be used in the same way. Instead of setting up an array of characters containing the delimiters, we pass a parenthesized set of delimiter values to the Regex constructor:

class SplitRegExApp
{
    static void Main(string[] args)
    {
        string s = "Once Upon A Time In America";
//        char[] seps = new char[]{' '};
        Regex r = new Regex(" ");
//        foreach (string ss in s.Split(seps))
        foreach (string ss in r.Split(s))
        {
            Console.WriteLine(ss);
        }
    }
}

Here's the output:

Once
Upon
A
Time
In
America

Let's continue our comparison of the String and Regex features for splitting strings. We can also split strings on the basis of multiple different delimiters:

Console.WriteLine(
    "\nMultiple different delimiters:");
string t = "Once,Upon:A/Time\\In\'America";
//char[] sep2 = new char[]{
//    ' ', ',', ':', '/', '\\', '\''};
Regex q = new Regex(@" |,|:|/|\\|\'");
//foreach (string ss in t.Split(sep2))
foreach (string ss in q.Split(t))
{
    Console.WriteLine(ss);
}

Note the way we specify multiple delimiters to Regex by using the bitwise OR (|) operator. Now that our delimiters include such characters as the backslash and single quote, we must ensure that the string is @-quoted to escape the special meaning of these delimiters. Let's see what happens when we try our version of the string with multiple spaces:

Console.WriteLine(
    "\nMultiple spaces, using \" \"");
string u = "Once   Upon A Time In   America";
//char[] sep3 = new char[]{' '};
Regex p = new Regex(" ");
//foreach (string ss in u.Split(sep3))
foreach (string ss in p.Split(u))
{
    Console.WriteLine(ss);
}

As you can see from the output, this result isn't quite right:

Multiple spaces, using " "
Once
 

Upon
A
Time
In
 

America

What's going on here? Our regular expression search pattern is a single space, so when the engine finds two spaces together, it splits between them. Furthermore, because the spaces are being discarded, we end up with an empty string. For instance, there are three spaces between "Once" and "Upon", so the sequence "Once Upon" is split four times: between "Once" and the following space, between that first space and the second (which results in an empty string), between the second and the third (another empty string), and between the third and "Upon". Therefore, we end up with four substrings: "Once", "", "", and "Upon". Can we fix this problem? Yes we can—the obvious solution is to test for empty strings upon output:

string u = "Once   Upon A Time In   America";
Regex p = new Regex(" ");
foreach (string ss in p.Split(u))
{
    if (ss.Length>0)
        Console.WriteLine(ss);
}

OK, it's working, but it seems a bit of a bodge, doesn't it? Isn't there a better way? The answer is yes. In the following example—which produces the same output—we specify \s as the pattern to match. Regex will interpret \s as any single whitespace character. We need to place the at sign (@) in front of the string, or the compiler will step in and complain about \s being an unrecognized escape sequence. Finally, we add a plus sign (+) to the end of the pattern to signify that we're happy to match multiple instances of the pattern—in this case, multiple instances of whitespace:

Console.WriteLine(
    "\nMultiple spaces, using \"[\\s]+\"");
string v = "Once   Upon A Time In   America";
Regex o = new Regex(@"[\s]+");
foreach (string ss in o.Split(v))
{
    Console.WriteLine(ss);
}

If we're concerned only with spaces and not other whitespace characters (such as tabs), we can reduce the expression to this:

Regex n = new Regex("[ ]+");

Finally, instead of using the square brackets to surround our search pattern, we can use parentheses—another of the metacharacters recognized by Regex. What difference do they make? Let's take our first (simplest) example:

Console.WriteLine(
    "\nSingle spaces, using ()");
string x = "Once Upon A Time In America";
Regex m = new Regex("( )");
foreach (string ss in m.Split(x))
{
    Console.WriteLine(ss);
}

Here's the output:

Single spaces, using ()
Once
 
Upon
 
A
 
Time
 
In
 
America

This time, we don't have empty strings between each substring. We now have now 11 substrings: the parentheses cause Regex to keep, or capture, the delimiters instead of discarding them. In a more sophisticated situation where we're not just splitting a string but performing some other modifications to it, we might want to keep the delimiters for other processing. The foregoing examples use regular expressions in a fairly simple manner, just to compare the Regex and String classes as closely as possible. We'll now see how to use regular expressions in a more powerful fashion.

Match and MatchCollection

The System.Text namespace also offers a Match class and a MatchCollection class. The Match class represents the results of a regular expression- matching operation. A Match object is immutable, and the Match class has no public constructor. Therefore, you can get a Match only from another class, such as Regex. In the following example, we use the Match method of the Regex class to return an object of type Match in order to find the first match in the input string. We also use the Match.Success property to indicate whether a match was indeed found.

class MatchingApp
{
    static void Main(string[] args)
    {
        Regex r = new Regex("in");
        Match m = r.Match("Matching");
        if (m.Success)
        {
            Console.WriteLine(
                "Found '{0}' at position {1}",
                m.Value, m.Index);
        }
    }
}

The output from this application is:

Found 'in' at position 5

Note that if we'd initialized the Regex object with "capturing" parentheses, the effect would be exactly the same:

Regex r = new Regex("(in)");

OK, but what if there are multiple occurrences of the pattern in the string? For this, we need to use the MatchCollection class. Like Match, this class is immutable and has no public constructor. In the following example, we use the same Regex object previously initialized to search for the pattern "in" and apply it to a longer string with multiple occurrences of the pattern. The results are returned in a MatchCollection object, which we can then iterate. We can also use the indexer to treat the collection as an array.

MatchCollection mc = r.Matches(
    "The King Was in His Counting House");
for (int i = 0; i < mc.Count; i++)
{
    Console.WriteLine(
        "Found '{0}' at position {1}",
        mc[i].Value, mc[i].Index);
}

The output from this new block of code is:

Found 'in' at position 5
Found 'in' at position 13
Found 'in' at position 25

The Match class stores and provides access to all the substrings extracted by the search. Match also remembers the string being searched and the regular expression being used, so it can use them to perform another search that starts where the last one ended. Therefore, we can also perform the previous search operation by using the following code—we find the first match, and as long as this succeeds, we continue searching with a call to Match.NextMatch:

string s2 = "The King Was in His Counting House";
Match m2; 
for (m2 = r.Match(s2); m2.Success; 
    m2 = m2.NextMatch())
{
    Console.WriteLine(
        "Found '{0}' at position {1}",
        m2.Value, m2.Index);
}

Suppose we only want to search for the pattern "in" as a word— when "in" occurs after and before a space. This situation is almost too trivial to mention; just bear in mind that the regular expression classes can search for any pattern you care to imagine:

Regex q = new Regex(" in ");
MatchCollection mm = q.Matches(
    "The King Was in His Counting House");
for (int i = 0; i < mm.Count; i++)
{
    Console.WriteLine(
        "Found '{0}' at position {1}",
        mm[i].Value, mm[i].Index);
}

The output from this new block of code is:

Found ' in ' at position 12

Finally, suppose we want to match multiple instances of multiple patterns:

Regex p = new Regex("((an)|(in)|(on))");
MatchCollection mn = p.Matches(
    "The King Kong Band Wins Again");
for (int i = 0; i < mn.Count; i++)
{
    Console.WriteLine(
        "Found '{0}' at position {1}",
        mn[i].Value, mn[i].Index);
}

The output from this new block of code is:

Found 'in' at position 5
Found 'on' at position 10
Found 'an' at position 15
Found 'in' at position 20
Found 'in' at position 27

Note that we can alternatively write the regular expression just shown like this:

Regex p = new Regex("(a|i|o)n");

This alternative pattern matching can be extended to a technique named backtracking. Backtracking occurs when the regular expression- matching engine needs to back up to re-examine part of the string that it's passed. For example, suppose we're looking for either spelling of the word "Gray": "Gray" or "Grey". Suppose that in a given string we have the substring "Grey". When the engine examines this string and finds the pattern "Gr", it must choose to compare the next character against the letter "a" or "e". Suppose it chooses to match "a". This comparison fails, so the engine must backtrack to try to match "e".

Regex n = new Regex("Gr(a|e)y");
MatchCollection mp = n.Matches(
    "Green, Grey, Granite, Gray");
for (int i = 0; i < mp.Count; i++)
{
    Console.WriteLine(
        "Found '{0}' at position {1}",
        mp[i].Value, mp[i].Index);
}

The output from this new block of code is:

Found 'Grey' at position 7
Found 'Gray' at position 22

Groups and Captures

The System.Text namespace also offers a Group class and a GroupCollection class. The Group class represents the results from a single regular expression-matching group. In the following example, we define three groups, "ing", "in", and "n", and then search the string "Matching" to find these patterns. As you can see, the Match class offers a Groups property that returns a GroupCollection object, and we can use an integer indexer into the GroupCollection to extract individual Group objects:

class GroupingApp
{
    static void Main(string[] args)
    {
        // Define groups 'ing', 'in', 'n'
        Regex r = new Regex("(i(n))g");
        Match m = r.Match("Matching");
        GroupCollection gc = m.Groups;
 
        Console.WriteLine(
            "Found {0} Groups", gc.Count);
        for (int i = 0; i < gc.Count; i++)
        {
            Group g = gc[i];
            Console.WriteLine(
                "Found '{0}' at position {1}",
                g.Value, g.Index);
        }
    }
}

The output from this application is:

Found 3 Groups
Found 'ing' at position 5
Found 'in' at position 5
Found 'n' at position 6

Note that the for loop just shown could've been written to use the Capture and CaptureCollection classes explicitly. The Capture class contains the results from a single subexpression capture, while the CaptureCollection class represents a sequence of substrings captured by a single capturing group:

for (int i = 0; i < gc.Count; i++) 
{
    CaptureCollection cc = gc[i].Captures; 
    for (int j = 0; j < cc.Count; j++) 
    {
        Capture cap = cc[j];
        Console.WriteLine(
            "Found '{0}' at position {1}",
            cap.Value, cap.Index); 
    }
}

The relationship between matches, groups, and captures is indicated in Figure 10-2.

Click to view graphic
Click to view graphic

Figure 10-2 Matches, groups, and captures.

The Group class becomes much more powerful when used with named groups. You can make Regex put the captured substrings into Group objects with arbitrary names and then use these names via the GroupCollection string indexer:

Regex q = new Regex(
    "(?<something>\\w+):(?<another>\\w+)");
Match n = q.Match("Salary:123456");
Console.WriteLine(
    "{0} = {1}", 
    n.Groups["something"].Value,
    n.Groups["another"].Value);

The output from this new block of code is:

Salary = 123456

Table 10-6 shows how the regular expression you just saw breaks down.

Table 10-6 Breakdown of a Typical Regular Expression

ElementDescription
?<something>Capture the matched substring into a group named "something".
\Escape the following expression, which has a special meaning to Regex.
\wA pattern that matches any "word" character (in other words, any alphabetic, numeric, or underscore character)—the same pattern as [a-zA-Z_0- 9].
+Allow for multiple instances of this pattern (in this case, any "word" characters).
:Split the string at this delimiter.
?<another>\\w+Capture the matched substring into a group named "another", matching any "word" characters.

String-Modifying Expressions

In addition to parsing strings to search for patterns by using methods such as Split, Match, and Matches, we can use methods in the Regex class for stripping out substrings, joining substrings, and generating modified strings. You can use Regex.Replace to perform common operations such as stripping leading and/or trailing whitespace, tokenizing or modifying pathnames, and splitting or joining lines of text. For example, to strip leading whitespace, we can initialize a Regex object with a regular expression that matches any number of whitespace characters at the beginning of a line (such as "^\s+") and then use Regex.Replace to replace all these characters with an empty string:

class RXmodifyingApp
{
    static void Main(string[] args)
    {
        string s = "     leading";
        string e = @"^\s+";
        Regex rx = new Regex(e);
        string r = rx.Replace(s,"");
        Console.WriteLine("Strip leading space : {0}", r);
    }
}

The output from this application is:

Strip leading space: leading

Table 10-7 breaks down the regular expression you just saw.

Table 10-7 Breakdown of Another Typical Regular Expression

ElementDescription
@"Escape" the \ in the pattern so that \s is treated as a single regular expression metacharacter.
^At the beginning of the line.
\s.match any whitespace character (space, tab, and so on)
+.and any number of them.

The Regex class offers instance methods such as Split, Replace, and Match as well as static equivalents; therefore, you don't even have to instantiate a Regex object. This feature is particularly useful if you want to perform a series of regular expression operations. Because the Regex object is immutable, it might be more useful to use the static methods. The previous code can thus be rewritten like this:

//rx = new Regex(e);
//string r = rx.Replace(s,"");
string r = Regex.Replace(s, e, "");

By the same token—no pun intended—we can strip trailing spaces, modify pathnames, and convert date formats:

s = "trailing    ";
e = @"\s+$";
r = Regex.Replace(s, e, "");
Console.WriteLine("Strip trailing space: {0}",  r);
 
Console.WriteLine();
s = @"C:\Documents and Settings\user1\Desktop\ ";
r = Regex.Replace(s, @"\\user1\\", @"\user2\") ; 
Console.WriteLine(
    "Modify path:\n\t{0}\n\t{1}", s, r);
 
Console.WriteLine();
s = @"c:\foo\bar\file.txt";
e = @"^.*\\";
r = Regex.Replace(s, e, "");
Console.WriteLine(
    "Strip path from filename: {0}", r);
 
Console.WriteLine();
s = "03/16/57";
e = 
    "(?<mm>\\d{1,2})/(?<dd>\\d{1,2})/ (?<yy>\\d{2,4})";
string e2 = "${dd}-${mm}-${yy}";
r = Regex.Replace(s, e, e2);
Console.WriteLine(
    "Change date format from {0} to {1}", s, r);

The date-formatting regular expression just shown breaks down into three subpatterns, each with the same basic meaning, as shown in Table 10-8.

Table 10-8 Breakdown of a Date-Formatting Regular Expression

ElementDescription
(?<mm>\\d{1,2})/Capture the matched substring into a group named "mm", matching any decimal digit—of which there must be at least one and no more than two— followed by a forward slash.
${mm}-Substitute the substring matched by (?<mm>) and follow it with a dash.

The output from these additional code blocks is:

Strip trailing space: trailing
 
Modify path:
        C:\Documents and Settings\user1\Deskto p\
        C:\Documents and Settings\user2\Deskto p\
 
Strip path from filename: file.txt
 
Change date format from 03/16/57 to 16-03-57

There's also a static version of Regex.Match, which can be used under similar circumstances. For example, to find the HREF link tags in some simple HTML:

Console.WriteLine();
s = @"<html>
    <a href=""first.htm"">first text</a>
    <br>loads of other stuff
    <a href=""second.htm"">second text</a>
    <p>more<a href=""third.htm"">third text</ a>
    </html>";
e = @"<a[^>]*href\s*=\s*[""']?([^'"">]+)['""]? >";
MatchCollection mc = Regex.Matches(s, e); 
foreach (Match mm in mc)
    Console.WriteLine("HTML links: {0}", mm);

Table 10-9 shows how the regular expression you just saw breaks down.

Table 10-9 Breakdown of a Typical HTML Regular Expression

ElementDescription
<a[^>]*hrefMatch the character substring "<a" followed by zero or more instances of any characters except the ">" character, followed by the string "href",
\s*=\s*followed by zero or more instances of whitespace, followed by the "=" character, followed by zero or more instances of whitespace,
['""]?followed by zero or one instance of single or double quotes,
([^'"">]+)followed by one or more instances of any characters except the single or double quotes or the closing ">",
['""]?>followed by zero or one instance of either the single or double quotes.

The output from this additional code is:

HTML links: <a href="first.htm">
HTML links: <a href="second.htm">
HTML links: <a href="third.htm">

Finally, remember how in the "Strings" section of the chapter we used a custom method to convert a string to proper case (initial caps on each word in the string)? Here's another version that achieves the same result by using regular expressions instead of string processing:

public class RXProperCaseApp
{
    static void Main(string[] args)
    {
        string s  = "the qUEEn wAs in HER parLOr";
        Console.WriteLine("Initial String:\t{0 }", s);
 
        s = s.ToLower();
        string e = @"\w+|\W+";
        string sProper = "";
 
        foreach (Match m in Regex.Matches(s, e ))
        {
            sProper += char.ToUpper(m.Value[0] )
                + m.Value.Substring(1, m.Lengt h - 1);
        }
        Console.WriteLine("ProperCase:\t{0}",  sProper);
    }
}

This is the output:

Initial String: the qUEEn wAs in HER parLOr
ProperCase:     The Queen Was In Her Parlor

Regular Expression Options

Suppose that in a string we want to match some alternative patterns in which only the letter case differs. For instance, suppose we want to find any instance of the word "in"—or "In" or "IN" or "iN". We could use this pattern:

class RXOptionsApp
{
    public static void PrintMatches(Regex r)
    {
        Console.WriteLine();
        string s = "The KING Was In His Counting House";
        MatchCollection mc = r.Matches(s);
        for (int i = 0; i < mc.Count; i++)
        {
            Console.WriteLine(
                "Found '{0}' at position {1}",
                mc[i].Value, mc[i].Index);
        }
    }
 
    static void Main(string[] args)
    {
        Regex r = new Regex("in|In|IN|iN");     // Same
        PrintMatches(r);
    }
}

Here's the output:

Found 'IN' at position 5
Found 'In' at position 13
Found 'in' at position 25

Alternatively, we could use an overloaded Regex constructor that takes a RegexOptions enumeration value as its second parameter. For example, to get the same results as we just saw, we could use the IgnoreCase option:

r = new Regex("in", RegexOptions.IgnoreCase);

Another potentially useful RegexOption is RightToLeft:

r = new Regex("in", 
    RegexOptions.IgnoreCase | 
    RegexOptions.RightToLeft);

Given the previous behavior, the output from this version should be obvious:

Found 'in' at position 25
Found 'In' at position 13
Found 'IN' at position 5

Another feature—which is very useful if you're building complex expressions—is the ability to embed comments into a pattern by using the # delimiter. Of course, this wouldn't be much use if the Regex object then included the comments as part of the pattern to be searched. Therefore, you can construct a Regex with the RegexOptions.IgnorePatternWhitespace option—this ignores both embedded comments and any whitespace that isn't explicitly escaped:

r = new Regex(
    @"in        # this is the first pattern to  match
    |[aeiou]s    # or any vowel followed by 's '
    ", RegexOptions.IgnorePatternWhitespace);

The output follows:

Found 'as' at position 10
Found 'is' at position 17
Found 'in' at position 25
Found 'us' at position 31

Compiling Regular Expressions

One of the RegexOptions enumeration values is Compiled:

r = new Regex("in", RegexOptions.Compiled);

The default behavior of the regex engine is to compile a regular expression to a sequence of internal instructions (not MSIL), which are interpreted upon execution. On the other hand, if you construct a regex object with the regexoptions.compiled option, the engine compiles the regular expression to explicit MSIL. This option allows the .NET framework's just-in-time compiler (JITter) to convert the expression to native machine code for higher performance.

For a complex expression that's used heavily, this conversion yields faster execution—of course, it also increases startup time. Also bear in mind that by using the Compiled option, you're effectively converting state data (which would be destroyed when the Regex object is garbage collected) into code (which is removed from memory only when the application terminates). So, choose when to use this option carefully.

A related feature is the ability to explicitly compile a regular expression to an assembly that's then persisted to disk by using the Regex.CompileToAssembly method. For example, suppose we have a lengthy regular expression such as one that parses an Internet Protocol (IP) address:

class RXassemblyApp
{
    static void Main(string[] args)
    {
        string s = "123.45.67.89";
        string e = 
            @"([01]?\d\d?|2[0-4]\d|25[0-5])\." +
            @"([01]?\d\d?|2[0-4]\d|25[0-5])\." +
            @"([01]?\d\d?|2[0-4]\d|25[0-5])\." +
            @"([01]?\d\d?|2[0-4]\d|25[0-5])";
        Match m = Regex.Match(s, e);
        Console.WriteLine("IP Address: {0}", m );
        for (int i = 1; i < m.Groups.Count; i+ +)
            Console.WriteLine(
                "\tGroup{0}={1}", i, m.Groups[i]);
    }
}

This regular expression breaks down into four identical groups. Table 10-10 shows how each of these four groups breaks down.

Table 10-10 Breakdown of an IP Address Regular Expression

ElementDescription
@"([01]?\d\d?Either a 0 or 1 followed by any one or two digits,
|2[0-4]\dor a 2 followed by any digit from 0 through 4, followed by any digit,
|25[0-5])or the substring "25" followed by any digit from 0 through 5,
\.followed by a "." character.

Here's the output:

IP Address: 123.45.67.89
        Group1=123
        Group2=45
        Group3=67
        Group4=89

We can explicitly compile this to a persistent assembly. First, set up an array of RegexCompilationInfo references—we need only one of these, but we have to have an array to pass to Regex.CompileToAssembly. Set up this one instance with the regular expression pattern, any RegexOptions flags, the name you want for your assembly, and any namespace you want to use for it. The final parameter is a Boolean value that indicates whether the regular expression should be public:

RegexCompilationInfo [] rci = 
    new RegexCompilationInfo[1];
rci[0] = new RegexCompilationInfo(
    e, RegexOptions.Compiled,
    "MyRegexAssembly", "MyNamespace", true);

Then set up an AssemblyName object. The assembly cache manager uses the object for binding and retrieving information about an assembly. We need to set only one property of this object: the filename for the assembly itself. The extension .dll is assumed and will be appended automatically. Finally, pass both the RegexCompilationInfo array and the AssemblyName reference to Regex.CompileToAssembly:

AssemblyName an = new AssemblyName();
an.Name = "MyAss";
Regex.CompileToAssembly(rci, an);

When you run this code, you'll find that a new file named MyAss.dll has been created in the same location as the target for this current assembly, which is normally ..\bin\debug. If we examine the metadata for this assembly, shown in Figure 10-3, we'll see that it contains three classes: MyRegexAssembly (via the third parameter to the RegexCompilationInfo constructor) derived from Regex, MyRegexAssemblyFactory derived from RegexRunnerFactory, and MyRegexAssemblyRunner derived from RegexRunner.

Click to view graphic
Click to view graphic

Figure 10-3 Metadata for compiled regular expression.

We could then use this customized derived MyRegexAssembly class in another project. In this example, I've added MyAss.dll as a reference to the new project:

using System;
using System.Text.RegularExpressions;
using MyNamespace;
 
    class TestAssemblyApp
    {
        static void Main(string[] args)
        {
            string s = "123.45.67.89";
            MyRegexAssembly r = new MyRegexAss embly();
            Match m = r.Match(s);
            Console.WriteLine("IP Address: {0} ", m);
            for (int i = 1; i < m.Groups.Count ; i++)
                Console.WriteLine(
                    "\tGroup{0}={1}", i, m.Groups[i]);
        }
    }

Summary

In this chapter, we examined two primary classes for processing strings, String and Regex, plus a range of ancillary classes that modify and support string operations. We explored the use of the String class methods for searching, sorting, splitting, joining, and otherwise returning modified strings. We also saw how many other classes in the .NET Framework support string processing—including Console, the basic numeric types, and DateTime—and how culture information and character encoding can affect string formatting. Finally, we saw how the system performs sneaky string interning to improve runtime efficiency.

In the second part of this chapter, we looked at Regex and its supporting classes—Match, Group, and Capture—for encapsulating regular expressions. We explored both pattern searching and string modifying through the set of Regex instance and static methods, and we examined the use of RegexOptions to modify the behavior of the operation. Finally, we saw how we can compile regular expressions to assemblies as a code management strategy.

Clearly, there's some overlap in functionality between strings and regular expressions. String-based code is probably simpler and easier to maintain, while Regex-based code will generally be much more flexible and powerful. In many situations, you'll find that a judicious mixture of both is the best approach.


Previous   |  Table of Contents   |  Next



Last Updated: April 8, 2002
Top of Page