Printer Friendly Version      Send     
Click to Rate and Give Feedback
Also by this Author

When designing Windows PowerShell, Microsoft gave security a leading role. And it shows. Take a look at some of the key security features and settings, such as credential handling and execution policies, that ensure Windows PowerShell won’t fall game to the same problems that VBScript enabled.

Don Jones

TechNet Magazine September 2007

...

Read more!

Ever wish Windows PowerShell would launch with a work environment tailored to your needs? Don Jones demonstrates how you can use profiles to customize the Windows PowerShell shell.

Don Jones

TechNet Magazine October 2008

...

Read more!

Windows PowerShell provides many features that often get overlooked. By taking a methodical approach to exploring Windows PowerShell cmdlets, you will unearth some powerful capabilities that you haven't yet noticed.

Don Jones

TechNet Magazine December 2007

...

Read more!

Today, the likelihood of a homogeneous network has become increasingly remote. It’s in your interest not to limit yourself to a single platform. Instead, you can be known as the IT guy who can do whatever needs to be done—whether it be supporting Mac or Windows. Don Jones teaches what you need to know to set up a Mac on your Windows network, troubleshoot network problems, share files and folders among Macs and Windows-based systems, and configure Macs to use your network printers.

Don Jones

TechNet ...

Read more!

An overview of signing your scripts for improved security.

Don Jones

TechNet Magazine April 2008

...

Read more!

Popular Articles

Traditional copy and paste works well enough for simple tasks, but for advanced functionality, you need Robocopy. But what if you're not a fan of the command line? Enter Robocopy GUI.

Joshua Hoffman

TechNet Magazine November 2006

...

Read more!

Your users are complaining that a server is running poorly—do you know where to look to diagnose the problem? PerfMon can be an indispensible tool for this as it has numerous diagnostic capabilities. Get an overview of the key indicators you should use to diagnose a variety of common bottlenecks that can slow down your servers.

Steven Choy

TechNet Magazine August 2008

...

Read more!

This month we continue our in-depth discussion about what’s new in the Windows Vista kernel. In this issue, we review some advancements in how Windows Vista manages memory and explore the areas of system startup, shutdown, and power management.

Mark Russinovich

TechNet Magazine March 2007

...

Read more!

How do you secure the desktop against malicious ActiveX controls without limiting application compatibility? We’ll take you on a tour of the ActiveX Installer Service (AxIS) in Windows Vista that addresses this issue with a new way to manage ActiveX controls.

Rob Campbell and Joel Yoker

TechNet Magazine July 2007

...

Read more!

See how this free utility can help you determine whether problems you are experiencing while running Windows are caused by faulty memory.

Lance Whitney

TechNet Magazine September 2008

...

Read more!

Our Blog

Want to be proactive about preventing data loss? Enterprise rights management is the way to go. Discover how   the Microsoft Enterprise Rights Management solution can protect your organization's information from unauthorized access and use.

Read more!

The complexity of systems today makes troubleshooting more difficult than ever. It’s harder to keep track of what has changed, when, and how that relates to everything else. If something goes wrong, chances are good that users will have a hard time figuring out what has changed on their computers, whether due to updates, new software, ...

Read more!

In the era of Software-plus-Services, Web services play an important role because they decouple hosted back-end environments from on-premise front-end applications. Explore  how the SharePoint platform drives online collaboration, so that Office applications and add-ins can run on local workstations while Web ...

Read more!

Did you just go “whoa!” when you saw that title? Bet you never thought you’d see that here! But the times, as they say, are a-changing and we have too. Many organizations today use multiple operating systems. We all have the same goal of providing high-quality, cost-effective IT services to the organization, and one way we can do that is by sharing core software infrastructure like Active Directory. ...

Read more!

Have you ever wondered why you sometimes get a message that says Windows can't replace a file that is currently in use? In the November 2008 issue of TechNet Magazine, Raymond Chen looks at the underlying reason for this and explains why it would be more accurate to say Windows can but has chosen not to ...

Read more!

Windows PowerShell Writing Regular Expressions
Don Jones


192.168.4.5. \\Server57\Share. johnd@contoso.com. You no doubt recognize these three items as an IP address, a Universal Naming Convention (UNC) path, and an e-mail address. Your brain recognizes their formats. Four groupings of digits, backslashes, the @ symbol, and other cues indicate what types of data these strings of
characters represent. With little thought, you can quickly recognize that 192.168 on its own isn't a valid IP address, that 7\\Server2\\Share isn't a valid UNC, and that joe@contoso isn't a valid e-mail address.
Computers, unfortunately, have to work a bit harder in order to "understand" complicated formats like these. That's where regular expressions come into play. A regular expression is a string, written using a special regular expression language, that helps a computer identify strings that are of a particular format—such as an IP address, a UNC, or an e-mail address. A well-written regular expression has the ability to allow a Windows PowerShellTM script to accept as valid or reject as invalid data that does not conform to the format you've specified.

Making a Simple Match
The Windows PowerShell –match operator compares a string to a regular expression, or regex, and then returns either True or False depending on whether the string matches the regex. A very simple regex doesn't even need to contain any special syntax—literal characters will suffice. For example:
"Microsoft" –match "soft"
"Software" –match "soft"
"Computers" –match "soft"
When run in Windows PowerShell, the first two expressions return True while the third returns False. In each, a string is followed by the –match operator, which is followed by a regex. By default, a regex will float across a string to find a match. The characters "soft" can be found within both Software and Microsoft, but at different positions. Also notice that, by default, a regex is case-insensitive—"soft" is found in "Software" despite its capital S.
But if necessary, a different operator, –cmatch, offers case-sensitive regex comparison, like so:
"Software" –cmatch "soft"
This expression returns False since the string "soft" doesn't match "Software" in a case-sensitive comparison. Note that the –imatch operator is also available as an explicit case-insensitive option, despite that being the default behavior of –match.

Wildcards and Repeaters
A regex can contain a few wildcard characters. A period, for example, matches one instance of any character. A question mark matches zero or one instance of any character. Here are some examples to illustrate:
"Don" –match "D.n" (True)
"Dn" –match "D.n" (False)
"Don" –match "D?n" (True)
"Dn" –match "D?n" (True)
In the first expression, the period stands in for exactly one character, so the match is True. In the second expression, the period doesn't find the one character that it requires to be included, and so the match is False. The question mark, as shown in the third and fourth expressions, can match a single unknown character or no character at all. Finally, in the fourth example, the match is True because both "D" and "n" are found without a character between them. Thus, the question mark can be thought of as standing for an optional character, so the match is still True even if no character appears in that position.
A regex also recognizes the * and + symbols as repeaters. These need to follow some character or characters. The * matches zero or more of the specified characters, while the + matches one or more of the specified characters. Here are some examples:
"DoDon" –match "Do*n" (True)
"Dn" -match "Do*n" (True)
"DoDon" -match "Do+n" (True)
"Dn" -match "Do+n" (False)
Notice that both * and + are matching "Do", not just the "o". That's because these repeaters are designed to match a series of characters, not just one character.
What if you need to match the period, *, ?, or + symbols themselves? You simply precede them with a backslash, which is the regex escape character:
"D.n" -match "D\.n" (True)
Notice that this is different from the Windows PowerShell escape character (the backward apostrophe), but it follows industry-standard regex syntax.

Character Classes
A character class is a broader form of wildcard, representing an entire group of characters. Windows PowerShell recognizes quite a few character classes. For instance:
  • \w matches any word character, meaning letters and numbers.
  • \s matches any white space character, such as tabs, spaces, and so forth.
  • \d matches any digit character.
There are also negative character classes: \W matches any non-word character, \S matches non-white space characters, and \D matches non-digits. These classes can be followed by * or + to indicate that multiple matches are acceptable. Here are some examples:
"Shell" -match "\w" (True)
"Shell" -match "\w*" (True)
Cmdlet of the Month
The Write-Debug cmdlet is very handy for writing objects (such as text strings) to the Debug pipeline. Trying this cmdlet in the shell can be somewhat disappointing, though, because it doesn't look like the cmdlet is doing anything.
The trick is that the Debug pipeline is shut off by default—the $DebugPreference variable is set to "SilentlyContinue." Set it to "Continue," however, and everything you send with Write-Debug will appear at the console in yellow text. This is a perfect way to add trace code to your scripts, allowing you to follow the execution of a complex script. The yellow color helps you distinguish between trace about and the script's normal output, and you can shut off the debug messages at any time without having to remove all the Write-Debug statements. Simply set $DebugPreference = "SilentlyContinue" again and the debug text will be suppressed.

Though both expressions return True, they're matching significantly different things. Fortunately, there's a way to see what the –match operator is thinking under the hood: each time a match is made, a special variable called $matches is populated with the results of the match—that is, whatever characters in the string the operator matched against your regex. The $matches variable retains its results until another positive match is made using the –match operator. Figure 1 shows the difference between the two expressions I just showed you. As you can see, \w matched the "S" in "Shell", while the repeating \w* matched the entire word.
Figure 1 What a difference a * can make (Click the image for a larger view)

Character Groups, Ranges, and Sizes
A regex can also contain groups or ranges of characters, enclosed in square brackets. For example, [aeiou] means that any one of the included characters—a, e, i, o, or u—is an acceptable match. [a-zA-Z] indicates that any letter in the range a-z or A-Z is acceptable (although if you're using the non-case-sensitive –match operator, just a-z or A-Z on its own would be sufficient). Here's an example:
"Jeff" -match "J[aeiou]ff" (True)
"Jeeeeeeeeeeff" -match "J[aeiou]ff" (False)
You can also specify a minimum and maximum number of characters using curly braces. {3} indicates that you want exactly three of the specified character, {3,} means that you want at least three or more, and {3,4} indicates that you want at least three but no more than four. This is an ideal way to create a regex for IP addresses:
"192.168.15.20" -match "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}" (True)
This regex wants four groups of digits with one to three digits each, all of which are separated by a literal period. But consider this example:
"300.168.15.20" -match "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}" (True)
This shows the limitations of a regex. While the formatting of this string looks like an IP address, it obviously isn't a valid IP address. A regex can't determine whether data is actually valid; it can only determine whether the data looks right.

Stop the Float
Troubleshooting a regex can be tricky. For example, here's a regex that tests for a UNC path in the format \\Server2\Share:
"\\Server2\Share" -match "\\\\\w+\\\w+" (True)
Here, the regex itself is difficult to read because every literal backslash I want to test for has to be escaped with a second backslash. Though this seems to work fine, it really doesn't:
"57\\Server2\Share" -match "\\\\\w+\\\w+" (True)
This second example is clearly (to you and me, at least) not a valid UNC path, but the regex gave it the all clear. Why? Remember that a regex will float by default. This regex merely looks for two backslashes, one or more letters and numbers, another backslash, and more letters and numbers. That pattern exists in the string—along with the extra digits at the start, which make it an invalid UNC. The trick is to tell the regex to start matching at the beginning of the string, without floating. I can do that like this:
"57\\Server2\Share" -match "^\\\\\w+\\\w+" (False)
The ^ character indicates that this is the location where the string begins. With that addition, the invalid UNC path fails because the regex is looking for the first two characters to be backslashes, and in this case they aren't.
Similarly, the $ symbol can be used to indicate the end of a string. This wouldn't be very useful in the case of a UNC path since a UNC path can contain additional path segments, such as \\Server2\Share\Folder\File, for example. However, I'm sure there are many cases where you would want to specify the end of a string.

Help with Regular Expressions
In Windows PowerShell, the about_regular_expressions help topic provides basic syntax assistance for the regex language, but online resources can provide even more information. For instance, one of my favorite Web sites, www.RegExLib.com, offers a free library of regular expressions that have been written for various purposes and contributed to by the public. You can search on keywords, such as "e-mail" or "UNC," to quickly locate a regex that suits your need—or at least provides a good starting point. If you manage to create a great regex, you can contribute it to the library so others can make use of it, too.
I also like RegexBuddy (www.RegexBuddy.com). This is an inexpensive tool that provides a graphical regex editor. RegexBuddy makes it easier to assemble a complex regex, and this tool also makes it easier to test a regex to ensure that it is properly accepting valid strings and rejecting invalid ones. A number of other software developers have also created free, shareware, and commercial regex editors and testers that users will surely find to be useful.

Using Regular Expressions
You may be wondering why you would use a regex in real life. Imagine you're reading information from a CSV file and using the information to create new users in Active Directory®. If the CSV file is generated by someone else, you'll want to validate that the data in it looks right. A regex is perfect for this task. A simple regex like \w+, for example, can confirm that first and last names don't contain any special characters or symbols, while something more complicated can confirm that the e-mail addresses conform to your corporate standard. For example, you could use this:
"^[a-z]+\.[a-z]+@contoso.com$"
This regex requires an e-mail address in the form don.jones@contoso.com, where the first and last names can only contain letters, and where they must be separated by a period. E-mail addresses, by the way, are the trickiest strings to write a regex for. If you can narrow your scope down to a specific corporate standard, you'll have an easier time of it.
Don't forget those start and end anchors (^ and $), which ensure that nothing follows contoso.com and also that nothing precedes the characters that make up the user's first name.
Actually using this regex in Windows PowerShell is pretty easy. Assuming the variable $email contains the e-mail address you read from the CSV file, something like this will check to see whether it's valid or not:
$regex = "^[a-z]+\.[a-z]+@contoso.com$"
If ($email –notmatch $regex) {
  Write-Error "Invalid e-mail address $email" 
}
And in this example you've learned a new operator. -notmatch returns True if the string doesn't match the provided regex. (There is also a –cnotmatch for case-sensitive comparisons.)
There's a lot more about regular expressions I haven't covered here—additional character classes, more advanced operations, and even an operator or two. And then there's the [regex] object type that Windows PowerShell supports. However, what I have covered in this quick overview of regex syntax should be enough to get you started. Feel free to visit me anytime at www.ScriptingAnswers.com if you need help puzzling through an especially tricky regex.

Don Jones is a contributing editor for TechNet Magazine and is the coauthor of Windows PowerShell: TFM (SAPIEN Press). He teaches Windows PowerShell (www.ScriptingTraining.com) and can be reached through the ScriptingAnswers.com Web site.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.
Page view tracker