Windows PowerShell in Action: Working With Text and Files in Windows PowerShell (Part 3)

XML Processing in PowerShell



PowerShell is Microsoft's next-generation command line and scripting solution. It combines the interactive capabilities of traditional shells such as bash or zsh with the programmability of scripting languages such as Perl or Ruby. Because PowerShell is based on .NET, it's capable of doing things in a shell environment that were previously only possible in languages such as Visual Basic, VBScript, or C#.

As with any scripting language, one of the most important domains for PowerShell is the ability to work with strings and files (both text and binary). This series of articles is based on chapter 10 of Windows PowerShell in Action from Manning Publications. Chapter 10 examines how PowerShell handles text and file processing tasks, illustrating how to process and parse text using string objects and regular expressions. It also shows how to deal with paths and how to manipulate binary files. Another significant area covered in this chapter is how to work with XML. XML has become increasingly important both in the IT field and in software development. We show how to search, manipulate, and create XML documents using PowerShell.

Part 1 of this series looked at how to process, search, and manipulate strings and unstructured text using the .NET string object and regular expressions. In Part 2 author Bruce Payette, a technical lead on the Windows PowerShell team, focused on file processing.

Part 3 of this series looks at working with XML in PowerShell. It covers PowerShell’s XML object adapter, how to work with the .NET XmlDocument and XmlReader classes, and, finally, how to navigate through a document. It also looks at the commands for saving data to and retrieving data from XML files.

*
On This Page
Part 3: XML Processing in PowerShellPart 3: XML Processing in PowerShell
Using XML as ObjectsUsing XML as Objects
Adding Elements to an XML ObjectAdding Elements to an XML Object
Loading and Saving XML FilesLoading and Saving XML Files
Using XML in a PipelineUsing XML in a Pipeline
Using XPathUsing XPath
The Import-Clixml and Export-Clixml CmdletsThe Import-Clixml and Export-Clixml Cmdlets

Part 3: XML Processing in PowerShell

XML (eXtensible Markup Language) is becoming more and more important in the computing world. XML is being used for everything from configuration files to log files to databases. PowerShell itself uses XML for its type and configuration files as well as for the help files. Clearly, for PowerShell to be effective, it has to be able to deal with XML documents. Let's take a look at how XML is used and supported in PowerShell.

Note. This article assumes some basic knowledge of XML markup.

We'll look at the XML object type as well as the mechanism that .NET provides for searching XML documents.

Top of pageTop of page

Using XML as Objects

PowerShell supports XML documents as a primitive data type. This means that you can access the elements of an XML document as though they were properties on an object. For example, let's create a simple XML object. We'll start with a string that defines a top-level node called "top". This node contains three descendants "a", "b", and "c" each of which has a value. Let's turn this string into an object:

PS (1) > $d = [xml] "<top><a>one</a><b>two</b><c>3</c></top>"

The [xml] cast takes the string and converts it into an XML object of type System.XML.XmlDocument. This object is then adapted by PowerShell so you can treat it like a regular object. Let's try this out. First we'll just display the object:

PS (2) > $d

top
---
top

As we expect, the object displays one top-level property corresponding to the top-level node in the document. Now let's see what properties this node contains:

PS (3) > $d.a
PS (4) > $d.top

a                    b                    c
-                    -                    -
one                  two                  3

We see three properties that correspond to the descendents of top. We can use conventional property notation to look at the value of an individual member:

PS (5) > $d.top.a
one

However, we can't change the value of this node. The system presents the objects as having only read-only members.

PS (6) > $d.top.a = 13
Cannot set "a" because only strings can be used as values to set
 XmlNode properties.
At line:1 char:8
+ $d.top.a <<<< = 13
PS (7) > $d.top.c
3

All of the normal type conversions apply, of course. The node c contains a string value that is a number.

PS (8) > $d.top.c.gettype().FullName
System.String

We can add this field to an integer, which will cause it to be converted into an integer.

PS (9) > 2 + $d.top.c
5

Now, since we can't simply assign to elements in an XML document, we'll dig a little deeper into the [xml] object and see how we can add elements.

Top of pageTop of page

Adding Elements to an XML Object

Let's add an element "d" to this document. To do this, we need to use the methods on the XML document object. First we have to create the new element:

PS (10) > $elem = $d.CreateElement("d")

In text, what we've created looks like "<d></d>". The tags are there, but it's empty. Let's set the element text—the "inner text."

PS (11) > $elem.set_InnerText("Hello")

#text
-----
Hello

Notice that we're using the property setter method here. This is because the XML adapter hides the basic properties on the XmlNode object. The other way to set this would be to use the PSBase member like we did with the hashtable example earlier in this chapter.

PS (12) > $ne = $d.CreateElement("e")
PS (13) > $ne.psbase.InnerText = "World"
PS (14) > $d.top.AppendChild($ne)

#text
-----
World

Now let's look at the revised object.

PS (15) > $d.top

a : one
b : two
c : 3
d : Hello
e : World

We see that the document now has five members instead of the original three. But what does the string look like now? It would be nice if we could simply case the document back to a string and see what it looks like:

PS (16) > [string] $d
System.Xml.XmlDocument

Unfortunately, as you can see, it isn't as simple as this. Instead, we'll save the document as a file and then display it:

PS (17) > $d.save("c:\temp\new.xml")
PS (18) > type c:\temp\new.xml
<top>
  <a>one</a>
  <b>two</b>
  <c>3</c>
  <d>Hello</d>
  <e>World</e>
</top>

The result is a nicely readable text file. Now that we know how to add children to a node, how can we add attributes? The pattern is basically the same as with elements. First we create an attribute object.

PS (19) > $attr = $d.CreateAttribute("BuiltBy")

Next we set the value of the text for that object. Again we use the psbase member to bypass the adaptation layer.

PS (20) > $attr.psbase.Value = "Windows PowerShell"

And finally we add it to the top level document

PS (21) > $d.psbase.DocumentElement.SetAttributeNode($attr)

#text
-----
Windows PowerShell

Let's look at the top node once again.

PS (22) > $d.top

BuiltBy : Windows PowerShell
a       : one
b       : two
c       : 3
d       : Hello
e       : World

Now we see that the attribute has been added. Let's save the document:

PS (23) > $d.save("c:\temp\new.xml")

Then retrieve the file. You can see how the attribute has been added to the top node in the document.

PS (24) > type c:\temp\new.xml
<top BuiltBy="Windows PowerShell">
  <a>one</a>
  <b>two</b>
  <c>3</c>
  <d>Hello</d>
</top>
PS (25) >

We constructed, edited, and saved XML documents, but we haven't loaded an existing document yet, so let's do that now.

Top of pageTop of page

Loading and Saving XML Files

At the end of the previous section, we saved an XML document to a file. Now let's read it back:

PS (1) > $nd = [xml] [string]::join("`n",
>> (gc –read 10kb c:\temp\new.xml))
>>

Here's what we're doing. We use the Get-Content cmdlet to read the file; however, it comes back as a collection of strings when what we really want is one single string. To merge the strings into one, we use the [string]::Join() method. Once we have the single string, we cast the whole thing into an XMLdocument.

Performance Tip: By default, Get-Content reads one record at a time. This can be quite slow. When processing large files, you can use the –ReadCount parameter to specify larger sizes. This will cause blocks of records to be written to the pipeline instead of writing one record at a time.

Let's verify that the document was read properly by dumping out the top-level node and then the child nodes.

PS (2) > $nd

top
---
top

PS (3) > $nd.top

BuiltBy : Windows PowerShell
a       : one
b       : two
c       : 3
d       : Hello

All is as it should be; even the attribute is there.

Now – as noted, while this is a simple approach and the one we'll use most often, it's not necessarily the most efficient approach, because it requires loading the entire document into memory. For large documents, or collections of many documents, this may become a problem. In the next section, we'll look at some alternative approaches that, while more complex, are more memory efficient.

Example: The Dump-Doc Function

The previous method we looked at for loading an XML file is simple but not very efficient. It requires that you load the file into memory, make a copy of the file while turning it into a single string, and then create an XML document representing the entire file but with all of the overhead of the XML DOM format. A more space-efficient way to process XML documents is to use the XML reader class. This class streams through the document one element at a time instead of loading the whole thing into memory. We're going to write a function that will use the XML reader to stream through a document and output it properly indented. An XML pretty-printer if you will. Here's what we want the output of this function to look like when it dumps its built-in default document:

PS (1) > dump-doc
<top BuiltBy = "Windows PowerShell">
    <a>
       one
    </a>
    <b>
       two
    </b>
    <c>
       3
    </c>
    <d>
       Hello
    </d>
</top>

Now let's test our function on a more complex document where there is more nesting and more attributes:

@'
<top BuiltBy = "Windows PowerShell">
    <a pronounced="eh">
       one
    </a>
    <b pronounced="bee">
        two
    </b>
    <c one="1" two="2" three="3">
       <one>
           1
       </one>
       <two>
           2
       </two>
       <three>
           3
       </three>
       </c>
       <d>
           Hello there world
       </d>
</top>
'@ > c:\temp\fancy.xml

When we run the function, we see

PS (2) > dump-doc c:\temp\fancy.xml
<top BuiltBy = "Windows PowerShell">
    <a pronounced = "eh">
       one
    </a>
    <b pronounced = "bee">
       two
    </b>
    <c one = "1"two = "2"three = "3">
       <one>
           1
       </one>
       <two>
           2
       </two>
       <three>
           3
       </three>
       </c>
       <d>
           Hello there world
       </d>
</top>

Which is pretty close to the original document. The code for the Dump-Doc function is shown below:

function Dump-Doc ($doc="c:\temp\new.xml")
{
    $settings = new-object System.Xml.XmlReaderSettings 
    $doc = (resolve-path $doc).ProviderPath
    $reader = [xml.xmlreader]::create($doc, $settings) 
    $indent=0
    function indent ($s) { " "*$indent+$s } 
    while ($reader.Read())
    {
        if ($reader.NodeType -eq [Xml.XmlNodeType]::Element) 
        {
            $close = $(if ($reader.IsEmptyElement) { "/>" } else { ">" })
            if ($reader.HasAttributes) 
            {
                $s = indent "<$($reader.Name) "
                [void] $reader.MoveToFirstAttribute() 
                Do 
                { 
                    $s += "$($reader.Name) = `"$($reader.Value)`"" 
                } 
                while ($reader.MoveToNextAttribute()) 
                "$s$close" }
            else
            {
                indent "<$($reader.Name)$close"
            }
            if ($close -ne '/>') {$indent++} 
        }
        elseif ($reader.NodeType -eq [Xml.XmlNodeType]::EndElement )
        {
            $indent--
            indent "</$($reader.Name)>" 
        }
        elseif ($reader.NodeType -eq [Xml.XmlNodeType]::Text)
        {
            indent $reader.Value 
        }
    }
    $reader.close() 
}

This is a complex function, so let's go through it one piece at a time. We start with the basic function declaration, where it takes an optional argument that names a file. Next we'll create the settings object that we need to pass in when we create the XML reader object. We also need to resolve the path to the document, because the XML reader object requires an absolute path. Now we can create the XmlReader object itself. The XML reader will stream through the document, reading only as much as it needs, as opposed to reading the entire document into memory.

We want to indent the levels of the document so we'll initialize an indent level counter and a local function to display the indented string. Now we'll read through all of the nodes in the document. We'll choose different behavior based on the type of the node. An element node is the beginning of an XML element. If the element has attributes then we'll add them to the string to display. We'll use the MoveToFirstAttribute()/MoveToNextAttribute() methods to move through the attributes. If there were no attributes, just display the element name. At each new element, increase the indent level if it's not an empty element tag. If it's the end of an element, decrease the indent level and display the closing tag. If it's a text element, just display the value of the element. Finally close the reader. We always want to close a handle received from a .NET method. It will eventually be discarded during garbage collection, but it's possible to run out of handles before you run out of memory.

This example illustrates the basic techniques for using an XML reader object to walk through an arbitrary document. In the next section, we'll look at a more specialized application.

Example: The Select-Help Function

Now let's work with something more useful. The PowerShell help files are stored as XML documents. We want to write a function that scans through the command file, searching for a particular word in either the command name or the short help description. Here's what we want the output to look like:

PS (1) > select-help property
Clear-ItemProperty: Removes the property value from a property.
Copy-ItemProperty: Copies a property between locations or namespaces.
Get-ItemProperty: Retrieves the properties of an object.
Move-ItemProperty: Moves a property from one location to another.
New-ItemProperty: Sets a new property of an item at a location.
Remove-ItemProperty: Removes a property and its value from the location.
Rename-ItemProperty: Renames a property of an item.
Set-ItemProperty: Sets a property at the specified location to a
specified value.
PS (2) >

In the example, we're searching for the word property and we get a list of all of the cmdlets that work with properties. The output is a string that contains the property name and a description string. Next let's look at a fragment of the document we're going to process:

<command:details>
    <command:name>
        Add-Content
    </command:name>
    <maml:description>
        <maml:para>
            Adds to the content(s) of the specified item(s)
        </maml:para>
    </maml:description>
    <maml:copyright>
        <maml:para></maml:para>
    </maml:copyright>
    <command:verb>add</command:verb>
    <command:noun>content</command:noun>
    <dev:version></dev:version>
</command:details>

PowerShell help text is stored in MAML (Microsoft Assistance Markup Language) format. From simple examination of this fragment, we can see that the name of a command is stored in the command:name element and the description is stored in a maml:para element nested inside a maml:description element. The basic approach we'll use is to look for the command tag, extract and save the command name, and then capture the description in the description element that immediately follows the command name element. This means that we'll use a state-machine pattern to process the document. A state machine usually implies using the switch statement, so this example is also a good opportunity to use the control structures in the PowerShell language a bit more. Now let's look at the function:

function Select-Help ($pat = ".") 
{
    $cmdHelp = "Microsoft.PowerShell.Commands.Management.dll-Help.xml"
    $doc = "$PSHOME\$cmdHelp" 
    $settings = new-object System.Xml.XmlReaderSettings
    $settings.ProhibitDTD = $false
    $reader = [xml.xmlreader]::create($doc, $settings) 
    $name = $null 
    $capture_name = $false 
    $capture_description = $false 
    $finish_line = $false 
    while ($reader.Read())
        {
            switch ($reader.NodeType)
            {
                ([Xml.XmlNodeType]::Element) { 
                    switch ($reader.Name)
                    {
                        "command:name" { 
                        $capture_name = $true
                        break
                    }
                    "maml:description" { 
                        $capture_description = $true
                        break
                    }
                    "maml:para" { 
                        if ($capture_description)
                        {
                            $finish_line = $true;
                        }
                    }
                }
                break
            }
            ([Xml.XmlNodeType]::EndElement) { 
            if ($capture_name) { $capture_name = $false }
            if ($finish_description)
            {
                $finish_line = $false
                $capture_description = $false
             }
            break
        }
        ([Xml.XmlNodeType]::Text) {
            if ($capture_name) 
            {
                $name = $reader.Value.Trim() 
            }
            elseif ($finish_line -and $name)
            {
                $msg = $name + ": " + $reader.Value.Trim()
                if ($msg -match $pat) 
                {
                    $msg
                }
                    $name = $null
                }
                break
                }
            }
        }
    $reader.close() 
}

Once again, this is a long piece of code, so let's work through it a piece at a time. The $pat parameter will contain the pattern to search for. If no argument is supplied, the default argument will match everything. Next we set up the name of the document to search in the PowerShell installation directory. Then we create the XmlReader object as in the previous examples.

Since we're using a state machine, we need to set up some state variables. The $name variable will be used to hold the name of the cmdlet, and the others will hold the state of the processing. We'll read through the document, one node at a time, and switch on the node type. Unrecognized node types are ignored.

First we'll process the Element nodes. We'll use a nested switch statement to perform different actions based on the type of element. Finding a "command:name" element starts the matching process. When we see a "maml:description" element. We want to capture the beginning of the MAML description field, so we indicate that we want to capture the description. When we see the "maml:para" element), we need to handle the embedded paragraph in the description element. In the end tag of an element, we'll reset some of the state variables if they've been set. And finally, we need to extract the information we're interested in out of the element. We've captured the cmdlet name out of the element, but we want to remove any leading and trailing spaces so we'll use the [string] Trim() method. Now we have both the cmdlet name and the description string. If it matches the pattern the caller specified, output it. Again, the last thing to do is to close the XML reader so we don't waste resources.

But where are the pipelines, we ask? Neither of the previous two examples has taken advantage of PowerShell's pipelining capability. In the next section we'll remedy this omission.

Top of pageTop of page

Using XML in a Pipeline

Pipelining is one of the signature characteristics of shell environments in general and PowerShell in particular. Since the previous examples didn't take advantage of this feature, we'll look at how it can be applied now. We're going to write a function that scans all of the PowerShell help files, both the text and the XML files. For example, let's search for all of the help topics that mention the word "scriptblock."

PS (1) > search-help scriptblock
about_Display
about_Types
Get-Process
Group-Object
Measure-Command
Select-Object
Trace-Command
ForEach-Object
Where-Object

This tool provides a simple, fast way to search for all of the help topics that contain a particular pattern:

function Search-Help
{
    param ($pattern = $(throw "you must specify a pattern"))
    select-string -list $pattern $PSHome\about*.txt | 
        %{$_.filename -replace '\..*$'} 

    dir $PShome\*dll-help.*xml | 
        %{ [xml] (get-content -read -1 $_) } | 
        %{$_.helpitems.command} | 
        ? {$_.get_Innertext() -match $pattern} | 
        %{$_.details.name.trim()} 
}

This function takes one parameter to use as the pattern to search for. We're using the throw keyword to generate an error if the parameter wasn't provided.

First we search all of the text files in the PowerShell installation directory and return one line for each matching file. Then we pipe this line into foreach-object (or its alias % in this case) to extract the base name of the file using the replace operator and a regular expression. This will list the file names in a form that you can plug back into get-help.

Now get a list of the XML help files and turn each file into an XML object. We specify a read count of -1, so the whole file is read at once. We extract the command elements from the XML document and then see whether the text of the command contains the pattern we're looking for. If so, omit the name of the command, trimming off unnecessary spaces.

As well as being a handy way to search help, this function is a nice illustration of using the divide-and conquer strategy when writing scripts in PowerShell. Each step in the pipeline brings you incrementally closer to the final solution.

Now that we know how to manually navigate through an XML document, let's look at some of the ways that the .NET framework provides to make this easier and more efficient.

Top of pageTop of page

Using XPath

The support for XML in the .NET Framework is rich. We can’t cover all of it, but we will cover one other thing. XML is actually a set of standards. One of these standards defines a path mechanism for searching through a document. This mechanism is called (not surprisingly) XPath. By using the .NET Framework’s XPath support, we can more quickly retrieve data from a document.

Setting up the test document

We'll work through a couple of examples using XPath, but first we need something to process. The following fragment is a string we'll use for our examples. It's a fragment of a book store inventory database. Each record in the database has the name of the author, the book title, and the number of books in stock. We'll save this string in a variable called $inventory as shown below:

$inventory = @"
    <bookstore>
        <book genre="Autobiography">
            <title>The Autobiography of Benjamin Franklin</title>
            <author>
                <first-name>Benjamin</first-name>
                <last-name>Franklin</last-name>
            </author>
            <price>8.99</price>
            <stock>3</stock>
        </book>
        <book genre="Novel">
            <title>Moby Dick</title>
            <author>
                <first-name>Herman</first-name>
                <last-name>Melville</last-name>
            </author>
            <price>11.99</price>
            <stock>10</stock>
        </book>
        <book genre="Philosophy">
            <title>Discourse on Method</title>
            <author>
                <first-name>Rene</first-name>
                <last-name>Descartes</last-name>
            </author>
            <price>9.99</price>
            <stock>1</stock>
        </book>
        <book genre="Computers">
            <title>Windows PowerShell in Action</title>
            <author>
                <first-name>Bruce</first-name>
                <last-name>Payette</last-name>
            </author>
            <price>39.99</price>
            <stock>5</stock>
        </book>
    </bookstore>
"@

Now that we have our test document created, let's look at what we can do with it.

The Get-Xpm Helper Function

To navigate through an XML document and extract information, we're going to need an object for XML document navigation. Here is the definition of a function that will create the object we need.

function get-xpn ($text)
{
    $rdr = [System.IO.StringReader] $text
    $trdr = [system.io.textreader]$rdr
    $xpdoc = [System.XML.XPath.XPathDocument] $trdr
    $xpdoc.CreateNavigator()
}

Unfortunately, we can't just convert a string directly into an XPath document. There is a constructor on this type that takes a string, but it uses that string as the name of a file to open. Consequently, the get-xpn function has to wrap the argument string in a StringReader object and then in a TextReader object, which can finally be used to create the XPathDocument. Once we have an instance of XPathDocument, we can use the CreateNavigator() method to get an instance of a navigator object.

$xb = get-xpn $inventory

And now we're ready to go. We can use this navigator instance to get information out of a document. First, let's get a list of all of the books that cost more than $9.

PS (1) > $expensive = "/bookstore/book/title[../price>9.00]"

We'll store the XPath query in the variable $expensive. Let's look at the actual query string for a minute. As you might expect from the name XPath, this query starts with a path into the document

/bookstore/book/title

This path will select all of the title nodes in the document. But, since we only want some of the titles, we extend the path with a qualification. In this case

[../price>9.00]

This says only match paths where the price element is greater than 9.00. Note that a path is used to access the price element. Since price is a sibling (at the same level) as the title element, we need to specify this as

../price

This should give you a basic of idea what the query is expressing, so we won't go into any more detail. Now let's run the query using the Select() method on the XPath navigator.

PS (2) > $xb.Select($expensive) | ft value

Value
-----
Moby Dick
Discourse on Method
Windows PowerShell in Action

We run the result of the query into format-table because we're only interested in the value of the element. (Remember that what we're extracting here is just the title element.) So this is pretty simple – we can search through the database and get the titles pretty easily. How about if we want to print both the title and price? Here's one way we can do it.

Extracting Multiple Elements

To extract multiple elements from the document, first we'll have to create a new query string. This time we need to get the whole book element, not just the title element, so we can also extract the price element. Here's the new query string:

PS (3) > $titleAndPrice = "/bookstore/book[price>9.00]"

Notice that this time, since we're getting the book instead of the title, we can filter on the price element without having to use ".." to go up a path. The problem now is how do we get the pieces of data we want – the title and price? Well – the result of the query has a property called OuterXml. This property contains the XML fragment that represents the entire book element. We can take this element and cast it into an XML document as we saw earlier in this section. Once we have it in this form, we can use the normal property notation to extract the information. Here's what it looks like:

PS (4) > $xb.Select($titleAndPrice) | %{[xml] $_.OuterXml} |
>> ft -auto {$_.book.price},{$_.book.title}
>>

$_.book.price $_.book.title
------------- -------------
11.99         Moby Dick
9.99          Discourse on Method
39.99         Windows PowerShell in Action

The call to Select() is like what we saw earlier. Now we take each object and process it using the foreach-object cmdlet. First we take the current pipeline object, extract the OuterXml string, then cast that string into an XML document, and pass that object through to the format-table cmdlet. We use scriptblocks in the field specification to extract the information we want to display.

Performing Calculations on Elements

Let’s look at one final example. We want to get total price of all of the books in the inventory. This time we'll use a slightly different query.

descendant::book

This query selects all elements that have a descendent element titled book. This is a more general way of selecting elements in the document. We'll pipe these documents into foreach-object. Here we'll specify scriptblocks for each of the begin, process, and end steps in the pipeline. In the begin scriptblock, we'll initialize $t to zero to hold the result. In the foreach scriptblock, we convert the current pipeline object into an [xml] object as we saw in the previous example. Then we get the price member, convert it into a [decimal] number, multiply it by the number of books in stock, and add the result to the total. The final step is to display the total in the end scriptblock. Here's what it looks like when it's run:

PS (5) > $xb.Select("descendant::book") | % {$t=0} `
>>     {
>>         $book = ([xml] $_.OuterXml).book
>>         $t += [decimal] $book.price * $book.stock
>>     } `
>>     {
>>         "Total price is: `$$t"
>>     }
>>
Total price is: $356.81

Having looked at building an XML path navigator on a stream, can we use XPath on an XML document itself? The answer is yes. In fact, it can be much easier than what we've seen previously. First let's convert our inventory into an XML document.

PS (6) > $xi = [xml] $inventory

The variable $xi now holds an XML document representation of the bookstore inventory. Let's select the genre attribute from each book:

PS (7) > $xi.SelectNodes("descendant::book/@genre")

#text
-----
Autobiography
Novel
Philosophy
Computers

This query says "select the genre attribute (indicated by the @) from all descendent elements called book." Now let's revisit another example from earlier in this section and display the books and prices again.

PS (8) > $xi.SelectNodes("descendant::book") |
>> ft -auto price, title
>>

price title
----- -----
8.99  The Autobiography of Benjamin Franklin
11.99 Moby Dick
9.99  Discourse on Method
39.99 Windows PowerShell in Action

This is simpler than the earlier example because SelectNodes() on an XmlDocument returns XmlElement objects, which PowerShell adapts and presents as regular objects. With the XPathNavigator.Select() method, we're returning XPathNavigator nodes, which aren't adapted automatically. As we can see, working with the XmlDocument object is the easiest way to work with XML in PowerShell, but there may be times when you need to use the other mechanisms, either for efficiency reasons (XmlDocument loads the entire document into memory) or because you're adapting code from another language.

In this section, we've demonstrated how you can use the XML facilities in the .NET framework to create and process XML documents. As the XML format is used more and more in the computer industry, these features will become critical. We've only scratched the surface of what is available in the .NET framework. We've only covered some of the XML classes and a little but of the XPath query language. We haven't discussed how to use XSLT, the eXtensible Stylesheet Language Transformation language that is part of the System.Xml.Xsl namespace. All of these tools are directly available from within the PowerShell environment. In fact, the interactive nature of the PowerShell environment makes it an ideal place to explore, experiment, and learn about XML.

Top of pageTop of page

The Import-Clixml and Export-Clixml Cmdlets

The last topic we're going to cover on XML are the cmdlets for importing and exporting objects from PowerShell. These cmdlets provide a way to save and restore collections of objects from the PowerShell environment. Let's take a look at how things are serialized.

Note. Serialization is the process of saving an object or objects to a file or a network stream. The components of the objects are stored as a series of pieces, hence "serialization." PowerShell uses a special type of "lossy" serialization where the basic shape of the objects is preserved but not all of the details. More in this in a minute.

First we'll create a collection of objects.

PS (1) > $data = @{a=1;b=2;c=3},"Hi there", 3.5

Now serialize them to a file using the Export-CliXml cmdlet:

PS (2) > $data | Export-Clixml out.xml

Let's see what the file looks like:

PS (3) > type out.xml
<Objs Version="1.1" xmlns="http://schemas.microsoft.com/powershe
ll/2004/04"><Obj RefId="RefId-0"><TN RefId="RefId-0"><T>System.C
ollections.Hashtable</T><T>System.Object</T></TN><DCT><En><S N="
Key">a</S><I32 N="Value">1</I32></En><En><S N="Key">b</S><I32 N=
"Value">2</I32></En><En><S N="Key">c</S><I32 N="Value">3</I32></
En></DCT></Obj><S>Hi there</S><Db>3.5</Db></Objs>

It's not very readable, so we'll use the Dump-Doc function from earlier in the chapter to display it:

PS (4) > dump-doc out.xml
<Objs Version = "1.1"xmlns = "http://schemas.microsoft.com/power
shell/2004/04">

This first part identifies the schema for the CLIXML object representation.

<Obj RefId = "RefId-0">
    <TN RefId = "RefId-0">
        <T>
            System.Collections.Hashtable
        </T>
        <T>
            System.Object
        </T>
    </TN>
    <DCT>
        <En>

Here are the key/value pair encodings:

            <S N = "Key">
                a
            </S>
            <I32 N = "Value">
                1
            </I32>
        </En>
        <En>
            <S N = "Key">
                b
            </S>
            <I32 N = "Value">
                2
            </I32>
        </En>
        <En>
            <S N = "Key">
                c
            </S>
            <I32 N = "Value">
                3
            </I32>
        </En>
    </DCT>
</Obj>

Now encode the string element

    <S>
        Hi there
    </S>

And the double-precision number.

    <Db>
        3.5
    </Db>
</Objs>

Now let's import these objects back into the session.

PS (5) > $new_data = Import-Clixml out.xml

And compare the old and new collections.

PS (6) > $new_data

Name                 Value
----                 -----
a                    1
b                    2
c                    3
Hi there             3.5

PS (7) > $data

Name                 Value
----                 -----
a                    1
b                    2
c                    3
Hi there             3.5

And they match.

These cmdlets provide a simple way to save and restore collections of objects, but they have limitations. They can only load and save a fixed number of primitive types. Any other type is "shredded" – that is, broken apart into a property bag composed of these primitive types. This allows any type to be serialized, but with some loss of fidelity. In other words, objects can't be restored to exactly the same type they were originally. This approach is necessary because there can be an infinite number of object types, not all of which may be available when the file is read back. Sometimes you don't have the original type definition, and other times there's no way to re-create the original object even with the type information because the type does not support this operation. By restricting the set of types that are serialized with fidelity, the CLIXML format can always recover objects regardless of the availability of the original type information.

There is also another limitation on how objects are serialized. An object has properties. Those properties are also objects that have their own properties, and so on. This chain of properties that have properties is called the serialization depth. For some of the complex objects in the system, such as the Process object, serializing through all of the levels of the object results in a huge XML file. To constrain this, the serializer only traverses to a certain depth. The default depth is 2. This default can be overridden either on the command line using the –depth parameter or by placing a <SerializationDepth> element in the type's description file. If you look at $PSHome/types.ps1xml, you can see some examples of where this has been done.

This concludes our three part series on text and file processing in PowerShell. In the first part, we looked at basic string processing. In part 2, we extended this to work with text and binary files. Finally in part three we switched away from processing unstructured text and looked at XML documents. This series gives you a solid foundation for processing text with PowerShell. But working with text, while important, is really only a small part of what you can do with PowerShell. In Windows PowerShell in Action we cover a much wider array of topics including Windows automation with WMI and COM, and .NET programming including some simple Windows Forms and graphics scripting.

This material was excerpted from the book Windows PowerShell in Action from Manning Publications.

Excerpt fromWindows PowerShell in Action
ISBN 932394-90-7
Copyright 2007 Manning Publications
All rights reserved


Top of pageTop of page