PowerShell is Microsoft's next-generation command line and scripting solution. It combines the interactive capabilities of traditional shells such as bash or zsh with the programmability of scripting languages such as Perl or Ruby. Because PowerShell is based on .NET, it's capable of doing things in a shell environment that were previously only possible in languages such as Visual Basic, VBScript, or C#.
As with any scripting language, one of the most important domains for PowerShell is the ability to work with strings and files (both text and binary). This series of articles is based on chapter 10 of Windows PowerShell in Action from Manning Publications. Chapter 10 examines how PowerShell handles text and file processing tasks, illustrating how to process and parse text using string objects and regular expressions. It also shows how to deal with paths and how to manipulate binary files. Another significant area covered in this chapter is how to work with XML. XML has become increasingly important both in the IT field and in software development. We show how to search, manipulate, and create XML documents using PowerShell.
Part 1 of this series looked at how to process, search, and manipulate strings and unstructured text using the .NET string object and regular expressions. In Part 2 author Bruce Payette, a technical lead on the Windows PowerShell team, focused on file processing.
Part 3 of this series looks at working with XML in PowerShell. It covers PowerShell’s XML object adapter, how to work with the .NET XmlDocument and XmlReader classes, and, finally, how to navigate through a document. It also looks at the commands for saving data to and retrieving data from XML files.
| Part 3: XML Processing in PowerShell | |
| Using XML as Objects | |
| Adding Elements to an XML Object | |
| Loading and Saving XML Files | |
| Using XML in a Pipeline | |
| Using XPath | |
| The Import-Clixml and Export-Clixml Cmdlets |
XML (eXtensible Markup Language) is becoming more and more important in the computing world. XML is being used for everything from configuration files to log files to databases. PowerShell itself uses XML for its type and configuration files as well as for the help files. Clearly, for PowerShell to be effective, it has to be able to deal with XML documents. Let's take a look at how XML is used and supported in PowerShell.
Note. This article assumes some basic knowledge of XML markup. |
We'll look at the XML object type as well as the mechanism that .NET provides for searching XML documents.
PowerShell supports XML documents as a primitive data type. This means that you can access the elements of an XML document as though they were properties on an object. For example, let's create a simple XML object. We'll start with a string that defines a top-level node called "top". This node contains three descendants "a", "b", and "c" each of which has a value. Let's turn this string into an object:
PS (1) > $d = [xml] "<top><a>one</a><b>two</b><c>3</c></top>"
The [xml] cast takes the string and converts it into an XML object of type System.XML.XmlDocument. This object is then adapted by PowerShell so you can treat it like a regular object. Let's try this out. First we'll just display the object:
PS (2) > $d top --- top
As we expect, the object displays one top-level property corresponding to the top-level node in the document. Now let's see what properties this node contains:
PS (3) > $d.a PS (4) > $d.top a b c - - - one two 3
We see three properties that correspond to the descendents of top. We can use conventional property notation to look at the value of an individual member:
PS (5) > $d.top.a one
However, we can't change the value of this node. The system presents the objects as having only read-only members.
PS (6) > $d.top.a = 13 Cannot set "a" because only strings can be used as values to set XmlNode properties. At line:1 char:8 + $d.top.a <<<< = 13 PS (7) > $d.top.c 3
All of the normal type conversions apply, of course. The node c contains a string value that is a number.
PS (8) > $d.top.c.gettype().FullName System.String
We can add this field to an integer, which will cause it to be converted into an integer.
PS (9) > 2 + $d.top.c 5
Now, since we can't simply assign to elements in an XML document, we'll dig a little deeper into the [xml] object and see how we can add elements.
Let's add an element "d" to this document. To do this, we need to use the methods on the XML document object. First we have to create the new element:
PS (10) > $elem = $d.CreateElement("d")
In text, what we've created looks like "<d></d>". The tags are there, but it's empty. Let's set the element text—the "inner text."
PS (11) > $elem.set_InnerText("Hello")
#text
-----
Hello
Notice that we're using the property setter method here. This is because the XML adapter hides the basic properties on the XmlNode object. The other way to set this would be to use the PSBase member like we did with the hashtable example earlier in this chapter.
PS (12) > $ne = $d.CreateElement("e")
PS (13) > $ne.psbase.InnerText = "World"
PS (14) > $d.top.AppendChild($ne)
#text
-----
World
Now let's look at the revised object.
PS (15) > $d.top a : one b : two c : 3 d : Hello e : World
We see that the document now has five members instead of the original three. But what does the string look like now? It would be nice if we could simply case the document back to a string and see what it looks like:
PS (16) > [string] $d System.Xml.XmlDocument
Unfortunately, as you can see, it isn't as simple as this. Instead, we'll save the document as a file and then display it:
PS (17) > $d.save("c:\temp\new.xml")
PS (18) > type c:\temp\new.xml
<top>
<a>one</a>
<b>two</b>
<c>3</c>
<d>Hello</d>
<e>World</e>
</top>
The result is a nicely readable text file. Now that we know how to add children to a node, how can we add attributes? The pattern is basically the same as with elements. First we create an attribute object.
PS (19) > $attr = $d.CreateAttribute("BuiltBy")
Next we set the value of the text for that object. Again we use the psbase member to bypass the adaptation layer.
PS (20) > $attr.psbase.Value = "Windows PowerShell"
And finally we add it to the top level document
PS (21) > $d.psbase.DocumentElement.SetAttributeNode($attr) #text ----- Windows PowerShell
Let's look at the top node once again.
PS (22) > $d.top BuiltBy : Windows PowerShell a : one b : two c : 3 d : Hello e : World
Now we see that the attribute has been added. Let's save the document:
PS (23) > $d.save("c:\temp\new.xml")
Then retrieve the file. You can see how the attribute has been added to the top node in the document.
PS (24) > type c:\temp\new.xml <top BuiltBy="Windows PowerShell"> <a>one</a> <b>two</b> <c>3</c> <d>Hello</d> </top> PS (25) >
We constructed, edited, and saved XML documents, but we haven't loaded an existing document yet, so let's do that now.
At the end of the previous section, we saved an XML document to a file. Now let's read it back:
PS (1) > $nd = [xml] [string]::join("`n",
>> (gc –read 10kb c:\temp\new.xml))
>>
Here's what we're doing. We use the Get-Content cmdlet to read the file; however, it comes back as a collection of strings when what we really want is one single string. To merge the strings into one, we use the [string]::Join() method. Once we have the single string, we cast the whole thing into an XMLdocument.
Performance Tip: By default, Get-Content reads one record at a time. This can be quite slow. When processing large files, you can use the –ReadCount parameter to specify larger sizes. This will cause blocks of records to be written to the pipeline instead of writing one record at a time. |
Let's verify that the document was read properly by dumping out the top-level node and then the child nodes.
PS (2) > $nd top --- top PS (3) > $nd.top BuiltBy : Windows PowerShell a : one b : two c : 3 d : Hello
All is as it should be; even the attribute is there.
Now – as noted, while this is a simple approach and the one we'll use most often, it's not necessarily the most efficient approach, because it requires loading the entire document into memory. For large documents, or collections of many documents, this may become a problem. In the next section, we'll look at some alternative approaches that, while more complex, are more memory efficient.
Example: The Dump-Doc Function
The previous method we looked at for loading an XML file is simple but not very efficient. It requires that you load the file into memory, make a copy of the file while turning it into a single string, and then create an XML document representing the entire file but with all of the overhead of the XML DOM format. A more space-efficient way to process XML documents is to use the XML reader class. This class streams through the document one element at a time instead of loading the whole thing into memory. We're going to write a function that will use the XML reader to stream through a document and output it properly indented. An XML pretty-printer if you will. Here's what we want the output of this function to look like when it dumps its built-in default document:
PS (1) > dump-doc
<top BuiltBy = "Windows PowerShell">
<a>
one
</a>
<b>
two
</b>
<c>
3
</c>
<d>
Hello
</d>
</top>
Now let's test our function on a more complex document where there is more nesting and more attributes:
@'
<top BuiltBy = "Windows PowerShell">
<a pronounced="eh">
one
</a>
<b pronounced="bee">
two
</b>
<c one="1" two="2" three="3">
<one>
1
</one>
<two>
2
</two>
<three>
3
</three>
</c>
<d>
Hello there world
</d>
</top>
'@ > c:\temp\fancy.xml
When we run the function, we see
PS (2) > dump-doc c:\temp\fancy.xml
<top BuiltBy = "Windows PowerShell">
<a pronounced = "eh">
one
</a>
<b pronounced = "bee">
two
</b>
<c one = "1"two = "2"three = "3">
<one>
1
</one>
<two>
2
</two>
<three>
3
</three>
</c>
<d>
Hello there world
</d>
</top>
Which is pretty close to the original document. The code for the Dump-Doc function is shown below:
function Dump-Doc ($doc="c:\temp\new.xml")
{
$settings = new-object System.Xml.XmlReaderSettings
$doc = (resolve-path $doc).ProviderPath
$reader = [xml.xmlreader]::create($doc, $settings)
$indent=0
function indent ($s) { " "*$indent+$s }
while ($reader.Read())
{
if ($reader.NodeType -eq [Xml.XmlNodeType]::Element)
{
$close = $(if ($reader.IsEmptyElement) { "/>" } else { ">" })
if ($reader.HasAttributes)
{
$s = indent "<$($reader.Name) "
[void] $reader.MoveToFirstAttribute()
Do
{
$s += "$($reader.Name) = `"$($reader.Value)`""
}
while ($reader.MoveToNextAttribute())
"$s$close" }
else
{
indent "<$($reader.Name)$close"
}
if ($close -ne '/>') {$indent++}
}
elseif ($reader.NodeType -eq [Xml.XmlNodeType]::EndElement )
{
$indent--
indent "</$($reader.Name)>"
}
elseif ($reader.NodeType -eq [Xml.XmlNodeType]::Text)
{
indent $reader.Value
}
}
$reader.close()
}
This is a complex function, so let's go through it one piece at a time. We start with the basic function declaration, where it takes an optional argument that names a file. Next we'll create the settings object that we need to pass in when we create the XML reader object. We also need to resolve the path to the document, because the XML reader object requires an absolute path. Now we can create the XmlReader object itself. The XML reader will stream through the document, reading only as much as it needs, as opposed to reading the entire document into memory.
We want to indent the levels of the document so we'll initialize an indent level counter and a local function to display the indented string. Now we'll read through all of the nodes in the document. We'll choose different behavior based on the type of the node. An element node is the beginning of an XML element. If the element has attributes then we'll add them to the string to display. We'll use the MoveToFirstAttribute()/MoveToNextAttribute() methods to move through the attributes. If there were no attributes, just display the element name. At each new element, increase the indent level if it's not an empty element tag. If it's the end of an element, decrease the indent level and display the closing tag. If it's a text element, just display the value of the element. Finally close the reader. We always want to close a handle received from a .NET method. It will eventually be discarded during garbage collection, but it's possible to run out of handles before you run out of memory.
This example illustrates the basic techniques for using an XML reader object to walk through an arbitrary document. In the next section, we'll look at a more specialized application.
Example: The Select-Help Function
Now let's work with something more useful. The PowerShell help files are stored as XML documents. We want to write a function that scans through the command file, searching for a particular word in either the command name or the short help description. Here's what we want the output to look like:
PS (1) > select-help property Clear-ItemProperty: Removes the property value from a property. Copy-ItemProperty: Copies a property between locations or namespaces. Get-ItemProperty: Retrieves the properties of an object. Move-ItemProperty: Moves a property from one location to another. New-ItemProperty: Sets a new property of an item at a location. Remove-ItemProperty: Removes a property and its value from the location. Rename-ItemProperty: Renames a property of an item. Set-ItemProperty: Sets a property at the specified location to a specified value. PS (2) >
In the example, we're searching for the word property and we get a list of all of the cmdlets that work with properties. The output is a string that contains the property name and a description string. Next let's look at a fragment of the document we're going to process:
<command:details>
<command:name>
Add-Content
</command:name>
<maml:description>
<maml:para>
Adds to the content(s) of the specified item(s)
</maml:para>
</maml:description>
<maml:copyright>
<maml:para></maml:para>
</maml:copyright>
<command:verb>add</command:verb>
<command:noun>content</command:noun>
<dev:version></dev:version>
</command:details>
PowerShell help text is stored in MAML (Microsoft Assistance Markup Language) format. From simple examination of this fragment, we can see that the name of a command is stored in the command:name element and the description is stored in a maml:para element nested inside a maml:description element. The basic approach we'll use is to look for the command tag, extract and save the command name, and then capture the description in the description element that immediately follows the command name element. This means that we'll use a state-machine pattern to process the document. A state machine usually implies using the switch statement, so this example is also a good opportunity to use the control structures in the PowerShell language a bit more. Now let's look at the function:
function Select-Help ($pat = ".")
{
$cmdHelp = "Microsoft.PowerShell.Commands.Management.dll-Help.xml"
$doc = "$PSHOME\$cmdHelp"
$settings = new-object System.Xml.XmlReaderSettings
$settings.ProhibitDTD = $false
$reader = [xml.xmlreader]::create($doc, $settings)
$name = $null
$capture_name = $false
$capture_description = $false
$finish_line = $false
while ($reader.Read())
{
switch ($reader.NodeType)
{
([Xml.XmlNodeType]::Element) {
switch ($reader.Name)
{
"command:name" {
$capture_name = $true
break
}
"maml:description" {
$capture_description = $true
break
}
"maml:para" {
if ($capture_description)
{
$finish_line = $true;
}
}
}
break
}
([Xml.XmlNodeType]::EndElement) {
if ($capture_name) { $capture_name = $false }
if ($finish_description)
{
$finish_line = $false
$capture_description = $false
}
break
}
([Xml.XmlNodeType]::Text) {
if ($capture_name)
{
$name = $reader.Value.Trim()
}
elseif ($finish_line -and $name)
{
$msg = $name + ": " + $reader.Value.Trim()
if ($msg -match $pat)
{
$msg
}
$name = $null
}
break
}
}
}
$reader.close()
}
Once again, this is a long piece of code, so let's work through it a piece at a time. The $pat parameter will contain the pattern to search for. If no argument is supplied, the default argument will match everything. Next we set up the name of the document to search in the PowerShell installation directory. Then we create the XmlReader object as in the previous examples.
Since we're using a state machine, we need to set up some state variables. The $name variable will be used to hold the name of the cmdlet, and the others will hold the state of the processing. We'll read through the document, one node at a time, and switch on the node type. Unrecognized node types are ignored.
First we'll process the Element nodes. We'll use a nested switch statement to perform different actions based on the type of element. Finding a "command:name" element starts the matching process. When we see a "maml:description" element. We want to capture the beginning of the MAML description field, so we indicate that we want to capture the description. When we see the "maml:para" element), we need to handle the embedded paragraph in the description element. In the end tag of an element, we'll reset some of the state variables if they've been set. And finally, we need to extract the information we're interested in out of the element. We've captured the cmdlet name out of the element, but we want to remove any leading and trailing spaces so we'll use the [string] Trim() method. Now we have both the cmdlet name and the description string. If it matches the pattern the caller specified, output it. Again, the last thing to do is to close the XML reader so we don't waste resources.
But where are the pipelines, we ask? Neither of the previous two examples has taken advantage of PowerShell's pipelining capability. In the next section we'll remedy this omission.
Pipelining is one of the signature characteristics of shell environments in general and PowerShell in particular. Since the previous examples didn't take advantage of this feature, we'll look at how it can be applied now. We're going to write a function that scans all of the PowerShell help files, both the text and the XML files. For example, let's search for all of the help topics that mention the word "scriptblock."
PS (1) > search-help scriptblock about_Display about_Types Get-Process Group-Object Measure-Command Select-Object Trace-Command ForEach-Object Where-Object
This tool provides a simple, fast way to search for all of the help topics that contain a particular pattern:
function Search-Help
{
param ($pattern = $(throw "you must specify a pattern"))
select-string -list $pattern $PSHome\about*.txt |
%{$_.filename -replace '\..*$'}
dir $PShome\*dll-help.*xml |
%{ [xml] (get-content -read -1 $_) } |
%{$_.helpitems.command} |
? {$_.get_Innertext() -match $pattern} |
%{$_.details.name.trim()}
}
This function takes one parameter to use as the pattern to search for. We're using the throw keyword to generate an error if the parameter wasn't provided.
First we search all of the text files in the PowerShell installation directory and return one line for each matching file. Then we pipe this line into foreach-object (or its alias % in this case) to extract the base name of the file using the replace operator and a regular expression. This will list the file names in a form that you can plug back into get-help.
Now get a list of the XML help files and turn each file into an XML object. We specify a read count of -1, so the whole file is read at once. We extract the command elements from the XML document and then see whether the text of the command contains the pattern we're looking for. If so, omit the name of the command, trimming off unnecessary spaces.
As well as being a handy way to search help, this function is a nice illustration of using the divide-and conquer strategy when writing scripts in PowerShell. Each step in the pipeline brings you incrementally closer to the final solution.
Now that we know how to manually navigate through an XML document, let's look at some of the ways that the .NET framework provides to make this easier and more efficient.
The support for XML in the .NET Framework is rich. We can’t cover all of it, but we will cover one other thing. XML is actually a set of standards. One of these standards defines a path mechanism for searching through a document. This mechanism is called (not surprisingly) XPath. By using the .NET Framework’s XPath support, we can more quickly retrieve data from a document.
Setting up the test document
We'll work through a couple of examples using XPath, but first we need something to process. The following fragment is a string we'll use for our examples. It's a fragment of a book store inventory database. Each record in the database has the name of the author, the book title, and the number of books in stock. We'll save this string in a variable called $inventory as shown below:
$inventory = @"
<bookstore>
<book genre="Autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
<stock>3</stock>
</book>
<book genre="Novel">
<title>Moby Dick</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
<stock>10</stock>
</book>
<book genre="Philosophy">
<title>Discourse on Method</title>
<author>
<first-name>Rene</first-name>
<last-name>Descartes</last-name>
</author>
<price>9.99</price>
<stock>1</stock>
</book>
<book genre="Computers">
<title>Windows PowerShell in Action</title>
<author>
<first-name>Bruce</first-name>
<last-name>Payette</last-name>
</author>
<price>39.99</price>
<stock>5</stock>
</book>
</bookstore>
"@
Now that we have our test document created, let's look at what we can do with it.
The Get-Xpm Helper Function
To navigate through an XML document and extract information, we're going to need an object for XML document navigation. Here is the definition of a function that will create the object we need.
function get-xpn ($text)
{
$rdr = [System.IO.StringReader] $text
$trdr = [system.io.textreader]$rdr
$xpdoc = [System.XML.XPath.XPathDocument] $trdr
$xpdoc.CreateNavigator()
}
Unfortunately, we can't just convert a string directly into an XPath document. There is a constructor on this type that takes a string, but it uses that string as the name of a file to open. Consequently, the get-xpn function has to wrap the argument string in a StringReader object and then in a TextReader object, which can finally be used to create the XPathDocument. Once we have an instance of XPathDocument, we can use the CreateNavigator() method to get an instance of a navigator object.
$xb = get-xpn $inventory
And now we're ready to go. We can use this navigator instance to get information out of a document. First, let's get a list of all of the books that cost more than $9.
PS (1) > $expensive = "/bookstore/book/title[../price>9.00]"
We'll store the XPath query in the variable $expensive. Let's look at the actual query string for a minute. As you might expect from the name XPath, this query starts with a path into the document
/bookstore/book/title
This path will select all of the title nodes in the document. But, since we only want some of the titles, we extend the path with a qualification. In this case
[../price>9.00]
This says only match paths where the price element is greater than 9.00. Note that a path is used to access the price element. Since price is a sibling (at the same level) as the title element, we need to specify this as
../price
This should give you a basic of idea what the query is expressing, so we won't go into any more detail. Now let's run the query using the Select() method on the XPath navigator.
PS (2) > $xb.Select($expensive) | ft value Value ----- Moby Dick Discourse on Method Windows PowerShell in Action
We run the result of the query into format-table because we're only interested in the value of the element. (Remember that what we're extracting here is just the title element.) So this is pretty simple – we can search through the database and get the titles pretty easily. How about if we want to print both the title and price? Here's one way we can do it.
Extracting Multiple Elements
To extract multiple elements from the document, first we'll have to create a new query string. This time we need to get the whole book element, not just the title element, so we can also extract the price element. Here's the new query string:
PS (3) > $titleAndPrice = "/bookstore/book[price>9.00]"
Notice that this time, since we're getting the book instead of the title, we can filter on the price element without having to use ".." to go up a path. The problem now is how do we get the pieces of data we want – the title and price? Well – the result of the query has a property called OuterXml. This property contains the XML fragment that represents the entire book element. We can take this element and cast it into an XML document as we saw earlier in this section. Once we have it in this form, we can use the normal property notation to extract the information. Here's what it looks like:
PS (4) > $xb.Select($titleAndPrice) | %{[xml] $_.OuterXml} |
>> ft -auto {$_.book.price},{$_.book.title}
>>
$_.book.price $_.book.title
------------- -------------
11.99 Moby Dick
9.99 Discourse on Method
39.99 Windows PowerShell in Action
The call to Select() is like what we saw earlier. Now we take each object and process it using the foreach-object cmdlet. First we take the current pipeline object, extract the OuterXml string, then cast that string into an XML document, and pass that object through to the format-table cmdlet. We use scriptblocks in the field specification to extract the information we want to display.
Performing Calculations on Elements
Let’s look at one final example. We want to get total price of all of the books in the inventory. This time we'll use a slightly different query.
descendant::book
This query selects all elements that have a descendent element titled book. This is a more general way of selecting elements in the document. We'll pipe these documents into foreach-object. Here we'll specify scriptblocks for each of the begin, process, and end steps in the pipeline. In the begin scriptblock, we'll initialize $t to zero to hold the result. In the foreach scriptblock, we convert the current pipeline object into an [xml] object as we saw in the previous example. Then we get the price member, convert it into a [decimal] number, multiply it by the number of books in stock, and add the result to the total. The final step is to display the total in the end scriptblock. Here's what it looks like when it's run:
PS (5) > $xb.Select("descendant::book") | % {$t=0} `
>> {
>> $book = ([xml] $_.OuterXml).book
>> $t += [decimal] $book.price * $book.stock
>> } `
>> {
>> "Total price is: `$$t"
>> }
>>
Total price is: $356.81
Having looked at building an XML path navigator on a stream, can we use XPath on an XML document itself? The answer is yes. In fact, it can be much easier than what we've seen previously. First let's convert our inventory into an XML document.
PS (6) > $xi = [xml] $inventory
The variable $xi now holds an XML document representation of the bookstore inventory. Let's select the genre attribute from each book:
PS (7) > $xi.SelectNodes("descendant::book/@genre")
#text
-----
Autobiography
Novel
Philosophy
Computers
This query says "select the genre attribute (indicated by the @) from all descendent elements called book." Now let's revisit another example from earlier in this section and display the books and prices again.
PS (8) > $xi.SelectNodes("descendant::book") |
>> ft -auto price, title
>>
price title
----- -----
8.99 The Autobiography of Benjamin Franklin
11.99 Moby Dick
9.99 Discourse on Method
39.99 Windows PowerShell in Action
This is simpler than the earlier example because SelectNodes() on an XmlDocument returns XmlElement objects, which PowerShell adapts and presents as regular objects. With the XPathNavigator.Select() method, we're returning XPathNavigator nodes, which aren't adapted automatically. As we can see, working with the XmlDocument object is the easiest way to work with XML in PowerShell, but there may be times when you need to use the other mechanisms, either for efficiency reasons (XmlDocument loads the entire document into memory) or because you're adapting code from another language.
In this section, we've demonstrated how you can use the XML facilities in the .NET framework to create and process XML documents. As the XML format is used more and more in the computer industry, these features will become critical. We've only scratched the surface of what is available in the .NET framework. We've only covered some of the XML classes and a little but of the XPath query language. We haven't discussed how to use XSLT, the eXtensible Stylesheet Language Transformation language that is part of the System.Xml.Xsl namespace. All of these tools are directly available from within the PowerShell environment. In fact, the interactive nature of the PowerShell environment makes it an ideal place to explore, experiment, and learn about XML.
The last topic we're going to cover on XML are the cmdlets for importing and exporting objects from PowerShell. These cmdlets provide a way to save and restore collections of objects from the PowerShell environment. Let's take a look at how things are serialized.
Note. Serialization is the process of saving an object or objects to a file or a network stream. The components of the objects are stored as a series of pieces, hence "serialization." PowerShell uses a special type of "lossy" serialization where the basic shape of the objects is preserved but not all of the details. More in this in a minute. |
First we'll create a collection of objects.
PS (1) > $data = @{a=1;b=2;c=3},"Hi there", 3.5
Now serialize them to a file using the Export-CliXml cmdlet:
PS (2) > $data | Export-Clixml out.xml
Let's see what the file looks like:
PS (3) > type out.xml <Objs Version="1.1" xmlns="http://schemas.microsoft.com/powershe ll/2004/04"><Obj RefId="RefId-0"><TN RefId="RefId-0"><T>System.C ollections.Hashtable</T><T>System.Object</T></TN><DCT><En><S N=" Key">a</S><I32 N="Value">1</I32></En><En><S N="Key">b</S><I32 N= "Value">2</I32></En><En><S N="Key">c</S><I32 N="Value">3</I32></ En></DCT></Obj><S>Hi there</S><Db>3.5</Db></Objs>
It's not very readable, so we'll use the Dump-Doc function from earlier in the chapter to display it:
PS (4) > dump-doc out.xml <Objs Version = "1.1"xmlns = "http://schemas.microsoft.com/power shell/2004/04">
This first part identifies the schema for the CLIXML object representation.
<Obj RefId = "RefId-0">
<TN RefId = "RefId-0">
<T>
System.Collections.Hashtable
</T>
<T>
System.Object
</T>
</TN>
<DCT>
<En>
Here are the key/value pair encodings:
<S N = "Key">
a
</S>
<I32 N = "Value">
1
</I32>
</En>
<En>
<S N = "Key">
b
</S>
<I32 N = "Value">
2
</I32>
</En>
<En>
<S N = "Key">
c
</S>
<I32 N = "Value">
3
</I32>
</En>
</DCT>
</Obj>
Now encode the string element
<S>
Hi there
</S>
And the double-precision number.
<Db>
3.5
</Db>
</Objs>
Now let's import these objects back into the session.
PS (5) > $new_data = Import-Clixml out.xml
And compare the old and new collections.
PS (6) > $new_data Name Value ---- ----- a 1 b 2 c 3 Hi there 3.5 PS (7) > $data Name Value ---- ----- a 1 b 2 c 3 Hi there 3.5
And they match.
These cmdlets provide a simple way to save and restore collections of objects, but they have limitations. They can only load and save a fixed number of primitive types. Any other type is "shredded" – that is, broken apart into a property bag composed of these primitive types. This allows any type to be serialized, but with some loss of fidelity. In other words, objects can't be restored to exactly the same type they were originally. This approach is necessary because there can be an infinite number of object types, not all of which may be available when the file is read back. Sometimes you don't have the original type definition, and other times there's no way to re-create the original object even with the type information because the type does not support this operation. By restricting the set of types that are serialized with fidelity, the CLIXML format can always recover objects regardless of the availability of the original type information.
There is also another limitation on how objects are serialized. An object has properties. Those properties are also objects that have their own properties, and so on. This chain of properties that have properties is called the serialization depth. For some of the complex objects in the system, such as the Process object, serializing through all of the levels of the object results in a huge XML file. To constrain this, the serializer only traverses to a certain depth. The default depth is 2. This default can be overridden either on the command line using the –depth parameter or by placing a <SerializationDepth> element in the type's description file. If you look at $PSHome/types.ps1xml, you can see some examples of where this has been done.
This concludes our three part series on text and file processing in PowerShell. In the first part, we looked at basic string processing. In part 2, we extended this to work with text and binary files. Finally in part three we switched away from processing unstructured text and looked at XML documents. This series gives you a solid foundation for processing text with PowerShell. But working with text, while important, is really only a small part of what you can do with PowerShell. In Windows PowerShell in Action we cover a much wider array of topics including Windows automation with WMI and COM, and .NET programming including some simple Windows Forms and graphics scripting.
This material was excerpted from the book Windows PowerShell in Action from Manning Publications.
Excerpt fromWindows PowerShell in Action
ISBN 932394-90-7
Copyright 2007 Manning Publications
All rights reserved