About the Author Richard Siddaway is Microsoft Practice Leader for Centiq Ltd, a Microsoft Gold Partner specializing in management, measurement, optimization and migration involving Microsoft technologies. The founder and president of the UK PowerShell User Group, Richard also writes Richard Siddaway's Blog of PowerShell and Other Things. |
This is not the world’s first PowerShell-based chat up line! This is the first event in the 2008 Scripting Games!! So, what are we supposed to do for the first event?
Upon first reading through the instructions it seems we have to generate a word from a seven-digit telephone number. When I first looked at the Scripting Games programme information that was issued a few weeks ago I started looking at the phones in the house and started to plan the code. I then read through the full instructions and got to a bit that said “For example, the word you create for 732-3464 must start with the letter P, R, or S. Why? Because the phone number starts with the number 7, and, on the standard phone dial, those three letters are the only letters associated with the number 7.” I had to immediately revise my plans because a UK phone has the letter Q associated with the number 7. Likewise the number 9 has Z associated with it. I found out later that phones in the USA do not have the letters Q and Z on the dial. On reading the instructions again I saw that the table associating digits and letters was quite clear – I had skimmed that part first time through! The moral of this little side trip is: read the instructions.
Having got that bit out of the way what other requirements are there? Reading through the instructions I pulled the following requirements:
1. | Derive a seven letter word from the phone number that is input via the command line or a message box – this is PowerShell so I will start with the command line. |
2. | It must be a single seven-letter word and it must be in the wordlist.txt file that is supplied to competitors – that means we have to read the file at some stage. |
3. | Can assume that the seven digits will be entered without a hyphen - so we do not need to write code to remove hyphens. |
4. | Only one solution should be displayed and it must be correct – note that it does not say the first correct solution. |
5. | The example shows the answer in upper case but the contents of wordlist.txt are in a mixture of case. No definite statement is made as to whether the answer should be returned in a particular case. I will return it in upper case as it looks more like the phone dial and matches the examples. |
6. | The wordlist.txt file will be in the c:\scripts folder when the scripts are tested |
I have a fairly direct approach to scripting. It needs to meet these criteria:
| • | It has to work |
| • | It has to deliver the correct result |
| • | It has to run in a reasonable time |
| • | It has to still be understandable six months from now |
| • | If necessary a brute force approach is acceptable |
My introduction to scripting was from an administrator’s view point and I will usually put in several small steps rather than go for a very clever piece of code that I won’t understand in a few months. My view is that, in administrative scripting, getting the result is more important than elegant code.
How am I going to solve this puzzle then?
First thing I did was to investigate Wordlist.txt. Naturally I used PowerShell!
$words = Get-Content "C:\Scripts\wordlist.txt" $words.count 32153
So we have 32153 possible words. That is a lot to search through. I decided to open up the file and have a look at the contents. This shows a great mixture of words. The variety comes from different cases being used; from spaces, hyphens and apostrophes being present in some words, and from different lengths of word.
My immediate thought was that this suggests we can reduce the number of words being considered as possible solutions.
$words7 = @()
$words | Where{$_.Length -eq 7} | Foreach {$words7 += $_}
$words7.length
4159
Our possible solution set has been reduced to a fraction of the starting position. This could make life a lot easier as at least some of the words would contain characters other than letters.
I thought of a number of possible solutions as I was looking at the contents of wordlist.txt.
1. | Convert the number to a word and then compare to the file contents |
2. | Work through the word list converting the word to a number and comparing that to the number that was input on a digit-by-digit basis |
3. | Work through the word list converting the word to a number and then comparing that to the whole number that was input |
4. | Use regular expressions to compare the number to the words in the file |
Option 1 would not be quick to run. Each digit translates to three possible letters, so for a seven-digit number we have 3x3x3x3x3x3x3 possible words to check. That comes to 2187 possible combinations of letters. We would have to scan through the contents of wordlist.txt up to 2187 times which means a possible total of 70,318,611 comparisons. We should not need that many, but it seems like this approach could take a long time to run! Even if we used only the seven-letter words we may be looking at up to 9,095,733 reads. In addition, the code to create the words would be complicated. This option was quickly ruled out as it takes too long to run and it is too complicated to write, especially for the Scripting Games, where there are deadlines on delivering the solutions and the activity takes place outside of working hours.
This suggests that starting with the word list might be a better approach. My next thought was that we could work through the word list and convert the words to numbers. This is an easier approach, because each letter maps to a single digit; that means a simple look-up will suffice. By working with the words we have to read through the file contents only once, which is a huge improvement. We can split this approach into two slightly different answers. Option 2 in the list works through the word and converts each letter to the corresponding digit. We then compare that digit to the digit in the same position in the phone number. For example if our word is abdomen and our phone number is 7435698 we convert the first letter (a) to the corresponding digit (2) and compare it to the first digit in the phone number. They do not match so we stop the comparison and move to the next word. The code to control looping through the letters becomes a bit involved and we need to either conduct an additional check on the word length or to first skim out the seven-letter words as shown earlier.
Option 3 is similar but we convert all of the letters to digits and then compare. This is easier to code but takes slightly longer to run. The code looks like this.
## get the phone number
$num = Read-Host "Please enter 7 digit telephone number"
if ($num.Length -ne 7){
Write-Host "Number must contain 7 digits but", $num, "contains", $num.Length
Return
}
# create a lookup hash table
$code = @{"A"="2"; "B"="2"; "C"="2"; "D"="3"; "E"="3"; "F"="3"; "G"="4"; "H"="4"; "I"="4"; "J"="5"; `
"K"="5"; "L"="5"; "M"="6"; "N"="6"; "O"="6"; "P"="7"; "R"="7"; "S"="7"; "T"="8"; "U"="8"; "V"="8"; `
"W"="9"; "X"="9"; "Y"="9";}
## now we need to read the file
## loop through the words
Get-Content "C:\Scripts\wordlist.txt" | ForEach-Object {
## check length of word
if ($_.Length -eq 7){
## convert to array of upper case
$wry = $_.ToUpper().ToCharArray()
$numchar = ""
For ($i = 0; $i -lt 7; $i++) {
## need to check if actually a letter
if ($wry[$i] -like "[A-Z]") {
$x = $wry[$i]
## build the digital version of the word
$numchar = $numchar + $code["$x"] }
}
if ($numchar -eq $num){
Write-Host $_.ToUpper()
exit ## stop after first hit
}
}
}
We start by using Read-Host to get the phone number. We were told hyphens would not be used but to ensure that typos are caught I check for the number of digits using the length property. We need to perform a lookup on the letters and the easiest way to do that is via a hash table. This is a pain having to create it this way, and I suspect a better way will come to me the day after the event closes. Now we get to the grunt work of the script. Use Get-Content to read the word list. That cmdlet saves so much effort, it is one of the best ideas in the whole of PowerShell. We then go into a Foreach-Object loop.
Each word coming into the loop is checked for length. If the length is seven we convert the word to an array of uppercase characters. This may not be necessary but it keeps everything consistent and removes a possible source of error. Note the chaining of the methods. This is perfectly permissible in .NET and saves a line of code (and typing). A variable, $numchar, is created to hold the digits and initialised as empty. A for loop is used to iterate through the letters of the word and, after checking that it is a letter, we look up the corresponding digit. Note that I set a variable to the letter and then use that in the look up. PowerShell did not like a construction of $code[$wry[$i]] and after attempting the obvious options I just went for the simple approach. Remember, if it works it’s good.
Having built up our set of digits we do a simple compare to the phone number input at the beginning of the script. If they match we write out the word in uppercase as it matches the phone dial. The exit statement causes our script to stop, ensuring we output only one answer.
This option gives a better result but it does not really show off the strengths of PowerShell. We could have produced solutions like this with any scripting language.
At this point I thought that option 3 would be the one I would probably use. A chance conversation (thanks James) raised the subject of regular expressions. I do not really like regular expressions and I have not used them that much, so they never really surface as a natural part of my scripting. However in this case they seemed like a very suitable answer, as I would be able to produce a short, easy-to-understand script that would solve the problem.
## get the phone number
$num = Read-Host "Please enter 7 digit telephone number"
if ($num.Length -ne 7){
Write-Host "Number must contain 7 digits but", $num, "contains", $num.Length
Return
}
## split number into array
$chary = $num.ToCharArray()
$rgx = ""
## now build the regular expression
## note that Q and Z don't appear on a US dial
For ($i = 0; $i -lt $chary.psbase.Length; $i++){
switch ($chary[$i]){
"2" {$r = "[ABC]"}
"3" {$r = "[DEF]"}
"4" {$r = "[GHI]"}
"5" {$r = "[JKL]"}
"6" {$r = "[MNO]"}
"7" {$r = "[PQR]"}
"8" {$r = "[TUV]"}
"9" {$r = "[WXY]"}
}
$rgx = $rgx + $r
}
## now we need to read the file
## and find a word
$word = Get-Content "c:\scripts\wordlist.txt" | where {($_.Length -eq 7) -and ($_.ToUpper() -match $rgx)} | Select -First 1
Write-Host $word.ToUpper()
As in the first script we get the phone number and check the number of digits. The number is split into an array of characters. We loop through the array of characters, building a regular expression to use in the matching. For example 7323464 will mean $rgx is equal to "[PQR][DEF][ABC][DEF][GHI][MNO][GHI]”. Each letter in the word is checked against the equivalent regular expression for its position. The whole comparison operation takes a single statement, so we begin to eliminate some of the loops. My initial attempt at this version used Get-Content and a Foreach loop as I had before. However, a little more thought made me realize that I could do everything on the pipeline. We set a variable to the output of the pipeline that starts with the Get-Content. A Where cmdlet is used to filter on the length of the word and determine if the word matches our regular expression – now that is real code efficiency. To ensure we get only a single answer we use a Select to pass only the first answer. We then write out the answer. It would be possible to combine these last two lines but I think it is clearer this way.