Look, no hands
Author: Richard Costall
Contents
Look, no hands
One of my favourite features of Windows Vista are the new speech recognition tools.
Speech recognition has been around for a while, not baked into the operating system
but available via third party products. With Windows Vista you get a truly integrated
experience. I decided to set myself a challenge and see if I could complete 5 tasks
without using the keyboard or the mouse. The majority of this document has been
generated using speech recognition. Sure it gets the odd word wrong, like ‘sister’
for Vista, but generally it makes a pretty good job of translating speech into text
without touching the keyboard.
For this article I’m working with Vista Release candidate 1 (build 5600), I have
not trained the speech recognition and I have never used any speech recognition
products.

To get speech recognition working, use the Vista Search to find it. The dialog is
fairly unobtrusive and gives you audio levels and textual feedback on your speech.
You can view a list of options for speech with “Show Speech options” and running
through the tutorial will make you familiar with the majority of the commands. A
Speech Recognition trainer application will help the system to better recognise
your voice.
If you are ever unsure about what you can say in Speech Recognition, “What can I
say” is invaluable.
Challenge one - notepad
So here we go on with the challenges. First up is creating a notepad document saving
it and then reopening it.
“Run Notepad”
“Hello Speech Recognition”
“File”
“Save As”
“Test.txt”
“Save”
“Close Notepad”
“Run Notepad”
“File Open”
“Test”
“Open”
“Close”

fig1: Hello Speech Recognition
So Vista passed challenge one with flying colours. The commands are fairly obvious
and it got them all correct first time. It was slightly confused over the filename,
but it prompted me with a dialogue box and a series of numbered options. All I had
to do was say the matching number of the file to save.

fig2: Which did you want?
Challenge two - viewing pictures
What makes this challenge a lot easier is the new Vista features which gives you
the ability to add tags to your images, pictures and media. Tagging pictures makes
it really easy to search for them from the start menu. For this challenge I will
search for pictures tagged with a particular keyword, pick one, and then navigate
through the pictures.
“start”
“Oliver”
“2163”
Windows Vista will automatically search through all the metadata and filenames to
find items that match the entered text. Speaking the whole filename, “image_2163”,
didn’t get the match but just saying the number “2163” found the file and ran windows
photo gallery. It is now quite easy to navigate through the other pictures and indeed
change the orientation. In fact in my example one of the following files was actually
a video which was immediately played.

fig3: searching for pictures couldn’t be easier
“next”
“previous”
“next”
“actual size”
“fit to window”
“rotate clockwise – rotate counter clockwise”
Challenge two went without any problems. In fact I even started to play around with
the rotate clockwise and fit to windows commands both of which worked instantly.
So on to challenge three, time to open up the throttle and see what this baby can
do!
Challenge three – hello world application
As a developer, I spend my life in Visual Studio 2005. There has been a lot of noise
around its compatibility with Windows Vista, but how would the speech recognition
fare on the first challenge with an application not shipped with the operating system
“run visual studio”
“file”
Visual studio fired up and we’re ready to go, so it was time to create a brand new
project. All of a sudden things are not looking good. The speech recognition didn’t
seem to recognize the file menu. The prompt asked me about the “Bluetooth Fire Transfer
Wizard” or the “Remote File Wizard” - neither of which were any good to me. Fortunately
you can force a click anywhere on the screen by using the Mousegrid command.

fig4: I want to create a new windows application, not do a Bluetooth file transfer
Mousegrid draws a three by three grid on the screen and you say the number of the
cell you want to zoom in on. Vista then draws a new three by three grid within that
grid, allowing you to home in on the item you require.

fig5: Ultimate power with the MouseGrid command
“mousegrid 1-1-6 – click”
“down, down, enter”
“tab enter”
Visual studio is an incredibly complex product made all the harder by the fact I’ve
never used speech before. So there may be some better ways to do this but I have
to complete the challenge by whatever means. I am now looking at the form designer
with the next goal being to create a button.
“mousegrid 1-4-6-click”
“mousegrid 1-5-5-click”
We now have a button lying in the top left hand corner of the screen, but how do
we drag and drop it to get the positioning required.
“mousegrid 1-8-2 mark 4-6-6 click”
That was actually a lot easier than I was expecting. Using the Mousegrid Command,
highlighting an area and then saying “mark”, highlights the item to be dragged and
then using the Mousegrid again to determine the destination and then saying “click”
drops the control. We’re not going to win any awards for user interface design but
we now have a button on the form in the place we wanted it. If you are real perfectionist
for layout and then you can use “down”, “right”, “left” and “up” commands for precise
positioning.

fig6: Drag and Drop speech style
“mouseGrid 4-6-6 double click”
In Visual Basic .NET double clicking on the button creates the handler for the click
event. All we need to do now to complete the challenge is to show “hello world”
in a message box dialog when the user clicks on the button, surely it cannot be
that hard.
The code view does not seem to recognize message box or msgbox. After many attempts
I discovered the “start typing” command. This command gives you a very granular
level of speech recognition, character by character. To end the typing mode you
simply say “stop typing”. We can now run the application by clicking on the toolbar;
this is going to be a precise click though...

fig7: Our “Hello World” code.
“mousegrid 2-1-9-1-3 click”
There we have it, up pops our application, and by saying “enter”, a message box
is shown, we can even say “button one” as it matches the text on the controls. Our
new and shiny speech enabled application can be shut down by saying “close”.
So in just 10 minutes we have built an application in visual studio without touching
the keyboard whatsoever. As you get more and more used to speech recognition things
will happen a lot quicker for you.
Challenge four – solitaire
In my challenges I am trying to cover many of the important applications in Windows
Vista. Enter Solitaire. Using Vista Speech Recognition, can I play one game of solitaire
to its logical conclusion? Vista ships with two versions of solitaire, the standard
edition and a new spider solitaire. For this challenge we will be using the standard
solitaire that we all know and love.

fig8: Which Solitaire would you like to play?
“run solitaire”
“2 , Ok”
“Seven of Diamonds – Eight of Spades”
“Six of Spades – Seven of Diamonds”
“king of Clubs – Stack one”
“Ace of Clubs, Double Click”

fig9: Show Numbers, highlights all clickable controls.
It’s amazing how easy this is. If the recognition is unsure about which card you
are talking about it highlights all those that it matches. For example, all hearts
if you mumble “something of hearts”. As with every game of solitaire you soon run
out of face up cards and have to resort to turning over from the deck.

fig10: Which hearts did you want?
“Deal Space” will highlight the face up card, or you can use “show numbers” which
highlights clickable objects on the window. There was a couple of occasions where
I needed to get the next face down card, and the recognition wouldn’t let me use
“deal space” or “show numbers” to access it. Once again it was “Mousegrid” to the
rescue and we were back on track.
“mousegrid 1 8 click”
“King of Diamonds – Stack two”
It did take a while to get used to a couple of minor quirks, but it soon became
a fun, and not frustrating exercise. It was this feature alone that had my parents
asking about when Vista would be available and how much it would cost. I must admit
my ability to play solitaire broke before the speech recognition, and I’m glad the
challenge was to play the game to its conclusion and not complete it.

fig11: Help me anybody?
Challenge 5 - search for the NxtGenUG website and view an interview
The next big question I had was how the speech recognition would cope with web pages
and more importantly, web technologies, such as DHTML menus? So this time the challenge
is to navigate to the Next Generation User Group (NxtGenUG) website and then view
one of the speaker interviews.
“run internet explorer”
“search 2 ok”
“NxtGenUG – enter”
The built in search within Internet Explorer 7 made it really easy to search for
the NxtGenUG site. What I hadn’t done was to connect my wireless. So a quick detour
for the challenge...
“Start - Connect to”
“down – enter”
The wireless dialog fired up and we were soon connected. I could now see matches
for NxtGenUG, but how would you click on them?
“show numbers”
“37 – click”

fig12: How to click a hyperlink
The “show numbers” command identified and highlighted all the hyperlinks on the
page. This makes navigation straight forward. In addition to straight hyperlinks,
the NxtGenUG website uses a DHTML/Javascript menu. So how would the speech recognition
cope with hovering over a menu option? Sometimes things are just too obvious.

fig13: Navigating around the NxtGenUG site.
“Show Numbers - 30 Hover”
“interviews”
“Mike Hall”

fig14: View the User Group interviews
The last part of that was truly impressive. The speech recognition parsed the text
that makes up the hyperlink.
“Dave and Rich join up with Microsoft's Mike Hall who works on the Windows Embedded
team, to find out what's happening in the embedded world.”
In this case it’s the full article abstract and not just the person’s name. For
example “Voices for Innovation”, or “Innovation” would have navigated to the Julie
Ray interview.
“Dave, armed with a 'turned off' phone, met up with Julie Ray - Program Manager
for Voices For Innovation (VFI) to find out what VFI is all about...”
The interview can then be viewed using “scroll down” to scroll down the page. From
this point you can see how easy it is to navigate around the web. I was also able
to listen to a latest podcast by navigating to the podcasts page and then selecting
the link to the podcast.

fig15: Listening to the NxtGenUG Podcast.
“start – Nxt – enter”
Vista Search not only trawls through and indexes images, RSS feeds, emails and media,
it also pick up your Internet Explorer favourites. I could have streamlined the
above challenge by firing up the start menu, searching for “Nxt” and then saying
enter, which would find the favourite link to the NxtGenUG web site.
Using speech to surf the web could not have been easier.
Conclusion
Vista Speech is a very powerful feature. Sure it can be temperamental and it really
needs to be utilised in the peace and tranquillity of the home office. The documentation
seems to be very light, so there’s a lot of trial and error involved. I found the
“Mousegrid” and “Show Numbers” to be invaluable and by utilising a combination of
the two of these you can cope with the majority of situations that you come across
in applications.
The above five challenges were just things I thought up. I am sure not all applications
will be as easy to use or even appropriate for speech recognition. I must admit
I found working with word quite difficult, as documentation is sparse and there
are so many things you can be doing at any time, for example correcting spelling,
moving text, changing styles. Nothing that a little bit of background reading couldn’t
put right!
Right, that’s it, now for some unfinished business...
“run Solitaire”
About the Author
Richard Costall (MVP, MCSD.NET) has over 19 years development experience and works
for 1st Software, a Microsoft Gold Partner, and the UK's leading software solution
for Financial Adviser and Intermediaries, designing and implementing IFA applications
in the financial services sector. Previously specializing in VB, XML/XSLT, COM,
ASP and MSMQ, Richard now lives and breathes the awesome world of .Net and in particular
ASP.NET (including 2.0) Richard spent 5 1/2 years as the Midlands regional coordinator
for VBUG (Visual Basic User Group) before co-founding NxtGenUG, the innovative UK user group for Microsoft Technologies.
Richard has written articles for publications such as ASP.NET Pro and International
Developer Magazines and also co-authored the Apress title Professional MSMQ. He
speaks a local user groups, Microsoft Conferences/Product launches, TechED Europe
and the hugely successful DeveloperDeveloperDeveloper Events.
When not in .NET land, Richard enjoys relaxing at home with his wife and two sons,
playing on the XBOX 360 or ultimately jetting off to Walt Disney World, Florida,
for a trip on the Tower of Terror.
|