Click Here to Install Silverlight*
United StatesChange|All Microsoft Sites
MSDN
The Beta Experience

Look, no hands

Author: Richard Costall


Contents

Look, no hands

One of my favourite features of Windows Vista are the new speech recognition tools. Speech recognition has been around for a while, not baked into the operating system but available via third party products. With Windows Vista you get a truly integrated experience. I decided to set myself a challenge and see if I could complete 5 tasks without using the keyboard or the mouse. The majority of this document has been generated using speech recognition. Sure it gets the odd word wrong, like ‘sister’ for Vista, but generally it makes a pretty good job of translating speech into text without touching the keyboard.

For this article I’m working with Vista Release candidate 1 (build 5600), I have not trained the speech recognition and I have never used any speech recognition products.

To get speech recognition working, use the Vista Search to find it. The dialog is fairly unobtrusive and gives you audio levels and textual feedback on your speech. You can view a list of options for speech with “Show Speech options” and running through the tutorial will make you familiar with the majority of the commands. A Speech Recognition trainer application will help the system to better recognise your voice.

If you are ever unsure about what you can say in Speech Recognition, “What can I say” is invaluable.

Challenge one - notepad

So here we go on with the challenges. First up is creating a notepad document saving it and then reopening it.

“Run Notepad”

“Hello Speech Recognition”

“File”

“Save As”

“Test.txt”

“Save”

“Close Notepad”

“Run Notepad”

“File Open”

“Test”

“Open”

“Close”

fig1: Hello Speech Recognition

So Vista passed challenge one with flying colours. The commands are fairly obvious and it got them all correct first time. It was slightly confused over the filename, but it prompted me with a dialogue box and a series of numbered options. All I had to do was say the matching number of the file to save.

fig2: Which did you want?

Challenge two - viewing pictures

What makes this challenge a lot easier is the new Vista features which gives you the ability to add tags to your images, pictures and media. Tagging pictures makes it really easy to search for them from the start menu. For this challenge I will search for pictures tagged with a particular keyword, pick one, and then navigate through the pictures.

“start”

“Oliver”

“2163”

Windows Vista will automatically search through all the metadata and filenames to find items that match the entered text. Speaking the whole filename, “image_2163”, didn’t get the match but just saying the number “2163” found the file and ran windows photo gallery. It is now quite easy to navigate through the other pictures and indeed change the orientation. In fact in my example one of the following files was actually a video which was immediately played.

fig3: searching for pictures couldn’t be easier

“next”

“previous”

“next”

“actual size”

“fit to window”

“rotate clockwise – rotate counter clockwise”

Challenge two went without any problems. In fact I even started to play around with the rotate clockwise and fit to windows commands both of which worked instantly. So on to challenge three, time to open up the throttle and see what this baby can do!

Challenge three – hello world application

As a developer, I spend my life in Visual Studio 2005. There has been a lot of noise around its compatibility with Windows Vista, but how would the speech recognition fare on the first challenge with an application not shipped with the operating system

“run visual studio”

“file”

Visual studio fired up and we’re ready to go, so it was time to create a brand new project. All of a sudden things are not looking good. The speech recognition didn’t seem to recognize the file menu. The prompt asked me about the “Bluetooth Fire Transfer Wizard” or the “Remote File Wizard” - neither of which were any good to me. Fortunately you can force a click anywhere on the screen by using the Mousegrid command.

fig4: I want to create a new windows application, not do a Bluetooth file transfer

Mousegrid draws a three by three grid on the screen and you say the number of the cell you want to zoom in on. Vista then draws a new three by three grid within that grid, allowing you to home in on the item you require.

fig5: Ultimate power with the MouseGrid command

“mousegrid 1-1-6 – click”

“down, down, enter”

“tab enter”

Visual studio is an incredibly complex product made all the harder by the fact I’ve never used speech before. So there may be some better ways to do this but I have to complete the challenge by whatever means. I am now looking at the form designer with the next goal being to create a button.

“mousegrid 1-4-6-click”

“mousegrid 1-5-5-click”

We now have a button lying in the top left hand corner of the screen, but how do we drag and drop it to get the positioning required.

“mousegrid 1-8-2 mark 4-6-6 click”

That was actually a lot easier than I was expecting. Using the Mousegrid Command, highlighting an area and then saying “mark”, highlights the item to be dragged and then using the Mousegrid again to determine the destination and then saying “click” drops the control. We’re not going to win any awards for user interface design but we now have a button on the form in the place we wanted it. If you are real perfectionist for layout and then you can use “down”, “right”, “left” and “up” commands for precise positioning.

fig6: Drag and Drop speech style

“mouseGrid 4-6-6 double click”

In Visual Basic .NET double clicking on the button creates the handler for the click event. All we need to do now to complete the challenge is to show “hello world” in a message box dialog when the user clicks on the button, surely it cannot be that hard.

The code view does not seem to recognize message box or msgbox. After many attempts I discovered the “start typing” command. This command gives you a very granular level of speech recognition, character by character. To end the typing mode you simply say “stop typing”. We can now run the application by clicking on the toolbar; this is going to be a precise click though...

fig7: Our “Hello World” code.

“mousegrid 2-1-9-1-3 click”

There we have it, up pops our application, and by saying “enter”, a message box is shown, we can even say “button one” as it matches the text on the controls. Our new and shiny speech enabled application can be shut down by saying “close”.

So in just 10 minutes we have built an application in visual studio without touching the keyboard whatsoever. As you get more and more used to speech recognition things will happen a lot quicker for you.

Challenge four – solitaire

In my challenges I am trying to cover many of the important applications in Windows Vista. Enter Solitaire. Using Vista Speech Recognition, can I play one game of solitaire to its logical conclusion? Vista ships with two versions of solitaire, the standard edition and a new spider solitaire. For this challenge we will be using the standard solitaire that we all know and love.

fig8: Which Solitaire would you like to play?

“run solitaire”

“2 , Ok”

“Seven of Diamonds – Eight of Spades”

“Six of Spades – Seven of Diamonds”

“king of Clubs – Stack one”

“Ace of Clubs, Double Click”

fig9: Show Numbers, highlights all clickable controls.

It’s amazing how easy this is. If the recognition is unsure about which card you are talking about it highlights all those that it matches. For example, all hearts if you mumble “something of hearts”. As with every game of solitaire you soon run out of face up cards and have to resort to turning over from the deck.

fig10: Which hearts did you want?

“Deal Space” will highlight the face up card, or you can use “show numbers” which highlights clickable objects on the window. There was a couple of occasions where I needed to get the next face down card, and the recognition wouldn’t let me use “deal space” or “show numbers” to access it. Once again it was “Mousegrid” to the rescue and we were back on track.

“mousegrid 1 8 click”

“King of Diamonds – Stack two”

It did take a while to get used to a couple of minor quirks, but it soon became a fun, and not frustrating exercise. It was this feature alone that had my parents asking about when Vista would be available and how much it would cost. I must admit my ability to play solitaire broke before the speech recognition, and I’m glad the challenge was to play the game to its conclusion and not complete it.

fig11: Help me anybody?

Challenge 5 - search for the NxtGenUG website and view an interview

The next big question I had was how the speech recognition would cope with web pages and more importantly, web technologies, such as DHTML menus? So this time the challenge is to navigate to the Next Generation User Group (NxtGenUG) website and then view one of the speaker interviews.

“run internet explorer”

“search 2 ok”

“NxtGenUG – enter”

The built in search within Internet Explorer 7 made it really easy to search for the NxtGenUG site. What I hadn’t done was to connect my wireless. So a quick detour for the challenge...

“Start - Connect to”

“down – enter”

The wireless dialog fired up and we were soon connected. I could now see matches for NxtGenUG, but how would you click on them?

“show numbers”

“37 – click”

fig12: How to click a hyperlink

The “show numbers” command identified and highlighted all the hyperlinks on the page. This makes navigation straight forward. In addition to straight hyperlinks, the NxtGenUG website uses a DHTML/Javascript menu. So how would the speech recognition cope with hovering over a menu option? Sometimes things are just too obvious.

fig13: Navigating around the NxtGenUG site.

“Show Numbers - 30 Hover”

“interviews”

“Mike Hall”

fig14: View the User Group interviews

The last part of that was truly impressive. The speech recognition parsed the text that makes up the hyperlink.

“Dave and Rich join up with Microsoft's Mike Hall who works on the Windows Embedded team, to find out what's happening in the embedded world.”

In this case it’s the full article abstract and not just the person’s name. For example “Voices for Innovation”, or “Innovation” would have navigated to the Julie Ray interview.

“Dave, armed with a 'turned off' phone, met up with Julie Ray - Program Manager for Voices For Innovation (VFI) to find out what VFI is all about...”

The interview can then be viewed using “scroll down” to scroll down the page. From this point you can see how easy it is to navigate around the web. I was also able to listen to a latest podcast by navigating to the podcasts page and then selecting the link to the podcast.

fig15: Listening to the NxtGenUG Podcast.

“start – Nxt – enter”

Vista Search not only trawls through and indexes images, RSS feeds, emails and media, it also pick up your Internet Explorer favourites. I could have streamlined the above challenge by firing up the start menu, searching for “Nxt” and then saying enter, which would find the favourite link to the NxtGenUG web site.

Using speech to surf the web could not have been easier.

Conclusion

Vista Speech is a very powerful feature. Sure it can be temperamental and it really needs to be utilised in the peace and tranquillity of the home office. The documentation seems to be very light, so there’s a lot of trial and error involved. I found the “Mousegrid” and “Show Numbers” to be invaluable and by utilising a combination of the two of these you can cope with the majority of situations that you come across in applications.

The above five challenges were just things I thought up. I am sure not all applications will be as easy to use or even appropriate for speech recognition. I must admit I found working with word quite difficult, as documentation is sparse and there are so many things you can be doing at any time, for example correcting spelling, moving text, changing styles. Nothing that a little bit of background reading couldn’t put right!

Right, that’s it, now for some unfinished business...

“run Solitaire”

About the Author

Richard Costall (MVP, MCSD.NET) has over 19 years development experience and works for 1st Software, a Microsoft Gold Partner, and the UK's leading software solution for Financial Adviser and Intermediaries, designing and implementing IFA applications in the financial services sector. Previously specializing in VB, XML/XSLT, COM, ASP and MSMQ, Richard now lives and breathes the awesome world of .Net and in particular ASP.NET (including 2.0) Richard spent 5 1/2 years as the Midlands regional coordinator for VBUG (Visual Basic User Group) before co-founding NxtGenUG, the innovative UK user group for Microsoft Technologies.

Richard has written articles for publications such as ASP.NET Pro and International Developer Magazines and also co-authored the Apress title Professional MSMQ. He speaks a local user groups, Microsoft Conferences/Product launches, TechED Europe and the hugely successful DeveloperDeveloperDeveloper Events.

When not in .NET land, Richard enjoys relaxing at home with his wife and two sons, playing on the XBOX 360 or ultimately jetting off to Walt Disney World, Florida, for a trip on the Tower of Terror.


© 2012 Microsoft Corporation. All rights reserved. Contact Us |Terms of Use |Trademarks |Privacy Statement
Microsoft