{"id":306383,"date":"2009-11-04T19:00:13","date_gmt":"2009-11-05T03:00:13","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=306383"},"modified":"2016-10-16T18:42:53","modified_gmt":"2016-10-17T01:42:53","slug":"making-car-infotainment-simple-natural","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/making-car-infotainment-simple-natural\/","title":{"rendered":"Making Car Infotainment Simple, Natural"},"content":{"rendered":"<p><em>By Rob Knies, Managing Editor, Microsoft Research<\/em><\/p>\n<p>You\u2019re steering with your left hand while your right is punching car-stereo buttons in eager search of that amazing new Lady Gaga song. Your mobile phone rings, and as you adjust your headset\u2014hands-free, naturally\u2014the driver in front of you slams on his brakes \u2026<\/p>\n<p>Sound familiar? For drivers, such a scenario is almost commonplace. These days, the automobile is tricked out with all sorts of conveniences, designed to make driving a comfortable, media-rich experience. But there is a cognitive price to pay in operating these devices while keeping sufficient concentration on the road.<\/p>\n<p>Does it have to be that way, though? Researchers from <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/lab\/microsoft-research-redmond\/\" target=\"_blank\">Microsoft Research Redmond<\/a> aim to find out.<\/p>\n<div id=\"attachment_306389\" style=\"width: 380px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-306389\" class=\"size-full wp-image-306389\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2009\/11\/Commute-UX-driving-simulator.jpg\" alt=\"Commute UX driving simulator\" width=\"370\" height=\"278\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2009\/11\/Commute-UX-driving-simulator.jpg 370w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2009\/11\/Commute-UX-driving-simulator-300x225.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2009\/11\/Commute-UX-driving-simulator-80x60.jpg 80w\" sizes=\"auto, (max-width: 370px) 100vw, 370px\" \/><p id=\"caption-attachment-306389\" class=\"wp-caption-text\">Ivan Tashev (left), Yun-Cheng Ju, and Mike Seltzer (at wheel) demonstrate their Commute UX driving simulator.<\/p><\/div>\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ivantash\/\" target=\"_blank\">Ivan Tashev<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/mseltzer\/\" target=\"_blank\">Mike Seltzer<\/a>, and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yuncj\/\" target=\"_blank\">Yun-Cheng Ju<\/a>, members of the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/speech-dialog-research-group\/\" target=\"_blank\">Speech Technology<\/a> group, are leading a research project called Commute UX, an interactive dialog system for in-car infotainment that makes finding a person to call or a song to play easy and efficient, using natural language input and a multimodal user interface.<\/p>\n<p>\u201cPeople are in their cars more and more,\u201d Seltzer says, \u201cand they\u2019re trying to do more and more while they\u2019re driving. We\u2019re trying to figure out how we can enable people to do at least some of the things they would like to do in a way that is safer and more natural.<\/p>\n<p>\u201cThose things are correlated. If you could just speak to the system as you would to a passenger, you wouldn\u2019t need to remember hundreds of commands and all the rules of how to use the system. You could keep your brainpower focused on the driving, keep your eyes on the road and your hands on the wheel, and, hopefully, you\u2019d be safer on the road.\u201d<\/p>\n<p>Alex Acero, research-area manager of the Speech Technology group, has a unique vantage point to assess the value of the Commute UX project.<\/p>\n<p>\u201cI\u2019ve been working on speech recognition for 25 years,\u201d he says, \u201cand have seen researchers get excited about speech in one application, only to see that the technology doesn\u2019t take off because users find other alternatives to accomplish their task. But for the car, we have not found an obvious safe alternative to speech, so I\u2019m very excited about the role of speech technology in automobiles\u2015and Commute UX, in particular.\u201d<\/p>\n<p>The hardware involved in the Commute UX project is simple and would not be unfamiliar to drivers of late-model cars already on the road: microphones, a touch screen, a cluster of buttons on the steering wheel. Simplicity is the key to minimizing driver distraction, the Commute UX researchers say, and that maxim is reflected in the project\u2019s guiding principles, which are focused on improving user satisfaction with such in-car systems:<\/p>\n<ul>\n<li><strong>Speech-enabled:<\/strong> With driving being an eyes-busy\/hands-busy activity, speech is the primary channel for interaction.<\/li>\n<li><strong>Multimodal interface:<\/strong> Speech input works best for browsing large lists, such as a music collection or a mobile-phone address book, while a touch screen or buttons are preferable for selecting from a short list. Transitioning from speech to touch must be a smooth experience, based on driving conditions and user preference.<\/li>\n<\/ul>\n<p>\u201cWhat we try to do is to integrate speech into a unified interface,\u201d Tashev explains. \u201cSpeech is very strong when you search within a list of 10,000 songs or 300 contacts in your address book, but not so efficient when you have three or four selections or options. Just glancing at the screen and touching to select or using buttons is more efficient. These two actions are very powerful in combination, and we are looking toward the least distracting user interface.\u201d<\/p>\n<ul>\n<li><strong>Situational awareness:<\/strong> When driving becomes precarious, such as passing or braking, auto passengers typically remain silent in deference to the driver. The Commute UX team hopes to mimic this passenger behavior, and factors such as speed, weather, or driving conditions can prompt the technology to switch off the user interface during challenging moments, eliminating the potential for distraction.<\/li>\n<\/ul>\n<p>\u201cWe want our computer to behave the same way as the passenger,\u201d Tashev confirms, \u201cnot to talk to us when we pass, when we change lanes. Usually, if you are braking hard, trying to keep an appropriate distance behind the car in front of you and frantically watching the car behind you in the rear-view mirror, passengers will not talk. In the near future, we expect our system to do the same.\u201d<\/p>\n<ul>\n<li><strong>Awareness of context and person:<\/strong> Cars usually have a small set of users, and Commute UX should be able to store defaults depending on the driving context and the driver\u2019s habits. Such defaults are easy to define, monitor, and store because of the small number of drivers involved.<\/li>\n<li><strong>Seamless integration of services based in the car and in the cloud:<\/strong> While standard controls are located in the auto itself, connection to Web-based services enables Commute UX to learn and to provide additional information and functionality.<\/li>\n<\/ul>\n<p>\u201cIn most tasks,\u201d Tashev says, \u201cthe onboard system should do the speech recognition and processing, but in many cases, we have an integrated system that can go to the cloud and ask for data such as traffic, weather, and gas prices. The speech recognition happens there. The driver doesn\u2019t care. We want to have a single face for the system: You ask for something, you get it.\u201d<\/p>\n<p>Infotainment systems for automobiles already are being offered by automakers such as Ford and Fiat. What Commute UX adds to the mix is simplicity.<\/p>\n<p>\u201cWe really tried to simplify the experience for the user,\u201d Seltzer says. \u201cOne of the biggest problems of the current systems is they\u2019re built on a model that says the user will always know exactly what to ask the system and will always ask it in the correct way. We\u2019ve found, by doing some empirical user studies and surveys, that this is not true at all.<\/p>\n<p>\u201cPeople often don\u2019t remember what to say. Even if they think they know, they\u2019re not saying the correct thing. A classic example we\u2019ve found is tracks on a music player. Often, what you think the song name is might be a fragment of the full name. This occurs in pop music all the time.\u201d<\/p>\n<p>As Ju explains, the project addresses the issue of users being unaware that they\u2019re making input errors.<\/p>\n<p>\u201cOne important aspect is called perceived accuracy,\u201d he says. \u201cWe know that users don\u2019t read manuals. We know that speech recognition is not perfect. But a lot of mistakes are caused by users. Developers usually don\u2019t consider user mistakes when they design and evaluate most systems. That creates a potential gap between the claimed accuracy and the accuracy perceived by users, because they don\u2019t know they are making mistakes and blame the computer.\u201d<\/p>\n<p>\u201cWe try to achieve a good balance so that we don\u2019t penalize the good user, but we can accommodate those users who occasionally make mistakes.\u201d<\/p>\n<div id=\"attachment_306392\" style=\"width: 380px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-306392\" class=\"size-full wp-image-306392\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2009\/11\/Commute-UX-user-interface.jpg\" alt=\"The Commute UX user interface\" width=\"370\" height=\"231\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2009\/11\/Commute-UX-user-interface.jpg 370w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2009\/11\/Commute-UX-user-interface-300x187.jpg 300w\" sizes=\"auto, (max-width: 370px) 100vw, 370px\" \/><p id=\"caption-attachment-306392\" class=\"wp-caption-text\">The Commute UX user interface.<\/p><\/div>\n<p>Commute UX solves this problem by severely constraining the number of voice commands necessary to obtain results.<\/p>\n<p>\u201cWhat we\u2019ve done,\u201d Seltzer, \u201cis taken an approach by which the user just needs to say things, very intuitively: \u2018Play,\u201d if they want to listen to music, or \u2018Call\u2019 or \u2018Reply.\u2019 There\u2019s one trigger word into what you want to do, and everything else is very natural.<\/p>\n<p>\u201cI can just say, \u2019Play the Chili Peppers.\u2019 I don\u2019t have to say, \u2018Play artist Red Hot Chili Peppers. I don\u2019t have to remember that it\u2019s an artist. I don\u2019t have to remember that the full name is Red Hot Chili Peppers. It\u2019s a problem to remember 500 commands. It\u2019s probably not a problem to remember three: play, call, reply.\u201d<\/p>\n<p>Cars, of course, offer a noise-laden auditory environment. Commute UX addresses this reality, as well, using state-of-the-art speech enhancement by capturing sound with an array of microphones.<\/p>\n<p>\u201cThe car is a very noisy place,\u201d Tashev says. \u201cMicrophones and speech-enhancement techniques need to precede the speech recognizer, and the robustness of the speech recognizer needs to adapt to that surrounding noise. The requirements are way harder than a quiet office environment where you use a speech recognizer for dictation.\u201d<\/p>\n<p>The issue was challenging, but for speech researchers, that provides an opportunity.<\/p>\n<p>\u201cWe tackled this problem in stages,\u201d Tashev says. \u201cThe first is the capturing part: Where are the microphones, and how many are there? We started to do studies on the best position for the microphone: on the dashboard, in the rear-view mirror, right in from of the driver\u2019s eyes? We designed a set of recommendations, which we shared with our partners, Microsoft\u2019s <a href=\"https:\/\/www.microsoft.com\/windowsembedded\/en-us\/windows-embedded-automotive-7.aspx\" target=\"_blank\">Automotive Business Unit<\/a>.<\/p>\n<p>The second stage, then, was speech enhancement.<\/p>\n<p>\u201cWe want to make it both human- and speech-recognition-friendly,\u201d Tashev adds. \u201cWhen we say that the human, not the speech recognizer, is the major source of mistakes, this is because of the proper design of the sound-capturing and speech-enhancement system.\u201d<\/p>\n<p>Another unique attribute of the Commute UX system is its ability to provide personalization.<\/p>\n<p>\u201cIt\u2019s very easy to upload the personal profile of the current driver,\u201d Tashev says, \u201cto know which telephone to pair. It includes a profile for the speech recognizer, adapted to the way you speak. In your set of messages, you\u2019ll apply your own specific style, your own points of interest for the navigation system, the most frequently played songs and playlists. This whole system uses extremely powerful priority information, which enables us to improve the way the speech recognizer and the information-retrieval system work.<\/p>\n<h2>Easy, Comfortable<\/h2>\n<p>\u201cIt increases usability and user comfort. Each time you open the car, it\u2019s all yours. When your wife opens the car, it\u2019s hers. We believe that personalization will play an important role in future systems.\u201d<\/p>\n<p>The Commute UX project got started during a group offsite in 2006, which prompted the team to schedule brainstorming sessions to determine which directions to pursue next. A direct result was for an automated telephone information system for drivers, including traffic, weather, gas prices, gas-station locations, and stock prices. The system was designed to operate atop <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.bing.com\/partners\/developers#BingSpeechApis\" target=\"_blank\">Microsoft Speech Server<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n<p>\u201cWe deployed it in January 2007,\u201d Tashev says, \u201cand gained some experience. Our specific part of this story is location. The car is a moving object, so we deal with streets, points of interest\u2014all this geographical information.\u201d<\/p>\n<p>The use of global positioning systems had not yet become prevalent, and the need to have the telephone system deliver pertinent location information got the researchers thinking about an onboard system. They learned the importance of location and time understanding\u2014and the importance of delivering that information effectively. Call-data analysis and user studies improved the task-completion rate, and they became convinced that using prior information, such as user-provided names for locations visited most often, reduced the number of dialog terms, making it easier for users to do what they wanted to do.<\/p>\n<p>The timing was fortuitous. In November 2007, Ford launched availability of its Sync system, built on top of the Microsoft Auto platform. Sync offered voice-enabled selection of songs and phone-call recipients. Suddenly, on-board infotainment systems were maturing from cool gadgets to integral automotive components.<\/p>\n<p>In the second phase of the Commute UX project, researchers built a prototype, featuring speech commands and queries, and avoiding a complex menu structure.<\/p>\n<p>\u201cUnlike many systems, there\u2019s no menu structure involved,\u201d Seltzer says. \u201cLet\u2019s say you say, \u2018Play track <em>Yellow Submarine<\/em>.\u2019 The system takes you implicitly into the music menu, and it assumes that everything you\u2019re going to say is about music. If you want to make a phone call, typically, you have to back out to the main menu and then make a phone call. In our case, we have what we call \u2018say anything at any time.\u2019 I can play a track, and the next command, I can make a phone call. As soon as the phone call is done, I can go back to playing more music.\u201d<\/p>\n<p>As Ju notes, Commute UX also can reply by voice to SMS messages.<\/p>\n<h2>Text Messaging, Too<\/h2>\n<p>\u201cWe know that drivers have the urge to write text messages even when they are trying to drive,\u201d he says. \u201cBut we obviously don\u2019t want them to do that. We want to use speech to help them do that.<\/p>\n<p>\u201cPreviously, we were thinking we could have a dictation-based system: You say something, we show you the results, you look at which words are recognized wrong and say them again or correct the list. It\u2019s a very straightforward approach. But we realized it was very demanding, too dangerous to even think about. Fortunately, we found a voice-search approach to provide the same service, but with much less cognitive and physical distraction.\u201d<\/p>\n<p>Others at Microsoft Research Redmond have played invaluable roles in creating the technology. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/shamsi\/\" target=\"_blank\">Shamsi Iqbal<\/a> is doing user studies to test levels of distraction and determine opportune times to interrupt drivers with messages and phone calls. Piali Choudhury performed significant development work. Ye-Yi Wang has helped devise techniques for flexible speech recognition. And Acero served as devil\u2019s advocate to keep the project focused.<\/p>\n<p>Tashev, Seltzer, and Ju have more to do. They are still trying to refine the speech-based interface, to make the system easier and less distracting. They are investigating the capabilities a GPS could deliver. In-car cameras might be able to detect and assist a sleepy driver. Many of the technological solutions the team is pursuing could be broadly applicable in the larger context of mobile devices.<\/p>\n<p>Like most of us, the researchers also drive, so the work they\u2019re doing has a chance to improve their lives both personally and professionally.<\/p>\n<p>\u201cWe\u2019ve gotten a lot of good support for doing this kind of work,\u201d Seltzer says. \u201cThat\u2019s been really exciting. There\u2019s a good chance that this is not only interesting work, but it\u2019s going to be out there on the road, and it will, personally, make my life better.\u201d<\/p>\n<p>Ju identifies an additional benefit.<\/p>\n<p>\u201cI\u2019ve been working on speech recognition for 15 years,\u201d he smiles, \u201cand my wife told me that this is the most exciting project and that, when the product becomes available, she wants to buy it right away.\u201d<\/p>\n<p>Tashev elaborates.<\/p>\n<p>\u201cWe were able to put a lot of pieces of technology together, and, wow, it works! It works quite well. That was the most exciting part for me, building the end-to-end system, a system that integrates a lot of interesting technologies but is interesting in itself.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"<p>By Rob Knies, Managing Editor, Microsoft Research You\u2019re steering with your left hand while your right is punching car-stereo buttons in eager search of that amazing new Lady Gaga song. Your mobile phone rings, and as you adjust your headset\u2014hands-free, naturally\u2014the driver in front of you slams on his brakes \u2026 Sound familiar? For drivers, [&hellip;]<\/p>\n","protected":false},"author":39507,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[],"msr_hide_image_in_river":0,"footnotes":""},"categories":[194456,194462],"tags":[214745,214733,214739,214766,214742,214679,214772,214754,214751,214748,214760,214736,186598,214757,197281,214769,214763],"research-area":[13545],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-306383","post","type-post","status-publish","format-standard","hentry","category-natural-language-processing","category-speech-and-dialog","tag-automobile","tag-automotive-business-unit","tag-bing-speech-apis","tag-buttons","tag-car-stereo","tag-commute-ux","tag-driver-distraction","tag-in-car-infotainment","tag-interactive-dialog-system","tag-media-rich-experience","tag-microphones","tag-microsoft-speech-server","tag-multimodal-user-interface","tag-natural-language-input","tag-speech-recognition","tag-steering-wheel","tag-touch-screen","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199565],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[],"related-researchers":[],"msr_type":"Post","byline":"","formattedDate":"November 4, 2009","formattedExcerpt":"By Rob Knies, Managing Editor, Microsoft Research You\u2019re steering with your left hand while your right is punching car-stereo buttons in eager search of that amazing new Lady Gaga song. Your mobile phone rings, and as you adjust your headset\u2014hands-free, naturally\u2014the driver in front of&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/306383","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/39507"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=306383"}],"version-history":[{"count":4,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/306383\/revisions"}],"predecessor-version":[{"id":306410,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/306383\/revisions\/306410"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=306383"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=306383"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=306383"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=306383"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=306383"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=306383"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=306383"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=306383"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=306383"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=306383"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=306383"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}