Click Here to Install Silverlight*
United StatesChange|All Microsoft Sites
Microsoft
PressPass - Information for Journalists 

Remarks by Bill Gates
International Joint Conference on Artificial Intelligence
Seattle, Wash., August 7, 2001

(Applause.)

BILL GATES: Good morning and welcome to Seattle. It’s great to have you here, and I appreciate this opportunity to share with you some thinking about artificial intelligence and some of the ways that Microsoft is hoping to push the field ahead, and how we’ll use those advances to make software far more approachable.

Microsoft was founded about 25 years ago, and I can remember at the time thinking, "Well, if I go out and do this really commercial stuff, I’m going to miss these big advances in AI that will be coming very soon." (Laughter.) And so I come from the school of AI optimist. You know, I can remember being at Harvard and back then AI was the Greenblatt Chess Program and Maxima and Eliza and people literally felt that within five to ten years that some of these tough problems would be solved.

Part of the reason I think that AI is the most interesting field to be working in is that they were not solved. They’re very tough problems, and so 25 years later the dreams are very much the same.

I do think the last four or five years have been extremely valuable in laying the foundation for where these solutions will come from, things like the advances in Bayesian modeling, combining those with other systems; some of these approaches I think hold promise to even get us to those very lofty goals, which founded this entire field.

Now, for Microsoft it’s very important that these advances be made. For us, things like vision and speech understanding, handwriting recognition, they’re very key for the dream that the company pursues, and that is making the personal computer connected to the network an even more valuable tool for an incredible range of scenarios.

Today, we can celebrate the fact that the PC is out in the hands of over 500 million users. In fact, this week I’ll be attending a 20th anniversary event for the PC, and some of the pioneers will be getting together down in San Jose and talking about what has happened during the last 20 years, is that different than what we expected and how far have we come.

My view is that it’s definitely a glass half full. It’s fantastic that we’ve changed word processing. It’s fantastic that when I get up before you in a presentation like this I don’t have any slides. I don’t have to think about whether they’re upside down and backwards or whether they’re going to fog up. Some of you here are young enough that you can’t relate to why that was such a problem in its day -- (laughter) -- but believe me it was. And so some very neat things have happened.

The idea that computing is for the masses; that is, as an individual tool is very different than 25 years ago when it was more about big computers that companies would use to bill people and track information in ways that didn’t seem very attractive.

But we need to go a lot further. Certainly, today the amount of confusion and frustration that exists as people are trying to communicate, use their PC as a communication tool, try and use it as a creativity tool, there are serious problems there.

And so for us there’s a need for a much more natural interface. There’s a need to model what’s going on with the user. We have done some things where we’ve taken Bayesian models and use those in what we call troubleshooters, so if you can’t print or your system’s not doing anything at all, we go through and ask you a series of questions. And those have helped a lot, but we can see building a much richer model as being key to having the PC fulfill its promise.

So the vision of Microsoft is pretty simple. It changed a couple years ago. For the first 25 years of the company, it was a personal computer on every desk and in every home. And it was a very good vision; very rare for a company to be able to stick with something like that for 25 years. The reason we changed it was simply that it became acceptable. It wasn’t wild. It wasn’t this big claim, where people would say, "Are you kidding?" You know, they would kind of say, "Of course. What’s next? Is it some other company that’s going to drive the next revolution here?"

And so as we stepped back and looked at what we were trying to do with the programming model, turning the Internet into the fabric for distributed computing, getting your information to replicate in a very invisible way so that it was available to you everywhere, thinking of this programming model spanning all the different devices, we changed to the mission statement we have now, which is empowering people through great software anytime, any place and on any device.

And the scenarios that we have in mind here, if you look at them, many of which we won’t achieve probably for the rest of the decade, many of them assume that underneath there is a level of reasoning and modeling and that basically requires the work that all of you do here, advance pretty dramatically to make that possible.

Now, the software industry has gone through many different phases. You could say a very important phase started back in 1981 with MS DOS, the IBM PC and the idea of building the software industry to critical mass around that. Before that, the software industry had been very small, mostly low volume packages, very high priced on mainframes. So that DOS era, which was essentially ’81 to about ’88, was a very important time.

Graphics interface came in with the shipment of the Macintosh, with the arrival of Windows on the PC itself and it was about another seven-year period there where that was helping flourish getting the idea of multiple applications, clipboards, standard interface, the graphics infrastructure, scalable fonts, things like that to just be commonplace and a wave of applications that came with that.

About six years ago, the Internet phase started, where it was really about browsing in HTML. The idea that you could connect up to any server was a very powerful thing and really caused people to dream about communications and commerce being done in a different way.

We say that we’re at the start of a new phase that builds on all the previous ones, just like has happened before, but it’s different in that there is a key set of standards and a key set of tools, a key set of applications that define this era. The standards around XML and the distributed computing protocol SOAP, and the related standards that come out of that, the idea that you can have the intelligence both be centralized and on the local device, that you can represent heterogeneous information, there’s a set of development tools around that. The key scenarios are e-commerce and very rich reformed communication and those are the things that will drive this to be even more popular than the PC itself has been up till now.

The scenarios that are very well established are really productivity and e-mail. Even in productivity we see new horizons. For example, if you think of the spreadsheet today, the way that information is represented there is extremely low-level. Yes, we understand the equation but we don’t understand the interrelationships of the various things. If things are variable size, they don’t get represented very well. If you imagine doing the forecast where you have to go out to different Web sites, it’s a very manual process.

And so we drive our new work in terms of scenarios. I won’t go into all the different ones here, but I just want to touch on a couple to give you a sense of how we start with the idea of this being a tool and then we look at what kind of technology will allow that to be possible.

Digital reading is a scenario we’ve believed in for a long time. Again, this is something where it seems like common sense, the fact that you could read off of a screen, you could annotate it to share it with other people, sort of the original hypertext vision, you could go back and search things that you’ve seen before, you can get it totally up to date, you can traverse a link to get in-depth information. Digital reading should be superior to reading off of paper. Well, of course, there are some roadblocks in terms of the size of the device and can you hold it in your hand and the resolution and just simply comfort with this approach that have not yet been overcome, and we see over the next two or three years that’s definitely going to happen, whether it’s the Tablet PC form factor, the wireless network, and so we see that there’s a lot that we can do to make digital reading a much richer experience than paper-based reading has been.

In the area of meetings, we think that facilitation of meetings both before the meeting takes place, during the meeting and after the meeting, there is dramatic opportunity. Some of this involves having a camera, which makes digital recordings of things that are going on, understands who’s speaking, ideally building a transcript through speech recognition and letting anybody who wasn’t present do the kind of searches they might be interested in, whether it’s just looking at the transcript or seeing sort of a sped-up video that fills them in on the parts that might be relevant to them.

In the living room, which is a tough frontier, we’re taking a product we call the Xbox, which brings a very reasonable level of graphics capability and hard-disk storage for the first time into that environment, so that the kind of online games we can have through the rich hardware and the Ethernet connection that’s built there, ought to push things up to a whole new level.

Now, behind those games, we need very rich models. We need things that people find entertaining and rich, so that they’ll go back week after week, month after month and make that something very real.

So every one of the scenarios here, whether it’s simply helping the user out or making it as rich as they would like it to be, involve some technology that would be thought of as artificial intelligence: looking at the text of mail messages to help somebody categorize them so that everybody doesn’t end up being a mail clerk setting up their own e-mail folders. Things like that are we think within reach and drawing on the rich work that people here are doing.

Now, one thing that is very exciting is we don’t see any limitation on the hardware side. That’s not to say that we’ve got infinitely free computing, but as we think about the processor speeds, the disk storage sizes, the kind of peripherals we’ll have in terms of still image and video capture, the kind of microphones that will standard built into the PC, the array microphones, we see the hardware as being there for these advanced scenarios. High-resolution LCDs, the 200 DPI LCD, which, when combined with our ClearType approach, gives you incredible readability; that is going to be a relatively cheap device within the next few years. The miniaturization to actually have the tablet PC be under two pounds and something that you would take to a meeting and simply do your notes on, that’s absolutely within reach.

The only piece of the picture where the hardware and communications people are letting us down a little bit is broadband connectivity out to homes. To businesses it will be there, but to homes even five years from now, even in the United States we’re looking at something like a third of homes having the connection and two-thirds not having that connection. And so there is some compromise there in terms of what you can do when it’s not always on, when you have those bandwidth limitations.

But with that one footnote, a significant footnote, the same assumption we made when Microsoft was founded that we could focus on software and that there would be lots of new horizons that Moore’s Law-type advances in hardware would be opening up, that’s stayed true to form and certainly for the next decade that’s an important thing that we’ll be able to take advantage of.

Now, at the same time as I was an AI optimist and thinking about starting Microsoft, I certainly didn’t have the concept that I’d ever be in charge of a very large company. I had this basic view that maybe a hundred developers could write about all the software the world would ever need. So my naiveté extended well beyond my predictions for AI.

And so now I find myself with a very large software enterprise doing products that literally have millions of lines of code, and in their complexity are more similar to a moon shot or a 747 than the original basic interpreter that I and my friend Paul Allen personally wrote, which kicked off the company’s development.

Fortunately, the market for software allows us to pursue these projects despite the large commercial scale and the cost involved in that. If you take our R&D budget, which is heavily development oriented percentage-wise, it’s over $5 billion for this year. And when you have an image of that in your head, you don’t have to think about big fabs or any capital equipment. That money is programmers sitting in their offices writing code. It’s 100 percent personnel related. There’s nothing in a software factory except people actually doing that work.

One aspect of our R&D is the Microsoft Research, and that’s been a wonderful experience. You probably know we’re located in Beijing, Cambridge and here at our headquarters. We’ve been able to bring in an incredible range of great people and had those people reach out particularly to universities and other corporate labs and collaborate on some very advanced things.

And if people want to know what we believe in and where we’re going, they can just look at our Microsoft Research Web Site and they’ll see the kinds of frontiers that we think software is pursuing.

In the vast majority of that Microsoft Research work, areas that fit within AI are central to what we’re doing, whether it’s decision-making learning, language, speech recognition; these are the classic goals of artificial intelligence. We are putting our money where our beliefs are that these things will become real and allow us to build far, far better software products than we have today; and not far better for small audiences. We’re talking about software products that many hundreds of millions, if not billions of people will be using and taking advantage of every day, things as simple as -- and we’ll get into a little demo of this -- if I want to communicate with someone, how do I make sure my time is being used in the best way. It’s a huge problem today and it’s simple to state but very hard to solve.

Well, let me show you a couple of the new things going on. I’ve got actually three demos. I’m going to start with a piece of work called Multi-Modal Interfaces, and I’d like to ask Derek Jacoby from Microsoft Research to come up and show you the work he and his group are doing on the new kinds of interfaces.

DEREK JACOBY: Great. Thank you, Bill.

Well, what I’m going to show you this morning is a project called MiPad which stands for Multi-Modal Interactive Notepad. What this is, is a research project looking at the use of speech recognition on wireless mobile devices, in this case a Compaq iPaq. Now, on these devices, speech recognition is really very beneficial, because your only other input mechanism is the software keyboard or handwriting recognition. It’s very difficult to input large amounts of data onto these devices. But at the same time, the device is not really powerful enough to do full continuous speech recognition on the device itself. And so what we’re doing is distributed speech recognition. We’re using the wireless network to send the audio back to a PC to do the speech recognition and it sends the results back to the device.

Let me show you an example.

"Send mail to Steve. This is a demonstration of Microsoft speech recognition technology, the result of research funding!" (Laughter.) Well, a little bit in error, but it got pretty close there.

So one of the things is for pure text input. The other thing that’s neat about MiPad is that it’s allowing people to interact with their devices very, very naturally; I just talk to the device. For instance, let me try scheduling a meeting. "Schedule a 30-minute meeting with Bill Gates to talk about review time." Very good. It went ahead and got that correct anyway.

So what we’re seeing here is it’s taking a natural language utterance and parsing it into the appropriate fields on a meeting notification, put Bill Gates as the attendee, the subject as review time, and so it’s taking a very natural sentence and filling out a form with it.

One of the UI techniques we’re experimenting with this is called "tap and talk." So, for instance, if I wanted an hour-long meeting rather than a 30-minute meeting, I’d tap on the duration. "One hour." And it will go ahead and apply field specific grammar that allows it to be much more accurate with that field already known than it is in terms of a general utterance.

Well, let me try one more text dictation. "If I get good recognition results, maybe it’s promotion time?" (Laughter.) I guess not this year. (Laughter.)

So that’s MiPad. We’re in the midst of productizing that with .NET, coming soon to a data center near you.

Thank you, Bill.

(Cheers, applause.)

BILL GATES: Part of the work there involves having a very rich model of the context, and of course that’s one of the surprising realizations is how important context is in things like these recognition systems. It’s not simply a matter of working at the speech level, but actually understanding what kinds of utterances might make sense in the different contexts.

Let me move to a different area, which is data mining. Data mining is very important to us in terms of delivering business tools to our customers. If they’re looking at sales patterns, if they’re looking at customer loyalty, they’re looking at their Web sites and how navigation is done there, it’s a huge amount of information and yet the value of insights into that information can be extremely large, and sometimes the relationships will often be surprising. So the basic goal is to both spot trends and be able to use those in a predictive way.

Likewise, the amount of information that we’re talking about is very dramatic. You know, just take documents alone; is it natural to think of your documents, the natural store of those documents being digital? Well, no, today still people do those things paper based.

As you get more and more information, like these click streams, will it be possible to actually navigate through those and mine all of the possibilities out of those things, or will it just be so prohibitive that retrieving those things, searching those things, you’d never want to do it?

So those are some tough questions and an important area for us, and let me ask David Heckerman and his group to come up and show you some of the progress we’re making there.

DAVID HECKERMAN: Thanks, Bill.

I’d like to show two data mining tools that Microsoft Research has developed in conjunction with our product group. The first one you can find in SQL Server 2000 and Commerce Server 2000 and it’s based on a statistical model called the dependency network, which is a close cousin of the Bayesian network.

So let’s use the dependency network to visualize and analyze some data that we’ve obtained from Nielsen. What you’ll see in a moment is a structural summary of TV watching behavior across 5,000 users from the month of February, 1995.

The nodes here, each node corresponds to whether or not a user watched a particular TV show, and there are a few nodes corresponding to demographics like age and occupation. And ours roughly speaking correspond to statistical correlations between these nodes.

So we put a few things in this tool that make it easier to extract interesting patterns, interesting insights from this data. Over here we have a slider, which hides the weaker dependencies, so let me bring it down all the way so we’re just seeing the strongest dependencies now. And, for example, we see a strong link between whether or not you watch Friends and whether or not you watch Seinfeld. We see another strong link between whether you watch Murder She Wrote, 60 Minutes; another one between Wheel of Fortune and Jeopardy. These make sense.

Another thing you notice when I first launch the program, the nodes were flying around. That was an automatic layout program that was bringing nodes together where there are a lot of links between those nodes, and so you find clusters of nodes and those clusters often reveal interesting patterns.

For example, down here we have Frasier, Seinfeld, Mad About You and Friends forming a cluster. That’s Must See TV. (Laughter.) We’ve got up here Fresh Prince of Bel Air, Ricki Lake, Married with Children, Coach, Roseanne and so forth. I’ll leave it to you to name that cluster. (Laughter.)

And finally, if you want to see the details of the statistical relationships, you can double-click on a node. So let me pick Oprah Winfrey. And what we see here is a set of rules for how to predict whether or not someone’s going to watch Oprah Winfrey. These rules lay out in a tree. Each path in the tree corresponds to a set of users, a set of watchers, and at the end of the tree we see the probability that those users or those viewers will watch Oprah encoded as a bar chart.

So, for example, if we trace this path, we see that people who watch General Hospital and people whose occupation is sales and clerical, well there’s about an 80 percent chance they’ll watch Oprah Winfrey.

So that’s dependency networks in SQL Server and Commerce Server.

And now for something newer and for the future, this is a tool called Web Canvas. This is a useful data mining tool for analyzing Web use and in particular for looking at traffic, analyzing traffic on Web sites. So what you’re seeing here is essentially all the traffic that occurred on MSNBC.com on one particular day about two years ago.

Let me turn the color code on here. Hits are coded by color. Hits for the front page are red. Hits to news pages are yellow. Hits to technology pages are green. And each row that you see here corresponds to what a single user did on this site and the order in which they went through the site on this particular day.

So, for example, this row that I’m highlighting here corresponds to a user that first came onto the site via the front page -- that’s red -- and went to a news page, then the front page, then a news page, then the front page and then left the site for that day.

So what Web Canvas does is it takes the paths of all users that hit the site on this day, and there’s about two million of them in this case, and it clusters users together, putting users with similar paths into the same cluster. Then it shows each cluster in a window, and in particular in side the window you see examples of user activity, examples of users in that particular cluster.

So, in order to make the analysis easier, Web Canvas lays out the clusters by size, left to right and top to bottom, so the largest cluster, the cluster with the most users is shown up here in the upper left-hand corner. If you hover, you see how many users there are. In this case, there are about 50,000 users that came through the on-air page once, and that was the first page they hit on MSNBC, and then left the site for good. The next largest cluster corresponds to people who come to the front page once and then leave. The next largest cluster corresponds to people who are looking for weather information and so on.

Now, as Bill mentioned, data mining is about finding useful surprises, and that was certainly the case here. At the time this analysis was done about two years ago, MSNBC had just put a good deal of money into making their front page look very good and useful for navigation. Well, if you look here, many or most of the large clusters are not consistent with that behavior, and the folks at MSNBC were so surprised by this that they actually made me go back and prove to them that I had parsed the logic correctly. (Laughter.) And as you might guess, since I’m telling you the story, it turns out the parse was correct, the analysis was correct, and they made good use of this information.

So you’ve just seen two examples of where we can cope with large amounts of data. And another situation that you’re all familiar with, where we have information overload, is searching the Web. So now I’d like to introduce Sue Dumais who will get up and show a system that she’s developed that can cope with these large amounts of information in that situation.

SUE DUMAIS: The basic idea behind the system we call SWISH is to take a long list of search results that you get today with popular Web search engines and transform them into a nicely structured knowledge representation.

There are two key components to doing this. The first is back end type classification algorithms and the second is the user interface. So let me start with the classification algorithms.

To build SWISH, what we do is start with an existing knowledge structure. In this case, we started with MSN’s Web directory. You could pick any directory, domain specific or general, that’s relevant for your application. We then use machine learning techniques to build statistical models of each of these clusters or each of these categories. And finally, at run time we take search results coming from one or multiple search engines and classify them using these learned models.

So to recap, what SWISH does is it’s seeded with humanly generated categories and we use machine learning techniques to greatly extend the reach of this structured information.

Let me show you an example of how this works. So if we ask a question about artificial intelligence, what you get with a typical search engine is a long list of search results like this. Many of them are very relevant, but you have several different meanings intermingled.

What happens with SWISH is we automatically tag each one of those and you see a nicely structured list of search results, perhaps. Uh-oh. (Laughter, applause.) We’ll give it another shot.

So what we did is take the list of search results that we got back and pull out the different meanings. What you can see immediately in this result list -- or sort of immediately -- is that there are several different meanings of artificial intelligence. Many have to do with computers, which is why you’re here. Many have to do with the movie. You also see some applications to health and fitness.

Users can interactively refine these categories, so if you have to see more details about computers and the Internet, what you’ll see is there are again several different meanings, some having to do with general computer science. I suspect many of you will see your sites there. There are also sites about tech transfer, some about software and downloads.

Now, we think that it’s important to empirically evaluate these artificial intelligence techniques, and we’ve done so in a series of user studies, using the SWISH interface. This just shows you an example of results from these kinds of studies. One of the things we find over and over again is that people searching through results like this are much faster using the SWISH groupings, about 40 percent faster than they are with a standard list result. They also much prefer this interface.

So I think SWISH is a nice example of a system that combines innovations in learning and in user interface design to allow users to quickly find the information they want.

DAVID HECKERMAN: Thanks, Sue.

Sometimes when you’re searching the Web, you’re looking for a particular document or a set of documents. Other times you’re looking to answer a specific question. I’d like to introduce Eric Brill, who will show you a system that he’s recently developed for doing just that. Eric.

ERIC BRILL: Thank you, David.

So I’m going to show you a quick demo of a question and answering system we’ve been developing at Microsoft Research. The basic idea behind this system is that given a question, we first go off on the Web and try to retrieve documents. In reality, most of the documents that we receive tend to be irrelevant, with just a small number of them actually then being relevant. Then using a combination of statistical analysis and natural language processing, we take that large document set and from it we distill with hopefully the correct answer to the question.

Now, rather than getting bogged down from the vast amount of information available on the Web, our system actually benefits from it as it makes use of information redundancy, and as such it should continue to improve as the Web grows.

All right, so let me try a few questions to give you a feel for the system’s capabilities. Okay, so the first one actually came up a few weeks ago when my wife and I were fighting about what movie to rent. What is the name of the movie starring Tom Hanks where he was stranded on a deserted island? So again we’re sending off that question. We’re retrieving a bunch of documents and we’re sifting through those documents trying to find an answer. (Laughter.)

COMPUTER VOICE: That’s easy: Castaway.

ERIC BRILL: Okay. Next, since this is an artificial intelligence conference, let’s make sure it knows the answer to this question: What does AI stand for? (Laughter.)

COMPUTER VOICE: I know that: Artificial Intelligence.

ERIC BRILL: Okay, and presumably most people here have seen the movie AI, so let’s ask it a question about that: Who directed the movie AI? (Laughter.)

COMPUTER VOICE: That’s easy: Steven Spielberg.

ERIC BRILL: Okay, now just to show that the system is by now means perfect, let’s say I’m off to a meeting with Bill Gates and I want to make small talk about his family, but I can never remember his wife’s name, so let me ask: Who is Bill Gates married to? Let’s see what we get. (Laughter.)

COMPUTER VOICE: I’m not sure. The answer is either Melinda French or Microsoft. (Laughter, applause.)

DAVID HECKERMAN: Okay, Eric, I think we’d better leave it there. (Laughter.) These are two great examples of machine learning applied to search and information retrieval. And, Bill, back to you.

(Applause.)

BILL GATES: I’ve never seen those last two before. Those look great. (Laughter.)

One last area that we wanted to show you some of the work we’re doing is reasoning and adaptation. This to us is a very big deal because people talk about information overload. They talk about junk mail. They talk about having to go out and take a lot of steps to find the things they care about. And particularly as you’re going to have these wireless networks and always have either a pocket-sized device or a tablet device or perhaps a wrist-sized, watch-sized device, where you’ll always be in touch, the question of what’s interesting to you, what’s important to you, what’s your context that determines when those things are valuable, that is a critical application that if we don’t have it, these systems will bring more wasted time than they save people time.

And so let me ask Eric Horvitz to come up with his team and show us what kind of work we’re doing there.

ERIC HORVITZ: Hi, Bill. Hi. Thanks.

I'd like to briefly tell you about some work we've been doing on sensing and reasoning about a user's context and leveraging inferences about context to give users greater control of information and communications, so that users can get the right information at the right time, on the right device.

So first, here's a display generated by a system named Priorities. Priorities learns how to assign measures of urgency to incoming e-mail message. It actually computes an expected cost of delayed review for each message.

Now, you see here at the top a message from Eric Xing, an intern from UC Berkeley working in our group this summer, that was assigned an urgency value of 95, and you'll see other urgency scores going all the way down here to a message that's trying to look important, but it really is something quite unimportant. You can just compare the content of that message to the kind of message that Eric Xing was sending me. As you can see here, he mentions that he was trying to find me and I was not in my office. I guess these days I'm getting harder to find in my office and I had some trouble that day coordinating with Eric.

So that shows you some of the functionality of the Priorities system. What's going on behind the scenes? To give you a little bit of background, we're actually inferring urgency from multiple facets of messages, looking at the structure of the header and body, patterns of text, using a linguistic analysis. We're also considering the communication history and sender/recipient relationships, accessing an online org chart to see what the relationship is between the sender and the recipient of the e-mail.

Priorities actually uses the inferred urgency, in conjunction with reasoning about my context, determined by listening to the ambient acoustics, looking at my calendar, and watching my activity, to figure out when to alert me on my desktop with important e-mail coming in, and when to send information over to a mobile device so I receive the most important information, given my setting, for example, considering whether I'm in a meeting.

Now, the Priorities system is just one piece of a larger project named the Notification Platform. There's a lot more in life than e-mail and appointment information. We also consider instant messaging and telephone calls, news and financial information, even the services provided by a set of agents that might come forward and help a user at the possible cost of disruption, and handling the error messages we might see once in awhile on our computers.

The idea is to take these diverse messages and communications that are coming into a user's world and to port them through a general decision-theoretic notification manager that's checking the cost and benefits of that information, and to consider if, when, and how to route the information to different devices. We reason from multiple sources of information about users' context, allowing us to compute probability distributions over a user's location and attention. And we harness that information to infer the best time and way to interrupt users with the right information.

So let me show you what's going on behind the scenes. We've got a prototype running here, and that system has been looking at my computer desktop. It also examines my calendar. This computer is set up as my office machine right now, so it infers I'm at a meeting in my office from the calendar. It knows the meeting ends in less than 30 minutes. It sees that my computer activity is at the top level. It hears my voice right now. And using a Bayesian vision system, that's been up and running and looking for faces, it sees me in front of the computer and right now it notes that I am gazing at the display.

Given the information from those separate sources of evidence, the system is computing my attentional state, for example, whether it's high-focus solo or low-focus solo activity or conversation in the office right now, as well as my location, inferring that I am at a desktop system.

The idea is to use that contextual information to continue to reason about information that I might want to see. We have a "universal inbox" here that accepts information from multiple sources that I have subscribed to, all speaking the same metadata language to the Notification Manager. In this case, the system decides that it should relay an alert about information from John Platt about a meeting on the Attentional UI project later today.

As you can see in this list here in the universal inbox, we have some outputs from agents, some financial information, information from e-mail and news sources--including a story here on a variant of the Code Red virus spreading around today---and so on. Each item is assigned an actual dollar value that captures the net value of my seeing that information right now. We also dynamically compose the kind of alerting that might go on, given the setting; in this case, I'm most likely sitting in my office chatting with somebody, for example, and the system has decided to use a visual notification coupled with an audio herald.

Now, if I take the system offline for a second here and cut off the live information feed and bring in a scenario that says I have been away from the office for four hours, notice that now the best mode for several items at the top of this list is to send them via a pager, and those items will go out and I'll be paged with those items. But for the items with a dollar value "in the red," it's not worth bothering me with them right now, and that's the basic idea. We want to give users a means for getting the right information at just the right time and suppressing alerts about information that can be seen later.

These projects have already had impact on products. About a month ago, Microsoft released a product called Outlook Mobile Manager that is based on key compoinents of Priorities. Mobile Manager lets users define a set of profiles about different aspects of their lives and then learns to guide the most urgent information to their mobile devices.-. But more importantly and more exciting for our team is having influence on the infrastructure and interfaces being used in the Microsoft .NET initiative. We've been working very closely with .Net teams to integrate key aspects of this technology into a .NET infrastructure that others, including the research community, can build upon.

Finally, I'd like to say that there are significant challenges beyond the core functionalities that I've described. These include the development of compelling user interfaces for assessing user preferences about information and communication, and, perhaps even more important, methods for ensuring privacy and security. When it comes to intelligence services that offer value in return for information about users' activities, we as a community need to come up with some really powerful and compelling tools that allow people to deeply trust these agents. We're working at that at Microsoft Research right now and it's a very interesting topic for the larger research community as well.

I'd like to now invite Nuria Oliver and Ashutosh Garg up to show you some more about what we've been doing with the use of machine learning for robust context assessment in different settings. Nuria is a Researcher on our team at Microsoft Research and Ashutosh Garg is an intern from the University of Illinois, working with our team this summer.

NURIA OLIVER: Thank you, Eric. Can you all hear me?

There are certainly very important intelligent problems when learning and building models to infer the context. Some of the systems that we build in our group, we have been tackling some of these problems. Among them I would emphasize the ability to be able to combine information coming from very heterogeneous steams of data; in our case, it’s mostly vision, audio and also the computer activity. Another important problem is how to build representation at different levels of abstraction. And finally another important problem is how to build systems automatically that aren’t supervised.

What I’m going to show now is a prototype of the latest system that we have developed. Ashutosh, the intern who is working in our group, is going to show the demo with me. This system that you can see on the screen is a real time system, an actual model of a real time system for this situation.

The system is able to recognize in real time what kind of activity is happening in the office. We have these models of typical activities. Right now, for example, you can see that it’s detecting that there are two people in the room, Ashutosh and I, and we’re probably having a face-to-face conversation.

At the core of the system we have a hierarchal model that we have created. Let’s listen now to Ashutosh to see how he explains a little more about the system.

ASHUTOSH GARG: Actually now we consider the aspect analysis and the energy of the audio signal and the color and motion distribution of the visual scene. The session of the computer, the activity is also being taken into account. These features are then used to both characterize and localize the source of the sound and to infer the number of persons that are represented in the scene.

At the highest level, we use these in part to reason about the situations in which the user may be in and compute the likelihood, as are being plotted over there in this graph. And as we can see, it has just detected that I am giving a presentation over here. Thank you.

(Applause.)

BILL GATES: Well, so where do we go from here? Again, I’m optimistic that these advances are going to come absolutely at a very rapid pace. We need these advances to build the kind of software tools that really count. Software can’t be so low level that it doesn’t understand what the user is trying to do, that it isn’t able to look at text and help the user with that.

And so we’re proceeding full speed ahead with growing the R&D that we do, really believing in the advance in science that you all represent. We want to contribute to the community and we look forward to being able to take your work and deliver it out to hundreds of millions of users.

Thank you.

(Applause.)

 

© 2009 Microsoft Corporation. All rights reserved. Contact Us |Terms of Use |Trademarks |Privacy Statement