For over 20 years, Microsoft Research’s labs around the world have focused on research across a broad spectrum of topics in computer science. From the start, the organization has invested heavily in pioneering breakthroughs in machine intelligence, including efforts in machine learning and big data. In this interview, Distinguished Scientist Eric Horvitz talks about advances he sees on the horizon, the influence they will have on your daily life, and how insights from big data and developing more intelligent software and services will change the world.
At Microsoft Research labs around the world, some very deep thinkers are contemplating big data. This includes Eric Horvitz, distinguished scientist at Microsoft and co-director of Microsoft Research’s Redmond lab, who was recently elected to the National Academy of Engineering for his work in “computational mechanisms for decision making under uncertainty and with bounded resources.”
He sees a future where machines, fueled by large amounts of data, can become “empowering, lifelong digital companions” who know what you want or need (be it pizza or medicine), where you want to go (be it Hawaii or the most traffic-free route to the ball game) and generally work with a passion on your behalf.
Capturing data, storing it, interpreting it, and leveraging it can provide insights on small and large scales, and in high-tech and mainstream fields alike, Horvitz said.
“In today’s world, effective large-scale data analytics for predictive modeling, visualization, and discovery are becoming central for success in many areas.”
Microsoft News Center recently spoke to Horvitz about how Microsoft Research (MSR) is investing time and talent in the area of big data and machine intelligence, what breakthroughs MSR has made, and his vision for the future of these fields.
MNC: Why do you think there is such a buzz around big data right now?
Horvitz: Buzzwords arise for variety of reasons. In this case, I believe a confluence of several factors led to the popular use of that catchy phrase. One is the data that’s being collected in unprecedented quantities now on a variety of fronts, and advances in computer science – in sensing, storage and networking. Large amounts of data are being collected in part because of the shift of many human activities to the Web – and that has made it easy to collect transactions and events of various kinds in stream with activities. This includes everything from e-commerce to cars driving over sensors in roads to smartphone services leveraging location data, to healthcare. In healthcare, the explosion of genomics and the increasing capture of clinical data in hospitals has brought gigabytes and terabytes of patient data into databases – and we are in the early days of biomedical informatics. Storage also has become very inexpensive compared to what it used to be. We used to talk about maybe one day having terabytes of data. Now terabytes are something your kids can carry on a small drive in their pocket as they go to middle school. On the computational side, there have been advances with computational procedures we use to harness data for multiple interesting uses – such as building predictive models from data. As examples, we can leverage data to make real-time predictions about a computer user’s changing intentions or interests and learn to recognize someone’s gestures. We can learn from patient data to predict the likelihood that a patient will be readmitted after their discharge from a hospital.
MNC: What makes Microsoft Research’s machine learning research unique from others in the field?
Horvitz: Microsoft Research is well known as an open research lab where we promote research freedom to publish on our results and advances. That has attracted the best and the brightest people. Folks at MSR are energized by a stream of interesting real-world challenges. They also have access to large data resources – and the tantalizing opportunity to get one’s best ideas into into the hands of hundreds of millions of people. Our researchers investigating machine learning are very much part of the larger community of researchers worldwide pursuing studies in machine intelligence. Beyond machine learning, this reseach includes machine perception, automated reasoning and decision making. Machine learning runs deep in the DNA of Microsoft Research; the area of work was one of a few early critical priority areas that we invested in.
Today, people doing machine learning research across our labs are a substantial intellectual force. This includes teams of deep thinkers working on core principles as well as applications. We have teams of folks doing machine learning in Redmond, Cambridge, Beijing, Bangalore, Silicon Valley, New England and New York City. Together, these groups form one of the largest machine learning efforts in the world.
MNC: What are some ways that MSR machine learning research has found its way into Microsoft products?
Horvitz: Numerous effort s have found their way into Microsoft products and services. Many of these successes stem from very close collaborations between people at MSR and folks on the product teams. As one example, Microsoft Research did the core work on learning how to rank items. This work led to Bing’s core methods for ranking search results in response to user queries. MSR is also well-known for is its work in vision systems – machines that can see and recognize what they’re seeing – as well as speech recognition and translation. When you use Bing voice search or Bing translator, you’re leveraging core MSR machine learning efforts.
Our Cambridge team is well known for methods that learn to understand how to take an image and to segment and categorize it; this valuable and innovative work was a critical enabler for the Kinect, which can identify people and their gestures in a room.
MSR is also known for applying machine learning research in the field of biomedical informatics and other aspects of clinical healthcare. In the Redmond lab, we’ve had major efforts in harnessing and utilizing the large quantities of clinical data coming out of hospitals now to build predictive models for guiding decision-making in hospitals. These systems are at work as I speak, in hospitals around enhancing healthcare. Another application is Bing Maps and Bing Directions, which provides traffic-sensitive directions for 72 cities in North America. Bing Directions uses methods from MSR that showed how we can learn from histories of traffic data how to predict real-time flows on all streets in a greater city region. Machine learning even occurs deep in the Windows operating system. MSR teamed with Windows to develop a real-time prefetching system that runs in Windows 7 and Windows 8. Windows continues to learn from users about their patterns of activity and then makes predictions about next actions – making the operating system even faster.
MNC: What are some goals of this extensive machine intelligence research?
Horvitz: The directions and goals are broad, from explorations of the basic science of machine learning to understanding how to best solve particular classes of data and perform specific tasks. We also explore the development of more efficient and powerful tools to support the engineering practice of machine learning. On this front, we’ve been exploring the development of tools and methods that let non-experts or or semi-experts do a great job with their own predictive modeling and data analytics. This is a very, very interesting challenge – to put the power in the hands of end users – typically, this kind of analytical power has only been in the hands of machine learning experts and statisticians .
MNC: That sounds like an immense challenge. Where do you start in trying to make machine intelligence available to the masses?
Horvitz: In machine learning, numerous algorithmic procedures have been developed; each typically comes with levers and knobs for tuning the methods to the data and task at hand. Questions arise about which method is best to use for a particular dataset and learning task. There are also challenges with cleaning, preparing and anonymizing raw data so it can be easily processed and analyzed. There are multiple danger zones in machine learning, and new kinds of tools can help people to specify what it is they want to learn and how to validate the accuracy of the predictions made by the models that they build. Then there’s decision making. This centers on how to guide actions and policies in the world based on predictions. We’re working to create new kinds of tools that guide data collection, analysis and testing – and that also provide end users with insights about visualization and decision making.
MNC: What are some of the other hurdles in the world of machine learning?
Horvitz: One challenge that we’ve been taking on is machines that can understand and even translate conversational speech. Sometimes small gains in accuracy have big implications for the competency of a system. Recently, (MSR Chief Research Officer) Rick Rashid demonstrated in front of a large audience in Tianjin, China, the ability to do real-time translation from English to Mandarin Chinese. He was talking freely and having his speech translated and then re-rendered in his own voice – he was speaking Mandarin in real time. That translation pipeline was enabled by several technologies, but in some ways the most salient and surprising innovation was a surprising increase in the accuracy of speech recognition for conversational speech. That’s just happened in the last couple of years, and was the result of research and experimentation at MSR on new directions in machine learning.
MNC: So what aspects of big data will Microsoft Research focus on?
Horvitz: There are so many fun and promising directions. I have to say, it’s really an exciting opportunity area – and we’re at an exciting time. Looking out at the longer-term future, I expect that machine learning, and machine intelligence more broadly, is going to provide us with foundational new tools for doing scientific research, and that many breakthroughs over the next few decades will come as a collaboration between people and the machine learning and reasoning tools. There are opportunities to learn new things from large amounts of data, including getting to the bottom of healthcare mysteries by going through data with automated learning tools – some of which can recognize causality, that A actually causes B.
Another direction is working to weave together a set of technologies – machine learning, speech recognition, natural language understanding, machine vision and decision making – to create systems that act like bright collaborators and that complement human intellect in new kinds of ways.
On another front, there’s a great deal of opportunity to do new kinds of search and retrieval on the Web. We’re also applying machine learning in new ways to pick out signals in large amounts of population data. For example, in recent work, we’ve developed a way to discover clues about medication side effects in anonymized search logs. I believe that data-centric methods will change the world in so many ways, with influences on health, education, science and commerce.
MNC: If you were to get a bit Jules Verne, what could all of this research mean for the future?
Horvitz: Looking out to the future, I believe that there’s an opportunity to build systems that really become empowering, lifelong digital companions that deeply understand what it is you want to do, where you want to go, what you want to learn, what you need to do to stay healthy, what your good and less good at, and that continue to work on your behalf to assist and to complement you. Work on several fronts is already providing some foreshadowing wisps of wider possibilities.
MNC: Why did you get into this field?
Horvitz: I have long been interested in understanding the human mind and my curiosity led me from biology to physics to the world of information and computation. Beyond that core pursuit, I’ve come to be excited over the years with applying principles of learning and decision making in real-world applications that provide value – while somehow being related to the big questions about thinking systems. I’ve had a blast working with and alongside fabulous colleagues on principles and applications. And at a place like Microsoft Research, we all have this tantalizing “lever” in mind – with a fulcrum at the horizon. Our next innovation or idea could really move the planet, via having an influence on Microsoft’s products and services.
MNC: All in a day’s work, huh?
Horvitz: [Laughing] Exactly. But I’m serious about this, we’re not kidding around.
MNC: The Harvard Business Review has declared the data scientist the new sexiest job.
Horvitz: That’s great. You might say that, in some ways, computer science and other engineering fields have suffered over the years in that people making career choices had been looking for “noble endeavors” – in fields like healthcare and law. I believe that the computational sciences are becoming the noble endeavors of our time, because computing enables so many other things from aerospace to healthcare to science to law to government.