Portrait of Sumit Basu

Sumit Basu

Principal Researcher

About

I’m a Researcher in the Knowledge Tools Group at Microsoft Research, Redmond. My research focus is on developing interactive, machine-learning based power tools to assist users in understanding and extracting answers from complex data.  I work on a variety of domains, including computer systems, music and sound analysis/synthesis, interactive selection, and more.

Projects

Sho: the .NET Playground for Data

Established: November 8, 2010

Sho is an interactive environment for data analysis and scientific computing that lets you seamlessly connect scripts (in IronPython) with compiled code (in .NET) to enable fast and flexible prototyping. The environment includes powerful and…

Blews – what the blogosphere tells you about news

Established: February 18, 2008

While typical news-aggregation sites do a good job of clustering news stories according to topic, they leave the reader without information about which stories figure prominently in political discourse. BLEWS uses political blogs to categorize news stories according to their…

Publications

2014

2013

2012

2011

2010

2008

2007

2006

2005

2004

2003

Projects

Other

Background

I’m Sumit Basu, a Principal Researcher in the Medical Devices Group at Microsoft Research, Redmond. My research focus is on developing interactive, machine-learning based power tools to assist users in understanding and extracting answers from complex data – physiological signals, teaching material/textbooks, computer systems, auditory signals like speech or music, scientific data, document collections, or the web. These power tools sometimes work by observing a user as they perform a task, then assisting them in their efforts once it understands what’s going on; in other cases (as in teaching) they provide inputs to the user and adaptively refine their strategy based on what works best.  The interactive aspect comes from having humans in a tight loop with the learning algorithm: instead of getting a big batch of labeled data, interactive learning tasks involve a delicate dance between the human and the algorithm to achieve sufficient performance with a minimum of operator effort.

I received my BS (1995), MEng (1997), and PhD (2002) all from MIT in Electrical Engineering and Computer Science.  I did my graduate work at the Media Lab with Professor Alex Pentland.  My doctoral thesis, “Conversational Scene Analysis,” examined how machine learning and signal processing techniques could be used to understand the structure of conversational interactions from auditory signals without recognizing words.  The common thread through all of my work to date has been the combination of human interaction and machine learning; fortunately there are an endless array of application areas of this ilk, especially if one is flexible in one’s definition of  interaction.

These days, I’m particularly interested in how we can use such technologies to detect, analyze, and derive insights from physiological signals with the goal of helping patients monitor and improve their cardiovascular health. This is a deep and complex area, involving problems in signal processing, signal quality estimation, real-time classification, and data mining, as well as fundamental aspects of cardiovascular physiology. If you’re a bright graduate student interested in such problems and curious about internship opportunities, drop me a line!

Spotlight

I recently joined the Medical Devices Group at Microsoft Research. We’ll have much more to say about the exciting project we’re working on very soon.

Our new paper, “Deep Questions without Deep Understanding”, on a new technique for generating high-level (deep) questions from large spans of text (i.e., entire Wikipedia sections, as opposed to individual sentences), will be appearing in July at ACL 2015.

Projects

Current Projects

ML for Physiological Signals: using machine learning to detect, analyze, and derive insights from physiological signals to help patients monitor and improve their cardiovascular health.

Earlier Projects

  • Teaching with Machine Learning: using machine learning to help students and teachers of all ages and all types of educational goals achieve their objectives more effectively and efficiently.
  • Sho: a powerful interactive environment for scientific computing and prototyping based on IronPython. Find out more and download it. Also check out this code for getting real-time skeleton data from Kinect in Sho.
  • Songsmith: a songwriting tool that takes melodies and helps develop accompaniments for them: based on this research with Dan Morris, it’s now a product (with much help from the MSR Advanced Development Team). Check it out and download the trial here. It’s also now free to many educational institutions via MSDN Academic Alliance and the Innovative Teachers’ Network.
  • StickySorter: a tool for doing affinity diagramming and other flavors of information organization I developed with Julie Guinn and Office Labs: you can download it here.
  • Music Analysis/Synthesis: using machine learning to help users understand, manipulate, and create music
  • Systems and Machine Learning: using machine learning to address problems in computer systems
  • Conversational Scene Analysis: seeking structure and content from conversational patterns

Activities

Community Activities