Portrait of Rich Caruana

Rich Caruana

Senior Researcher

About

Rich Caruana is a Senior Researcher at Microsoft Research. Before joining Microsoft, Rich was on the faculty in the Computer Science Department at Cornell University, at UCLA's Medical School, and at CMU's Center for Learning and Discovery.  Rich's Ph.D. is from Carnegie Mellon University, where he worked with Tom Mitchell and Herb Simon.  His thesis on Multi-Task Learning helped create interest in a new subfield of machine learning called Transfer Learning.  Rich received an NSF CAREER Award in 2004 (for Meta Clustering), best paper awards in 2005 (with Alex Niculescu-Mizil), 2007 (with Daria Sorokina), and 2014 (with Todd Kulesza, Saleema Amershi, Danyel Fisher, and Denis Charles), co-chaired KDD in 2007 (with Xindong Wu), and serves as area chair for NIPS, ICML, and KDD.  His current research focus is on learning for medical decision making, transparent modeling, deep learning, and computational ecology.

Projects

Intelligible, Interpretable, and Transparent Machine Learning

The importance of intelligibility and transparency in machine learning Most real datasets have hidden biases. Being able to detect the impact of the bias in the data on the model, and then to repair the model, is critical if we are going to deploy machine learning in applications that affect people’s health, welfare, and social opportunities. This requires models that are intelligible. In machine learning, there is often a tradeoff between accuracy and intelligibility: the…

Machine Learning on the Edge

In a few years, the world will be filled with billions of small, connected, intelligent devices. Many of these devices will be embedded in our homes, our cities, our vehicles, and our factories. Some of these devices will be carried in our pockets or worn on our bodies. The proliferation of small computing devices will disrupt every industrial sector and play a key role in the next evolution of personal computing. Most of these devices…

Dual Embedding Space Model (DESM)

Established: January 21, 2016

The Dual Embedding Space Model (DESM) is an information retrieval model that uses two word embeddings, one for query words and one for document words. It takes into account the vector similarity between each query word vector and all document word vectors. A key challenge for information retrieval is to model document aboutness. The traditional approach uses term frequency, with more occurrences of a query word indicating that the document is more likely to be…

CodaLab

Established: March 21, 2014

CodaLab is an open-source web-based platform that enables researchers, developers, and data scientists to collaborate, with the goal of advancing research fields where machine learning and advanced computation is used. CodaLab helps solve many common problems in the arena of data-oriented research through its online community, where people share worksheets and participate in collaborative competitions. Immutable experimentation that ensures reproducibility Today’s data-driven research and development is stymied by an inability of scientists and their collaborators…

Publications

2016

2015

2014

2013

2012

2011

1997

Projects