I lead the research and development activities in the Cloud and Information Services Lab at Microsoft. My group comprises of a mix of researchers and engineers working on various projects in the “Big data” space. Efforts range from rapid prototyping, to building production quality systems, releasing code to open source, and publishing papers in top Systems conferences.
Several of the projects initiated in the lab have been tech transferred to product teams. Over the years, we have worked closely with Microsoft’s Big Data teams and helped the teams embrace/contribute to OSS.
As an applied research lab, we work on a wide-range of projects related to analytics at datacenter scale, with a practical bent. Our projects span areas such as, cluster resource management, tiered storage, service analytics, query optimization, and stream processing. (Please see the Projects tab for details about our projects). Several of the systems we have built have been deployed internally at datacenter scale clusters (spanning, hundreds of thousands of machines).
We contribute extensively to open source (in particular, Apache Hadoop). Team members are Hadoop committers, PMC members, and are well respected in the OSS community.
On a personal side, I am broadly interested in building storage and compute infrastructure for datacenter settings. I enjoy building and deploying systems in practice as well as releasing them as open source. In building these systems, my work leverages upon technology trends in datacenter computing.
Some of the previous systems I have built and released as open source projects are:
Kosmos distributed filesystem: I have designed/implemented/deployed (KFS) to manage PB’s of storage.
Sailfish: I have also designed/implemented Sailfish, a compute infrastructure which improves handling of intermediate data (i.e., “shuffle” phase in a Map-Reduce computation). Our results show that Sailfish can improve job completion times at scale by 20% to 5x.
I also collaborate extensively with colleagues in MSR-Redmond.