The summer of data science

October 8, 2015 | Posted by Microsoft Research Blog

While we live and breathe data science year-round at Microsoft Research, this summer, we offered a broad range of data science education opportunities for young researchers. Participation in these events was extremely rewarding—for both the students and the organizers.

Students and Advisors at the National Water Center for the Summer Institute

DS3 announcement leads the way

The educational blitz really began in the spring, when we announced openings for the upcoming Data Science Summer School (DS3), an eight-week course (June 15 to August 7) taking place at the Microsoft Research New York City Laboratory. Limited to just eight top-level undergrads, DS3 provided hands-on training and a deep level of understanding of data science. The students not only learned how to acquire, clean, and utilize the “messy” real-world data that is the raw material of today’s research, they were also introduced to problems in applied statistics and machine learning.

National Flood Interoperability Experiment Summer Institute 2015

While DS3 may have been the first program announced, the first student summer event to actually get under way was a Summer Institute in association with the National Flood Interoperability Experiment on June 1, in Tuscaloosa, Alabama. Fifty students from around the world spent seven weeks learning about and analyzing US hydrology data collated together for the first time ever. In the run-up to the event, my Microsoft colleagues and I helped with the data architecture and cyberinfrastructure on Microsoft Azure. During the event, we trained and continued to mentor the students on how to leverage the cloud and Azure ML for their research projects.

NFIE Summer Institute 2015

The “flood institute” culminated at the 3rd CUASHI Conference on Hydroinformatics at the University of Alabama, where students presented their group projects during talks and poster sessions. Several participants went on to submit papers and show their outcomes at other events, including the NSF Data Science Workshop, hosted by the University of Washington and New York University from August 5 to 7. This Seattle-based event invited students to submit a white paper on data science research. The students who submitted the top 100 papers were invited to participate in the event and present posters on their research. Several Microsoft employees—including two from Microsoft Research—participated in the event as panelists, speakers, and mentors. What struck me the most was that all of the posters at the NSF Data Science Workshop relied on multidisciplinary collaborations to drive research projects.

I participated in the National Flood Interoperability Experiment and the [NSF] Data Science workshop. I gained numerous new acquaintances, some of whom I now consider pals, and two projects currently underway that will lead on to publication… [I was impressed by the] diversity of ideas and curiosity to look outside my own little world of research problems.

—Solomon Vimal, visiting scholar, University of North Carolina at Chapel Hill

Heidelberg Laureate Forum

The last data science outreach happened at the Heidelberg Laureate Forum, where 200 young researchers came to the Heidelberg Institute of Technology and Science to interact directly with Abel Prize, Fields Medal, Nevanlinna, and Turing Award laureates. It was a once-in-a-lifetime opportunity for these students and new faculty to have direct access to the minds that have shaped computing and mathematics for our generation. It was an honor to present the outcomes of the National Flood Interoperability Experiment to the assembly, which included Turing Award winners from Microsoft Research—Butler Lampson, Leslie Lamport, and Tony Hoare—as well as Jennifer Chayes and Christian Borgs from the Microsoft Research New York and New England Laboratories.

The most exciting component of the Heidelberg Laureate Forum was the gathering of luminaries who have achieved the highest award in their respective fields. Many of these luminaries gave talks at the HLF that were full of insight for young researchers like myself, and all were enthusiastically involved in interacting with us and answering our questions.

—Mayan Kejriwal, PhD student, University of Texas at Austin

A commitment to growing the next generation

Kris Tolle presenting at the Heidleberg Laureate Forum

Whether they include 8 or 200 young researchers, these events have the potential to shape the future of data science. Interacting with these young researchers and guiding them toward future success is one of the most rewarding aspects of my job. My advice to these young minds was to do something that really matters and don’t leave the science out of data science.

And while the 2015 summer of data science is behind us, we are jumping into autumn with equal vigor. Stay tuned for announcements on the Data Science webpage.

Kristin Tolle, Director, Data Science Initiative, Microsoft Research

