Microsoft Research Blog

Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Calling all aspiring women in Data Science

March 12, 2019 | By Vani Mandava, Director, Data Science Outreach

Datathon participants at the Microsoft New England Research and Development center. Photo credit: Dana J. Quigley; @DJQPhotography

What started as a one-day conference organized by Stanford University in 2015, Women in Data Science (WiDS) has blossomed into a movement bringing together women data scientists and aspiring data scientists via a series of over 150 virtual and in-person events worldwide, ultimately culminating in the March 4, 2019 main event at Stanford. Microsoft is a proud partner of WiDS; in addition to supporting the Datathon via the webinar, Microsoft also provided Xboxes as prizes.

One of the main drivers for engagement is the WiDS Datathon, now in its second year, that kicks off in the weeks preceding the conference, with the winners announced at Stanford during the conference. This year’s Datathon had participants working on a classic image classification problem using computer vision techniques. The challenge to be solved is an environmental one. Rampant deforestation caused by oil palm production (oil palm is a common ingredient across products in everyday use) has led to devastation of the eco habitats of many animal and plant species. One way to get ahead of the problem is to identify where the deforestation is taking place. These are remote regions and satellite imagery is an effective means of smart detection and intervention. Planet provided a set of hi-res satellite images and Figure8 helped annotate them and created a training, testing and holdout dataset for the Datathon. The Datathon has led to workshops in several countries with participants coming together to form teams to solve the challenge.

Datathon rules allow for teams of up to four people, with the requirement that at least half of each team be female or identify as female. Within weeks, the Datathon attracted over 200 teams. I took a shot at solving the problem using Microsoft Custom vision, one of the cognitive services available on Azure. Using the custom vision UI, I was able to build a classifier with a handful of training images within minutes. Extending the classifier to include hundreds of images was easy using the Python SDK for Custom vision. Such is the power of cognitive services in Azure; you can build a transfer learning-based powerful image classification algorithm with less than 100 lines of code. The model improved by simply continuing to add more images from the geo-images training dataset to the existing custom vision model, which was a simple and effective demonstration of the importance of increasing training data for higher model accuracy.

Training images count Precision Recall
60 79.60% 79.60%
1,800 97.50% 97.10%
5,000 99.60% 99.10%

 

We hosted a WiDS webinar that covered basic machine learning concepts and a tutorial with the custom vision solution. The webinar recording and slides are available for those who missed it.

This democratization of machine learning tools is an important factor in opening up the field of data science to a wide audience of data science students and practitioners. The other factor, especially relevant to attracting women to data science, is the focus on socially relevant datasets and problems, such as this year’s oil palm classification problem.

Data science for social good is an important sub field within the data science community with efforts such as the annual Workshop on Social Impact at KDD and efforts such as the Data Science for Social Good Summer Fellowship started at University of Chicago and now offered by University of Washington, University of British Columbia and other universities. The emphasis on leveraging data for altruistic goals is also evident in computer science departments across higher education that are currently pivoting to data science education. For example, the Data Science program offered at the University of California Berkeley, based on real datasets, has been a great catalyst in getting women into computing in unprecedented numbers—half the enrolled students are women, in contrast to traditional computer science courses. Greater numbers of women skilled in data science will help to fill the data gap that has created a pervasive but invisible bias with a profound effect on women’s lives.

Participants at the UC Berkeley WiDS Datathon Collaboration Day. Photo credit WiDS ambassadors Emily Liu and Mariah Rogers

More broadly than data science, AI has a burgeoning effort of socially relevant subfields that are applicable to a growing demographic of women technologists and students. These include topics such as eliminating bias in AI systems through fairness, accountability and transparency, secure machine learning, privacy, ethics, policy impacting and domain specific machine learning.

This year, the WiDS Datathon has resulted in regional Datathon workshops around the globe, for example, the WiDS Data Collaboration Day at UC Berkeley, and a meetup at the Microsoft New England Research and Development center.

Congratulations to all participants – visit the WiDS Datathon page for the full list of winners. We look forward to continuing our engagement with the growing community of data scientists as they tackle challenges that will have positive lasting impact on research and technology!

Up Next

Data platforms and analytics

Changing the world with data science

Alan Turing asked the question “can machines think?” in 1950 and it still intrigues us today. At The Alan Turing Institute, the United Kingdom’s national institute for data science in London, more than 150 researchers are pursuing this question by bringing their thinking to fundamental and real-world problems to push the boundaries of data science. […]

Kenji Takeda

Director, Health and AI Partnerships (Academic)

Artificial intelligence, Data platforms and analytics

Measuring human happiness and frustration using data science in the cloud

Emotions make us human. Researchers at The Alan Turing Institute in the United Kingdom are using artificial intelligence and machine learning to push the state of the art in data science to better understand what makes us happy, angry and frustrated. “Our research seeks to try and measure aspects of the world that we, as […]

Kenji Takeda

Director, Health and AI Partnerships (Academic)

Data platforms and analytics

Transportation Data Science at Microsoft

By Vani Mandava, Director, Data Science Outreach, Microsoft Research The National Science Foundation (NSF)-supported Big Data Innovation Hubs launched a National Transportation Data Challenge with a kickoff event in Seattle in May 2017. Microsoft Outreach, through its partnership with the Big Data Hubs organized an Azure workshop and participated in a panel discussion on ‘How […]

Microsoft blog editor