What started as a one-day conference organized by Stanford University in 2015, Women in Data Science (WiDS) has blossomed into a movement bringing together women data scientists and aspiring data scientists via a series of over 150 virtual and in-person events worldwide, ultimately culminating in the March 4, 2019 main event at Stanford. Microsoft is a proud partner of WiDS; in addition to supporting the Datathon via the webinar, Microsoft also provided Xboxes as prizes.
One of the main drivers for engagement is the WiDS Datathon, now in its second year, that kicks off in the weeks preceding the conference, with the winners announced at Stanford during the conference. This year’s Datathon had participants working on a classic image classification problem using computer vision techniques. The challenge to be solved is an environmental one. Rampant deforestation caused by oil palm production (oil palm is a common ingredient across products in everyday use) has led to devastation of the eco habitats of many animal and plant species. One way to get ahead of the problem is to identify where the deforestation is taking place. These are remote regions and satellite imagery is an effective means of smart detection and intervention. Planet provided a set of hi-res satellite images and Figure8 helped annotate them and created a training, testing and holdout dataset for the Datathon. The Datathon has led to workshops in several countries with participants coming together to form teams to solve the challenge.
Datathon rules allow for teams of up to four people, with the requirement that at least half of each team be female or identify as female. Within weeks, the Datathon attracted over 200 teams. I took a shot at solving the problem using Microsoft Custom vision, one of the cognitive services available on Azure. Using the custom vision UI, I was able to build a classifier with a handful of training images within minutes. Extending the classifier to include hundreds of images was easy using the Python SDK for Custom vision. Such is the power of cognitive services in Azure; you can build a transfer learning-based powerful image classification algorithm with less than 100 lines of code. The model improved by simply continuing to add more images from the geo-images training dataset to the existing custom vision model, which was a simple and effective demonstration of the importance of increasing training data for higher model accuracy.
|Training images count||Precision||Recall|
This democratization of machine learning tools is an important factor in opening up the field of data science to a wide audience of data science students and practitioners. The other factor, especially relevant to attracting women to data science, is the focus on socially relevant datasets and problems, such as this year’s oil palm classification problem.
Data science for social good is an important sub field within the data science community with efforts such as the annual Workshop on Social Impact at KDD and efforts such as the Data Science for Social Good Summer Fellowship started at University of Chicago and now offered by University of Washington, University of British Columbia and other universities. The emphasis on leveraging data for altruistic goals is also evident in computer science departments across higher education that are currently pivoting to data science education. For example, the Data Science program offered at the University of California Berkeley, based on real datasets, has been a great catalyst in getting women into computing in unprecedented numbers—half the enrolled students are women, in contrast to traditional computer science courses. Greater numbers of women skilled in data science will help to fill the data gap that has created a pervasive but invisible bias with a profound effect on women’s lives.
More broadly than data science, AI has a burgeoning effort of socially relevant subfields that are applicable to a growing demographic of women technologists and students. These include topics such as eliminating bias in AI systems through fairness, accountability and transparency, secure machine learning, privacy, ethics, policy impacting and domain specific machine learning.
This year, the WiDS Datathon has resulted in regional Datathon workshops around the globe, for example, the WiDS Data Collaboration Day at UC Berkeley, and a meetup at the Microsoft New England Research and Development center.
Congratulations to all participants – visit the WiDS Datathon page for the full list of winners. We look forward to continuing our engagement with the growing community of data scientists as they tackle challenges that will have positive lasting impact on research and technology!