Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Making better use of the crowd

December 1, 2016 | By Microsoft blog editor

By Jennifer Wortman Vaughan, Senior Researcher, Microsoft Research

Jennifer Wortman Vaughan, Senior Researcher, Microsoft Research

Jennifer Wortman Vaughan, Senior Researcher, Microsoft Research
Photography: John Brecher

Over the last decade, computer scientists have harnessed crowds of Internet users to solve tasks that are notoriously difficult to crack with computers alone, such as determining whether an image contains a tree, rating the relevance of websites, and verifying phone numbers.

The machine learning community was early to embrace so-called crowdsourcing to quickly and inexpensively obtain the vast quantities of labeled data needed to train machine learning systems how to classify images or recognize speech, for example. Labeled data are essentially sets of teaching examples, such as pictures of cats that are tagged with the word “cat.”

Usually this handoff of labeled data is where interaction with the crowd ends. Are there better ways to make use of the crowd?

On December 5, I will showcase innovative uses of crowdsourcing that go beyond data collection during a crowdsourcing tutorial at NIPS, the premier international machine learning conference, held this year in Barcelona. The tutorial will also dive into recent research aimed at understanding who crowdworkers are and how they behave, which could inform best practices for interacting with the crowd.

Innovative uses of crowdsourcing that go beyond the collection of data include:

  • Harnessing the crowd to improve machine-learning models including extracting features from labeled data most relevant for model training and evaluation of learned models.
  • Leveraging the complementary strengths of humans and machines to achieve more than either can alone. Potential applications of these so-called hybrid-intelligence systems include real-time on-demand closed captioning of day-to-day conversations and crowd-powered writing and editing.
  • Using crowdsourcing platforms to recruit large pools of subjects for experiments designed to study the effects of human behavior when reasoning about the performance of computer systems, which could lead to better designed algorithms and systems.

Recent research—both qualitative and quantitative—has opened the black box of crowdsourcing to uncover that crowdworkers are not just independent contractors, but rather a network with a rich communication structure.  Meanwhile, experiments have explored how to boost the quality of crowdwork using both well-designed monetary incentives (such as performance-based payments) and intrinsic motivation (such as piqued curiosity).

This research has much to teach us about how to most effectively interact with the crowd. (Hint: Be respectful, be responsive, be clear.)

Crowdsourcing has the potential for major impact on the way we design machine learning and AI systems, but to unleash this potential we need more creative minds exploring novel ways to use it.

Related

Up Next

Artificial intelligence, Data platforms and analytics

Cloud computing aids researchers in solving the unsolvable in medical data labeling

It’s not uncommon for physicians to disagree about a diagnosis. That’s why people often seek a second or third opinion when faced with a serious or complex health concern. What if instead of a second opinion, hundreds of expert opinions could be collated? What if those experts were a combination of both humans and AI […]

Vani Mandava

Director, Data Science Outreach

Artificial intelligence

Microsoft Research to present latest findings on fairness in socio-technical systems at FAT* 2019

Researchers from Microsoft Research will present a series of studies and insights relating to fairness in machine learning systems and allocations at the FAT* Conference—the new flagship conference for fairness, accountability, and transparency in socio-technical systems—to be held from January 29–31 in Atlanta, Georgia. Presented across four papers and covering a broad spectrum of domains, […]

Microsoft blog editor

group of people at KDD

Artificial intelligence, Data platforms and analytics

Machine learning, data mining and rethinking knowledge at KDD 2018

KDD 2018, the 24th ACM Conference on Knowledge Discovery and Data Mining took place in London, United Kingdom on August 19-23 in the heart of London’s historic Royal Docks. KDD is one of the top conferences in the machine learning and data mining domain, bringing together researchers and practitioners across computer science and all verticals. […]

Microsoft blog editor