Error Rate Analysis Of Labeling By Crowdsourcing

  • Hongwei Li ,
  • Bin Yu ,
  • Denny Zhou

ICML '13 Workshop: Machine Learning Meets Crowdsourcing |

Crowdsourcing label generation has been a crucial component for many real-world machine learning applications. In this paper, we provide finite-sample exponential bounds on the error rate (in probability and in expectation) of hyperplane binary labeling rules for the Dawid-Skene (and Symmetric Dawid-Skene) crowdsourcing model. The bounds can be applied to analyze many commonly used prediction methods, including the majority voting, weighted majority voting and maximum a posteriori (MAP) rules. These bound results can be used to control the error rate and design better algorithms. In particular, under the Symmetric Dawid-Skene model we use simulation to demonstrate that the data-driven EM-MAP rule is a good approximation to the oracle MAP rule which approximately optimizes our upper bound on the mean error rate for any hyperplane binary labeling rule. Meanwhile, the average error rate of the EM-MAP rule is bounded well by the upper bound on the mean error rate of the oracle MAP rule in the simulation.