Combining Algorithms and Humans for Large-Scale Data Integration


April 18, 2017


Vasileios Verroios


Stanford University


Modern enterprises collect data from their operations and the web, and strongly depend on the collected data to make important decisions. To analyze the collected data, enterprises need to first perform data integration, i.e., combine the data from the multiple sources to create a unified set. Data integration involves some tasks that are still very hard for computer algorithms, like tasks involving images, video, natural language, or data semantics understanding. Since humans may be more accurate with such tasks, the approach of crowdsourcing has been proposed and applied by large companies and research organizations, over the last years. In crowdsourcing, humans are also involved, in order to enhance computer algorithms by completing small tasks, like classifying a forum comment as offensive or ironic. Crowdsourcing drastically improves the accuracy of the outcome compared to using only computer algorithms, however, it does not scale due to the large amount of time (and monetary compensation) required by humans. In this talk, I will discuss how to make crowdsourcing scalable for data integration.