SOLOMON: Seeking the Truth Via Copying Detection

We live in the Information Era, with access to a huge amount of information from a variety of data sources. However, data sources are of different qualities, often providing conflicting, out-of-date and incomplete data. Data sources can also easily copy, reformat and modify data from other sources, propagating erroneous data. These issues make the identification of high quality information and sources non-trivial.
In this talk we present the SOLOMON system, whose core is a module that detects copying between sources. We show how we can effectively detect copying relationship between data sources, leverage the results in various aspects of data integration, and provide a user-friendly interface to facilitate users in identifying sources that best suit their information needs.

Speaker Details

Dr. Xin Luna Dong is a researcher at AT&T Labs-Research. She received a Ph.D. in Computer Science and Engineering from University of Washington in 2007, received a Master’s Degree in Computer Science from Peking University in China in 2001, and received a Bachelor’s Degree in Computer Science from Nankai University in China in 1998. Her research interests include databases, information retrieval and machine learning, with an emphasis on data integration, data cleaning, personal information management, and web search. She has led the Solomon project, whose goal is to detect copying between structured sources and to leverage the results in various aspects of data integration, and the Semex personal information management system, which got the Best Demo award (one of top-3) in Sigmod ’05. She co-chaired WebDB ’10, has served as a group leader for the program committee of CIKM ’11, and has served on program committees of SIGMOD, VLDB, WWW, ICDE, and others.

Xin Luna Dong
AT&T Research