Data Analytics: Integration and Privacy

  • Steven Whang | Stanford University

Data analytics has become an extremely important and challenging problem in disciplines like computer science, biology, and medicine. As massive amounts of data are available for analysis, scalable integration techniques become important. At the same time, new privacy issues arise where one’s sensitive information can easily be inferred from the large amounts of data.

In my talk, I will first focus on the problem of entity resolution (ER), which identifies database records that refer to the same real world entity. In practice, ER is not a one-time process, but is constantly improved as the data, schema and application are better understood. I will address the problem of keeping the ER result up-to-date when the ER logic “evolves” frequently. A naive approach that re-runs ER from scratch may not be tolerable for resolving large datasets. I will show when and how we can instead exploit previous “materialized” ER results to save redundant work with evolved logic.

Next, I will introduce my work on managing information leakage where one must try to prevent important bits of information from being resolved by ER in order to gain data privacy. As more of our sensitive data gets exposed to a variety of merchants, health care providers, employers, social sites and so on, there is a higher chance that an adversary can “connect the dots” and piece together our information, leading to even more loss of privacy. I will explain our information leakage model and propose using disinformation as a tool for containing information leakage.

Speaker Details

Steven Whang is a computer science PhD candidate at Stanford University advised by Prof. Hector Garcia-Molina. His research interests include data integration and data privacy. He received his B.S. in computer science from the Korea Advanced Institute of Science and Technology (KAIST) in 2003 and his M.S. in computer science from Stanford University in 2007. He is a recipient of the IBM PhD Fellowship.