A Machine Learning Perspective on Managing Noisy Structured Data
- Theodoros Rekatsinas | University of Wisconsin-Madison
Modern analytics depend on high-effort tasks like data preparation and data cleaning to produce accurate results. This talk describes recent work on making routine data preparation tasks such as data cleaning dramatically easier. I will first introduce a formal probabilistic framework to describe the quality of structured data and demonstrate how this framework allows us to cast data cleaning as a statistical learning and inference problem. I will then show how this connection allows us to obtain formal guarantees on automated data cleaning and describe how it forms the basis of the HoloClean framework, a state-of-the-art ML-based solution for managing noisy structured data. I will close with additional examples of how a statistical learning view on managing noisy data can lead to new solutions to classical database problems such as the discovery of functional dependencies in structured data.
[Slides]
-
-
Yeye He
Senior Principal Researcher
-
-
Watch Next
-
Accelerating MRI image reconstruction with Tyger
- Karen Easterbrook,
- Ilyana Rosenberg
-
-
-
-
AI for Precision Health: Learning the language of nature and patients
- Hoifung Poon,
- Ava Amini,
- Lili Qiu
-
GenAI for Supply Chain Management: Present and Future
- Georg Glantschnig,
- Beibin Li,
- Konstantina Mellou
-
Using Optimization and LLMs to Enhance Cloud Supply Chain Operations
- Beibin Li,
- Konstantina Mellou,
- Ishai Menache
-
-
-