Managing Large-scale Probabilistic Databases
- Christopher Re | University of Washington
For the next generation of data-management applications, such as sensor-based monitoring, data integration, and information extraction, data processing is the dominant cost. Often, the data driving these applications are uncertain, for example, due to missed, inconsistent, or imprecise sensor readings. Unfortunately, traditional data-management systems provide little or no support for managing uncertainty. To remedy this, my dissertation advocates an approach for data management in which uncertainty is modeled using probabilities. The cost of modeling imprecision using probabilities is that basic data-management tasks, such as querying, become theoretically and practically more difficult. Thus, the key challenge in managing large-scale probabilistic data is efficiency.
In this talk, I will discuss the fundamental techniques that I developed in my dissertation to build a probabilistic database capable of handling large, imprecise datasets: these techniques include top-k processing with probabilities, materialized views, approximate lineage, and extensional processing for complex analytic queries. This work resulted in two systems: Mystiq, the first system to support complex queries on gigabytes of probabilistic relational data, and Lahar, the first system to support rich event-style queries on large, probabilistic streams.
Speaker Details
Christopher (Chris) Ré’s is a PhD candidate in the department of Computer Science and Engineering advised by Professor Dan Suciu and will receive his degree in the Summer of 2009. His recent work is in the area of large-scale probabilistic data management, which is motivated by diverse applications including RFID data management, information extraction, social networking, and data cleaning. Chris has completed several industrial internships at Microsoft Research and IBM. In addition to his dissertation work, he has completed a wide range of projects: Dedupalog, a language for large-scale data cleaning; the algebraic compiler for Galax, an open-source XQuery processor; XQuery! (read: XQuery-Bang), an XML update language; and SilkRoute II, a comprehensive XQuery-to-SQL-translation system. Chris obtained undergraduate degrees in Mathematics and Computer Science and an M.Eng in Computer Science from Cornell University.
-
-
Jeff Running
-
-
Watch Next
-
-
-
Accelerating MRI image reconstruction with Tyger
- Karen Easterbrook,
- Ilyana Rosenberg
-
-
-
-
From Microfarms to the Moon: A Teen Innovator’s Journey in Robotics
- Pranav Kumar Redlapalli
-
-
-