Tutorial on Foundations of Probabilistic Answers to Queries

June 6, 2005
Dan Suciu and Nilesh Dalvi | University of Washington

Probabilistic query answering is a set of techniques used in several, very recent database applications: exploratory queries in databases, novel IR-style approaches to data integration, querying information extracted from the Web, queries over sensor networks, data acquisition, querying data sources that violate integrity constraints, controlling information disclosure in data exchange, and reasoning about privacy breaches. This is a surprisingly diverse range of applications, most of which have either emerged recently, or have seen a recent increased interest. They all share a common fundamental abstraction: that an item being a query answer is no longer a Boolean value, but a probabilistic event. This is a different paradigm to query answering, whose foundations lie in random graphs and probabilistic model theory. The results from these fields, and their relevance to the probabilistic query answering method, are little known in the database research community. This tutorial presents the applications, the foundations, and some processing techniques for probabilistic query answering.

Goals: First, the tutorial gives a brief overview of the class of applications mentioned above and to highlight their underlying theme: that the answer to a query is a probabilistic, rather than a deterministic event. Second, it presents a simple definition for probabilistic query answers that will hopefully help researchers in the field adopt a unified framework. Third, it presents in an accessible fashion some of the relevant results from random graphs and probabilistic finite models, and highlights what they mean in the context of probabilistic query answering. Fourth, it describes several techniques, systems, and results from probabilistic databases.

Target Audience: This tutorial is intended for both systems oriented and theoretical researchers interested in any of the applications mentioned above, or in exploring a new data management paradigm based on probabilities.

Speaker Details

Dan Suciu is an Associate Professor in Computer Science at the University of Washington. He is conducting research in data management, with an emphasis on topics that arise from sharing data on the Internet, such as management of semistructured and heterogeneous data, large scale message stream processing, security of data. He is a co-author of the book “Data on the Web: from Relations to Semistructured Data and XML”, holds six US patents, received the 2000 ACM SIGMOD Best Paper Award, and conducted three projects that lead to popular software tools in the public domain: XMill, XMLTK, and SilkRoute.

Nilesh Dalvi received his B.S. from Indian Institute of Technology, Mumbai in 2001. He is currently a Ph.D. candidate at the University of Washington, advised by Prof. Dan Suciu. His research interests lie in the area of Database management. His focus is on the management of uncertainty in databases, with applications to several domains: data integration, information retrieval, information extraction and data privacy.