Towards Expressive and Scalable Publish/Subscribe


October 5, 2005


Publish/subscribe (pub/sub) is a powerful paradigm, which enables asynchronous interaction in large distributed applications ranging from Enterprise Application Integration (EAI) to Internet-scale news services. Today it is supported by virtually all major message-oriented middleware solutions. There is, however, an inherent tradeoff between pub/sub expressiveness and performance. Previous work has shown that by limiting subscriptions to simple filters on topics or content, one can achieve very high scalability in terms of the number of publishers and subscribers. On the opposite end of the spectrum are very expressive data stream processing systems like STREAM, for which it is not clear how to make them scale to large numbers of subscriptions. We attempt to find a sweet spot between expressiveness and performance. The main idea is to extend the functionality of simple pub/sub filters to enable stateful subscriptions, parameterization, and computation of aggregates, while still maintaining high scalability. Our main contributions are a novel algebra for expressing stateful subscriptions, a corresponding transformation of algebra expressions into simple automata, and effective methods for multi-query optimization by sharing processing between queries. Our query language is more expressive than recently proposed XML filtering approaches, enabling us to efficiently support a variety of interesting queries for emerging applications like RSS feeds. We have implemented our techniques in the Cayuga prototype system and show that even for tens of thousands of concurrently active stateful subscriptions, we can maintain a throughput of thousands of messages per second on a standard PC.

This is joint work with Al Demers, Johannes Gehrke, Mingsheng Hong, and Walker White


Mirek Riedewald

Mirek Riedewald is a Research Associate at Cornell University. In 2002 he obtained his Ph.D. from the University of California at Santa Barbara. Mirek’s current research interests are in the general area of database and information systems, in particular data stream processing, data management and analysis services for the sciences, data-driven Web applications, and data warehousing. By collaborating with scientists outside computer science, e.g., physicists and ornithologists at Cornell, Mirek strives to make the latest data management techniques available to other research communities. His work has been published in the proceedings of leading scientific conferences like VLDB and ACM SIGMOD, in journals like IEEE TKDE, and in books by Kluwer Academic Publishers, MIT/AAAI Press, and Idea Group Publishing.