Publish/subscribe (pub/sub) is a powerful paradigm, which enables asynchronous interaction in large distributed applications ranging from Enterprise Application Integration (EAI) to Internet-scale news services. Today it is supported by virtually all major message-oriented middleware solutions. There is, however, an inherent tradeoff between pub/sub expressiveness and performance. Previous work has shown that by limiting subscriptions to simple filters on topics or content, one can achieve very high scalability in terms of the number of publishers and subscribers. On the opposite end of the spectrum are very expressive data stream processing systems like STREAM, for which it is not clear how to make them scale to large numbers of subscriptions. We attempt to find a sweet spot between expressiveness and performance. The main idea is to extend the functionality of simple pub/sub filters to enable stateful subscriptions, parameterization, and computation of aggregates, while still maintaining high scalability. Our main contributions are a novel algebra for expressing stateful subscriptions, a corresponding transformation of algebra expressions into simple automata, and effective methods for multi-query optimization by sharing processing between queries. Our query language is more expressive than recently proposed XML filtering approaches, enabling us to efficiently support a variety of interesting queries for emerging applications like RSS feeds. We have implemented our techniques in the Cayuga prototype system and show that even for tens of thousands of concurrently active stateful subscriptions, we can maintain a throughput of thousands of messages per second on a standard PC.
This is joint work with Al Demers, Johannes Gehrke, Mingsheng Hong, and Walker White