Abstract

Many emerging applications such as wide-area network management need to query large, structured, highly distributed datasets. Seaweed is a distributed scalable infrastructure for querying such datasets. In this paper we describe its architecture and design features, using the Anemone network management system as a motivating example. The main contribution is a design supporting accurate query planning and efficient execution across a large number of unreliable endsystems. In contrast to prior work, Seaweed supports ad hoc querying in addition to continuous querying. The paper describes the solutions adopted by Seaweed: latency-based cost estimation, availability-based scheduling, and meta-data aggregation.