Distributed Segment Tree: A Unified Architecture to Support Range Query and Cover Query

  • Shipeng Li

MSR-TR-2007-30 |

Cover query, which is defined as to find all the intervals (or ranges) currently maintained in the system that cover a given value, arises from many real distributed file sharing or storage systems where the overall performance is critically determined by the effectiveness in handling the few specific objects (e.g., those updated or last few to be downloaded). Range query, defined as to find all the values in a certain range over the underlying structure, has also found applications in many critical real applications. While range query has attracted much attention, cover query has rarely been touched. Recognizing the duality between cover query and range query, we design the distributed segment tree (DST) that can support range query and cover query in a uniform way and equally efficiently. The key idea of DST is to overlay the segment tree data structure on top of DHT and implicitly reestablish the connection between the structural information of the segment tree and the underlying randomized (due to the hash operation) storage and routing substrate – DHT. We develop a segment splitting algorithm that achieves the computability feature of a segment tree, i.e., an arbitrary range can be uniquely decomposed into the union of a minimum number of element subranges. The performance of DST can thus be significantly enhanced by exploiting the parallelism among the resulting element subranges and achieve query latency close to a single DHT get for moderate query ranges with a small number of parallel network connections. Several load balancing techniques to handle storage and query traffic constraints are designed. DST can be easily extended to the multi-dimensional case. We systematically study various properties of DST, evaluate and compare its real performance for 1-D/2-D range/cover queries on PlanetLab for several important metrics.