Distributed hash tables for large-scale cooperative applications


April 14, 2005


Frank Dabek




Distributed hash tables are a way to organize the storage and network resources of many Internet hosts to create a single storage system. The promise of DHTs lies in their potential to create a distributed storage infrastructure that, in aggregate, is more robust and offers higher performance than any individual host. Because DHTs are likely to be deployed on potentially unreliable volunteer nodes spread around the globe, meeting this goal is challenging: nodes in the system may join or leave at any time and latencies between nodes can be large.

DHash is a distributed hash table that stores data reliably and provides low latency, high throughput access to data. DHash uses an efficient distributed lookup service, Chord, to map each block to a node based on the block’s key. DHash replicates data blocks for availability and efficiently maintains the desired number of replicas as nodes join and leave the system. DHash minimizes the latency of block fetch operations by routing requests to nearby nodes in the network using a decentralized, synthetic coordinate system, Vivaldi, that accurately predicts network latency between hosts. A transport protocol, STP, allows DHash to efficiently and fairly use network resources while performing many parallel block downloads.

DHash has proven to be a useful substrate for building a number of different cooperative applications including a file system for content distribution, a Usenet replacement, and a distributed backup and archival service.


Frank Dabek

Frank Dabek is a graduate student at MIT. His research interests include distributed systems, networking, and high performance servers. He is advised by Frans Kaashoek and Robert Morris.