A substrate for peer-to-peer applications
Pastry is a generic, scalable and efficient substrate for peer-to-peer applications. Pastry nodes form a decentralized, self-organizing and fault-tolerant overlay network within the Internet. Pastry provides efficient request routing, deterministic object location, and load balancing in an application-independent manner. Furthermore, Pastry provides mechanisms that support and facilitate application-specific object replication, caching, and fault recovery.
Short overview of Pastry (please don’t cite) (opens in new tab)
Pastry provides the following capabilities. First, each node in the Pastry network has a unique, uniform random identifier (nodeId) in a circular 128-bit identifier space. When presented with a message and a numeric 128-bit key, a Pastry node efficiently routes the message to the node with a nodeId that is numerically closest to the key, among all currently live Pastry nodes. The expected number of forwarding steps in the Pastry overlay network is O(log N), while the size of the routing table maintained in each Pastry node is only O(log N) in size (where N is the number of live Pastry nodes in the overlay network). At each Pastry node along the route that a message takes, the application is notified and may perform application-specific computations related to the message.
Second, each Pastry node keeps track of its L immediate neighbors in the nodeId space (called the leaf set), and notifies applications of new node arrivals, node failures and node recoveries within the leaf set. Third, Pastry takes into account locality (proximity) in the underlying Internet; it seeks to minimize the distance messages travel, according to a scalar proximity metric like the ping delay. Pastry is completely decentralized, scalable, and self-organizing; it automatically adapts to the arrival, departure and failure of nodes. P2p applications built upon Pastry can utilize its capabilities in many ways, including:
- Mapping application objects to Pastry nodes: Application-specific objects are assigned unique, uniform random identifiers (objIds) and mapped to the k, (k >= 1) nodes with nodeIds numerically closest to the objId. The number k reflects the application’s desired degree of replication for the object.
- Inserting objects: Application-specific objects can be inserted by routing a Pastry message, using the objId as the key. When the message reaches a node with one of the k closest nodeIds to the objId, that node replicates the object among the other k-1 nodes with closest nodeIds (which are, by definition, in the same leaf set for k <= L/2).
- Accessing objects: Application-specific objects can be looked up, contacted, or retrieved by routing a Pastry message, using the objId as the key. By definition, the message is guaranteed to reach a node that maintains a replica of the requested object unless all k nodes with nodeIds closest to the objId have failed.
- Availability and persistence: Applications interested in availability and persistence of application-specific objects maintain the following invariant as nodes join, fail and recover: object replicas are maintained on the k nodes with numerically closest nodeIds to the objId, for k > 1. The fact that Pastry maintains leaf sets and notifies applications of changes in the set’s membership simplifies the task of maintaining this invariant.
- Diversity: The assignment of nodeIds is uniform random, and cannot be corrupted by an attacker. Thus, with high probability, nodes with adjacent nodeIds are diverse in geographic location, ownership, jurisdiction, network attachment, etc. Therefore, the probability that such a set of nodes is conspiring or suffers from correlated failures is low even for modest set sizes. This minimizes the probability of a simultaneous failure of all k nodes that maintain an object replica. Likewise, quorum-based protocols can be used to securely update and query the state of replicated objects, despite the presence of a limited number of malicious nodes in the system.
- Load balancing: Both nodeIds and objIds are randomly assigned and uniformly distributed in the 128-bit Pastry identifier space. Without requiring any global coordination, this results in a good first-order balance of storage requirements and query load among the Pastry nodes, as well as network load in the underlying Internet.
- Object caching: Applications can cache objects on the Pastry nodes encountered along the paths taken by insert and lookup messages. Subsequent lookup requests whose paths intersect are served the cached copy. (Pastry’s network locality properties make it likely that messages routed with the same key from nearby nodes converge early, thus lookups are likely to intercept nearby cached objects.) This distributed caching offloads the k nodes that hold the primary replicas of an object, and it minimizes client delays and network traffic by dynamically caching copies near interested clients.
- Efficient, scalable information dissemination: Applications can perform efficient multicast using reverse path forwarding along the tree formed by the routes from clients to the node with nodeId numerically closest to a given objId. Pastry’s network locality properties ensure that the resulting multicast trees are efficient; i.e., they result in efficient data delivery and resource usage in the underlying Internet.
Links to related projects
- CAN (opens in new tab)
- Chord (opens in new tab)
- Farsite (opens in new tab)
- Freenet (opens in new tab)
- Gnutella (opens in new tab)
- OceanStore/Tapestry (opens in new tab)
Pastry software distribution
Use the FreePastry distribution (found here (opens in new tab)).