Transcript Document
PASTRY
1
Sources
Pastry paper “ Pastry: Scalable, decentralized object location and routing for large scale peer-to-peer systems ” by Antony Rowstron (Microsoft Research) and Peter Druschel (Rice University), IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), Heidelberg, Germany, pages 329-350, November, 2001 Pastry Homepage http://research.microsoft.com/en-us/um/people/antr/Pastry/default.htm
2
Related work
Chord [ Sigcomm’01 ] CAN [ Sigcomm’01 ] Tapestry [ TR UCB/CSD-01-1141 ] PNRP [ unpub .] Viceroy [ PODC ’02 ] Kademlia [ IPTPS ’02 ] Small World [ Kleinberg ‘99, ‘00 ] Plaxton Trees [ Plaxton et al. ‘97 ] Generalized Hypercube [ Bhuyan et al . ‘84 ]
Pastry
Generic p2p location and routing substrate (DHT)
Self-organizing overlay network (join, departures, locality repair) Consistent hashing Lookup/insert object in < O(log N) per-node state Network locality heuristics log 2 b N routing steps (expected) Scalable, fault resilient, self-organizing, locality aware, secure 4
Pastry: Object distribution
2 128 - 1 O nodeIds objId/key
Consistent hashing
128 bit circular id space
nodeIds
(uniform random)
objIds/keys
(uniform random)
Invariant:
node with numerically closest nodeId maintains object 5
Route( X )
Pastry: Object insertion/lookup
2 128 - 1 O X Msg with key
X
is routed to live node with nodeId closest to X
Problem:
complete routing table not feasible 6
Pastry Node
Represented by 128-bit randomly chosen nodeId (Hash of IP or public key) NodeId is in base 2 typical value 2 or 4) b (b is a configuration parameter; b Evenly distributed nodeIds along the circular namespace (0-2 128 – 1 space).
Routes a message in O(log N) steps to destination N: size of network Node state contains: Leaf Set (
L
) Routing table (
R
) Neighborhood Set (
M
) CMPT 880: P2P Systems - SFU 7
Pastry node state
Leaf set: L/2 Numerically closest nodes ( L is a configuration parameter = 16, 32 typically ) Routing Table (Prefix based) Neighborhood Set: M physically closest nodes 8
Pastry node state (Leaf Set)
Serves as a fall back for routing table and contains: L/2 numerically closest and larger nodeIds L/2 numerically closest and smaller nodIds Size of
L
is typically 2 b or 2 x 2 b Nodes in L are numerically close (could be geographically diverse) 9
Pastry node state: Neighborhood set (M) Contains the IP addresses and nodeIds of closest nodes according to proximity metric Size of |M| is typically 2 b or 2x2 b Not used in routing, but instead for maintaining locality properties 10
Node state:
Routing Table
Matrix of Log
2b
N rows and 2 is the number of nodes in the network )
b
– 1 columns (
N
Entries in row n match the first n current nodeId AND digits of Column number follows matched digits: Format: matched digits–column number–rest of ID
Log 2b
N populated on average 11
Node10233102
(2),
( b = 2, l = 8)
0 02212102
10
031203
102
00230
1023
0322
10233
001 1
1
1301233
10
132102
102
11302
1023
1000 2 22301203
1
2230203
102
2302
1023
2121
10233
232
102331
20 3 31203203
1
3021022
10
323302 12
Pastry: Routing
Tradeoff
O( log N ) routing table size O( 2 b * log 2 b N + 2l log N ) message forwarding steps 13
Prefix Routing
Node IDs and keys from randomized namespace (SHA-1) incremental routing towards destination ID each node has small log (n) set of outgoing routes neighbors per node, log (n) hops between any node pair
ID: ABC E
ABC 0
To: ABCE
AB 5F
A 930
Pastry: Routing table (#
10233102
)
L nodes in leaf set log 2 b N Rows (actually log 2 b 2 128 = 128/b) 2 b columns L neighbors 15
Pastry: Routing procedure
(1) Node is in the leaf set (2) Forward message to a closer node (Better match) (3) Forward towards numerically Closer node (not a better match) D: Message Key
L i
: i th closest NodeId in leaf set shl(A, B): Length of prefix shared by nodes A and B R i j : (j, i) th entry of routing table 16
Pastry: Routing procedure
If
(destination is within range of our leaf set) forward to numerically closest member
else let let if
l d
= length of shared prefix = value of
l-
th digit in
D
’s address (
R l d
exists) forward to
R l d
else
(a) shares at least as long a prefix (b) is numerically closer than this node 17
Pastry: Routing procedure
If message with key D is within range of leaf set, forward to numerically closest leaf Else forward to node that shares at least one more digit with D in its prefix than current nodeId If no such node exists, forward to node that shares at least as many digits with D as current nodeId but numerically nearer than current nodeId CMPT 880: P2P Systems - SFU 18
Pastry: Routing
d46a1c d4 71f1 d46 7c4 d46 2ba d4 213f Look for ( d46a1c ) 65a1fc d 13da3 • •
Properties
log 2 b N steps O( log N ) state 19
Pastry: Locality properties
Assumption: scalar proximity metric e.g. ping/RTT delay, # IP hops traceroute, subnet masks a node can probe distance to any other node
Proximity invariant: Each routing table entry refers to a node close to the local node (in the proximity space), among all nodes with the appropriate nodeId prefix.
20
Pastry: Geometric Routing in proximity space d46a1c d4 71f1 d46 7c4 d46 2ba d4 213f d46 7c4 Route(d46a1c) d 13da3
NodeId space
65a1fc l The proximity distance traveled by message in each routing step is exponentially increasing (entry in row is chosen from a set of nodes of size N/2 bl ) The distance traveled by message from its source increases monotonically at each step (message takes larger and larger strides) d46 2ba
Proximity space
d4 213f 65a1fc d 13da3 21
Pastry: Locality properties
Each routing step is local, but there is no guarantee of globally shortest path Nevertheless, simulations show: Expected distance traveled by a message in the proximity space is within a small constant of the minimum Among k nodes with nodeIds closest to the key, message likely to reach the node closest to the source node first 22
Pastry: Self-organization
Initializing and maintaining routing tables and leaf sets Node addition Node departure (failure) The goal is to maintain all routing table entries to refer to a near node, among all live nodes with appropriate prefix 23
Pastry: Node addition
New node X contacts nearby node A A routes “ join ” closest to X message to X, which arrives to Z, X obtains leaf set from Z, i table from i
’
th
’
th row for routing node from A to Z X informs any nodes that need to be aware of its arrival X also improves its table locality by requesting neighborhood sets from all nodes X knows In practice: optimistic approach 24
Pastry: Node addition
X=d46a1c d4 71f1 Z=d46 7c4 d46 2ba d4 213f New node : X=d46a1c A is X ’s neighbor A = 65a1fc Route(d46a1c) d 13da3 25
Pastry: Node addition
d46a1c d4 71f1 d46 7c4 d46 2ba d4 213f d46 7c4
Proximity space
Route(d46a1c) d 13da3 65a1fc B1 is first row of B d4 213f X New node : d46a1c
NodeId space
X is close to A, B is close to B1. Why X is close to B1?
decreasing set size) d46 2ba The expected distance from B to its row one entries (B1) is much larger than the expected distance from A to B (chosen from exponentially 65a1fc d 13da3 26
Node departure (failure)
Leaf set repair (eager – all the time):
Leaf set members exchange keep-alive messages request set from furthest live node in set
Routing table repair (lazy – upon failure):
get table from peers in the same row, if not found – from higher rows
Neighborhood set repair (eager)
27
Pastry: Summary
Generic p2p overlay network Scalable, fault resilient, self-organizing, secure O(log N) O(log N) routing steps (expected) routing table size Network locality properties 28