Transcript Document

PASTRY

1

Sources

 Pastry paper  “ Pastry: Scalable, decentralized object location and routing for large scale peer-to-peer systems ” by Antony Rowstron (Microsoft Research) and Peter Druschel (Rice University), IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), Heidelberg, Germany, pages 329-350, November, 2001  Pastry Homepage  http://research.microsoft.com/en-us/um/people/antr/Pastry/default.htm

2

Related work

   Chord [ Sigcomm’01 ] CAN [ Sigcomm’01 ] Tapestry [ TR UCB/CSD-01-1141 ]       PNRP [ unpub .] Viceroy [ PODC ’02 ] Kademlia [ IPTPS ’02 ] Small World [ Kleinberg ‘99, ‘00 ] Plaxton Trees [ Plaxton et al. ‘97 ] Generalized Hypercube [ Bhuyan et al . ‘84 ]

Pastry

   

Generic p2p location and routing substrate (DHT)

 Self-organizing overlay network (join, departures, locality repair) Consistent hashing Lookup/insert object in < O(log N) per-node state Network locality heuristics log 2 b N routing steps (expected) Scalable, fault resilient, self-organizing, locality aware, secure 4

Pastry: Object distribution

2 128 - 1 O nodeIds objId/key

Consistent hashing

128 bit circular id space

nodeIds

(uniform random)

objIds/keys

(uniform random)

Invariant:

node with numerically closest nodeId maintains object 5

Route( X )

Pastry: Object insertion/lookup

2 128 - 1 O X Msg with key

X

is routed to live node with nodeId closest to X

Problem:

complete routing table not feasible 6

Pastry Node

     Represented by 128-bit randomly chosen nodeId (Hash of IP or public key) NodeId is in base 2 typical value 2 or 4) b (b is a configuration parameter; b Evenly distributed nodeIds along the circular namespace (0-2 128 – 1 space).

Routes a message in O(log N) steps to destination  N: size of network Node state contains:    Leaf Set (

L

) Routing table (

R

) Neighborhood Set (

M

) CMPT 880: P2P Systems - SFU 7

Pastry node state

 Leaf set: L/2 Numerically closest nodes ( L is a configuration parameter = 16, 32 typically )  Routing Table (Prefix based)  Neighborhood Set: M physically closest nodes 8

Pastry node state (Leaf Set)

   Serves as a fall back for routing table and contains:  L/2 numerically closest and larger nodeIds  L/2 numerically closest and smaller nodIds Size of

L

is typically 2 b or 2 x 2 b Nodes in L are numerically close (could be geographically diverse) 9

Pastry node state: Neighborhood set (M)    Contains the IP addresses and nodeIds of closest nodes according to proximity metric Size of |M| is typically 2 b or 2x2 b Not used in routing, but instead for maintaining locality properties 10

Node state:

Routing Table

    Matrix of Log

2b

N rows and 2 is the number of nodes in the network )

b

– 1 columns (

N

Entries in row n match the first n current nodeId AND digits of Column number follows matched digits: Format: matched digits–column number–rest of ID

Log 2b

N populated on average 11

Node10233102

(2),

( b = 2, l = 8)

0 02212102

10

031203

102

00230

1023

0322

10233

001 1

1

1301233

10

132102

102

11302

1023

1000 2 22301203

1

2230203

102

2302

1023

2121

10233

232

102331

20 3 31203203

1

3021022

10

323302 12

Pastry: Routing

Tradeoff

  O( log N ) routing table size  O( 2 b * log 2 b N + 2l log N ) message forwarding steps 13

Prefix Routing

Node IDs and keys from randomized namespace (SHA-1)    incremental routing towards destination ID each node has small log (n) set of outgoing routes neighbors per node, log (n) hops between any node pair 

ID: ABC E

ABC 0

To: ABCE

AB 5F

A 930

Pastry: Routing table (#

10233102

)

L nodes in leaf set log 2 b N Rows (actually log 2 b 2 128 = 128/b) 2 b columns L neighbors 15

Pastry: Routing procedure

(1) Node is in the leaf set (2) Forward message to a closer node (Better match) (3) Forward towards numerically Closer node (not a better match) D: Message Key

L i

: i th closest NodeId in leaf set shl(A, B): Length of prefix shared by nodes A and B R i j : (j, i) th entry of routing table 16

Pastry: Routing procedure

If

(destination is within range of our leaf set) forward to numerically closest member

else let let if

l d

= length of shared prefix = value of

l-

th digit in

D

’s address (

R l d

exists) forward to

R l d

else

(a) shares at least as long a prefix (b) is numerically closer than this node 17

Pastry: Routing procedure

   If message with key D is within range of leaf set, forward to numerically closest leaf Else forward to node that shares at least one more digit with D in its prefix than current nodeId If no such node exists, forward to node that shares at least as many digits with D as current nodeId but numerically nearer than current nodeId CMPT 880: P2P Systems - SFU 18

Pastry: Routing

d46a1c d4 71f1 d46 7c4 d46 2ba d4 213f Look for ( d46a1c ) 65a1fc d 13da3 • •

Properties

log 2 b N steps O( log N ) state 19

Pastry: Locality properties

Assumption: scalar proximity metric    e.g. ping/RTT delay, # IP hops traceroute, subnet masks a node can probe distance to any other node

Proximity invariant: Each routing table entry refers to a node close to the local node (in the proximity space), among all nodes with the appropriate nodeId prefix.

20

Pastry: Geometric Routing in proximity space d46a1c d4 71f1 d46 7c4 d46 2ba d4 213f d46 7c4 Route(d46a1c) d 13da3

NodeId space

65a1fc l  The proximity distance traveled by message in each routing step is exponentially increasing (entry in row  is chosen from a set of nodes of size N/2 bl ) The distance traveled by message from its source increases monotonically at each step (message takes larger and larger strides) d46 2ba

Proximity space

d4 213f 65a1fc d 13da3 21

Pastry: Locality properties

   Each routing step is local, but there is no guarantee of globally shortest path Nevertheless, simulations show:  Expected distance traveled by a message in the proximity space is within a small constant of the minimum Among k nodes with nodeIds closest to the key, message likely to reach the node closest to the source node first 22

Pastry: Self-organization

Initializing and maintaining routing tables and leaf sets   Node addition Node departure (failure) The goal is to maintain all routing table entries to refer to a near node, among all live nodes with appropriate prefix 23

Pastry: Node addition

    New node X contacts nearby node A A routes “ join ” closest to X message to X, which arrives to Z, X obtains leaf set from Z, i table from i

th

th row for routing node from A to Z X informs any nodes that need to be aware of its arrival  X also improves its table locality by requesting neighborhood sets from all nodes X knows  In practice: optimistic approach 24

Pastry: Node addition

X=d46a1c d4 71f1 Z=d46 7c4 d46 2ba d4 213f New node : X=d46a1c A is X ’s neighbor A = 65a1fc Route(d46a1c) d 13da3 25

Pastry: Node addition

d46a1c d4 71f1 d46 7c4 d46 2ba d4 213f d46 7c4

Proximity space

Route(d46a1c) d 13da3 65a1fc B1 is first row of B d4 213f X New node : d46a1c

NodeId space

X is close to A, B is close to B1. Why X is close to B1?

decreasing set size) d46 2ba The expected distance from B to its row one entries (B1) is much larger than the expected distance from A to B (chosen from exponentially 65a1fc d 13da3 26

Node departure (failure)

    

Leaf set repair (eager – all the time):

Leaf set members exchange keep-alive messages request set from furthest live node in set 

Routing table repair (lazy – upon failure):

get table from peers in the same row, if not found – from higher rows

Neighborhood set repair (eager)

27

Pastry: Summary

     Generic p2p overlay network Scalable, fault resilient, self-organizing, secure O(log N) O(log N) routing steps (expected) routing table size Network locality properties 28