Transcript Slide 1
Structured P2P
Networks
Guo Shuqiao
Yao Zhen
Rakesh Kumar Gupta
CS6203 Advanced Topics in Database Systems
Introduction-P2P Network
A peer-to-peer (P2P) network is a
distributed system in which peers employ
distributed resources to perform a critical
function in a decentralized fashion [LW2004]
Classification of P2P networks
Unstructured
and Structured
Centralized and Decentralized
Hierarchical and Non-Hierarchical
Structured P2P network
Distributed hash table (DHT)
DHT
is a structured overlay that offers
extreme scalability and hash-table-like
lookup interface
CAN, Chord, Pastry
Other techniques
Skip
list
Skipgraph, SkipNet
Outline
Hashed based techniques in P2P
Hashed
Pastry
P-Grid
Two
based structured P2P system
important issues
Load balancing
Neighbor table consistency preserving
Comparison
Skip-list based system
SkipNet
of DHT techniques
Conclusion
Outline
Hashed based techniques in P2P
Hashed
Pastry
P-Grid
Two
based structured P2P system
important issues
Load balancing
Neighbor table consistency preserving
Comparison
Skip-list based system
SkipNet
of DHT techniques
Conclusion
Pastry [RD2001]
Pastry is a P2P object location and
routing scheme
Hash-based
Properties
Completely
decentralized
Scalable
Self-organized
Fault-resilient
Efficient
search
Design of Pastry
nodeID: each node has a unique numeric
identifier (128 bit)
Assigned
randomly
Nodes with adjacent nodeIDs are diverse in
geography, ownership, etc
Assumption: nodeID is uniform in the ID space
Presented
as a sequence of digits with base
2b
b is a configuration parameter (4)
Design of Pastry (cont’)
Message/query has a numeric key of
same length with nodeIDs
Key
is presented as a sequence of digits
with base 2b
Route: a message is routed to the node
with a nodeID that is numerically closest
to the key
Destination of Routing
Message
Key = 10
20
23
31
03
12
Destination node
Pastry Schema
Given a message of key k, a node A
forwards the message to a node whose
ID is numerically closest to k among all
nodes known to A
Each node maintains some routing
state
Pastry Node State
NodeID 10233102
A leaf set L
A routing table
A neighborhood
set M
Leaf set
SMALLER
LARGER
10233033
10233001
10233020
10233000
10233120
10233230
10233122
10233232
-2-2301203
1-2-230203
2
102-2-2302
1023-2-120
10233-2-32
102331-2-0
2
-3-1203203
1-3-021022
10-3-23302
3
3
11301233
31203203
31301233
33213321
Routing table
-0-2212102
0
10-0-31203
102-0-0230
1023-0-322
10233-0-01
0
1
1-1-301233
10-1-32102
102-1-1302
1023-1-000
1
Neighborhood set
13021022
02212102
10200230
22301203
Meanings of ‘Close’
20
31
Closest according to
proximity metric
(real distance )
Nearest Neighbor
23
Closest according to
numerical meaning
Node with closet nodeID
03
12
Pastry Node State
A leaf set
|L|
nodes with closest nodeIDs
|L|/2 larger ones and |L|/2 smaller ones
Useful
in message routing
A neighborhood set
|M|
nearest neighbors
Useful in maintaining locality properties
Leaf Set and Neighborhood Set
A
In this example
b=2, l=8
|L| = 2 × 2b
=8
|M| = 2 × 2b
=8
NodeID 10233102
Leaf set
SMALLER
LARGER
10233033
10233001
10233021
10233000
10233120
10233230
10233122
10233232
-2-2301203
1-2-230203
2
102-2-2302
1023-2-120
10233-2-32
102331-2-0
2
-3-1203203
1-3-021022
10-3-23302
3
3
11301233
31203203
31301233
33213321
Routing table
-0-2212102
0
10-0-31203
102-0-0230
1023-0-322
10233-0-01
0
1
1-1-301233
10-1-32102
102-1-1302
1023-1-000
1
Neighborhood set
13021022
02212102
10200230
22301203
Routing Table
A
l rows and 2b
columns
ith
row: i-prefix
jth column: next
digit after the
prefix is j
b=2 l=8->
8 rows and 4
columns
j=0
2nd
NodeID 10233102
Leaf set
SMALLER
LARGER
10233033
10233001
10233021
10233000
10233120
10233230
Routing table
-0-2212102
0
10-0-31203
102-0-0230
1023-0-322
10233-0-01
0
j=3
j=1
1
1-1-301233
10-1-32102
102-1-1302
1023-1-000
1
10233122
10233232
-2-2301203
1-2-230203
2
102-2-2302
1023-2-120
10233-2-32
102331-2-0
2
-3-1203203
1-3-021022
10-3-23302
3
3
11301233
31203203
31301233
33213321
Neighborhood set
13021022
02212102
10200230
22301203
Routing
Step1: If k falls within the
range of nodeIDs covered
by A’s leaf set, forwarded
it to a node in the leaf set
whose nodeID is closest
to k
Eg. k = 10233022
falls in the range
(10233000,10233232)
Forword it to node10233021
If k is not covered by the
leaf set, go to step2
A
NodeID 10233102
Leaf set
SMALLER
LARGER
10233033
10233001
10233021
10233000
10233120
10233230
10233122
10233232
-2-2301203
1-2-230203
2
102-2-2302
1023-2-120
10233-2-32
102331-2-0
2
-3-1203203
1-3-021022
10-3-23302
3
3
11301233
31203203
31301233
33213321
Routing table
-0-2212102
0
10-0-31203
102-0-0230
1023-0-322
10233-0-01
0
1
1-1-301233
10-1-32102
102-1-1302
1023-1-000
1
Neighborhood set
13021022
02212102
10200230
22301203
Routing
Step2: The routing table
is used and the message
is forwarded to a node
whose ID shares a longer
prefix with the k than A’s
nodeID does
Eg. k = 10223220
forward it to node 10222302
If the appropriate entry in
the routing table is empty,
go to step3
A
NodeID 10233102
Leaf set
SMALLER
LARGER
10233033
10233001
10233021
10233000
10233120
10233230
10233122
10233232
-2-2301203
1-2-230203
2
102-2-2302
1023-2-120
10233-2-32
102331-2-0
2
-3-1203203
1-3-021022
10-3-23302
3
3
11301233
31203203
31301233
33213321
Routing table
-0-2212102
0
10-0-31203
102-0-0230
1023-0-322
10233-0-01
0
1
1-1-301233
10-1-32102
102-1-1302
1023-1-000
1
Neighborhood set
13021022
02212102
10200230
22301203
Routing
Step3: The message is
forwarded to a node in
the leaf set, whose ID
has the same shared
prefix as A but is
numerically closer to k
than A
Eg. k = 10233320
forward it to node10233232
If such a node does not
exist, A is the
destination node
A
NodeID 10233102
Leaf set
SMALLER
LARGER
10233033
10233001
10233021
10233000
10233120
10233230
10233122
10233232
-2-2301203
1-2-230203
2
102-2-2302
1023-2-120
10233-2-32
102331-2-0
2
-3-1203203
1-3-021022
10-3-23302
3
3
11301233
31203203
31301233
33213321
Routing table
-0-2212102
0
10-0-31203
102-0-0230
1023-0-322
10233-0-01
0
1
1-1-301233
10-1-32102
102-1-1302
1023-1-000
1
Neighborhood set
13021022
02212102
10200230
22301203
Routing
The routing procedure always converges, since
each step chooses a node that
Shares
a longer prefix
Shares the same long prefix, but is numerically
closer
Routing performance
expected number of routing steps is log2bN
Assumption: accurate routing tables and no recent
node failures
The
Performance
Average number of routing hops versus number of Pastry nodes
b = 4, |L| = 16, |M| =32 and 200,000 lookups.
Discussion of Pastry
Pastry: the parameters make it flexible
b
is the most important parameter that
determines the power of the system
Trade-off between the routing efficient (log2bN)
and routing table size (log2bN×2b)
Each
node can choose its own |L| and |M|
based on the node situation
Discussion of Pastry
– routing schema
Local optimal??
Eg. k = 10233200
X’ nodeID = 10233232
Y’ nodeID = 10233133
Dis(k, X’ID) =
(10233200, 10233232) = 32
Dis(k, Y’ID) =
(10233200, 10233133) = 1
Local optimal node is Y
Pastry forward to node X
A
NodeID 10233102
Leaf set
SMALLER
LARGER
10233033
10233001
10233021
10233000
10233120
10233132
10233122
10233133
-2-2301203
1-2-230203
2
102-2-2302
1023-2-120
10233-2-32
102331-2-0
2
-3-1203203
1-3-021022
10-3-23302
3
3
11301233
31203203
31301233
33213321
Routing table
-0-2212102
0
10-0-31203
102-0-0230
1023-0-322
10233-0-01
0
1
1-1-301233
10-1-32102
102-1-1302
1023-1-000
1
Neighborhood set
13021022
02212102
10200230
22301203
P-Grid [Aberer2001]
P-Grid is a scalable access structure for P2P
Hash-based
& virtual binary search tree
Randomized algorithms are used for constructing
the access structure
Query
k=100
0
1
Virtual binary tree
00
01
10
11
1
6
2
3
4
5
1 :3
01:2
1 :5
01:2
1 :4
00:6
0 :2
11:5
0 :6
11:5
0 :6
10:4
P-Grid (cont’)
Properties
Complete
decentralized
Scalable with the total number of nodes and
data items
Fault-resilient, search is robust against
failures of nodes
Efficient search
Discussion of Pastry and P-Grid
The two system both make uniform
assumption
Pastry:
ID space
P-Grid: data distribution and behavior on
peer
If data/message/query distribution is
skewed, Pastry and P-Grid are not able
to balance the load
Outline
Hashed based techniques in P2P
Hashed
Pastry
P-Grid
Two
based structured P2P system
important issues
Load balancing
Neighbor table consistency preserving
Comparison
Skip-list based system
SkipNet
of DHT techniques
Conclusion
Load Balancing
Consider a DHT P2P system with N nodes
Θ(logN)
imbalance factor if items IDs are
uniformly distributed [SMKKB2001]
Even worse if applications associate
semantics with the item IDs
IDs would no longer be uniformly distributed
How to
Minimize
the load imbalance?
Minimize the amount of load moved?
Load Balancing
Challenges
Data
items are continuously inserted/deleted
Nodes join and depart continuously
The distribution of data item IDs and item
sizes can be skewed
Solution—[GLSKS2004]
Load Balancing
Virtual server
Represents
a peer in the DHT rather than physical
node
A physical node hosts one or more virtual server
Total load of virtual servers = load of node
E.g., in Chord
Virtual Server
0
7
FT1
1
2
6
5
3
4
FT3
Node:
Physical Node
Load Balancing
Basic idea
Directories
To store load information of the peer nodes
Periodically schedule reassignments of virtual
servers
Distributed load balancing problem
reduced to
Centralized problem at each directory
Load Balancing
Load balancing algorithm
Nod
e
Directory
ID (known to
all nodes)
Randomly chooses a directory
Receives information from nodes
Computes a schedule of virtual server
transfers among nodes contacting it in
order to reduce their maximal utilization
Delay T time
directory in
new cycle OR
utilization>Ke
yes
Send to directory:(1)Loads of
Emergency load balancing
all virtual servers that it is
responsible for (2)Capacity
Load Balancing
Load balancing algorithm (cont.)
Computing
optimal reassignment is NP-
complete
Greedy algorithm O(mlogm)
For each heavily loaded node, move the least
loaded virtual server to pool
For each virtual server in pool, from heaviest to
lightest, assign to a node n which minimizes the
resulting load
Load Balancing
Performance
Tradeoff:
Load movement vs. Load balancing
Load balancing: max node utilization
When T decreases
Max node utilization decreases
Load movement increases
Effective
in achieving load balancing for
System utilization as high as 90%
Only transfer 8% of the load that arrives in the
system
Emergency
load balancing is necessary
Consistency Preserving
Neighbor table
A table
of neighbor pointers
For efficient routing in a P2P system
Challenge
How
to maintain consistent neighbor tables in
a dynamic network where nodes may join,
leave and fail concurrently and frequently?
Consistency Preserving
Consistent network
For
every entry in neighbor tables, if there
exists at least one qualified node in the
network, then the entry stores at least one
qualified node
Qualified node for an entry of a node’s neighbor
table: the node whose ID has suffix same as the
required suffix of that entry
Otherwise,
the entry is empty
Consistency Preserving
K-consistent network
For
every entry in neighbor tables, if there
exist H qualified nodes in the network, then
the entry stores at least min(K,H) qualified
nodes
Otherwise, the entry is empty
For K>0, K-consistency => consistency
1-consistency = consistency
Consistency Preserving
General strategy
Identify
a consistent subnet as large as
possible
Only replace a neighbor with a closer one if
both of them belong to the subnet
Expand the consistent subnet after new
nodes join
Maintain consistency of the subnet when
nodes fail
Consistency Preserving
Approach of [LL2004b]
To
To
design a join protocol such that
An initially K-consistent network remains Kconsistent after a set of nodes join process
terminate
The termination of join implies the node joined
belong to this consistent subnet
design a failure recovery protocol that
Recovers K-consistency of the subnet by repairing
holes left by failed neighbors with qualified nodes in
the subnet
Protocol is presented in the paper [LL2004a], but
integrated with join in experiment of this paper
Consistency Preserving
Join protocol
Each
node has a status
copying, waiting, notifying, cset_waiting, in_system
S-node: node in status in_system
T-node: otherwise
All
S-nodes form a consistent subnet
Consistency Preserving
Copy neighbor infor from S-nodes to fill in most
copying entries of its table level by level.
When cannot find a qualified S-node for a level i>=1
Try to find an S-node which shares at least the
waiting
rightmost i-1 with x and stores x as a neighbor
When find such a node, say y
Seek and notify nodes that share the rightmost j
notifying digits with it, where j is the lowest level that x is
stored in y’s table
When finish notifying
Wait for the nodes joining currently and are
cnet_wating
likely to be in the same consistent subnet
When confirm all nodes have exited notifying status
in_system
Consistency Preserving
Performance
p-ratio
In x’s table, the primary-neighbor of the entry is y,
the true primary-neighbor should be z
p-ratio = delay from x to y / delay from x to z
K-consistency
is always maintained in all
experiments
When K increases, p-ratio decreases
More neighbor infor is stored => more messages
Even
with massive joins and failures, tables are
still optimized greatly
Outline
Hashed based techniques in P2P
Hashed
Pastry
P-Grid
Two
based structured P2P system
important issues
Load balancing
Neighbor table consistency preserving
Comparison
Skip-list based system
SkipNet
of DHT techniques
Conclusion
Comparing DHTs [DGPR2003]
Each DHT Algorithm has many details making it difficult
to compare. We will use a component-base analysis
approach
Break DHT design into independent components
Analyze impact of each component choice separately
Two types of components
Routing-level : neighbor & route selection
System-level : caching, replication, querying policy, latency
Metrics Used
Metrics used in comparison
Flexibility – Options in choosing neighbors and routes
Resilience – Does it route when nodes goes down ?
Load balancing – Is the content distributed ?
Proximity & Latency – Is the content stored nearby ?
Aspects of DHT
Geometry - a structure that inspires a DHT design,
Distance function –distance between two nodes
Algorithm: rules for selecting neighbors and routes using the
distance function
Algorithm & Geometry
What is routing algorithm & geometry ?
Routing Algorithm – refers to exact rules for selecting neighbors,
routes. (eg. Chord, CAN, PRR, Tapestry, Pastry)
Geometries – refers to the algorithms’ underlying structure
derived from the way in which neighbors and routes are chosen.
(Eg. Chord routes on a ring).
Why is geometry important ? Geometry capture flexibility
in selection of neighbors and routes.
Neighbor selection – Does the geometry choose neighbors
based on proximity ? Leads to shorter paths.
Route selection – Number of options for selecting next hops.
Leads to shorter, reliable paths.
DHT Algorithms Analysis
The table summarizes the
geometries & algorithms.
Geometry
Algorithm
Tree
PRR
We will examine the metric
flexibility in these two aspects
Hypercube
CAN
Butterfly
Viceroy
Ring
Chord
XOR
Kademlia
Hybrid
Pastry
Flexibility in neighbor selection
Flexibility in route selection
011
root
0
111
7
010
0
110
1
6
001
01
10
11
000
0
2
1
101
5
00
root
1
100
3
4
00
01
10
11
Tree Geometry
PRR uses tree geometry.
Distance between two nodes is the depth of the binary
tree (Well-balanced tree : log N)
Node selection flexibility - has 2(i-1) options of choosing
neighbor at distance i.
No routing flexibility
root
Height = 2
0
Leafset
1
Height = 1
00
01
10
11
Hypercube Geometry
011
111
CAN uses a d-torus hypercube.
010
110
Each node has log n neighbor.
001
101
Routing greedily by correcting bits in
any order.
000
100
Neighbors differ by exactly one bit.
No flexibility in choosing neighbors.
Routing from source to destination at log n distance.
First node has log n next hop choices, second hop has
log (n – 1) choices. Hence (log n)! choices
Butterfly Geometry
Viceroy uses butterfly geometry.
Nodes organized in a series of log n “stages” where all
the nodes at stage i are capable of correcting the ith bit.
Routing consists of 3 phases. Done in O(log N) hops
No flexibility in route selection and neighbor selection.
Ring Geometry
0
Chord uses the Ring
7
1
Maintain log n neighbors and
6
2
routes to arbitrary destination in
log n hops. Routing in O(log n) hops
5
3
4
Flexibility in neighbor selection, has
2(i-1) possible options to pick its ith neighbor
An approx of nlog n / 2 possible routing tables for each node
Yields (log n)! possible routes to route from
a source to destination of distance log n.
Ring Geometry
111
000
110
110
001
010
101
011
100
To route from 000 to 110, we have two routes.
Route
Route
to 100 and then to 110.
to 010 and then to 110.
XOR
Kademlia uses XOR Geometry.
Distance between nodes is XOR of their identifier.
Node has 2(i-1) options of choosing neighbor at ith
distance. Yields approx nlog n / 2 entries per routing table.
Route flexibility by fixing lower order bits before fixing the
higher bits if an optimal path is not available. May result
in longer distances as as the lower order bits fixed need
not be preserved by later routing.
Hybrid
Pastry is a hybrid. Its nodes are regarded as both leaves
of a binary tree and points to a one-dimensional circle.
Distance between nodes is either the tree distance and
cyclic distance between nodes
Node has 2(i-1) options of choosing neighbor at distance
i. Yields approx n((log n) / 2) entries per routing table.
root
Route selection freedom – allowed
to take hops on the ring – these paths
0
1
might not retain the O(log n) bound
on routes.
00
01
10
11
Flexibility Overview
Property
Tree
Hypercube
Ring
Butterfly
Xor
Hybrid
nlog n / 2
1
nlog n / 2
1
nlog n / 2
nlog n / 2
Route Selection (optimal)
1
c1(log n)
c1(log n)
1
1
1
Natural support for
sequential neighbors?
no
no
yes
no
no
Deafult – no
Fallback – yes
Neighbor selection
Ring & Hypercube have twice the routing flexibilities
than Hybrid & XOR geometries
Resilience
Two aspects of robust routing
Static resilience measures how well the algorithm can route in
a dynamic environment before the recovery algorithms.
Dynamic recovery measures how quickly states are recovered
after failure.
100
Node failure- 30% failure
80
Tree - 90% routes failed
(no route selection flexibility)
Ring, Hypercube –
7% routes failed
(most route selection flexibility)
Hybrid, XOR - 20% route failed
(half flexibility as ring)
% Failed Paths
XOR
Tree
Hypercube
60
Hybrid
40
Ring
20
0
0
10
20
30
Route Selection Flexibility affects static resilience
40
50
60
% Failed Nodes
70
80
90
Path Latency
Goal is to minimise end-to-end latency of overlay
networks. Two proximity methods are considered.
Proximity Neighbor Selection (PNS)
Proximity Route Selection (PRS)
Neighbors are chosen on their proximity.
Routes are selected depending on the proximity of the neighbors
PNS achieves improvement over PRS which achieves
improvement over Plain version.
Geometry does not affect performance of PNS / PRS.
Thus it is important to choose a routing algorithm that has a
geometry that accommodates PNS.
Local Convergence
Does messages sent from two nodes to the same
destination converge at a node near the two sources ?
Leads to low latencies in the following:
Overlay Multicast
Caching
Server selection
Measured by number of exit points in the network.
Best case, only one node sends a message off-domain.
Limitations & Findings
Limitations
Findings
Author has not considered all geometries
Not considered other factors and performance metrics
Routing geometry is important.
Flexibility is improves resilience & proximity.
Why not the RING ?
Great flexibility to choose neighbors and routes. Implement both
the proximity methods PNS & PRS.
Highest performance in resilience tests and is as good as other
geometry in path lengths and local convergence.
Outline
Hashed based techniques in P2P
Hashed
Pastry
P-Grid
Two
based structured P2P system
important issues
Load balancing
Neighbor table consistency preserving
Comparison
Skip-list based system
SkipNet
of DHT techniques
Conclusion
Skip List [PSL1990]
Skip list are data structures that can be used in place of
balanced trees. Uses probabilistic balancing techniques
hence algorithms are simpler and faster.
Described as a sorted linked list in which some nodes
are supplemented with pointers that skip over many list
elements.
HDR
NIL
2
5
29
16
9
23
25
27
Perfect Skip List
A perfect skip list is one where the height of the ith node is the
exponent of the largest power-of-two that divides i. Pointers at level
h have length 2h. A perfect skip list supports searches in O(log N).
HDR
NIL
2
5
29
16
9
Height is 2 : (22)
23
25
27
Height is 3 : (23)
Level 2 pointer skips over 22 nodes
Because it is expensive to perform insertion and deletions in a
perfect skip list, a probabilistic balanced skip list is proposed by
consulting a random number generator.
Examples
HDR
NIL
Add Node 10 (height is 1 chose randomly)
HDR
10
NIL
10
NIL
Add Node 5 (height is 0 chose randomly)
HDR
5
Add Node 8 (height is 2 chose randomly)
HDR
8
5
NIL
10
Add Node 12 (height is 0 chose randomly)
HDR
8
5
10
NIL
12
Add Node 2 (height is 0 chose randomly)
HDR
8
2
5
10
NIL
12
Search Skip List
HDR
NIL
2
•
•
•
5
29
16
9
23
25
27
Search for Node 30. From HDR to Node 29. Then stop
and search fails. (illustrated)
Search for Node 23. From HDR to Node 16. Drop two
levels, From Node 16 to Node 23. Found.
Search for Node 27. From HDR to Node 16. Drop one
level, From Node 16 to Node 25. Drop one level, from
Node 25 to Node 27. Found.
Skip List
Worst case performance when significantly unbalanced.
Space efficient. Can use 1.33 pointers per element.
Maintains a O(log N) searches with high probability.
Comparison with AVL, recursive 2-3 & self adjust trees
Skip List performs more comparison than other methods.
Skip List is slightly slower than AVL trees in searches, but
insertions and deletions in a skip list are faster
Skip Lists are faster than self adjusting tree when a
uniform distribution is encountered, but slower for highly
skewed distributions
SkipNet Introduction [SNL2003]
In DHTs, we cannot control where the data will be stored
Data might be stored far away from the administrative domain
and thus hard to administer privileges. – Can we adapt ?
Gives rise to Denial of service attacks and traffic analysis.
Solution : Use SkipNet - scalable overlay network that
provides controlled data placement and guarantee
routing locality by organizing data by string names
Content can be placed on pre-defined node or distributed
uniformly across nodes of a hierarchical naming subtree.
Motivation
Disadvantages of Chord, CAN, Tapestry, Pastry:
No Content locality:
No Path locality:
Explicitly place data on a specific overlay nodes or distribute it
across nodes in a specified domain.
Cannot be prone to traffic analysis & Denial of service attacks
Guarantees that routing path between two overlay nodes in a
domain does not leave the domain.
Additional security – the traffic does not passed on to other domain
which could be its competitor.
SkipNet provides both content & path locality.
How does SkipNet do it?
Employs a string name and numeric ID space.
Node names and content identifier string mapped into name ID
Hashes of the node names and content identifiers mapped into
the numeric ID.
By arranging content in name ID order rather than
dispersing it, we can achieve content & path locality.
Advantages of locality
Improved availability
Performance
Searches are faster as data is stored near nodes.
Manageability
data stored within organisation and can search even if the
network disjoints.
Resilience against Internet failures. Nodes within a cluster
gracefully survives failures that disconnect clusters from the rest
of the Internet (useful property of SkipNet)
facilitates control and maintenance in an administrative domain
Security
Can deal with traffic analysis & denial of service attacks.
SkipNet Structure
Adapts the skip list structure
Other enhancements.
Traversals start from any node
State and processing costs should be the same for all nodes
We use a Ring & doubly linked list.
Each node also stored 2 log N pointers rather than a high
variable number of pointers.
SkipNet
Perfect : Pointers at level h point to nodes that are exactly 2h
nodes to the left and right.
Probabilistic : A node in level h probabilistically determines which
ring it belongs to.
SkipNet Structure
010
110
D
M
101
O
000
001
A
T
Z
100
X
V
111
011
Level
Level
2
T
T
2
D
D
1
M
X
1
Z
O
0
D
Z
0
X
T
SkipNet nodes ordered by name ID. Routing tables of nodes A and V shown.
SkipNet Structure
Ring
000
Ring
001
Ring
010
Ring
011
Ring
100
Ring
101
M
A
Ring
110
O
Ring
111
D
L=3
T
X
Z
V
M
A
Ring 00
O
Ring 01
X
T
O
D
L=1
Ring 1
T
Ring 0
Z
X
D
M
Z
X
V
O
Root Ring
A
L=2
Ring 11
V
M
A
D
Ring 10
Z
T
Level
L=0
V
The full SkipNet routing infrastructure for an 8 node system, including the ring labels.
Routing By Name ID
Similar to search in Skip Lists
Message routed from highest level pointer in either clockwise /
counter clockwise direction with name ID that are not past the
destination value.
Terminates when messages arrives at a node whose name ID is
closest to destination.
Because nodes are doubly linked, scheme routes either to left or
right pointers depending on name ID’s.
Number of hops is O(log N)
Level
Example
2
A
A
101
1
X
M
O
0
V
O
010
110
D
M
000
001
A
Z
Level
T
2
T
T
1
M
X
0
D
Z
100
X
V
111
011
Routing a message from Node A to Node V
Path:
A (Level 2, clockwise) T, “T” < “V”
T (Level 2, clockwise) Failed
T (Level 1, clockwise) Failed
T (Level 0, clockwise) V. (Destination)
Level
2
D
D
1
Z
O
0
X
T
Routing Algorithm
SendMsg(nameID, msg) {
if( LongestPrefix(nameID,localNode.nameID)==0 )
msg.dir = RandomDirection();
else if( nameID<localNode.nameID )
msg.dir = counterClockwise;
else
msg.dir = clockwise;
msg.nameID = nameID;
RouteByNameID(msg);
}
// Invoked at all nodes (including the source and
// destination nodes) along the routing path.
RouteByNameID(msg) {
// Forward along the longest pointer
// that is between us and msg.nameID.
h = localNode.maxHeight;
while (h >= 0) {
nbr = localNode.RouteTable[msg.dir][h];
if (LiesBetween(localNode.nameID, nbr.nameID,
msg.nameID, msg.dir)) {
SendToNode(msg, nbr);
return;
}
h = h - 1;
}
// h<0 implies we are the closest node.
DeliverMessage(msg.msg);
}
Routing By Numeric ID
Routing begins at level 0 ring until a node is found
whose numeric ID matches the destination numeric ID in
the first digit.
Messages forwarded from ring in level h, Rh, to a ring in
level h+1, Rh+1, such that nodes in Rh+1 share h+1 digits
with destination numeric ID.
Terminates when
Deliver message to node with numeric ID = key
If none of the nodes in Rh share h+1 digits with destination
numeric ID then we pick node with numeric ID that is closest to
destination’s numeric ID.
Number of message hops is O(log N),
Routing By Numeric ID
Ring
0000
Ring
0001
A
Ring
0100
M
Ring
0101
Ring
1000
Ring
1001
O
Z
V
M
A Ring 00
T
O
Ring 01
X
A
Ring 0
X
Ring 11
V
O
D
Ring 1
T
Z
V
………………….
D
A
M
O
T
Root Ring
Z
D
Ring 10
Z
M
Ring
1101
T
X
Ring
1100
D
X
V
E.g. Let Z = 1000, O = 1001.
Route from A 1011.
Path: A(0000) D (1100 – move up level) O (1001 – move up level) Z
(1000) O (1001 – closest match for 1011) (deliver).
Routing Algorithm
// Invoked at all nodes (including the source and destination nodes) along the routing path.
// Initially: msg.ringLvl = -1, msg.startNode = msg.bestNode = null & msg.finalDestination = false
RouteByNumericID(msg) {
if (msg.numID == localNode.numID || msg.finalDestination) {
DeliverMessage(msg.msg);
return;
}
if (localNode == msg.startNode) { // Done traversing current ring.
msg.finalDestination = true;
SendToNode(msg.bestNode);
return;
}
h = CommonPrefixLen(msg.numID, localNode.numID);
if (h > msg.ringLvl) { // Found a higher ring.
msg.ringLvl = h;
msg.startNode = msg.bestNode = localNode;
} else if ( abs(localNode.numID - msg.numID) < abs(msg.bestNode.numID - msg.numID)) {
// Found a better candidate for current ring.
msg.bestNode = localNode;
}
// Forward along current ring.
nbr = localNode.RouteTable[clockWise][msg.ringLvl];
SendToNode(nbr);
}
Benefits
Skip Net support routing with the same data structure by
name ID
numeric ID
Bottom ring is sorted by name ID and top rings are
sorted by numeric ID.
For a given node, the SkipNet rings to which it belongs
to precisely form a Skip List that is a ring & double
linked.
Node Joins & Departure
Node Joins
A New node finds top level ring that matches its numeric ID.
Finds a neighbor in the top ring using name Id search.
Starting from one of the neighbors, it searches for its name ID at
the next lower level and thus finds neighbors at lower level.
Repeated until it reaches root.
The existing nodes only point to the new node only after it has
joined the root ring.
Insertion traverse O(log N) hops with high probability
Node Departure
Can route correctly as long as root level ring is maintained.
Other levels regarded as optimization hints and it maintains
upper-ring membership thru background repair process.
Example
Ring
000
Ring
001
Ring
010
Ring
011
Ring
100
Ring
101
M
A
O
A
L=3
V
Ring 00
T
O
Ring 01
X
A
Search by numeric ID 101
Z
D
M
Z
X
Highest attainable level is 2
O joins ring containing Z at level 2
Z forwards join message to D at next lower level 1
V
O
Root Ring
A
D, V are neighbors in level 1
M, T are neighbors in level 0
L=1
Ring 1
T
T
V
Proceed by searching by name ID in next lower levels
O
D
Ring 0
L=2
Ring 11
V
X
Join - Insert node O (101)
D
Ring 10
Z
M
D
Z
M
Ring
111
T
X
Ring
110
Level
L=0
Properties of SkipNet
Content & Path Locality
Naming nodes like a DNS entry. Path locality for groups in which nodes
share a single DNS suffix.
Incorporating node name ID into content name gurantees that the content
will be hosted on that node.
E.g. reversing DNS names: john.microsoft.com becomes com.microsoft.john
E.g. com.microsoft.john/doc-name
Constrained Load Balancing
Stored using two parts – a CLB Domain and CLB suffix
Searching node
For example a doc using the name msn.com/DataCenter!TopStories.html.
Search for node in the CLB Domain using name ID search. Then search by
numeric ID for the hash of the CLB suffix constrained by domain ID.
Search is constrained by a nameID prefix, we use the double link list.
This type of search affect the performance by a factor of 2.
Performed over a naming subtree but not over arbitrary subset of nodes.
Properties of SkipNet
Fault tolerance:
Only need to maintain correct neighbors at Level 0
Failure across organizational boundaries only segments the
overlay. Gracefully survives.
Security:
Each node has 16 neighbors at Level 0.
Level 0 repaired easily by contacting life nodes.
Employs background stabilization mechanisms when failure
Nodes cannot create global names containing suffix of registered
domains.
Path locality avoids traffic analysis
However, outbound traffic still prone to analysis easily.
Range queries:
Ability to perform queries over contiguous ring segments.
Enhancements
Use Sparse & Dense Routing Table
Duplicate pointer elimination
Use a density parameter k & a non-binary random digit to the
base k for numeric ID.
Remove duplicate pointers in the routing table. 25%
improvements can be achieved.
Incorporate Network proximity for routing by name id
Introduce a P-table for proximity routing. The goal of P-table is to
maintain routing in O(log ) hops.
Ensures that each hop has low latency. Keeps track of the
network distance that are close to itself.
Enhancements
Incorporate Network proximity for routing by numeric id
Add a C-table to incorporate network proximity when searching
by numeric ID.
Keeps track of nodes that are close and within CLB domain.
Design Alternative
IP routing & DNS
o
Content placement by routing using IP and DNS lookup.
Single Overlay Network
o
o
o
o
Content locality, we name node with the hash of the data’s
object’s name. Requires separate routing table for each object
Use 2 part naming scheme –content name consist of node
addresses concatenated with node-relative names. Does not
support guaranteed path locality
Add constraints to message to limit path locality. However
prevents routing from being consistent.
Use a 2 part segments, use numeric ID and name ID like
SkipNet. Result is a static form of constrained load balancing.
Design Alternative
Multiple overlay network
o
o
o
Multiple overlays with membership could be considered.
Requires that access to other overlays are by gateways.
Access to data is constrained and load balanced within a single
overlay not accessible to clients outside except via gateways.
SkipNet provides explicit content placement, allows
clients to dynamically define new DHTs over any name
prefix scope and guarantees path locality within shared
name prefix within a single infrastructure.
Experiments
The author run experiments against the following:
Basic SkipNet using only R-Table
Full SkipNet using R-Table, P-Table, C-Table.
Pastry
Chord
We use the following lookup performance metrics
Relative Delay Penalty (RDP) - latency of overlay path compare to IP
Physical network hops - length of the overlay path measured in IP hops
Number of failed lookups
Other metrics (refer to paper)
Format of node name
Organisation size
Models for distribution of nodes and data
Using host or organisation generated node name
Simulation of domain isolation by failing organization’s link
Experiment Results
Basic routing costs
Full SkipNet and Pastry are locality aware while basic SkipNet
and Chord are not. Hence performed better.
Non-uniform distribution of data does not affect performance.
Chord
Basic SkipNet
Full SkipNet
Pastry
16.3
41.7
102.2
63.2
Routing Entries per Node
Locality of Placement
Measures physical network hops.
Chord and Pastry have constant physical hops because they are
oblivious to locality of data since they diffuse data throughout
network.
SkipNet shows performance improvements as the locality of the
data references increased.
Experiment Results
Fault Tolerance – when organisation disconnected
Constrained Load Balancing (within a domain)
Locality improves fault tolerance.
Chord, Pastry fails totally for local lookups at data diffused
SkipNet functions and does local lookups
Studies the Relative Delay Penalty (RDP) as node increases
Basic CLB using R-Table cause higher delays penalties
Full CLB causes intermediate delays penalties
Pastry has low delay penalties.
Network proximity
Study the effect of RDP over density k which control P-Table
entries.
We notice that RDP levels off after k=8 because of the increase
of pointers in P-Table
SkipNet Summary
SkipNet is the first p2p system that achieves both path
and content locality. Provides content locality at desired
degree and granularity.
Clustering node names allows SkipNet to perform
gracefully in face of linkages failure.
Performance is similar to other p2p systems such as
Chord and Pastry under uniform access patter.
Under access patterns where intra-organisation traffic
predominates, SkipNet performs better.
SkipNet is also more resilience to network partitions than
other p2p.
Conclusion
Looked at hashed based techniques in
P2P
Pastry
P-Grid
Two important issues
Load
balancing
Neighbor table consistency preserving
Comparison of DHT techniques
SkipNet – A Skip List Adaption
References
[CAN2001] Sylvia Ratnasamy; Paul Francis; Mark Handley; Richard Karp; Scott
Shenke. A Scalable Content-Addressable Network. SIGCOMM’01, August 2731, 2001.
[CPLS2001] Ion Stoica Robert Morris, David Karger, M. Frans Kaashoek, Hari
Balakrishnan Chord: A Scalable Peertopeer Lookup Service for
InternetApplications. SIGCOMM’01, August 27-31, 2001.
[CSWH2000] I. Clarke, O. Sandberg, B. Wiley, and T. W. Hong, “Freenet: A
distributed anonymous information storage and retrieval system”, Proc. of ICSI
Workshop on Design Issues in Anonymity and Unobservability, 2000.
[DRGR2003] K. Gummadi, R. Gummadiy, S. Gribble, S. Ratnasamy, S. Shenker, I.
Stoicak, The Impact of DHT Routing Geometry on Resilience and Proximity,
SIGCOMM’03, August 25–29, 2003.
[LL2004a] S. S. Lam and h. Liu. Failure recovery for structured P2P networks:
Protocol design and performance evaluation. In Proc. Of ACM SIGMETRICS,
June 2004.
[LL2004b] Consistency-preserving Neighbor Table Optimization for P2P Networks,
Technical Report TR-04-01, Dept. of CS, Univ. of Texas at Austin, January 2004.
References (cont.)
[GLSKS2004] Load Balancing in Dynamic Structured P2P Systems, Proc. of IEEE
INFOCOM, Portland, Oregon, USA, 2004.
[PSL1990] William Pugh. Skip lists: A probabilistic alternative to balanced trees.
Communications of the ACM, June 1990 supported by an AT&T Bell Labs
Fellowship and by NSF grant CCR–8908900.
[RD2001] A. Rowstron and P. Druschel, “Pastry: Scalable, decentralized object
location and routing for large-scale pear-to-per systems”. In Proc. of the 18th
IFIP/ACM International Conf. on Distributed Systems Platforms, November 2001.
[SMKKB2001] I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, H. Balakrishnan,
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, Proc.
Of SIGCOMM ’01, San Diego, California, USA
[SML+2004] I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F.
Dabek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup service for
internet applications”, Proc. of the 2001 ACM Annual Conference of the Special
Interest Group on Data Communication (ACM SIGCOMM’01), 2001.
[SNL2003] Nicholas J.A. Harvey, Michael B. Jones, Stefan Saroiu, Marvin Theimer,
Alec Wolman. SkipNet: A Scalable Overlay Network with Practical Locality
Properties. Proceedings of the Fourth USENIX Symposium on Internet
Technologies and Systems (USITS '03), Seattle, WA. March 2003