The Dping Scalable Membership Service

Transcript The Dping Scalable Membership Service

The Dping Scalable Membership
Service
Indranil Gupta
Ashish Motivala
Abhinandan Das
Cornell University
Group Membership Service
Process Group
pi
X
Asynchronous
Lossy Network
X
Join
Leave
Failure
pj
pi
pj’s Membership list
System Design
• Join, Leave, Failure : broadcast to all
processes
• Need to detect a process failure at some
process quickly (to be able to broadcast it)
• Failure Detector Protocol Specifications
– Detection Time
– Accuracy
– Load
Specified by application
designer to Dping
Optimized by Dping
Dping Failure Detector Protocol
pi
pj
K random
processes
X
Protocol period
= T time units
X
Properties
e
• Expected Detection time = T.
e 1
• Load: O(K) per process
– Inaccuracy probability exponential in K
• Process failures detected
– in O(log N) protocol periods w.h.p.
– in O(N) protocol periods deterministically
Properties
• Expected Detection time
e/(e-1) protocol periods
• Load: O(K) per process
– Inaccuracy probability exponential in K
• Process failures detected
– in O(log N) protocol periods w.h.p.
– in O(N) protocol periods deterministically
Why not Heartbeating ?
• Centralized : single failure point
• All-to-all : O(N) load per process
• Logical ring : unpredictability on multiple
failures
LAN Scalability
6
5
4
Experimental
Expected
3
2
1
22
20
18
16
14
12
10
8
6
4
0
2
Mean Time to Failure
Detection / RTT
7
Number of Processes
Win2000, 100 Base-T Ethernet LAN
Protocol Period = 3*RTT, RTT=10 ms, K=1
WAN Deployment
• Load on core routers
• No representatives per subnet/domain
• Broadcast ‘suspicion’ before ‘declaring’ process
failure
• Piggyback broadcasts through ping messages
– Epidemic-style broadcast

The Dping Scalable Membership Service

Transcript The Dping Scalable Membership Service

Directory