The Dping Scalable Membership Service

Download Report

Transcript The Dping Scalable Membership Service

The Dping Scalable Membership
Service
Indranil Gupta
Ashish Motivala
Abhinandan Das
Cornell University
Group Membership Service
Process Group
pi
X
Asynchronous
Lossy Network
X
Join
Leave
Failure
pj
pi
pj’s Membership list
System Design
• Join, Leave, Failure : broadcast to all
processes
• Need to detect a process failure at some
process quickly (to be able to broadcast it)
• Failure Detector Protocol Specifications
– Detection Time
– Accuracy
– Load
Specified by application
designer to Dping
Optimized by Dping
Dping Failure Detector Protocol
pi
pj
K random
processes
X
Protocol period
= T time units
X
Properties
e
• Expected Detection time = T.
e 1
• Load: O(K) per process
– Inaccuracy probability exponential in K
• Process failures detected
– in O(log N) protocol periods w.h.p.
– in O(N) protocol periods deterministically
Properties
• Expected Detection time
e/(e-1) protocol periods
• Load: O(K) per process
– Inaccuracy probability exponential in K
• Process failures detected
– in O(log N) protocol periods w.h.p.
– in O(N) protocol periods deterministically
Why not Heartbeating ?
• Centralized : single failure point
• All-to-all : O(N) load per process
• Logical ring : unpredictability on multiple
failures
LAN Scalability
6
5
4
Experimental
Expected
3
2
1
22
20
18
16
14
12
10
8
6
4
0
2
Mean Time to Failure
Detection / RTT
7
Number of Processes
Win2000, 100 Base-T Ethernet LAN
Protocol Period = 3*RTT, RTT=10 ms, K=1
WAN Deployment
• Load on core routers
• No representatives per subnet/domain
• Broadcast ‘suspicion’ before ‘declaring’ process
failure
• Piggyback broadcasts through ping messages
– Epidemic-style broadcast