Failure Detectors Ali Ghodsi – UC Berkeley / KTH alig(at)cs.berkeley.edu Modeling Timing Assumptions Tedious to model eventual synchrony (partial synchrony) Timing assumptions mostly needed to.
Download
Report
Transcript Failure Detectors Ali Ghodsi – UC Berkeley / KTH alig(at)cs.berkeley.edu Modeling Timing Assumptions Tedious to model eventual synchrony (partial synchrony) Timing assumptions mostly needed to.
Failure Detectors
Ali Ghodsi – UC Berkeley / KTH
alig(at)cs.berkeley.edu
Modeling Timing Assumptions
Tedious to model eventual synchrony (partial
synchrony)
Timing assumptions mostly needed to detect
failures
Heartbeats, timeouts, etc…
Use failure detectors to encapsulate timing
assumptions
11/7/2015
Black box giving suspicions regarding node failures
Accuracy of suspicions depends on model strength
Ali Ghodsi, alig(at)cs.berkeley.edu
2
Implementation of Failure Detectors
Typical Implementation
Periodically exchange heartbeat messages
Timeout based on some worst case msg round
trip
11/7/2015
If timeout, then suspect node
If recv msg from suspected node, revise suspicion
and increase time-out
Ali Ghodsi, alig(at)cs.berkeley.edu
3
Completeness and Accuracy
Two important types of requirements
1. Completeness requirements
Requirements regarding actually crashed nodes
When do they have to be detected?
2. Accuracy requirements
Requirements regarding actually alive nodes
When are they allowed to be suspected?
How to trivially achieve either? [d]
11/7/2015
Together they are impossible in an asynchronous system!
Ali Ghodsi, alig(at)cs.berkeley.edu
4
Formal Model of FD
Augment formal model with failure detectors (FD)
A configuration consists of
Transition function on node i gets extra parameter:
State of each node
FD-state of each node
FD-state of node i
FD-state updated in comp(i) by another function
11/7/2015
FD-function
Not modeled explicitly, but must satisfy some properties
Ali Ghodsi, alig(at)cs.berkeley.edu
5
Requirements: Completeness
Strong Completeness
Every crashed node is eventually detected by all
correct nodes
There exists a time after which all crashed
nodes are detected by all correct nodes
The book only studies detectors with this property
Is it realistic? [d]
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
6
Requirements: Completeness
Weak Completeness
Every crashed node is eventually detected by some
correct node
There exists a time after which all crashed
nodes are detected by some correct node
11/7/2015
Possibly detected by different correct nodes
Ali Ghodsi, alig(at)cs.berkeley.edu
7
Requirements: Accuracy
Strong Accuracy
For all nodes p and q,
No correct node is ever suspected
p does not suspect q, unless q has crashed
Is it realistic? [d]
Strong assumption, requires synchrony
I.e. no premature timeouts
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
8
Requirements: Accuracy
Weak Accuracy
There exists at least one correct node p
There exists a correct node which is never
suspected by any node
All nodes will never suspect p
Still strong assumption
One node is always “well-connected”
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
9
Requirements: Eventual Accuracy
Eventual Strong Accuracy
Eventual Weak Accuracy
After some finite time the detector provides weak accuracy
After some time, the requirements are fulfilled
After some finite time the FD provides strong accuracy
Prior to that, any behavior is possible!
Quite weak assumptions [d]
When can eventual weak accuracy be achieved?
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
10
Four Main Established Detectors
Four detectors with strong completeness
Perfect Detector (P)
Strong Accuracy
Synchronous Systems
Strong Detector (S)
Weak Accuracy
Eventually Perfect Detector (P)
Eventual Strong Accuracy
Asynchronous Systems
Eventually Strong Detector (S or Ω)
11/7/2015
Eventual Weak Accuracy
Ali Ghodsi, alig(at)cs.berkeley.edu
11
Four Less Interesting Detectors
Four detectors with weak completeness
Detector Q
Strong Accuracy
Synchronous Systems
Weak Detector (W)
Weak Accuracy
Eventually Detector Q (Q)
Eventual Strong Accuracy
Asynchronous Systems
Eventually Weak Detector (W)
11/7/2015
Eventual Weak Accuracy
Ali Ghodsi, alig(at)cs.berkeley.edu
12
Interface of Perfect Failure Detector
Module:
Name: PerfectFailureDetector, instance p
Events:
Indication: p, crash | pi
Notifies that node pi has crashed
Properties:
11/7/2015
PFD1 (strong completeness)
PFD2 (strong accuracy)
Ali Ghodsi, alig(at)cs.berkeley.edu
13
Properties of P
Properties:
PFD1 (strong completeness)
PFD2 (strong accuracy)
Eventually every node that crashes is permanently
detected by every correct node (liveness)
If a node p is detected by any node, then p has crashed
(safety)
Safety or Liveness?
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
14
Implementing P in Synchrony
Assume synchronous system
Each node every time units
Max transmission delay between 0 and time
units
Send <heartbeat> to all nodes
Each node waits + time units
If did not get <heartbeat> from pi
11/7/2015
Detect <p, crash | pi>
Ali Ghodsi, alig(at)cs.berkeley.edu
15
Correctness of P
PFD1 (strong completeness)
A crashed node doesn’t send <heartbeat>
Eventually every node will notice the absence of
<heartbeat>
PFD2 (strong accuracy)
Assuming local computation is negligible
Maximum time between 2 heartbeats
pi
+ time units
If alive, all nodes will recv hb in time
No inaccuracy
pj
max delay
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
16
Interface of EPFD
Module:
Name: EventuallyPerfectFailureDetector, instance ep
Events:
Indication: ep, suspect | pi
Notifies that node pi is suspected to have crashed
Indication: ep, restore | pi
Notifies that node pi is not suspected anymore
Properties:
11/7/2015
PFD1 (strong completeness)
PFD2 (eventual strong accuracy). Eventually, no correct node is
suspected by any correct node.
Ali Ghodsi, alig(at)cs.berkeley.edu
17
Implementing P
Assume partially synchronous system
Every time units at each node:
Eventually some bounds exists
Send <heartbeat> to all nodes
Each node waits T time units
If did not get <heartbeat> from pi
If get HB from pi, and pi is in suspected
11/7/2015
Indicate <ep, suspect | pi> if pi is not in suspected
Put pi in suspected set
Indicate <ep, restore | pi> and remove pi from suspected
Increase timeout T
Ali Ghodsi, alig(at)cs.berkeley.edu
18
Correctness of P
EPFD1 (strong completeness)
Same as before
EPFD2 (eventual strong accuracy)
Each time p is inaccurately suspected by a correct q
timeout T is increased at q
11/7/2015
Eventually system becomes synchronous, and T becomes
larger than the unknown bound (T> + )
q will receive HB on time, and never suspect p again
Ali Ghodsi, alig(at)cs.berkeley.edu
19
Leader Election
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
20
Leader Election vs Failure Detection
Failure detection captures failure behavior
Leader election (LE) also captures failure
behavior
Detect failed nodes
Detect correct nodes (a single & same for all)
Formally, leader election is an FD
11/7/2015
Always suspects all nodes except one (leader)
Ensures some properties regarding that node
Ali Ghodsi, alig(at)cs.berkeley.edu
21
Leader Election vs Failure Detection
We’ll define two leader election algorithm
Leader election (LE) which “matches” P
Eventual leader election () which “matches” P
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
22
Matching LE and P
P’s properties
P always eventually detects failures (strong completeness)
P never suspects correct nodes (strong accuracy)
Completeness of LE
Informally: eventually ditch crashed leaders
Formally: eventually every correct node trusts some correct
node
Accuracy of LE
Informally: never ditch a correct leader
Formally: No two correct nodes trust different correct nodes
Is this really accuracy? [d]
Yes! Assume two nodes trust different correct nodes
11/7/2015
One of them must eventually switch, i.e. leaving a correct node
Ali Ghodsi, alig(at)cs.berkeley.edu
23
LE desirable properties
LE always eventually detects failures
LE is always accurate
Eventually every correct node trusts some correct node
No two correct nodes trust different correct nodes
But the above two permit the following
p1
p2
p3
elect p3
elect p1 elect p2 elect p1
elect p3
elect p1
elect p3
But P1 is “inaccurately” leaving a correct leader
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
24
LE desirable properties
To avoid “inaccuracy” we add
Local Accuracy:
If a node is elected leader by pi, all previously elected
leaders by pi have crashed
Not allowed, as p1
is correct
p1
p2
p3
11/7/2015
elect p3
elect p1
elect p2
elect p1
elect p3
elect p1
elect p3
Ali Ghodsi, alig(at)cs.berkeley.edu
25
Interface of Leader Election
Module:
Name: LeaderElection, instance le
Events:
Request: le, Leader | pi
Indicate that the leader is node pi
Properties:
LE1 (eventual completeness). Eventually every correct node
trusts some correct node
11/7/2015
LE2 (agreement). No two correct nodes trust different correct
nodes
LE3 (local accuracy). If a node is elected leader by pi, all
previously elected leaders by pi have crashed
Ali Ghodsi, alig(at)cs.berkeley.edu
26
Implementing LE
Globally rank all nodes
e.g. rank ordering p1>p2>p3>p4
represented by function rank(process)=rank_int
e.g. rank(p1)=4, rank(p2)=3, rank(p3)=2, rank(p4)=1
maxrank(S)={return highest ranked node in S}
maxrank(S)=arg maxp∈S{rank(p)}
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
27
Implementing LE (2)
Implements: LeaderElection, instance le.
Uses: PerfectFailureDetector, instance P.
upon event le, init do
suspected = ; leader :=
upon event P, crash | pi do
suspected := suspected {pi}
upon exists leader maxrank(-suspected) do
leader := maxrank(-suspected)
trigger le, Leader | pi
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
28
Matching and P
P weakens P by only providing eventual
accuracy
Weaken LE to by only guaranteing eventual
agreement
eventual
LE Properties:
11/7/2015
LE1 (eventual completeness).
Eventually every correct node trusts
some correct node
LE2 (agreement). No two correct nodes
trust different correct nodes
LE3 (local accuracy). If a node is
elected leader by pi, all previously
elected leaders by pi have crashed
Ali Ghodsi, alig(at)cs.berkeley.edu
29
Interface of Eventual Leader Election
Module:
Name: EventualLeaderElection, instance
Events:
Request: , leader | pi
Notify that pi is trusted to be leader
Properties:
11/7/2015
ELD1 (eventual completeness). Eventually every correct
node trusts some correct node
ELD2 (eventual agreement). Eventually no two correct
nodes trust different correct nodes
Ali Ghodsi, alig(at)cs.berkeley.edu
30
Eventual Leader Detection Ω
In crash-stop process abstraction
Ω is obtained directly from P
11/7/2015
Each node trusts the node with highest id
among all nodes not suspected by ◊P
Eventually, exactly one correct process
will be trusted by all correct processes
Ali Ghodsi, alig(at)cs.berkeley.edu
31
Implementing Ω
Implements: EventualLeaderElection, instance Ω.
Uses: EventuallyPerfectFailureDetector, instance ep.
upon event Ω, init do
upon event ep, suspect | pi do
suspected := suspected {pi}
upon event ep, restore | pi do
suspected := ; leader := pn;
trigger Ω, Leader | leader
suspected := suspected \ {pi}
upon exists leader maxrank(-suspected) do
11/7/2015
leader := maxrank(-suspected)
trigger Ω, Leader | pi
Ali Ghodsi, alig(at)cs.berkeley.edu
32
Ω for Crash Recovery
Can we elect a recovered node? [d]
Basic idea
Not if it keeps crash-recovering infinitely often!
Count number of times you’ve crashed (epoch)
Distribute your epoch periodically to all nodes
Elect leader with lowest (epoch, node_id)
Implementation
Similar to P and for crash-stop
11/7/2015
Piggyback epoch with heartbeats
Store and load leader upon crash
Ali Ghodsi, alig(at)cs.berkeley.edu
33
Byzantine Leader Election
Processes observe leaders behavior
Resilience is f Byzantine nodes
Trigger bld, Complain | p if leader p doesn’t
adhere to specification, i.e. its Byzantine faulty
N>3f
Function leader(round) rotates through nodes
11/7/2015
1p1, 2p2, …, NpN, N+1p1, N+2p2, …
Ali Ghodsi, alig(at)cs.berkeley.edu
34
Interface of Byzantine ELE
Module:
Events:
Name: ByzantineLeaderElection, instance bld
Indication: bld, Trust | p
Notify that p is trusted to be leader
Request: bld, Complain| p
Complain that leader p isn’t trusted
Properties:
Eventual Succession. If more than f correct nodes complain about p then
every correct node eventually trusts some other leader than p
Putsch Resistance. A correct node does not trust a new leader unless at least
one correct node complained about the current one
Eventual Agreement. Eventually no two correct nodes trust different nodes
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
35
Rotating Byzantine LE
Implements: ByzantineLeaderElection, instance bld.
Uses: AuthPerfectPointToPointLinks, instance al.
upon event bld, init do
round := 1; complList := [nil]*N; compl:= false;
trigger bld, trust | leader(round)
upon event bld, Complain| p s.t. p=leader(round) and compl:=false do
11/7/2015
compl := True
forall q in do
trigger al, send | q, [Complaint, round]
Ali Ghodsi, alig(at)cs.berkeley.edu
If p suspected of
being faulty
broadcast Complaint
36
Rotating Byzantine LE (2)
Implements: ByzantineLeaderElection, instance bld.
Uses: AuthPerfectPointToPointLinks, instance al.
upon event al, Deliver | p, [Complaint, r] s.t. r=round and
complList[p]=nil
complList[p] := Complaint
if #(complList) > f and compl = False then
compl := True
forall q in do
trigger al, send | q, [Complaint, round]
If more than f
complain then
broadcast Complaint
else if #(complList) > 2f then
round := round + 1
complList := [nil]*N
compl := False
trigger bld, trust | leader(round)
If more than 2f
complain then switch
leader
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
37
Correctness of Byzantine ELE
Why not switch leader as soon as received more
than f complaints? [d]
Byzantine nodes send complaints to only 1 node.
Thus, one correct node receives f+1 complaints
All other correct nodes receive 1 complaint
Only one correct node switched leader
Eventual succession
When received f+1 complaints, correct nodes relay bcast
Every correct node will get > 2f complaints
New leader will be elected
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
38
Correctness of Byzantine ELE (2)
Putsch resistance
f Byzantine nodes cannot generate f+1 complaints
required for relaying and leader switch to happen
Eventual agreement
Correct nodes synchronously move through rounds,
as a node moves to new round when it ensured every
other correct node will move too (by relaying)
Eventually all faulty (Byzantine) nodes detected
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
39
Reductions
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
40
Reductions
We say X≼Y if
X can be solved given a solution of Y
Read X is reducible to Y
Informally, problem X is weaker or as hard as Y
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
41
Preorders, partial orders…
A relation ~ is a preorder on a set A if for any x,y,z
in A
x~x (reflexivity)
x~y and y~z implies x~z (transitivity)
Difference between preorder and partial order
Partial order is a preorder with antisymmetry
I.e. two different objects x and y cannot be symmetric
11/7/2015
x~y and y~x implies x=y
i.e. it isn’t possible that x~y and y~x for two different x and y
Ali Ghodsi, alig(at)cs.berkeley.edu
42
≼ is a preorder
≼ is a preorder
Reflexivity. X≼X
Transitivity. X≼Y and Y≼Z implies X≼Z
X can be solved given a solution to X
Since Y≼Z, use impl. of Z to impl. Y.
use impl. of Y to impl. X. Hence we impl. Y from Z’s impl.
≼ is not antisymmetric, thus not a partial order
Two different X and Y can be equivalent
11/7/2015
Distinct problems X and Y can be solved from the other’s
solution
Ali Ghodsi, alig(at)cs.berkeley.edu
43
Shortcut definitions
We write X≃Y if
X≼Y and Y≼X
Problem X is equivalent to Y
We write X≺Y if
11/7/2015
X≼Y and not X≃Y
or equivalently, X≼Y and not Y≼X
Problem X is strictly weaker than Y, or
Problem Y is strictly stronger than X
Ali Ghodsi, alig(at)cs.berkeley.edu
44
Example
It is true that P≼P
Given P, we can implement P
In fact, P≺P in the asynchronous model
We just return P’s suspicions.
P always satisfies P’s properties
Because not P≼P is true
Reductions common in computability theory
If X≼Y, and if we know X is impossible to solve
If P≼P, and some problem Z can be solved with P
11/7/2015
Then Y is impossible to solve too
Then Z can also be solved with P
Ali Ghodsi, alig(at)cs.berkeley.edu
45
Weakest FD for a problem?
Often P is used to solve problem X
But P is not very practical (needs synchrony)
Is X a “practically” solvable problem?
11/7/2015
Can we implement X with P?
Sometimes a weaker FD than P will not solve X
Proven using reductions
Ali Ghodsi, alig(at)cs.berkeley.edu
46
Weakest FD for a problem
We might know that X≼P (X solvable with P)
Common proof to show P is weakest FD for X
Can we solve X weaker FD than P, say P?
Or is it impossible, i.e. P≺X
Prove that P≼X
I.e. P can be solved given X
If P≼X then P≺X
11/7/2015
Because we know P≺P and P≃X, i.e. P≺P≃X
If we can solve X with P, then
we can solve P with P, which is a contradiction
Ali Ghodsi, alig(at)cs.berkeley.edu
47
How are the detectors related
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
48
Trivial Reductions
Strongly complete
P≼P
P is always strongly accurate, thus also
eventually strongly accurate
S≼S
P
S
P is always strongly accurate, thus also
always weakly accurate
S≼P
11/7/2015
S is always weakly accurate, thus also
eventually weakly accurate
S≼P
P
P is always eventually strongly
accurate, thus also always eventually
weakly accurate
Ali Ghodsi, alig(at)cs.berkeley.edu
S
49
Trivial Reductions (2)
Weakly complete
Q≼Q
Q is always strongly accurate, thus also
eventually strongly accurate
W≼W
Q
W
Q is always strongly accurate, thus also
always weakly accurate
W≼Q
11/7/2015
W is always weakly accurate, thus also
eventually weakly accurate
W≼Q
Q
P is always eventually strongly
accurate, thus also always eventually
weakly accurate
Ali Ghodsi, alig(at)cs.berkeley.edu
W
50
Completeness “Irrelevant”
Weak completeness trivially reducible to strong
Strong completeness reducible to weak
i.e. can get strong completeness from weak
P≼Q, S≼W, P≼Q, S≼W,
They’re equivalent!
P≃Q, S≃W, P≃Q, S≃W
Eventual Eventual
Strong
Weak
Strong
Weak
Strong
P
S
P
S
Weak
Q
W
Q
W
Completeness
11/7/2015
Accuracy
Ali Ghodsi, alig(at)cs.berkeley.edu
51
Proving Irrelevance of Completeness
Weak completeness ensures
every crash is eventually detected by some correct node
Simple idea
Every node q broadcast suspicions Susp periodically
upon event receive <S,q>
Susp := (Susp S) — {q}
also works like a
heartbeat
Every crash is eventually detected by all correct p
11/7/2015
Can this violate some accuracy properties?
Ali Ghodsi, alig(at)cs.berkeley.edu
52
Maintaining Accuracy
Strong and Weak Accuracy aren’t violated
Strong accuracy
No one is ever inaccurate
Our reduction never spreads inaccurate
suspicions
Weak accuracy
Everyone is accurate about at least one node p
11/7/2015
No one will spread inaccurate information about p
Ali Ghodsi, alig(at)cs.berkeley.edu
53
Maintaining Eventual Accuracy
Eventual Strong and Eventual Weak Accuracy
aren’t violated
Proof is almost same as previous page
Eventually all faulty nodes crash
Inaccurate suspicions undone
11/7/2015
Will get heartbeat from correct nodes and revise (–{q})
Ali Ghodsi, alig(at)cs.berkeley.edu
54
Relation between FDs
Q
P
Q
P
W
S
W
S
equivalent
reducible to
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
55
Omega also a FD
Can we implement S with ? [d]
I.e. is it true that S≼
Suspect all nodes except the leader given by
Eventual Completeness
Eventual Weak Accuracy
11/7/2015
All nodes are suspected except the leader (which is
correct)
Eventually, one correct node (leader) is not suspected by
anyone
Thus, S≼
Ali Ghodsi, alig(at)cs.berkeley.edu
56
equivalent to S (and W)
We showed S≼, it turns out we also have ≼S
I.e. ≃S
Due to the famous CHT result
If consensus implementable with detector D
Then Omega can be implemented using D
I.e. if Consensus≼D, then ≼D
11/7/2015
Since S can be used to solve consensus, we have ≼S
Implies W≃≃S is weakest detector to solve consensus
Ali Ghodsi, alig(at)cs.berkeley.edu
57
Relation between FDs (2)
Q
P
Q
P
W
S
W
S
equivalent
reducible to
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
58
Combining Abstractions
Fail-stop (synchronous)
Crash-stop process model
Perfect links + Perfect failure detector (P)
Fail-silent (asynchronous)
Crash-stop process model
Perfect links
Fail-noisy (partially synchronous)
Crash-stop process model
Perfect links + Eventually Perfect failure detector (P)
Fail-recovery
Crash-recovery process model
Stubborn links + …
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
59
The rest of book/course
Assume crash-stop system with a perfect
failure detector (fail-stop)
Give algorithms
Try to make a weaker assumption
11/7/2015
Revisit the algorithms
Ali Ghodsi, alig(at)cs.berkeley.edu
60