Failure Detectors Ali Ghodsi – UC Berkeley / KTH alig(at)cs.berkeley.edu Modeling Timing Assumptions   Tedious to model eventual synchrony (partial synchrony) Timing assumptions mostly needed to.

Transcript Failure Detectors Ali Ghodsi – UC Berkeley / KTH alig(at)cs.berkeley.edu Modeling Timing Assumptions   Tedious to model eventual synchrony (partial synchrony) Timing assumptions mostly needed to.

Failure Detectors
Ali Ghodsi – UC Berkeley / KTH
alig(at)cs.berkeley.edu
Modeling Timing Assumptions


Tedious to model eventual synchrony (partial
synchrony)
Timing assumptions mostly needed to detect
failures


Heartbeats, timeouts, etc…
Use failure detectors to encapsulate timing
assumptions


11/7/2015
Black box giving suspicions regarding node failures
Accuracy of suspicions depends on model strength
Ali Ghodsi, alig(at)cs.berkeley.edu
2
Implementation of Failure Detectors
Typical Implementation


Periodically exchange heartbeat messages
Timeout based on some worst case msg round
trip


11/7/2015
If timeout, then suspect node
If recv msg from suspected node, revise suspicion
and increase time-out
Ali Ghodsi, alig(at)cs.berkeley.edu
3
Completeness and Accuracy

Two important types of requirements
1. Completeness requirements

Requirements regarding actually crashed nodes

When do they have to be detected?
2. Accuracy requirements

Requirements regarding actually alive nodes


When are they allowed to be suspected?
How to trivially achieve either? [d]

11/7/2015
Together they are impossible in an asynchronous system!
Ali Ghodsi, alig(at)cs.berkeley.edu
4
Formal Model of FD

Augment formal model with failure detectors (FD)

A configuration consists of



Transition function on node i gets extra parameter:


State of each node
FD-state of each node
FD-state of node i
FD-state updated in comp(i) by another function


11/7/2015
FD-function
Not modeled explicitly, but must satisfy some properties
Ali Ghodsi, alig(at)cs.berkeley.edu
5
Requirements: Completeness

Strong Completeness


Every crashed node is eventually detected by all
correct nodes
There exists a time after which all crashed
nodes are detected by all correct nodes


The book only studies detectors with this property
Is it realistic? [d]
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
6
Requirements: Completeness

Weak Completeness


Every crashed node is eventually detected by some
correct node
There exists a time after which all crashed
nodes are detected by some correct node

11/7/2015
Possibly detected by different correct nodes
Ali Ghodsi, alig(at)cs.berkeley.edu
7
Requirements: Accuracy

Strong Accuracy


For all nodes p and q,


No correct node is ever suspected
p does not suspect q, unless q has crashed
Is it realistic? [d]


Strong assumption, requires synchrony
I.e. no premature timeouts
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
8
Requirements: Accuracy

Weak Accuracy


There exists at least one correct node p


There exists a correct node which is never
suspected by any node
All nodes will never suspect p
Still strong assumption

One node is always “well-connected”
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
9
Requirements: Eventual Accuracy

Eventual Strong Accuracy


Eventual Weak Accuracy


After some finite time the detector provides weak accuracy
After some time, the requirements are fulfilled


After some finite time the FD provides strong accuracy
Prior to that, any behavior is possible!
Quite weak assumptions [d]

When can eventual weak accuracy be achieved?
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
10
Four Main Established Detectors

Four detectors with strong completeness

Perfect Detector (P)

Strong Accuracy
Synchronous Systems

Strong Detector (S)


Weak Accuracy
Eventually Perfect Detector (P)

Eventual Strong Accuracy
Asynchronous Systems

Eventually Strong Detector (S or Ω)

11/7/2015
Eventual Weak Accuracy
Ali Ghodsi, alig(at)cs.berkeley.edu
11
Four Less Interesting Detectors

Four detectors with weak completeness

Detector Q

Strong Accuracy
Synchronous Systems

Weak Detector (W)


Weak Accuracy
Eventually Detector Q (Q)

Eventual Strong Accuracy
Asynchronous Systems

Eventually Weak Detector (W)

11/7/2015
Eventual Weak Accuracy
Ali Ghodsi, alig(at)cs.berkeley.edu
12
Interface of Perfect Failure Detector

Module:


Name: PerfectFailureDetector, instance p
Events:

Indication: p, crash | pi


Notifies that node pi has crashed
Properties:


11/7/2015
PFD1 (strong completeness)
PFD2 (strong accuracy)
Ali Ghodsi, alig(at)cs.berkeley.edu
13
Properties of P

Properties:

PFD1 (strong completeness)


PFD2 (strong accuracy)


Eventually every node that crashes is permanently
detected by every correct node (liveness)
If a node p is detected by any node, then p has crashed
(safety)
Safety or Liveness?
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
14
Implementing P in Synchrony

Assume synchronous system


Each node every  time units


Max transmission delay between 0 and  time
units
Send <heartbeat> to all nodes
Each node waits + time units

If did not get <heartbeat> from pi

11/7/2015
Detect <p, crash | pi>
Ali Ghodsi, alig(at)cs.berkeley.edu
15
Correctness of P

PFD1 (strong completeness)

A crashed node doesn’t send <heartbeat>


Eventually every node will notice the absence of
<heartbeat>
PFD2 (strong accuracy)


Assuming local computation is negligible
Maximum time between 2 heartbeats



pi
+ time units
If alive, all nodes will recv hb in time


No inaccuracy
pj


max delay
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
16
Interface of EPFD


Module:
 Name: EventuallyPerfectFailureDetector, instance ep
Events:
 Indication: ep, suspect | pi
 Notifies that node pi is suspected to have crashed


Indication: ep, restore | pi
 Notifies that node pi is not suspected anymore
Properties:


11/7/2015
PFD1 (strong completeness)
PFD2 (eventual strong accuracy). Eventually, no correct node is
suspected by any correct node.
Ali Ghodsi, alig(at)cs.berkeley.edu
17
Implementing P

Assume partially synchronous system


Every  time units at each node:


Eventually some bounds exists
Send <heartbeat> to all nodes
Each node waits T time units

If did not get <heartbeat> from pi



If get HB from pi, and pi is in suspected


11/7/2015
Indicate <ep, suspect | pi> if pi is not in suspected
Put pi in suspected set
Indicate <ep, restore | pi> and remove pi from suspected
Increase timeout T
Ali Ghodsi, alig(at)cs.berkeley.edu
18
Correctness of P

EPFD1 (strong completeness)


Same as before
EPFD2 (eventual strong accuracy)

Each time p is inaccurately suspected by a correct q
timeout T is increased at q


11/7/2015
Eventually system becomes synchronous, and T becomes
larger than the unknown bound (T>  + )
q will receive HB on time, and never suspect p again
Ali Ghodsi, alig(at)cs.berkeley.edu
19
Leader Election
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
20
Leader Election vs Failure Detection

Failure detection captures failure behavior


Leader election (LE) also captures failure
behavior


Detect failed nodes
Detect correct nodes (a single & same for all)
Formally, leader election is an FD


11/7/2015
Always suspects all nodes except one (leader)
Ensures some properties regarding that node
Ali Ghodsi, alig(at)cs.berkeley.edu
21
Leader Election vs Failure Detection

We’ll define two leader election algorithm

Leader election (LE) which “matches” P

Eventual leader election () which “matches” P
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
22
Matching LE and P



P’s properties
 P always eventually detects failures (strong completeness)
 P never suspects correct nodes (strong accuracy)
Completeness of LE
 Informally: eventually ditch crashed leaders
 Formally: eventually every correct node trusts some correct
node
Accuracy of LE
 Informally: never ditch a correct leader
 Formally: No two correct nodes trust different correct nodes


Is this really accuracy? [d]
Yes! Assume two nodes trust different correct nodes

11/7/2015
One of them must eventually switch, i.e. leaving a correct node
Ali Ghodsi, alig(at)cs.berkeley.edu
23
LE desirable properties

LE always eventually detects failures


LE is always accurate


Eventually every correct node trusts some correct node
No two correct nodes trust different correct nodes
But the above two permit the following
p1
p2
p3

elect p3
elect p1 elect p2 elect p1
elect p3
elect p1
elect p3
But P1 is “inaccurately” leaving a correct leader
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
24
LE desirable properties

To avoid “inaccuracy” we add

Local Accuracy:

If a node is elected leader by pi, all previously elected
leaders by pi have crashed
Not allowed, as p1
is correct
p1
p2
p3
11/7/2015
elect p3
elect p1
elect p2
elect p1
elect p3
elect p1
elect p3
Ali Ghodsi, alig(at)cs.berkeley.edu
25
Interface of Leader Election



Module:
 Name: LeaderElection, instance le
Events:
 Request: le, Leader | pi
 Indicate that the leader is node pi
Properties:
 LE1 (eventual completeness). Eventually every correct node
trusts some correct node


11/7/2015
LE2 (agreement). No two correct nodes trust different correct
nodes
LE3 (local accuracy). If a node is elected leader by pi, all
previously elected leaders by pi have crashed
Ali Ghodsi, alig(at)cs.berkeley.edu
26
Implementing LE

Globally rank all nodes




e.g. rank ordering p1>p2>p3>p4
represented by function rank(process)=rank_int
e.g. rank(p1)=4, rank(p2)=3, rank(p3)=2, rank(p4)=1
maxrank(S)={return highest ranked node in S}
maxrank(S)=arg maxp∈S{rank(p)}
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
27
Implementing LE (2)





Implements: LeaderElection, instance le.
Uses: PerfectFailureDetector, instance P.
upon event le, init do
 suspected = ; leader := 
upon event P, crash | pi do

suspected := suspected  {pi}
upon exists leader  maxrank(-suspected) do
 leader := maxrank(-suspected)
 trigger le, Leader | pi
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
28
Matching  and P

P weakens P by only providing eventual
accuracy

Weaken LE to  by only guaranteing eventual
agreement

eventual
LE Properties:



11/7/2015
LE1 (eventual completeness).
Eventually every correct node trusts
some correct node
LE2 (agreement). No two correct nodes
trust different correct nodes
LE3 (local accuracy). If a node is
elected leader by pi, all previously
elected leaders by pi have crashed
Ali Ghodsi, alig(at)cs.berkeley.edu
29
Interface of Eventual Leader Election

Module:


Name: EventualLeaderElection, instance 
Events:

Request: , leader | pi


Notify that pi is trusted to be leader
Properties:


11/7/2015
ELD1 (eventual completeness). Eventually every correct
node trusts some correct node
ELD2 (eventual agreement). Eventually no two correct
nodes trust different correct nodes
Ali Ghodsi, alig(at)cs.berkeley.edu
30
Eventual Leader Detection Ω

In crash-stop process abstraction
 Ω is obtained directly from P


11/7/2015
Each node trusts the node with highest id
among all nodes not suspected by ◊P
Eventually, exactly one correct process
will be trusted by all correct processes
Ali Ghodsi, alig(at)cs.berkeley.edu
31
Implementing Ω

Implements: EventualLeaderElection, instance Ω.
Uses: EventuallyPerfectFailureDetector, instance ep.

upon event Ω, init do




upon event ep, suspect | pi do


suspected := suspected  {pi}
upon event ep, restore | pi do


suspected := ; leader := pn;
trigger Ω, Leader | leader
suspected := suspected \ {pi}
upon exists leader  maxrank(-suspected) do


11/7/2015
leader := maxrank(-suspected)
trigger Ω, Leader | pi
Ali Ghodsi, alig(at)cs.berkeley.edu
32
Ω for Crash Recovery

Can we elect a recovered node? [d]


Basic idea




Not if it keeps crash-recovering infinitely often!
Count number of times you’ve crashed (epoch)
Distribute your epoch periodically to all nodes
Elect leader with lowest (epoch, node_id)
Implementation
 Similar to P and  for crash-stop


11/7/2015
Piggyback epoch with heartbeats
Store and load leader upon crash
Ali Ghodsi, alig(at)cs.berkeley.edu
33
Byzantine Leader Election

Processes observe leaders behavior


Resilience is f Byzantine nodes


Trigger bld, Complain | p if leader p doesn’t
adhere to specification, i.e. its Byzantine faulty
N>3f
Function leader(round) rotates through nodes

11/7/2015
1p1, 2p2, …, NpN, N+1p1, N+2p2, …
Ali Ghodsi, alig(at)cs.berkeley.edu
34
Interface of Byzantine ELE

Module:


Events:



Name: ByzantineLeaderElection, instance bld
Indication: bld, Trust | p

Notify that p is trusted to be leader
Request: bld, Complain| p

Complain that leader p isn’t trusted
Properties:



Eventual Succession. If more than f correct nodes complain about p then
every correct node eventually trusts some other leader than p
Putsch Resistance. A correct node does not trust a new leader unless at least
one correct node complained about the current one
Eventual Agreement. Eventually no two correct nodes trust different nodes
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
35
Rotating Byzantine LE

Implements: ByzantineLeaderElection, instance bld.
Uses: AuthPerfectPointToPointLinks, instance al.

upon event bld, init do




round := 1; complList := [nil]*N; compl:= false;
trigger bld, trust | leader(round)
upon event bld, Complain| p s.t. p=leader(round) and compl:=false do


11/7/2015
compl := True
forall q in  do

trigger al, send | q, [Complaint, round]
Ali Ghodsi, alig(at)cs.berkeley.edu
If p suspected of
being faulty
broadcast Complaint
36
Rotating Byzantine LE (2)



Implements: ByzantineLeaderElection, instance bld.
Uses: AuthPerfectPointToPointLinks, instance al.
upon event al, Deliver | p, [Complaint, r] s.t. r=round and
complList[p]=nil



complList[p] := Complaint
if #(complList) > f and compl = False then

compl := True

forall q in  do
 trigger al, send | q, [Complaint, round]
If more than f
complain then
broadcast Complaint
else if #(complList) > 2f then

round := round + 1

complList := [nil]*N

compl := False

trigger bld, trust | leader(round)
If more than 2f
complain then switch
leader
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
37
Correctness of Byzantine ELE

Why not switch leader as soon as received more
than f complaints? [d]





Byzantine nodes send complaints to only 1 node.
Thus, one correct node receives f+1 complaints
All other correct nodes receive 1 complaint
Only one correct node switched leader
Eventual succession



When received f+1 complaints, correct nodes relay bcast
Every correct node will get > 2f complaints
New leader will be elected
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
38
Correctness of Byzantine ELE (2)

Putsch resistance


f Byzantine nodes cannot generate f+1 complaints
required for relaying and leader switch to happen
Eventual agreement


Correct nodes synchronously move through rounds,
as a node moves to new round when it ensured every
other correct node will move too (by relaying)
Eventually all faulty (Byzantine) nodes detected
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
39
Reductions
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
40
Reductions

We say X≼Y if

X can be solved given a solution of Y

Read X is reducible to Y

Informally, problem X is weaker or as hard as Y
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
41
Preorders, partial orders…

A relation ~ is a preorder on a set A if for any x,y,z
in A



x~x (reflexivity)
x~y and y~z implies x~z (transitivity)
Difference between preorder and partial order

Partial order is a preorder with antisymmetry


I.e. two different objects x and y cannot be symmetric

11/7/2015
x~y and y~x implies x=y
i.e. it isn’t possible that x~y and y~x for two different x and y
Ali Ghodsi, alig(at)cs.berkeley.edu
42
≼ is a preorder

≼ is a preorder

Reflexivity. X≼X


Transitivity. X≼Y and Y≼Z implies X≼Z


X can be solved given a solution to X
Since Y≼Z, use impl. of Z to impl. Y.
use impl. of Y to impl. X. Hence we impl. Y from Z’s impl.
≼ is not antisymmetric, thus not a partial order

Two different X and Y can be equivalent

11/7/2015
Distinct problems X and Y can be solved from the other’s
solution
Ali Ghodsi, alig(at)cs.berkeley.edu
43
Shortcut definitions

We write X≃Y if



X≼Y and Y≼X
Problem X is equivalent to Y
We write X≺Y if




11/7/2015
X≼Y and not X≃Y
or equivalently, X≼Y and not Y≼X
Problem X is strictly weaker than Y, or
Problem Y is strictly stronger than X
Ali Ghodsi, alig(at)cs.berkeley.edu
44
Example

It is true that P≼P

Given P, we can implement P



In fact, P≺P in the asynchronous model


We just return P’s suspicions.
P always satisfies P’s properties
Because not P≼P is true
Reductions common in computability theory

If X≼Y, and if we know X is impossible to solve


If P≼P, and some problem Z can be solved with P

11/7/2015
Then Y is impossible to solve too
Then Z can also be solved with P
Ali Ghodsi, alig(at)cs.berkeley.edu
45
Weakest FD for a problem?

Often P is used to solve problem X


But P is not very practical (needs synchrony)
Is X a “practically” solvable problem?



11/7/2015
Can we implement X with P?
Sometimes a weaker FD than P will not solve X
Proven using reductions
Ali Ghodsi, alig(at)cs.berkeley.edu
46
Weakest FD for a problem

We might know that X≼P (X solvable with P)



Common proof to show P is weakest FD for X



Can we solve X weaker FD than P, say P?
Or is it impossible, i.e. P≺X
Prove that P≼X
I.e. P can be solved given X
If P≼X then P≺X

11/7/2015
Because we know P≺P and P≃X, i.e. P≺P≃X
 If we can solve X with P, then
 we can solve P with P, which is a contradiction
Ali Ghodsi, alig(at)cs.berkeley.edu
47
How are the detectors related
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
48
Trivial Reductions

Strongly complete

P≼P

P is always strongly accurate, thus also
eventually strongly accurate

S≼S


P
S
P is always strongly accurate, thus also
always weakly accurate
S≼P

11/7/2015
S is always weakly accurate, thus also
eventually weakly accurate
S≼P


P
P is always eventually strongly
accurate, thus also always eventually
weakly accurate
Ali Ghodsi, alig(at)cs.berkeley.edu
S
49
Trivial Reductions (2)

Weakly complete

Q≼Q

Q is always strongly accurate, thus also
eventually strongly accurate

W≼W


Q
W
Q is always strongly accurate, thus also
always weakly accurate
W≼Q

11/7/2015
W is always weakly accurate, thus also
eventually weakly accurate
W≼Q


Q
P is always eventually strongly
accurate, thus also always eventually
weakly accurate
Ali Ghodsi, alig(at)cs.berkeley.edu
W
50
Completeness “Irrelevant”

Weak completeness trivially reducible to strong

Strong completeness reducible to weak

i.e. can get strong completeness from weak


P≼Q, S≼W, P≼Q, S≼W,
They’re equivalent!

P≃Q, S≃W, P≃Q, S≃W
Eventual Eventual
Strong
Weak
Strong
Weak
Strong
P
S
P
S
Weak
Q
W
Q
W
Completeness
11/7/2015
Accuracy
Ali Ghodsi, alig(at)cs.berkeley.edu
51
Proving Irrelevance of Completeness

Weak completeness ensures


every crash is eventually detected by some correct node
Simple idea


Every node q broadcast suspicions Susp periodically
upon event receive <S,q>


Susp := (Susp  S) — {q}
also works like a
heartbeat
Every crash is eventually detected by all correct p

11/7/2015
Can this violate some accuracy properties?
Ali Ghodsi, alig(at)cs.berkeley.edu
52
Maintaining Accuracy

Strong and Weak Accuracy aren’t violated

Strong accuracy



No one is ever inaccurate
Our reduction never spreads inaccurate
suspicions
Weak accuracy

Everyone is accurate about at least one node p

11/7/2015
No one will spread inaccurate information about p
Ali Ghodsi, alig(at)cs.berkeley.edu
53
Maintaining Eventual Accuracy


Eventual Strong and Eventual Weak Accuracy
aren’t violated
Proof is almost same as previous page


Eventually all faulty nodes crash
Inaccurate suspicions undone

11/7/2015
Will get heartbeat from correct nodes and revise (–{q})
Ali Ghodsi, alig(at)cs.berkeley.edu
54
Relation between FDs
Q
P
Q
P
W
S
W
S
equivalent
reducible to
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
55
Omega also a FD

Can we implement S with ? [d]

I.e. is it true that S≼

Suspect all nodes except the leader given by 

Eventual Completeness


Eventual Weak Accuracy


11/7/2015
All nodes are suspected except the leader (which is
correct)
Eventually, one correct node (leader) is not suspected by
anyone
Thus, S≼
Ali Ghodsi, alig(at)cs.berkeley.edu
56
 equivalent to S (and W)

We showed S≼, it turns out we also have ≼S


I.e. ≃S
Due to the famous CHT result

If consensus implementable with detector D
Then Omega can be implemented using D

I.e. if Consensus≼D, then ≼D


11/7/2015
Since S can be used to solve consensus, we have ≼S
Implies W≃≃S is weakest detector to solve consensus
Ali Ghodsi, alig(at)cs.berkeley.edu
57
Relation between FDs (2)
Q
P
Q
P
W
S

W
S
equivalent
reducible to
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
58
Combining Abstractions




Fail-stop (synchronous)
 Crash-stop process model
 Perfect links + Perfect failure detector (P)
Fail-silent (asynchronous)
 Crash-stop process model
 Perfect links
Fail-noisy (partially synchronous)
 Crash-stop process model
 Perfect links + Eventually Perfect failure detector (P)
Fail-recovery
 Crash-recovery process model
 Stubborn links + …
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
59
The rest of book/course

Assume crash-stop system with a perfect
failure detector (fail-stop)


Give algorithms
Try to make a weaker assumption

11/7/2015
Revisit the algorithms
Ali Ghodsi, alig(at)cs.berkeley.edu
60

Failure Detectors Ali Ghodsi – UC Berkeley / KTH alig(at)cs.berkeley.edu Modeling Timing Assumptions   Tedious to model eventual synchrony (partial synchrony) Timing assumptions mostly needed to.

Transcript Failure Detectors Ali Ghodsi – UC Berkeley / KTH alig(at)cs.berkeley.edu Modeling Timing Assumptions   Tedious to model eventual synchrony (partial synchrony) Timing assumptions mostly needed to.

Directory