Causal Reliable Broadcast Ali Ghodsi – UC Berkeley / KTH alig(at)cs.berkeley.edu Motivation  Assume we chat application   Whatever written is reliably broadcast to group If you get.

Download Report

Transcript Causal Reliable Broadcast Ali Ghodsi – UC Berkeley / KTH alig(at)cs.berkeley.edu Motivation  Assume we chat application   Whatever written is reliably broadcast to group If you get.

Causal Reliable
Broadcast
Ali Ghodsi – UC Berkeley / KTH
alig(at)cs.berkeley.edu
Motivation

Assume we chat application


Whatever written is reliably broadcast to group
If you get the following chat output, is it ok?
[UserZ] Ok
[UserY] Can we push back 1h?
[UserX] Let’s meet at 2pm?

UserX’s message caused UserY’s message,

11/6/2015
UserY’s message caused UserZ’s message
Ali Ghodsi, alig(at)cs.berkeley.edu
2
Motivation (2)


Does uniform reliable broadcast remedy
this? [d]
Causal reliable broadcast solves this


Deliveries in causal order!
Causality is same as happened-before
relation by Lamport!
11/6/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
3
Causality Recalled

Let m1 and m2 be any two messages:
m1m2 (m1 causally precedes m2) if

C1 (FIFO order).


C2 (Network order).


Some process pi delivers m1 and later broadcasts m2
C3 (Transitivity).

11/6/2015
Some process pi broadcasts m1 before broadcasting m2
There is a message m’ such that m1  m’ and m’  m2
Ali Ghodsi, alig(at)cs.berkeley.edu
4
Causal Broadcast Interface

Module:


Events



Request: co, Broadcast | m
Indication: co, Deliver | src, m
Property:


Name: CausalOrder, instance co
CB: If node pi delivers m1, then pi must have delivered
every message causally preceding () m1 before m1
Is this useful? How can it be satisfied? [d]

11/6/2015
It is only safety. Satisfy it by never delivering!
Ali Ghodsi, alig(at)cs.berkeley.edu
5
Causality

C1 (FIFO order).

Some process pi broadcasts m1 before broadcasting m2
p1
11/6/2015
m1
m2
p1
p2
p2
p3
p3
Ali Ghodsi, alig(at)cs.berkeley.edu
m1
m2
6
Causality (2)

C2 (Network order).

Some process pi delivers m1 and later broadcasts m2
p1
p2
p3
11/6/2015
m1
p1
m2
p2
m1
m2
p3
Ali Ghodsi, alig(at)cs.berkeley.edu
7
Causality (3)

C3 (Transitivity).

There is a message m’ such that m1  m’ and m’  m2
p1
p2
p3
11/6/2015
m1
p1
m2
m1
p2
m3
p3
Ali Ghodsi, alig(at)cs.berkeley.edu
m2
m3
8
Different Causalities


Property:
 CB:
If node pi delivers m1, then pi must deliver every
message causally preceding () m1 before m1
 CB’: If pj delivers m1 and m2, and m1m2, then pj must
deliver m1 before m2
What is the difference? [d]
Violates CB and CB’
p1
p2
m1
p1
m2
p2
m3
p3

Violates CB, not CB’
p3
m1
m2
m3
Indeed, CB implies CB’
11/6/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
9
Reliable Causal Broadcast Interface

Module:


Events



Name: ReliableCausalOrder, instance rco
Request: rco, Broadcast | m
Indication: rco, Deliver | src, m
Property:


11/6/2015
RB1-RB4 from regular reliable broadcast
CB: If node pi delivers m, then pi must deliver every
message causally preceding () m before m
Ali Ghodsi, alig(at)cs.berkeley.edu
10
Uniform Reliable Causal Broadcast

Module:


Events



Name: UniformReliableCausalOrder, instance urco
Request: urco, Broadcast | m
Indication: urco, Deliver | src, m
Property:


11/6/2015
URB1-URB4 from uniform reliable broadcast
CB: If node pi delivers m, then pi must deliver every
message causally preceding () m before m
Ali Ghodsi, alig(at)cs.berkeley.edu
11
Reusing abstractions

Reuse RB for CB


11/6/2015
Use reliable broadcast abstraction to implement
reliable causal broadcast
Use uniform reliable broadcast abstraction to
implement uniform causal broadcast
Ali Ghodsi, alig(at)cs.berkeley.edu
12
Towards an implementation

Main idea



Each broadcasted message carries a history
Before delivery, ensure causality
First algorithm

11/6/2015
History is set of all causally preceding messages
Ali Ghodsi, alig(at)cs.berkeley.edu
13
Fail-Silent No-Waiting Causal Bcast


Each message m carries ordered list of
causally preceding messages in pastm
Whenever a node rbDelivers m


coDeliver causally preceding messages in pastm
coDelivers m

11/6/2015
Avoid duplicates using delivered
Ali Ghodsi, alig(at)cs.berkeley.edu
14
Execution (direct override)
p1
coB(m1) coD(m1) coB(m2) coD(m2)
m2 [m1]
m1
coD(m2)
p2
coD(m1)
m2 [m1]
m1
p3
rbD(m2)
11/6/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
coD(m1) coD(m2)
15
Execution (indirect override)
p1
coB(m1) coD(m1)
coD(m2)
m2 [m1]
m1
coB(m2)
p2
coD(m2)
coD(m1)
m1
m2 [m1]
p3
rbD(m2)
11/6/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
coD(m1) coD(m2)
16
Fail-silent Causal Broadcast Impl

Implements:

ReliableCausalOrderBroadcast instance rco.

Uses: ReliableBroadcast instance rb.

upon event rb, Init do


delivered := ; past := nil
upon event rco, Broadcast | m do


11/6/2015
trigger rb, Broadcast | (DATA, past, m)
past := append(past, <pi, m>)
Ali Ghodsi, alig(at)cs.berkeley.edu
Append this message
to past history
17
Fail-silent Causal Broadcast Impl (2)

upon event rb,Deliver | pi,(DATA, pastm , m) do









11/6/2015
if mdelivered then
forall (sn,n)pastm do
if ndelivered then
trigger rco,Deliver|sn, n
delivered := delivered{n}
past := append(past, <sn,n>)
trigger rco,Deliver|pi,m
delivered := delivered{m}
past := append(past, <pi,m>)
Ali Ghodsi, alig(at)cs.berkeley.edu
in ascending order
deliver preceding
messages
append to history
deliver current
message
append to history
18
Correctness

RB1-RB4 follow from use of RB


No creation and no duplication still satisfied
Validity still satisfied



Some messages might be delivered earlier, never later
Agreement directly from RB
CO by induction on prefixes of executions


It is vacuously true for empty executions
Assume it is true for all deliveries of a prefix

11/6/2015
Then it is true for any extension with one event
Ali Ghodsi, alig(at)cs.berkeley.edu
19
Improving the algorithm


Disadvantage of algorithm is that the message size
(bit complexity) grows
Useful idea


Garbage collect old messages
Implementation of GC


Ack receipt of every message m to all
Use perfect failure detector P


11/6/2015
Determine with P when all correct nodes got message m
Delete m from past when all correct nodes got m
Ali Ghodsi, alig(at)cs.berkeley.edu
20
GC implementation

Uses: ReliableBroadcast instance rb, PerfectFD instance P

upon event rco, Init do




correct := correct \ {pi}
upon event mdelivered and selfack[m] do



book keeping of acks
upon event P, crash | pi do


delivered := ; past := nil
correct := 
forall m: ack[m] := 
ack := ack[m] U {m}
trigger rb, Broadcast | (ACK, m)
called upon coDeliver
ack to all
upon event rb, Deliver | pi, [ACK, m] do



11/6/2015
ack := ack[m] U {pi}
if correctack[m] do
past := remove(past, <x, m>)
Ali Ghodsi, alig(at)cs.berkeley.edu
When received ack
from all, GC m from
any x
21
Towards another implementation

Main idea



First algorithm


Each broadcasted message carries a history
Before delivery, ensure causality
History is set of all causally preceding messages
Second algorithm [d]

11/6/2015
History is a vector timestamp
Ali Ghodsi, alig(at)cs.berkeley.edu
22
Fail-Silent Waiting Causal Broadcast

Represent past history by vector clock (VC)

Slightly modify the VC implementation

11/6/2015
At node pi

VC[i]:
number of messages pi coBroadcasted

VC[j], ji: number of messages pi coDelivered from pj
Ali Ghodsi, alig(at)cs.berkeley.edu
23
Fail-Silent Waiting Causal Broadcast

Upon CO broadcast m


Piggyback VC and RB broadcast m
Upon RB delivery of m with attached VCm
compare VCm with local VCi

11/6/2015
Only deliver m once VCm precedes VCi
Ali Ghodsi, alig(at)cs.berkeley.edu
24
Execution
(0,0,0)
b(m1)
p1
(0,0,0)
(2,0,0)
(1,0,0)
m1(0,0,0)
d(m1)
(1,0,0)
b(m2)
d(m2)
m2(1,0,0)
d(m1)
p2
(0,0,0)
(2,0,0)
d(m2)
m2(1,0,0)
m1(0,0,0)
(1,0,0)
d(m1)
p3
(2,0,0)
d(m2)
hold m2
11/6/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
25
Fail-Silent Waiting Causal Impl.

Implements:

ReliableCausalOrderBroadcast, instance rco

Uses: ReliableBroadcast, instance rb

upon event rco, Init do


forall pi   do VC[i] := 0
upon event rco, Broadcast|m do



11/6/2015
trigger rb,Broadcast|(DATA, VC, m)
VC[self] := VC[self] + 1
trigger rco,Deliver|self, m
Ali Ghodsi, alig(at)cs.berkeley.edu
send m with VC
VC has only
increased, so RCO
deliver
26
Fail-Silent Waiting Causal Impl. (2)

upon event rb,Deliver|pj, (DATA, VCm , m) do

if pj ≠ self then



pending := pending  (pj, (DATA, VCm, m))
deliver-pending()
procedure deliver-pending()

for every message
whose VC precedes
local VC
while exists x=(sm,(DATA,VCm,m))pending s.t. VCVCm do



11/6/2015
put on hold
pending := pending \ (sm, (DATA, VCm, m)
trigger rco,Deliver | sm, m
VC[ rank(sm) ] := VC[ rank(sm) ] + 1
Ali Ghodsi, alig(at)cs.berkeley.edu
Remove on hold
deliver and increase
local VC
27
Possible execution?
(0,0)
p1
(1,1)
(1,0)
b(m1) d(m1)
d(m2)
m2(0,0)
M1(0,0)
p2
(0,0)

b(m2)
d(m2)
(0,1)
d(m1)
(1,1)
Delivery order isn’t same!

11/6/2015
What is wrong? [d] Nothing, there is no causality.
Ali Ghodsi, alig(at)cs.berkeley.edu
28
Other possible orderings

Other common orderings

Single-source FIFO order

Total order

Causal order
11/6/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
29
Single-Source FIFO order

Intuitively


For all messages m1 and m2 and all pi and pj,


Msgs from same node delivered in order sent
if pi broadcasts m1 before m2, and if pj delivers
m1 and m2, then pj delivers m1 before m2
Caveat

11/6/2015
This formulation doesn’t require delivery of both
messages
Ali Ghodsi, alig(at)cs.berkeley.edu
30
Total Order

Intuitively


For all messages m1 and m2 and all pi and pj,


Everyone delivers everything in exact same order
if both pi and pj deliver both messages, then they deliver
them in the same order
Caveat


11/6/2015
This formulation doesn’t require delivery of both
messages
Everyone delivers same order, maybe not send order!
Ali Ghodsi, alig(at)cs.berkeley.edu
31
Execution Example (1)
a
b
single-source FIFO? yes
11/6/2015
totally ordered?
no
causally ordered?
yes
Ali Ghodsi, alig(at)cs.berkeley.edu
32
Execution Example (2)
a
b
single-source FIFO? no
11/6/2015
totally ordered?
yes
causally ordered?
no
Ali Ghodsi, alig(at)cs.berkeley.edu
33
Execution Example (3)
a
b
single-source FIFO? yes
11/6/2015
totally ordered?
no
causally ordered?
no
Ali Ghodsi, alig(at)cs.berkeley.edu
34
Hierarchy of Orderings

Stronger implies weaker ordering ()
best-effort
FIFO best-effort
causal besteffort
reliable
reliable FIFO
reliable causal
uniform reliable
uniform reliable
FIFO
uniform reliable
causal
11/6/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
35