Formal Models of Distributed Systems (2) Ali Ghodsi – UC Berkeley / KTH alig(at)cs.berkeley.edu.

Download Report

Transcript Formal Models of Distributed Systems (2) Ali Ghodsi – UC Berkeley / KTH alig(at)cs.berkeley.edu.

Formal Models of
Distributed Systems (2)
Ali Ghodsi – UC Berkeley / KTH
alig(at)cs.berkeley.edu
Modeling a Node

State machine of node i

Bunch of states Qi
p2
outbuf[1]
inbuf[1]
outbuf[3]
inbuf[3]

Each state consists of




1 inbuffer set for each neighbor
1 outbuffer set for each neighbor
Other data relevant to algorithm
Initial states

11/7/2015
outbuf[2]
inbuf[2]
p2
p1
p3
inbuf[j] is empty for all j
Ali Ghodsi, alig(at)cs.berkeley.edu
p4
2
Transition functions


All of the state, except outbufs, is called the
accessible state of a node
Transition function f



takes accessible state and gives state, and
adds at most 1 new msg in each outbuf[i] of new state.
all inbuf[i] of new state must be empty
x=0
outbuf[1]={}
inbuf[1]={m0}
outbuf[2]={}
inbuf[2]={}
outbuf[3]={}
inbuf[3]={}
11/7/2015
f
x=1
outbuf[1]={m1}
inbuf[1]={}
outbuf[2]={}
inbuf[2]={}
outbuf[3]={}
inbuf[3]={}
Ali Ghodsi, alig(at)cs.berkeley.edu
3
Single node perspective

This is how computers in a distributed
system work:




1. Wait for message
2. When received message, do some local
computation, send some messages
Goto 1.
Is this a correct model? [D]



11/7/2015
Determinism?
I/O?
Atomicity?
Ali Ghodsi, alig(at)cs.berkeley.edu
4
Single Node to a Distributed System
A configuration is snapshot of state of all nodes

C = (q0,q1,…,qn-1) where qi is state of pi
Configuration

p1
p2
p3
x=1
outbuf[1]={m1}
inbuf[1]={}
outbuf[2]={}
inbuf[2]={}
outbuf[3]={}
inbuf[3]={}
x=4
outbuf[1]={}
inbuf[1]={}
outbuf[2]={}
inbuf[2]={}
outbuf[3]={m7}
inbuf[3]={}
x=11
outbuf[1]={m1}
inbuf[1]={}
outbuf[2]={}
inbuf[2]={}
outbuf[3]={}
inbuf[3]={m3}
An initial configuration is a configuration
where each qi is an initial state
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
5
Single Node to a Distributed System

The system evolves through events



Computation event comp(i)


Computation event at node i, comp(i)
Delivery event of msg m from i to j, del(i,j,m)
Apply transition function f on node i’s state
Delivery event del(i,j,m)

11/7/2015
Move message m from outbuf of pi to inbuf of pj
Ali Ghodsi, alig(at)cs.berkeley.edu
6
Execution

An execution is an infinite sequence of



If eventk is comp(i)


config0, event1, config1, event2, config2…
config0 is an initial configuration
configk-1 changes to configk by
applying pi’s transition function on i’s state in configk-1
If eventk is del(i,j,m)

11/7/2015
configk-1 changes to configk by
moving m from i’s outbuf for link i↔j to j’s inbuf for link i-j
Ali Ghodsi, alig(at)cs.berkeley.edu
7
Single Node to a Distributed System
Example execution
config0
p1
x=1
outbuf[1]={m1}
inbuf[1]={}
p2
x=4
outbuf[1]={m7}
inbuf[1]={}
outbuf[2]={}
inbuf[2]={}
p3
x=11
outbuf[1]={}
inbuf[1]={}
11/7/2015
p2
p1
1
2
1
1
p3
config1
p1
x=1
outbuf[1]={m1}
inbuf[1]={}
p2
x=4
outbuf[1]={}
inbuf[1]={}
outbuf[2]={}
inbuf[2]={}
p3
x=11
outbuf[1]={}
inbuf[1]={m7}
event1
del(2,3,m7)
Ali Ghodsi, alig(at)cs.berkeley.edu
config2
p1
x=1
outbuf[1]={m1}
inbuf[1]={}
p2
x=4
outbuf[1]={}
inbuf[1]={}
outbuf[2]={}
inbuf[2]={}
p3
x=12
outbuf[1]={m8}
inbuf[1]={}
event2
comp(3)
8
Some definitions for later use…

Each comp(i) is associated with a transition


Transition (s1,s2,j) is applicable in
configuration c if


If f of process i maps state1 to state2:
the triple (state1,state2,i) is called a transition
The accessible state of node j is s1 in c
A del(i,j,m) is applicable in configuration c if

11/7/2015
m is in outbuf for link i↔j of node i in c
Ali Ghodsi, alig(at)cs.berkeley.edu
9
Single Node to a Distributed System
Example execution
p1
1
1
p2
config0
p1
x=1
outbuf[1]={m1}
inbuf[1]={}
config1
event1
p1
x=1
outbuf[1]={m1}
inbuf[1]={}
p2
x=12
outbuf[1]={m8}
inbuf[1]={}
comp(2)
p2
x=11
outbuf[1]={}
inbuf[1]={m7}
associated with
transition
(
11/7/2015
x=11
inbuf[1]={m7}
,
x=12
outbuf[1]={m8}
inbuf[1]={}
Ali Ghodsi, alig(at)cs.berkeley.edu
, 2)
10
Some definitions for later use… (2)

If transition e=(s1,s2,i) is applicable to conf c


Then app(e,c) gives new configuration after the
event comp(i)
If e=del(i,j,m) is applicable to conf c

11/7/2015
Then app(e,c) gives new configration after the
event del(i,j,m)
Ali Ghodsi, alig(at)cs.berkeley.edu
11
Schedules (Asynchronous Model)

Our processes are deterministic


Non-determinism comes from asynchrony



Given some message, update state, send some messages,
and wait…
Messages take arbitrary time to be delivered
Processes execute at different speeds
A schedule is the sequence of events



11/7/2015
Message asynchrony determined by del(i,j,m)
Process speeds determined by comp(i)
All non-determinism embedded in schedule!
Ali Ghodsi, alig(at)cs.berkeley.edu
12
Schedules (2)

Given the initial configuration


The schedule determines the whole execution
Not all schedules allowed for an initial conf.

11/7/2015
del(i,j,m) only allowed if m is in outbuf of i in
previous configuration
Ali Ghodsi, alig(at)cs.berkeley.edu
13
Admissible executions (aka fairness)

An execution is admissible if



each process has infinite number of comp(i), and
every message m sent is eventually del(i,j,m)
Why infinity?


11/7/2015
Executions are infinite
When algorithm is finished, only make dummy
transitions (same state)
Ali Ghodsi, alig(at)cs.berkeley.edu
14
Synchronous Systems

Lockstep execution


Execution partitioned into non-overlapping
rounds
Informally, in each round



11/7/2015
Every process can send a message to each
neighbor
All messages are delivered
Every process computes based on message
received
Ali Ghodsi, alig(at)cs.berkeley.edu
15
Synchronous Systems Formally

Execution partitioned into disjoint rounds

Round consists of



Deliver event for every message in all outbufs
One computation event on every process
Every execution is admissible

Executions by definition infinite


11/7/2015
Processes take infinite steps
Every message is delivered
Ali Ghodsi, alig(at)cs.berkeley.edu
16
Time, clocks & order of
events
Order of Events

The following theorem shows an important result:


The order in which two applicable computation events or
two applicable delivery events are executed is irrelevant!
Theorem:

Let a and b be two different events applicable in C
where a and b are comp events, then



11/7/2015
a is applicable to app(b, C)
b is applicable to app(a, C)
app(b, app(a, C)) = app(a ,app(b, C))
Ali Ghodsi, alig(at)cs.berkeley.edu
18
Ordering Proof

a and b are both transitions

a and b cannot be on the same node, since two different
events cannot be applicable at the same time


a=(s1,s2,i) and b=(s3,s4,j) for i≠j
Since transition b only changes state of node j in C


State of node i is still s1 in app(b, C)
Thus, a is applicable in app(b, C)

Symmetrically, b will be applicable in app(a,C)

app(a, app(b, C)) = app(b, app(a, C))

11/7/2015
because a and b do not change the other one’s state
Ali Ghodsi, alig(at)cs.berkeley.edu
19
Order sometimes matter

The theorem says nothing in two cases




If both events are comp(i) on same node i
One delivers message m, other outputs or
consumes m through a comp(i)
In above cases both events cannot be
applicable in C
Above cases the events are causally related
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
20
Causal Order

The relation <H on the events of an execution (or schedule),
called causal order, is defined as follows

If a occurs before b on the same process, then a <H b

If a produces (comp) m and b delivers m, then a <H b

If a delivers m and b consumes (comp) m, then a <H b

<H is transitive.


I.e. If a <H b and b <H c then a <H c
Two events, a and b, are concurrent if not a <H b and not b <H a
a||b
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
21
Example of Causally Related
events
Concurrent Events
Time-space diagram
Causally Related Events
p1
p2
p3
time
Causally Related Events
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
22
Similarity of executions

The view of pi in E, denoted E|pi, is


Two executions E and F are similar w.r.t pi if


the subsequence of execution E restricted to
events and state of pi
E|pi = F|pi
Two executions E and F are similar if

11/7/2015
E and F are similar w.r.t every node
Ali Ghodsi, alig(at)cs.berkeley.edu
23
Equivalence of Executions:
Computations

Computation Theorem:

Consider execution E=(c0,e1,c1,e2,c2,…), and schedule
of events V=(e1,e2,e3,…)


I.e. app(ei,ci-1)=ci
Let P be a permutation of V, preserving causal order

P=(f1, f2, f3…) preserves the causal order of V when for
every pair of events fi ≤H fj implies i<j

11/7/2015
Then E is similar to the execution starting in c0
with schedule P
Ali Ghodsi, alig(at)cs.berkeley.edu
24
Equivalence of executions

If two executions F and E have the same
collection of events, and their causal order
is preserved, F and E are said to be similar
executions, written F~E

11/7/2015
F and E could have different permutation of
events as long as causality is preserved!
Ali Ghodsi, alig(at)cs.berkeley.edu
25
Computations


Similar executions form equivalence classes where every
execution in class is similar to the other executions in the class.
I.e. the following always holds for executions:
 ~ is reflexive


~ is symmetric


I.e. If a~b then b~a for any executions a and b
~ is transitive


I.e. a~ a for any execution
If a~b and b~c, then a~c, for any executions a, b, c
Equivalence classes are called computations of executions
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
26
Example of similar executions
p1
Same color ~ Causally related
p2
p3
time
p1

p2
All three executions are part
of the same computation, as
causality is preserved
p3
time
p1
p2
p3
time
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
27
Two important results (1)



Computation theorem gives two important results
Result 1: There is no algorithm which can observe the order of
the sequence of events (that can “see” the time-space
diagram) for all executions
Proof:



11/7/2015
Assume such an algorithm exists. Assume p knows the order in
the final repeated configuration
Take two distinct similar executions of algorithm preserving
causality
Computation theorem says their final repeated configurations
are the same, then the algorithm cannot have observed the
actual order of events as they differ
Ali Ghodsi, alig(at)cs.berkeley.edu
28
Two important results (2)


Result 2: The computation theorem does not hold if
the model is extended such that each process can
read a local hardware clock
Proof:


11/7/2015
Similarly, assume a distributed algorithm in which each
process reads the local clock each time a local event
occurs
The final (repeated) configuration of different causality
preserving executions will have different clock values,
which would contradict the computation theorem
Ali Ghodsi, alig(at)cs.berkeley.edu
29
Observing Causality


So causality is all that matters…
…how to locally tell if two events are causally
related?
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
30
Lamport Clocks

Each process has a local logical clock, kept
in variable t, initially t=0


Node p piggybacks (t, p) on every sent message
On each event update t:

t = t + 1 for every transition

If receive message with timestamp (tq, q)

t = max(t, tq)
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
31
Lamport Clocks (2)

Comparing two timestamps (tp,p) and (tq,q)




(tp,p)<(tq,q) iff (tp<tq or (tp=tq and p<q))
i.e. break ties using node identifiers
e.g. (5,p5)<(7,p2), (4,p2)<(4,p3)
Lamport logical clocks guarantee that:

If a <H b, then t(a) < t(b),

11/7/2015
where t(a) is Lamport clock of event a
Ali Ghodsi,
Ali Ghodsi,
alig(at)cs.berkeley.edu
aligh(at)kth.se
32
Example of Lamport logical clocks
p1
0
1
2
3
4
p2
0
p3
0
4
1
5
6
time
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
33
Vector Timestamps

Each process p has local vector vp of size n



vp[i]=0 for all i
Piggyback vp on every sent message
For each transition update local vp by


vp[p] := vp[p] + 1
vp[i] := max( vp[i], vq[i] )

11/7/2015
where vq is clock in message received from node q
Ali Ghodsi, alig(at)cs.berkeley.edu
34
Comparing Vector Clocks

vp < vq iff


vp and vq are concurrent (vp || vq) iff


vp[i]≤vq[i] for all i
not vp<vq, and not vq<vp
Vector clocks guarantee


If v(a) < v(b) then a <H b, and
If a <H b, then v(a) < v(b)

11/7/2015
where v(a) is the vector clock of event a
Ali Ghodsi, alig(at)cs.berkeley.edu
35
Example of Vector Timestamps
p1
[0,0,0]
[1,0,0]
[2,0,0]
[3,0,0]
[4,0,0]
p2
[0,0,0]
p3
[0,0,0]
[3,1,0]
[0,0,1]
[3,2,0]
[3,2,2]
time
Great! But cannot be done with smaller
vectors than size n, for n nodes
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
36
Partial and Total Orders

Only a partial order or a total order? [d]

the relation <H on events in executions


the relation < on Lamport logical clocks


Total: any two distinct clock values are ordered
the relation < on vector timestamps

11/7/2015
Partial: <H doesn’t order concurrent events
Partial: timestamp of concurrent events not ordered
Ali Ghodsi, alig(at)cs.berkeley.edu
37
Logical clock vs. Vector clock

Logical clock




(1)
Vector clock


If a <H b then t(a) < t(b)
If a <H b then t(a) < t(b)
If t(a) < t(b) then a <H b
(1)
(2)
Which of (1) and (2) is more useful? [d]
What extra information do vector clocks
give? [d]
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
38
Complexity
Complexity of Algorithms

We care about



Termination


Number of messages used before terminating
Time it takes to terminate
A subset of the states Qi are terminated states
Algorithm has terminated when


11/7/2015
All states in a configuration are terminated
No messages in {in,out}bufs
Ali Ghodsi, alig(at)cs.berkeley.edu
40
Message Complexity

Maximum number of messages until
termination for all admissible executions

11/7/2015
This is worst-case message complexity…
Ali Ghodsi, alig(at)cs.berkeley.edu
41
Time Complexity

Basic idea of time complexity



Message delay is at most 1 time unit
Computation events take 0 time units
Formally, timed execution is an execution s.t.





11/7/2015
Time is associated with each comp(i) event
First event happens at time 0
Time can never decrease & strictly increases locally
Max time between comp(i) sending m and comp(j)
consuming m is 1 time unit
Time complexity is maximum time until termination for
all admissible timed executions
Ali Ghodsi, alig(at)cs.berkeley.edu
42
Time Complexity (2)

Why at most 1?


Why not just assume every msg takes 1 time
unit?
Would not model reality

11/7/2015
Some algorithms would have misleading time
complexity
Ali Ghodsi, alig(at)cs.berkeley.edu
43
At most is less or more than equal?

Compare “at most” vs. “exactly” 1 time unit

11/7/2015
How do they compare? [d]
Ali Ghodsi, alig(at)cs.berkeley.edu
44
Time Complexity: broadcasting

Init:
parent = null
n = number of nodes

Source:
send <a> to all neighbors
wait to receive n-1 <b>

Others:
when receive <a> from p:
if parent==null:
parent := p
forward <a> to all neighbors except <parent>
send <b> to parent
when receive <b>:
send <b> to parent

What is the time complexity if every message takes


11/7/2015
At most 1 time unit?
Exactly 1 time unit?
Ali Ghodsi, alig(at)cs.berkeley.edu
45
At most can only lower complexity

“at most” produces ≥ time complexity


Every timed execution with “exactly” 1 time unit is
possible in the “at most” model
Time complexity considers the maximum time

Time complexity of “at most” can only increase over “exactly”
“exactly 1”
“at most 1”
11/7/2015
Ali Ghodsi, alig(at)cs.berkeley.edu
46
Summary

The total order of executions of events is not always important


Causal order matters:





Order of two events on the same process
Order of two events, where one is a send and the other one a corresponding receive
Order of two events, that are transitively related according to above
Executions which contain permutations of each others event such that
causality is preserved are called similar executions
Similar executions form equivalence classes called computations.


Two different executions could yield the same “result”
Every execution in a computation is similar to every other execution in its
computation
Vector timestamps can be used to determine causality

11/7/2015
Cannot be done with smaller vectors than size n, for n processes
Ali Ghodsi, alig(at)cs.berkeley.edu
47