© Distributed Computing 5. Snapshot Shmuel Zaks [email protected] The snapshot algorithm (Candy and Lamport) 3 4 Goal: design a snapshot (=global-statedetection) algorithm that: will record a collection of.
Download ReportTranscript © Distributed Computing 5. Snapshot Shmuel Zaks [email protected] The snapshot algorithm (Candy and Lamport) 3 4 Goal: design a snapshot (=global-statedetection) algorithm that: will record a collection of.
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 2
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 3
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 4
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 5
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 6
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 7
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 8
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 9
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 10
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 11
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 12
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 13
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 14
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 15
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 16
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 17
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 18
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 19
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 20
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 21
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 22
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 23
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 24
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 25
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 26
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 27
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 28
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 29
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 30
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 31
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 32
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 33
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 34
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 35
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 36
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 37
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 38
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 39
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 40
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 41
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 42
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 43
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 44
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 45
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 46
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47
Slide 47
©
Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]
The snapshot algorithm (Candy and Lamport)
2
3
4
Goal: design a snapshot (=global-statedetection) algorithm that:
will record a collection of states of all system
components (which forms a global system
state),
will not change the underlying computation,
will not freeze the underlying computation
5
A Process Can…
record its own state,
send and receive messages,
record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
6
Motivation
Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
Checkpointing
7
Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8
many distributed algorithms are
structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation
termination
our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9
Model
Distributed system D is a
finite, labeled, directed
graph.
C1
p
q
C2
Channels have infinite buffers, are errorfree and preserve FIFO
Message delay is bounded, but unknown
10
State of a Channel
C1
p
C1
p
3
2
1
q
C2
q
1
[1, 2, 3] – sequence X of messages that
were sent
[1] – sequence Y of received messages
( prefix of X )
[2, 3] – state of C1: X \ Y
11
Example:
System
Distributed system:
C1
p
q
C2
State transitions
receive
A
B
(same for p and q):
send
Initial global state:
Ø
B
A
Ø
12
A
receive
C1
p
B
q
C2
send
Global state transition diagram
p
B
q
Ø
A
p sends
p
q
A
A
Ø
Ø
p receives
Ø
A
A
q receives
q sends
Ø
A
B
Ø
A computation corresponds to a path in the diagram
deterministic
13
Example:
System
p
Distributed system:
State transition:
p:
q:
C1
q
C2
A
C
send
receive
send
receive
B
D
14
A
send
B
receive
p
C1
q
C2
C
send
D
receive
Global state transition diagram
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q receives
A
D
q sends
p receives
B
D
Ø
non-deterministic
15
A
send
B
receive
p
C1
q
C2
C
send
D
receive
We look at the following sequence of events:
p
A
q
Ø
C
p sends
p
q
B
C
Ø
Ø
q sends
A
D
p receives
B
D
Ø
16
in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.
p
C
q
17
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
p
q
q
Ø
Recorded state:
No token
C
A
A
Record C
Ø
A
A
A
Record q
Record p
18
A
receive
B
p
C1
q
C2
send
Example:
System
p
B
C1
p
q
q
Ø
A
A
Ø
Recorded state:
Two tokens
A
Ø
B
Record p
A
Record C
Record q
19
Record p
Record C
Record q
P’s state
recorded
P sends a
message on C
C’s state
recorded
time
C’s state
recorded
P sends a
message on C
P’s state
recorded
Record C
Record q
Record p
20
p
C
q
q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving
from p
But: how does q know when to record its state?
21
The snapshot algorithm
Who starts?
We assume one process.
Hw: extend discussion + proof to any number
of startes.
22
(Intuition for the
Algorithm)
p
C
q
Who will record the state of channel C? q
q starts recording after it records its state
How q knows when to stop recording?
p sends
right after it records
its state, and before sending any other message
23
The snapshot algorithm
channel recording
C
p
q
Starts when q records itself
Ends when q receives
along C
Note : for any q p0, the channel along
which
arrived first is recorded as
24
The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts
.
Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25
The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26
Termination
Assumption No marker remains forever in an
input channel
Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27
The Recorded Global State
Ex:
System
C1
p
q
C2
State transition:
p:
q:
A
C
send
receive
send
receive
B
D
28
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
A
C
q sends
A
D
p receives
B
D
29
What did we get?
30
e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
e is defined by < p, s, s’, M, c >
e =
may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
Event
31
Process State and Global State
A process: set of states, an initial state
set of events
A global state S: collection of process states
and channel states
initially, each process is in its initial state and all
channels are empty
next(S, e) is the global state after event e in
applied to global state S
32
Process State and Global State
seq = (ei : i = 0…n) is a computation of
the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global state)
33
The Recorded Global State
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
34
Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.
35
Claim: Sequence obtained by interchanging
ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36
Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.
1. Process states
2. Channel states
37
Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.
38
A
send
B
receive
p
C1
q
C2
C
send
D
receive
A
D
p
A
q
C
p sends
p
q
B
C
post
pre
q sends
A
D
p receives
post
B
D
39
A
send
B
receive
A
p
C1
q
C2
C
send
D
receive
D
p
A
(Another execution)
p
q
q
C
q sends
A
D
pre
post
p sends
A
D
p receives
post
A
B
D
40
What did we get?
A configuration that could have happened
41
(The Recorded Global State)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
42
(The Recorded Global State)
Theorem : There exists a computation
seq' = (ej' : j 0), where
1. For all j < i or j t : ej' = ej
2. The subsequence (ej' | i j < t) is a
permutation of the subsequence
(ej | i j < t)
3. For all j < i or j t : Sj' = Sj
4. There exists k, i k t, such that S * = Sk
Stable Detection
D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44
Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45
Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S
y(S*)=true
S0
St
y(S0)=false
S*
y(S*)=false
y(St)=true
46
References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems
47