© Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il The snapshot algorithm (Candy and Lamport) 3 4 Goal: design a snapshot (=global-statedetection) algorithm that:  will record a collection of.

© Distributed Computing 5. Snapshot Shmuel Zaks [email protected] The snapshot algorithm (Candy and Lamport) 3 4 Goal: design a snapshot (=global-statedetection) algorithm that:  will record a collection of.

Transcript © Distributed Computing 5. Snapshot Shmuel Zaks [email protected] The snapshot algorithm (Candy and Lamport) 3 4 Goal: design a snapshot (=global-statedetection) algorithm that:  will record a collection of.

Slide 1

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 2

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 3

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 4

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 5

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 6

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 7

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 8

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 9

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 10

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 11

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 12

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 13

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 14

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 15

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 16

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 17

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 18

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 19

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 20

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 21

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 22

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 23

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 24

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 25

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 26

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 27

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 28

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 29

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 30

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 31

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 32

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 33

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 34

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 35

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 36

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 37

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 38

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 39

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 40

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 41

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 42

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 43

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 44

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 45

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 46

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

Slide 47

©

Distributed Computing
5. Snapshot
Shmuel Zaks
[email protected]

The snapshot algorithm (Candy and Lamport)

2

3

4

Goal: design a snapshot (=global-statedetection) algorithm that:

 will record a collection of states of all system
components (which forms a global system
state),
 will not change the underlying computation,
 will not freeze the underlying computation
5

A Process Can…
 record its own state,
 send and receive messages,
 record messages it sends and receives,
 cooperate with other processes

Processes do not share clocks or memory
 Processes cannot record their state
precisely at the same instant

6

Motivation

 Many problems in distributed systems
can be stated in terms of the problem of
detecting global states:
Stable property detection problems :
termination detection,
deadlock detection
etc.
 Checkpointing
7

Stable Property Detection Problem
D - distributed system
y - a predicate function defined on the set
of global states of D
S, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’
reachable from S
8

 many distributed algorithms are
structured as a sequence of phases
 A phase: transient part, then a stable part
phase termination vs. computation
termination
 our view on the problem:
i. detect the termination of a phase
ii. initiate a new phase
Notice that “the kth phase has terminated” is
a stable property
9

Model
 Distributed system D is a
finite, labeled, directed
graph.

C1

p

q
C2

 Channels have infinite buffers, are errorfree and preserve FIFO
 Message delay is bounded, but unknown
10

State of a Channel

C1
p

C1
p

3

2

1

q

C2
q

1

 [1, 2, 3] – sequence X of messages that
were sent
 [1] – sequence Y of received messages
( prefix of X )
 [2, 3] – state of C1: X \ Y

11

Example:

System

Distributed system:

C1
p

q
C2

State transitions

receive

A

B

(same for p and q):

send

Initial global state:

Ø

B

A
Ø
12

A

receive

C1

p

B

q

C2

send

Global state transition diagram

p

B

q

Ø
A

p sends

p

q

A

A

Ø

Ø

p receives

Ø
A

A

q receives
q sends

Ø
A

B

Ø
A computation corresponds to a path in the diagram
deterministic

13

Example:

System

p

Distributed system:
State transition:

p:

q:

C1

q

C2

A

C

send
receive
send

receive

B

D
14

A

send

B

receive

p

C1

q

C2

C

send

D

receive

Global state transition diagram

p
A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q receives

A

D

q sends
p receives

B

D

Ø

non-deterministic

15

A

send

B

receive

p

C1

q

C2

C

send

D

receive

We look at the following sequence of events:

p

A

q

Ø

C

p sends

p

q

B

C

Ø

Ø
q sends

A

D

p receives

B

D

Ø
16

in the snapshot algorithm:
Each process records its own state
p and q cooperate to record the state of C.

p

C

q

17

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

p

q

q

Ø

Recorded state:

No token

C

A

A

Record C

Ø
A

A

A

Record q
Record p

18

A

receive
B

p

C1

q

C2

send

Example:

System

p
B

C1

p

q

q

Ø
A

A
Ø

Recorded state:

Two tokens

A

Ø

B

Record p
A

Record C
Record q

19

Record p
Record C
Record q
P’s state
recorded

P sends a
message on C

C’s state
recorded

time

C’s state
recorded

P sends a
message on C

P’s state
recorded

Record C
Record q
Record p
20

p

C

q

q will record the state of C
p and q have to coordinate ; using a special marker
q starts recording C after it records its state
q stops when receiving

from p

But: how does q know when to record its state?
21

The snapshot algorithm

Who starts?
We assume one process.

Hw: extend discussion + proof to any number
of startes.

22

(Intuition for the
Algorithm)
p

C

q

 Who will record the state of channel C? q
 q starts recording after it records its state

 How q knows when to stop recording?
 p sends
right after it records
its state, and before sending any other message
23

The snapshot algorithm
channel recording

C

p

q

Starts when q records itself
Ends when q receives

along C

Note : for any q  p0, the channel along
which
arrived first is recorded as 

24

The snapshot algorithm
State recording
p0 starts.
p0 recoreds its state,
and then broadcasts

.

Shout-algorithm =
PI (Propogation-of-information)=
hot potato = …
When q receives
for the first time, it
records its own state
25

The snapshot algorithm
Marker-Sending Rule for a process q
1. record the state of p
2. send
along c before sending
any other message
Marker-Receiving Rule for a process q
on receiving
along channel c:
if q’s state is not recorded:
1. record state;
2. record c’s state = ;
else: c’s state is the sequence of messages
received since q recorded its state
26

Termination
Assumption No marker remains forever in an
input channel

Claim: If the graph is strongly connected and
at least one process records its state,
then all processes will record their state
in finite time
Proof: by induction
27

The Recorded Global State

Ex:

System

C1

p

q

C2
State transition:

p:

q:

A

C

send
receive
send

receive

B

D
28

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C

p sends



p

q

B
A

C


q sends

A

D



p receives

B

D
29

What did we get?

30

e in process p is an atomic action:
can change the state of p, and
a state of at most one channel c incident on p
(by sending/receiving message M along c )
 e is defined by < p, s, s’, M, c >
 e = may occur in global state S
if 1. the state of p in S is s
2. if c is directed towards p then c’s state
has M in its head
 Event

31

Process State and Global State
 A process: set of states, an initial state
set of events
 A global state S: collection of process states
and channel states
 initially, each process is in its initial state and all
channels are empty

 next(S, e) is the global state after event e in
applied to global state S

32

Process State and Global State

 seq = (ei : i = 0…n) is a computation of
the system iff

ei may occur in Si , 0  i  n

Si+1 = next(Si, ei)
(S0 is the initial global state)

33

The Recorded Global State

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

34

Definition
Event ej is called pre-recording if ej is in a
process p and p records its state after ej in seq .
Event ej is called post-recording if ej is in a
process p and p records its state before ej in seq .
Assume that ej-1 is a post-recording event before
Pre-recording event ej in seq.

35

Claim: Sequence obtained by interchanging

ej-1 and ej is a computation.
Proof: ej-1 occurs in p and ej in q (other than p).
There cannot be a message sent at ej-1 and
received at ej.
Hence, event ej can occur in global state Sj-1.
The state of process p is not altered by ej,
hence ej-1 can occur after ej.
36

Proof
Swap the events till all post-recorded events
appear after all pre-recorded events.
The acquired computation is seq’.
All that is left to show: S* is a global state
after all prerecorded events and before all
postrecorded events.

1. Process states
2. Channel states

37

Claim: The state of a channel in S* is
(sequence of messages corresp. to pre-recorded
receives)(sequence of messages corresp. to
prerecorded sends)
Proof: The state of channel c from process
p to process q recorded in S* is the sequence
of messages received on c by q after q
records its state and before q receives a
marker on c.
The sequence of messages sent by p is the
sequence corres. to prerecording sends on c.

38

A

send

B

receive

p

C1

q

C2

C

send

D

receive


A

D

p
A

q



C



p sends

p

q

B

C

post
pre

q sends

A

D



p receives

post

B

D
39

A

send

B

receive

A

p

C1

q

C2

C

send

D

receive


D
p
A

(Another execution)
p
q

q



C



q sends

A



D

pre
post

p sends

A



D

p receives

post

A
B

D
40

What did we get?
A configuration that could have happened

41

(The Recorded Global State)

seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state

42

(The Recorded Global State)

Theorem : There exists a computation
seq' = (ej' : j  0), where
1. For all j < i or j  t : ej' = ej
2. The subsequence (ej' | i  j < t) is a
permutation of the subsequence
(ej | i  j < t)
3. For all j < i or j  t : Sj' = Sj
4. There exists k, i  k  t, such that S * = Sk

Stable Detection

D - distributed system
y - a predicate function defined on the set of
global states of D
S, S’ – global states of D
y is a stable property of D if y(S) implies y(S’)
for all S’ reachable from S
44

Algorithm
Input: A stable property y
Output: a boolean value b with the
property:
y(S0)
b and b
y(St)
Algorithm:
begin
record a global state S*
b := y(S*)
end
45

Correctness
1. S* is reachable from S0
2. St is reachable from S*
3. y(S)
y(S’)
for all S’ reachable from S

y(S*)=true



S0
St
y(S0)=false

S*
y(S*)=false



y(St)=true

46

References
K. M. Chandy and L. Lamport,
Distributed Snapshots:
Determining Global States of Distributed
Systems

47

© Distributed Computing 5. Snapshot Shmuel Zaks [email protected] The snapshot algorithm (Candy and Lamport) 3 4 Goal: design a snapshot (=global-statedetection) algorithm that:  will record a collection of.

Transcript © Distributed Computing 5. Snapshot Shmuel Zaks [email protected] The snapshot algorithm (Candy and Lamport) 3 4 Goal: design a snapshot (=global-statedetection) algorithm that:  will record a collection of.

Directory