Ch13 Checkpointing and Recovery

Download Report

Transcript Ch13 Checkpointing and Recovery

Ch13
Checkpointing and Recovery
Outline
Introduction
What ?
Why?
Where?
Problems in Rollback
Incarnation numbers
Taxonomy of solution techniques
Uncoordinated checkpoint
Coordinated checkpoint
Synchronous Logging
Asynchronous Logging
Adaptive Logging
Checkpointing and Recovery
Introduction
During a computation, a node might fail and then be repaired
After a failed processor has been repaired,
how to take the system to a consistent global state?
If every processor periodically :
records its local state on stable storage,
records messages received on stable storage
Then
One can take the system to a consistent global state by rolling back the
system to a previously recorded global state
Terminology
checkpointing : record state in a stable storage
log received messages : record received messages on a stable storage
Checkpointing and Recovery
Recovery line
A set C of local checkpoints forms a consistent state
(also called recovery line) if the following conditions are
satisfied:
1) there are no lost messages in C
2) there are no orphan messages in C
3) C contains exactly one checkpoint for each processor
Checkpointing and Recovery
Problems in rollback
Goal of rollback is to roll back the system to a consistent state
Some precautions have to be taken for this to work properly
For simplicity, we do not consider channel state for the rollback
To see the problem, assume:
1) processors checkpoint from time to time
2) checkpoints are established independently without any
coordination between themselves
Checkpointing and Recovery
Problems in rollback
To see the problem, assume:
1) processors checkpoint periodically
2) checkpoints are established independently without any
coordination between themselves
p3
p1
p2
m2
c2
m1
c1
m3
c3
The global state formed by
c1,c2,c3 is inconsistent it contains:
lost messages: m2, m3
orphan messages: m1
Checkpointing and Recovery
Problems in rollback
p
q
p1
: cascading rollbacks
r
q1
p
r4
 m1
r
q4
 m2q
r1
r2
q2
p2
m4
p3
r3
q3
p2
m5
r4
m3
q4
m2
p3
m1
q3
r3
 m3
p
 m4
q
 m5r
“p rolls back to p3”
requires , because of
message m1 that
“r rolls back to r4”
...
{p2,q3,r3} is a recovery line
A rollback by a processor can cause
an avalanche of rollbacks
How to avoid this ?
Checkpointing and Recovery
Problems in rollback
p
q
pi
I/O
: I/O stuttering
r
Rolling back processor p to pi
requires that the I/O event be
re-executed: I/O stuttering
How can we avoid this ?
Log inputs: avoid input stuttering
Output commit: avoid output stuttering
Checkpointing and Recovery
Problems in rollback
p
: messages duplication
q
q
r(m)
p
pi
pi
Rollback(p)
m
After p recovers
m
r(m)
r(m)
Processor p rolls back to pi
No need for q to roll back
After recovery, processor p sends m again.
Processor q should recognize that message m is a
duplicate message
Checkpointing and Recovery
Incarnation numbers: handling duplicate messages
Every processor:
maintains an incarnation number on a stable storage
stores a guess of the incarnation number of every other processor
On every recovery from failure or rollback,
the incarnation number is incremented;
Each message carries the incarnation number of the sender
Checkpointing and Recovery
Incarnation numbers: handling duplicate messages
Evolution of a processor is organized into periods.
Incarnations numbers serve to identify these periods
Recovery
from failure
0
1
[ period 0 [ period 1
Rollback
2
[
When processor p receives a message m from processor q, processor p behaves as follows:
if m.incarnation < incarnation[q]: message m is a duplicate, discard it
if
=
: deliver m
if
>
: m belongs to an incarnation that p don’t know yet, so block
the delivery of m until m.incarnation=incarnation[q]
Checkpointing and Recovery
Choices to be made to implement a recovery scheme
To log or not to log messages ?
Log messages:
+ : increases flexibility at the recovery time
- : expensive (space)
processes must be deterministic (which is not often the case)
Checkpointing and Recovery
Choices to be made to implement a recovery scheme
To coordinated or not to coordinated recording state?
Uncoordinated checkpoints
Sufficient information (we’ll see later) must be kept for rollback
+ : keeps the cost of establishing checkpoints low
- : the amount of rollback may be unbounded
Coordinated checkpoints
The set of checkpoints together form a recovery line
+ : limits the amount of rollback
- : increases the cost of establishing checkpoints
Checkpointing and Recovery
Uncoordinated checkpointing
Assumptions
1. Processors asynchronously checkpoint from time to time
2. No coordination between processors for establishment of
checkpoints
3. No log of messages
Goal
find a maximal recovery line (latest recovery line)
i.e the one that happens after every other possible recovery line
Checkpointing and Recovery
Uncoordinated checkpointing
Checkpoint interval algorithm (progressive rollback)
Notations
Ci,j : the jth checkpoint at processor pi
Ii,j : the interval ] Ci,j ; Ci,j+1[, processing interval of pi between
Ci,j and Ci,j+1
Definition
Ik,l depends on Ii,j iff there is a message m sent in Ii,j and
received in Ik,l
pi
pk
Ci,j
Ci,j+1
m
Ck,l
Ck,l+1
Checkpointing and Recovery
Uncoordinated checkpointing
Checkpoint interval algorithm (progressive rollback)
Idea of the algorithm
When a processor pi fails and then is repaired
1. Processor pi initiates recovery by restoring its last
checkpoint, say Ci,j
2. Every processor pk in Ik,l such that Ik,l depends on Ii,j rolls
back (but to which checkpoint ? We’ll see later)
3. This process continues recursively (transitively) until a
recovery line is determined
To support recovery, the information about interval dependence
must be recorded (This is the sufficient information !)
Checkpointing and Recovery
Uncoordinated checkpointing
Interval dependence graph: to capture rollback requirements
GI is a graph in which
VI: vertices are checkpoint intervals that exist when recovery starts
EI: directed edges such that
1). for every processor pi,
(Ii,j , Ii,j+1) is in EI
2). If Ik,l depends on Ii,j then (Ii,j , Ik,l) is added to EI
If
Ii,j
Ii,j+1
then
then
If
Ii,j
Ii,j+1
in GI
Ii,j
Ii,j
Ik,l
Ik,k+1
in GI
Checkpointing and Recovery
Uncoordinated checkpointing
Intuition behind interval dependence graph:
If processor pi rolls back to Ci,j and Ik,l depends on Ii,j
then processor pk must roll back to Ck,,l
This, to avoid orphan messages
If
Ii,j
and
then
m
Ci,j
Ik,l
pi
Ck,l
pk
Because of m
Checkpointing and Recovery
Uncoordinated checkpointing
Interval dependence graph illustrated:
p1
p3
p2
I1,1
m5
I1,2
I1,3
I1,4
I2,1
I2,2
m4
m1
m3
m2
I2,3
I3,1
1,1
I3,2
1,2
I3,3
2,2
1,3
3,1
3,2
3,3
2,3
I3,4
1,4
Message passing and checkpoiting
2,1
3,4
Interval dependence graph
Checkpointing and Recovery
Uncoordinated checkpointing
The checkpoint interval algorithm (progressive rollback)
When a processor pi fails and then is repaired, then pi performs
Step 1. Compute GI
Step 2. Mark the node of GI corresponding to its last checkpoint
interval; Let Ii,j be that node.
Mark all the nodes of GI that are reachable from Ii,j
Step 3. Define for each processor k, the “best checkpoint” of k
w.r.t. recovery of pi to be : Ck,l such that
l = min {j | Ik,j is marked}
every processor rolls back to its “best checkpoint”
Checkpointing and Recovery
Uncoordinated checkpointing
The algorithm illustrated: assume that p2 fails and then is repaired
Step 1.
p2 computes GI
1,1
1,2
2,1
2,2
1,3
3,1
3,2
3,3
2,3
1,4
3,4
Interval dependence graph
Checkpointing and Recovery
Uncoordinated checkpointing
The algorithm illustrated: assume that p2 fails and then is repaired
Step 2.
p2 marks all the nodes of GI
reachable from its last checkpoint
interval
1,1
1,2
2,1
2,2
1,3
Recall: for each processor k
the “best checkpoint” of k w.r.t.
recovery of p2 is Ck,l such that
l = min {j | Ik,j is marked}
3,1
3,2
3,3
2,3
1,4
3,4
Interval dependence graph
Checkpointing and Recovery
Uncoordinated checkpointing
The algorithm illustrated: assume that p2 fails and then is repaired
p1
p2
Step 3.
Each processor rolls back to
its “best checkpoint” w.r.t.
Recovery of p2
I1,1
m5
I1,2
I1,3
Recall: for processor k
the “best checkpoint” of k w.r.t.
recovery of p2 is Ck,l such that
l = min {j | Ik,j is marked}
I1,4
I3,1
I2,1
I3,2
I2,2
m4
m1
m3
m2
I2,3
p3
I3,3
I3,4
The recovery line determined
Checkpointing and Recovery
Uncoordinated checkpointing
Some comments about the checkpoint interval algorithm
Rollback can take the system to the initial state
The algorithm presented is a centralized algorithm
can be implemented on a recovery manager that directs all the
participants to restart, each from its “best checkpoint”
For a distributed version,
recovery control messages are must be used to communicate
parts of GI
Checkpointing and Recovery
Coordinated checkpointing
Idea:
Processors coordinate the checkpointing of their local states
to ensure that the checkpoints taken by the different processors
form a recovery line
This avoid cascading rollback
Method used:
Similar to that used for computing a “global snapshot”
However, there are some differences
Checkpointing and Recovery
Coordinated checkpointing
Subtleties:
1. Only processor states are recorded (save space)
2. Failures during checkpointing are handled
3. Store the minimum number of checkpoints (save space)
4. Lost messages are handled by the communication protocol
(a consistent set of checkpoints may now contain lost messages)
5. No orphan messages in the computed set of checkpoints
Checkpointing and Recovery
Coordinated checkpointing
Subtleties (cont.):
6. Only a minimum number of processors must checkpoint
idea:
old checkpoints together with new checkpoints of some
processors may form a “consistent set” of checkpoints
Checkpointing and Recovery
Coordinated checkpointing
Koo & Toueg 87 (the original algorithm):
Uses a two-phase protocol to ensure that either all processors
checkpoint or none do
Two types of checkpoints are used for that
“tentative checkpoint” :
established when global state recording is ongoing
“permanent checkpoint” :
if the recorded state is consistent, tentative checkpoints
become permanent checkpoints
Checkpointing and Recovery
Coordinated checkpointing: Koo &
Toueg 87 (the original algorithm)
Basic idea
Phase 1
Initiator q:
1. an initiator processor q takes a tentative checkpoint;
2. q requests all other processors to take tentative checkpoints
Non-initiator p:
on receiving this request
1. p establish/ not establish the tentative checkpoint;
2. p sends its decision to the initiator;
3. p waits for the final decision from q
(i.e. refrains from any communication with any other until the second
phase is over)
Checkpointing and Recovery
Coordinated checkpointing: Koo &
Toueg 87 (the original algorithm)
Basic idea (cont.)
Phase 2 :
Initiator q:
1. Processor q collects decisions from all other processors
2. If
all other processors have taken tentative checkpoints
then
q makes its tentative checkpoint permanent;
else
q undo its tentative checkpoint;
3. q requests all others to perform the same final decision
Non-initiator p:
on receiving this final decision
processor p executes the order;
Checkpointing and Recovery
Coordinated checkpointing: Koo &
Toueg 87 (the original algorithm)
The Basic idea ensures that there are no orphan messages
Why?
Checkpointing and Recovery
Coordinated checkpointing: Koo &
Toueg 87 (the original algorithm)
The Basic idea ensures that there are no orphan messages
Why?
Answer:
no communication is allowed until the second phase is over
Checkpointing and Recovery
Coordinated checkpointing: Koo &
Toueg 87 (the original algorithm)
It is not necessary that all processors record their state during
checkpointing
Why ?
Checkpointing and Recovery
Coordinated checkpointing: Koo &
Toueg 87 (the original algorithm)
It is not necessary that all processors record their state during
checkpointing
Why ?
p2
p1
p3
p1 initiates checkpointing by establishing c1,1
then p1 contacts p2, p3 sending red messages
assume that everything went fine and p2, p3 establish
c2,2 and c3,2 respectively as new checkpoints
C1,1
C2,1
C3,1
C1,2
C2,2
C3,2
{c1,2 , c2,2 , c3,2} form a consistent set of checkpoints
However, {c1,2 , c2,1 , c3,2}also form a consistent set
of checkpoints (i.e. no orphan messages)
Hence, processor p2 need not take a new checkpoint
Checkpointing and Recovery
Coordinated checkpointing: Koo &
Toueg 87 (the original algorithm)
Ensuring a minimum number of checkpoints:
Every processor assigns monotonically increasing sequence
numbers to each message it sends
Each processor p uses:
p.last_rec[1..M] an array of sequence numbers
p.last_rec[i] = sequence number of the last message that processor p received
from processor pi since p’s last checkpoint
p.first_sent[1..M] an array of sequence numbers
p.first_sent[i] = sequence number of the first message that processor p sent to
processor pi since p’s last checkpoint
Checkpointing and Recovery
Coordinated checkpointing: Koo &
Toueg 87 (the original algorithm)
Ensuring a minimum number of checkpoints:
When an initiator processor q requests a processor p to take a
tentative checkpoint,
processor q appends q.last_rec[p] to its request
On receiving this request from q,
processor p takes the tentative checkpoint only if
(p.first_sent[q]  q.last_rec[p])
q
Last checkpoint of q
Current checkpoint of q
p
Last checkpoint of p
p takes a new checkpoint only in this case
 avoid orphan messages
Checkpointing and Recovery
Coordinated checkpointing: Koo &
Toueg 87 (the original algorithm)
Ensuring a minimum number of checkpoints (cont.)
Only processors that have sent messages to the initiator processor q
since q’s last checkpoint need to consider the establishment of a
new checkpoint requested by q
 an initiator processor q should send requests only to those
processors p such that :
q
Last checkpoint of q
Current checkpoint of q
p
Checkpointing and Recovery
Coordinated checkpointing: Koo &
Toueg 87 (the original algorithm)
Ensuring a minimum number of checkpoints (cont.)
Every processor q maintains:
q.checkpoint_cohort : a set that contains those processors from
which q has received some messages
since q’s last chekpoint
i.e. q.checkpoint_cohort stores processors p such that:
q
Last checkpoint of q
Current checkpoint of q
p
Checkpointing and Recovery
Coordinated checkpointing: Koo &
Toueg 87 (the original algorithm)
The algorithm
Phase 1
Initiator processor q:
1. Take tentative checkpoint;
2. for every processor p in q.checkpoint_cohort do
send (Request_tentative_chkp; q.last_rec[p]) to p;
Checkpointing and Recovery
Coordinated checkpointing: Koo &
Toueg 87 (the original algorithm)
The algorithm
Phase 1: Non-initiator processor p:
On receiving “Request_tentative_chkp; q.last_rec[p]” from q
if (ready to perform tentative checkpoint) and (p.first_sent[q]  q.last_rec[p]) then
take tentative checkpoint;
for every processor r in p.checkpoint_cohort do
send (Request_tentative_chkp; p.last_rec[r]) to r;
p.replies := empty;
for every processor r in p.checkpoint_cohort do
wait until r sends “OK” or “KO” , Timeout=T;
on “OK” : add r to p.replies; /* set of replies */
If p.replies  p.checkpoint_cohort then
send “KO” to q
else
send “OK” to q
Checkpointing and Recovery
Coordinated checkpointing: Koo &
Toueg 87 (the original algorithm)
The algorithm
Phase 2
Initiator processor q:
1. q.replies := empty;
2. for every processor p in q.checkpoint_cohort do
wait until p sends “OK” or “KO” , Timeout=T;
on “OK” : add p to q.replies; /* set of replies */
if q.replies  q.checkpoint_cohort then
undo tentative;
send “undo tentative checkpoint” to
every processor in q.checkpoint_cohort
else
permanent := tentative;
send “make tentative checkpoint permanent” to
every processor in q.checkpoint_cohort
Checkpointing and Recovery
Coordinated checkpointing: Koo &
Toueg 87 (the original algorithm)
The algorithm
Phase 2
Non-initiator processor p:
wait until q sends “undo …” or “make … permanent”; timeout = T
on “undo …” do
undo tentative checkpoint
end
on “make … permanent” do
checkpoint : =tentative_checkpoint
end
if no timeout then
m := message received;
for every processor r in p.checkpoint_cohort do
send m to r;
Checkpointing and Recovery
Coordinated checkpointing: Koo &
Toueg 87 (the original algorithm)
Handling failures
idea:
Failures are detected by timeouts;
On recovery,
if the recovering processor was the initiator,
it undoes its tentative checkpoint and sends this decision to
the other processors
else
the recovered processor consults the initiator oe some other
processor to find the final decision
Checkpointing and Recovery
Logging
Idea:
Processors record incoming messages
Purpose:
avoid need of “resending”
reduce the amount of rollback (idea of virtual checkpoint)
Log messages
Virtual checkpoint
+ flexibility
- expensive
Checkpointing and Recovery
Synchronous Logging
Idea
Each message must be logged before it can be delivered
During recovery,
logged messages are replayed until the recovering processor
is up to date
(guarantee of replay after all sends that can cause subsequent
rollback)
Problem :
expensive
Checkpointing and Recovery
Asynchronous Logging
Idea
Each message must be logged but not necessarily before it can be
delivered
Messages can be first saved in main memory
Exploit idle period to log messages
several messages can be packed together then logged
simultaneously (efficient used of I/O devices)
Problem some messages may be lost
 not always possible to replay