Causal logging: Manetho

Download Report

Transcript Causal logging: Manetho

Causal Logging : Manetho
Rohit C Fernandes
10/25/01
Manetho System Model

Non determinististic events





Message Receive
Internal event(Kernel call)
Creation of a new process
Output Commit
Stable Storage + Volatile Memory
Manetho properties



Tolerate any number of simultaneous
failures
Low failure-free overhead
Only failed processes roll back
Example Manetho Execution
Causal Logging : Intuition



Piggyback determinant of nondeterministic event on outgoing
messages
Determinant?
Piggyback Antecedence Graphs
Antecedence Graph



Directed acyclic graph
Nodes : State Intervals
Edges : Happened before(immediate)
Antecedence Graph
Receive Node


Two incoming edges
Fields




Receiver ID
Sender ID
Index of created state interval
Unique identifier of message
Internal Event Node


One incoming edge
Fields


Type of event
Replay information
Failure Free Operation

Each process maintains



AG of its current interval
Log that contains data and ID of each
message sent
Message Send : Piggyback AG of
current state interval
Optimization





Need not send complete AG
Incremental piggybacking
AG(i+1p) is a proper subgraph of
AG(ip)
Process q communicates to p max j
such that jp is in q’s AG
P sends AG (ip ) - AG (jp )
Information on Stable Storage



Checkpoints
AG (asynchronously) : Need not
piggyback part of AG which is in disk
Output commit: Save AG to disk
Incarnation Numbers




Each process starts a new incarnation
after recovery
Integer stored in stable storage
Tagged on outgoing messages
Messages from old incarnations
discarded
Recovery Protocol


Recover(p,c,INCNUM,S)
Step 1



INCNUM  INCNUM+1 ; save INCNUM
INCVEC[p] INCNUM
G AG(pc) // stable storage
Recovery Protocol

Step 2

For all q  S, qp




(INQ,AGQ)remote call at q:GET_AG(p)
GGAGQ
INCVEC[q]INQ
For all q  S, qp

Remote call at q: SEND_INC(p,INCVEC)
Recovery Protocol

Step 3


mmax j such that pj  G
Recover upto pm



Don’t send out application messages but log
them
For receive, request message from sender’s log
Replay internal event
Recovery Example
Available Antecedence Graphs
Application Characteristics
Performance Overhead
Coordinated vs. Uncoordinated