Transcript Document

Fast Checkpoint Recovery for
Applications with Frequent Points of
Consistency
Marcos Vaz Salles
Cornell University
(joint work with Tuan Cao, Benjamin Sowell, Yao
Yue, Alan Demers, Johannes Gehrke, and Walker
White)
SIGMOD 2011
Athens, Greece
Data in Main Memory
Large main memories
have become
affordable
• Over 50 GB of main memory per server no
longer uncommon
– At Amazon EC2, server with 68 GB of main
memory: ~$0.68 / hour
• Many apps now hold all data in-memory
Examples of Applications
Completely Run in Main Memory
MMOs
Virtual Worlds
In-Memory
Data Warehouses
In-Memory
Search Engines
Behavioral Simulations
In-Memory
OLTP Systems
Examples of Applications
Completely Run in Main Memory
• What they gain
MMOs
+ Low latency
+ High performance
Virtual Worlds
Behavioral Simulations
• The challenge they face
– Durability
In-Memory
Data Parallel Systems
In-Memory
Search Engines
In-Memory
OLTP Systems
Observation: Frequent Points of
Consistency
• State often consistent in main memory
Research
Challenge:
– Every few seconds or less
Canofwe
leverage
ofin
• Point
consistency:
Momentpoints
of execution
which
the state, if captured,
be used to
consistency
to can
achieve
restart the application in case of failure
•
durability with minimal
Examples
overhead
and
– At
the end of every time
steplatency?
in a game
– At the end of every batch of user operations in an
OLTP system such as H-Store
Talk Outline
• Motivation
• Approaches to Durability
• New Checkpointing Algorithms
– Wait-Free Zigzag
– Wait-Free Ping-Pong
• Experiments
• Conclusions
Classic Approaches to Durability
• General Approaches
Checkpointing
(+ logging)
Hot Stand-bys /
Replication
• Hot stand-bys / In-memory replication
– Faster recovery
– Higher resource utilization
– Problems with correlated failures (e.g. power loss)
• Checkpointing
– Uses fewer resources, can share warm stand-bys
– Longer recovery time
How to Design a Checkpointing
Mechanism for FC Applications?
• Approach: Take Consistent
Checkpoints frequently
– Leverage frequent points of consistency
– Bound recovery time by checkpointing
often
• Small overhead and low latency
critical
– Keep benefits of main memory
– Allow for low-overhead logical logging
Foundations: Consistent MainMemory Database Checkpointing
• Multiple methods studied in the past
for main memory databases
• Benchmark of main memory
checkpointing for MMOs [VLDB
2009]
Anatomy of Checkpointing
Algorithms
Points of
Consistency
• Two main threads
– Mutator: applies updates to
application state
– Asynchronous Writer: writes
checkpoints of state to disk
• Checkpoint period: Time between
start and end of a checkpoint
• A third thread, the Logger,
overlaps logical log writing with
Mutator processing
Prep
Ckpt
Checkpoint
Period
Prep
Ckpt
Existing Consistent Checkpointing
Algorithms
• Naive Snapshot (NS)
– Pauses application to make in-memory copy
– Good for very high update rates
• Copy on Update (COU)
– Uses copy-on-write snapshots
– Introduces locking overhead
– Good for low to moderate update rates
Naive Snapshot
Beginning of checkpoint period
AS
AScopy
Naive Snapshot: Latency
Expected Latency Profile
Copy On Update
Beginning of checkpoint period
AS
ASshadow
dirty
Copy On Update
Updates during first checkpoint period
42
Old Value
43
Old Value
AS
ASshadow
dirty
Copy On Update
End of checkpoint period
42
Old Value
43
Old Value
AS
ASshadow
dirty
Copy On Update: Latency
Expected Latency Profile
Foundations: Consistent MainMemory Database Checkpointing
• Multiple methods studied in the past
• Pros
for main memory databases
+ Conceptual simplicity
• We will+inCheaper
the following
than hot stand-bys
– Abstract their anatomy
• Cons
– Review the best methods for FC
– Latency
applications
/ Synchronization
(COU)
– Build –onLocking
this literature
and propose
new methods
Talk Outline
• Motivation
• Approaches to Durability
• New Checkpointing Algorithms
– Wait-Free Zigzag
– Wait-Free Ping-Pong
• Experiments
• Conclusions
New Algorithms
• Two new algorithms: Wait-Free Zigzag (ZZ)
and Wait-Free Ping-Pong (PP)
• Designed to eliminate latency and locking
– Track updates at word granularity
• No bulk copies
– Wait-Free within checkpoint period
• No locking
– Make bulk bit manipulation asynchronous (PP)
• Reduced synchronous overhead
Wait-Free Zigzag
Initialization
AS0
AS1
MR MW
Wait-Free Zigzag
Updates during a checkpoint period
42
43
AS0
AS1
MR MW
Wait-Free Zigzag
End of checkpoint period
42
43
AS0
AS1
MR MW
Wait-Free Zigzag: Latency
Implementation Goal: Minimize the spike
Expected Latency Profile
• Implementation: Optimized bulk bit array
negation
Wait-Free Ping-Pong
Initialization
AS
BO Odd
current
BE Even
Wait-Free Ping-Pong
Updates during first checkpoint period
42
43
44
AS
BO Odd
current
BE Even
Wait-Free Ping-Pong
End of checkpoint period
42
42
44
44
AS
BO Odd
current
BE Even
Wait-Free Ping-Pong
New updates in next checkpoint period
45
42
42
44
44
46
AS
BO Odd
BE Even
current
Wait-Free Ping-Pong: Latency
Implementation Goal: Get the line as low as possible
Expected Latency Profile
• Implementation: Cache-aware interleaving of
application state
What about Conceptual Simplicity?
• Flat latency is great, but we want conceptual
simplicity too
– While allowing for implementation trickery 
• See the pseudocode for Ping-Pong:
Mutator::HandleWrite(index,
newValue) {
// write value twice
state[index] = newValue;
current[index] = newValue;
currentBit[index] = 1;
}
Mutator::PrepareNextCkpt() {
// swap pointers
swap(previous,
current);
swap(previousBit,
currentBit);
}
Talk Outline
• Motivation
• Approaches to Durability
• New Checkpointing Algorithms
– Wait-Free Zigzag
– Wait-Free Ping-Pong
• Experiments
• Conclusions
Experimental Setup
• Several workloads: Synthetic and TPC-C
application
• Evaluated
– Overhead
– Latency Distribution over Time
• All methods write checkpoints of the whole
state
Tuning the algorithms
• Optimization and profiling with VTune
– Details on the paper
• Main memory optimizations
– Memory alignment  NS
– Bit array packing for negation  BACOU, BAZZ
– State interleaving for cache performance  IPP
• Disk writing optimizations
– Keep extra copy at Async Writer  IPP-Copy
– Merge with previous checkpoint  IPP-Merge
Experimental Setup for Synthetic
Workloads
• Zipfian and MMO synthetic workloads
– Will show Zipfian only
• Turned off Asynchronous Writer and Logger
• Checkpoint interval fixed
• Varied update rates and state size
• Hardware: Intel Xeon 5500 2.66 GHz with 12
GB RAM and 4 cores running CentOS
Overhead vs. Update Rate: Zipfian
Workload, 200MB application state
• Naive Snapshot (NS) insensitive to update rate
• Ping-Pong (IPP) has lower overhead for wide range of
update rates
Latency Distribution: Zipfian
Workload, 1.28M updates/sec
• Detailed view of latency over time for 1.28M updates/sec
• Ping-Pong (IPP) has flat and very low latency
Overhead vs. Application State Size:
Zipfian Workload
0.08% updates/sec
2.56% updates/sec
• Similar trends: Ping-Pong (IPP) and Naive Snapshot (NS)
best methods, difference depends on how much of the state
updated per second
TPC-C Application Experimental
Setup
• Taken methods best for high update rates
– Naive Snapshot & Ping-Pong
• Implemented Optimized ARIES variant
– Fuzzy checkpointing + physical redo logging
• Turned on Asynchronous Writer and Logger
– Checkpoint interval proportional to application
state size (# of warehouses)
• Hardware Amazon Cluster Compute
Quadruple Extra Large instance + RAIDs of
EBS volumes
Application Throughput: SingleThreaded, Main-Memory TPC-C
• Checkpoints taken as frequently as possible, Amazon EC2
• Ping-Pong (IPP-Copy) exhibits less than half the overhead of
Optimized ARIES
Application Latency: SingleThreaded, Main-Memory TPC-C
• Naive Snapshot still exhibits largest latency spikes
• Logging introduces additional spikes, worst with Optimized
ARIES
BRRL: Big Red Recovery Library
• Application-level library for fast checkpointing
of FC applications
• Schema-aware API to the programmer
Come see our demo!
Summary of Checkpoint-Recovery
for FC Applications
• Revisited main-memory checkpointing
– Memory large and abundant
– FC applications completely in-memory
• New consistent checkpointing algorithms:
Wait-Free Zigzag and Wait-Free Ping-Pong
– Allow for, but do not require, use of logical logging
– Main Results
• Nearly constant latency  ZZ and PP
• Up to order-of-magnitude lower overhead than
existing methods  PP
Thank you!
Backup Slides
Wait-Free Zigzag: What have we
achieved?
• No bulk copies of application state
– State distributed over two copies, word-level
granularity
• No locking
– One copy is never touched during checkpoint
period
• Bulk bit manipulation
– Still necessary, negation of MR into MW
Wait-Free Ping-Pong: What have we
achieved?
• No bulk copies of application state
– Write all updates twice, word-level granularity
• No locking
– Mutator itself writes all new values to buffers
• No (synchronous) bulk bit manipulation
– Clearing of buffer bit arrays can be done by
Asynchronous Writer
Wait-Free Zigzag
New updates in next checkpoint period
45
46
42
43
AS0
AS1
MR MW