Transcript Recovery in Main Memory Databases
Recovery in Main Memory Databases
-Le Gruenwald, Jing Huang, Margaret H. Dunham el al Engineering Intelligent Systems, Vol.4, No. 3, September 1996 이 인선 97/08/21
Introduction
General MMDB Architecture – Main Memory (MM) in RAM memory – Stable Memory(SM) optional nonvolatile memory used to hold log buffers(log tail) avoid I/O actions when transaction are committed essential to performance – Archive Memory(AM) holds a backup of the entire database focus on logging, checkpointing, reloading
MMDB Logging(1)
– physical logging the state of the database modified by an operation are logged it is recommended for MMDB systems – logical logging contains descriptions of higher level operations and records the state transition of the database the idempotent property does not hold
MMDB Logging(2)
Logging rules – Write Ahead Rule undo-log data must be written to a nonvolatile memory prior to the updating in the database – Commit rule if a DBMS allows a transaction to commit, the redo-log data of it should be ensured in nonvolatile storage – Logging After Writing the after image of an updated item should be written to the log after its corresponding update is propagated to the database simplifies the log processing with a fuzzy checkpointing MMDB
MMDB Logging(3)
MMDB logging differs from DRDB logging in three ways – a nonvolatile log buffer should be used to satisfy WAL without requiring I/O prior to transaction commit – physical logging is recommended as it is easier to use with fuzzy checkpointing – to reduce the amount of the log needed to redo transactions after a system failure, the LAW policy should be followed
Checkpointing DRDB
Commit consistent checkpointing – periodically stop processing transactions – flush all dirty cache slots and mark the log cache consistent checkpointing fuzzy checkpointing – only flushes those dirty slots that have not been flushed since before the previous checkpoint – normal replacement activity will flush most cache slots that were dirty since before the previous checkpoint – checkpoint won’t have much flushing to do and won’t delay active transaction for very long.
Checkpointing MMDBs(1)
Focuses on low-interference with normal transactions and supporting efficient recovery Fuzzy checkpointing – Hagmann first suggested using fuzzy checkpointing for MMDBs “a crash recovery scheme for a memory-resident database system” IEEE transactions on computers. Vol. C-35, No. 9, september 1986 the checkpointer does not need to obtain the locks on the data items to be checkpointed the database is dumped in sections after dumping a section, the checkpointer writes a log record to the log
a section must not overwrite its previous image (sliding monoplexed backups)
LAW with fuzzy checkpointing
Checkpointing MMDBs(2)
– Salem and Garcia-Molina “checkpointing memory-resident databases”(‘89) compared the fuzzy checkpointing scheme with two-non fuzzy checkpointing schemes fuzzy checkpointing is the most efficient one ping-pong scheme – each dirty page is flushed twice – Lin and Dunham “segmented fuzzy checkpointing for main memory databases”(‘94) checkpoints one segment at a time in a round-robin fashion automatically changes the segment boundaries based on the distribution of update operations
Checkpointing MMDBs(3)
3 1 2 4 B a1 C 1 b1 c1 B C 2 a2 b2 c2
Redo log size in the Segmented fuzzy checkpointing – Li et al “checkpointing and recovery in partitioned main memory databases(‘95) the database is divided into partitions, each of which has its own log disks the time to recover from a system failure is reduced
Checkpointing MMDBs(4)
Non-Fuzzy Checkpointing – overhead comes from locking the checkpointed objects to ensure transaction-consistency or action-consistency – Lehman and Carey “a recovery algorithm for a high-performance memory resident database system”(‘87) transaction-consistent(at relation level)scheme no need to maintain undo-log-records in nonvolatile storage checkpointing increases the data contention with normal transaction
Checkpointing MMDBs(5)
– Salem and Garcia-Molina “checkpointing memory-resient databases” (‘89) discuss two non-fuzzy checkpointing approaches – the first(black and white) one aborts some update transactions – the second(Copy-On-Update) one requires some update transactions storing the original values of data items to be updated – both have severe impact on the system performance – Jagadish et al “recovering from main-memory lapses” (‘93) propose an action-consistent checkpointing scheme the undo-logs of active transactions are first written to the log, and then dirty pages are flushed to disk during normal processing, the redo-logs of the committed transactions are written to the log ping-pong update this approach was originally used in Dali
Checkpointing MMDBs(6)
Log-driven checkpointing – applies the log to a previous dump to generate a new dump – originally used to generate remote backup of the database – is adopted to “incremental recovery in main memory database systems” (‘92) – with high transaction processing rate in MMDBs, the size of the log can increase rapidly – it is quite inefficient compared to fuzzy checkpointing
MMDB Reloading(1)
Issues –
occurrence frequency of the reload process
on average, a system failure occurs once every few weeks media failure, MM page faults –
when the system should resume its execution after a failure
28.43 minutes are needed to recover 1Giga DB [?] if the system is not available at all during recovery, many transactions will be backlogged –
reload prioritization
reload priority can be determined based on access frequency, transaction deadline(“MMDB reload algorithms”) or temporal data interval from real-time applications[?]
MMDB Reloading(2)
Existing reload schemes – simple reloading the system can not be brought online until the entire database is memory-resident – concurrent reloading Grenwald – “mmdb reload algorithms” (‘91) – two processors(RP & DP), nonvolatile shadow memory(SM) and dual address translation mechanism in the MARS system –
ordered reload with prioritization/ smart reload/ frequency reload
– the differences lie in the structure of AM, utilization of data access frequency, reload prioritization, and reload granularity – the frequency reload yields the best transaction response time and system throughput
MMDB Reloading(3)
Lehman – “a recovery algorithm for a high-preformance” – after the system catalogs and their indices are reloaded then regular transaction processing is allowed to resume Levy and Silberschatz – “incremental recovery in main memory database systems”, (‘92) – resume transaction processing immediately after a system failure and recovers pages individually according to the demand of post-crash transaction.
– Stale/fresh marking technique – in order to implement a page-based recovery, log records must be grouped together on a page basis during normal operation
Recovery with Existing MMDB Systems(1)
Dali from AT&T – the original recovery manager was implemented according to “recovering from main-memory lapses” (‘93) logging only redo records during normal execution segment-level action-consistent checkpoints checkpointer write to the disk relevant parts of the undo log recovery has only a single pass over the log require no special h/w to preserve the data – test led to a restructuring of its recovery manager “multi-level recovery in the Dali storage manager” (‘95) multi-level logging, post-commit actions, dirty page detection, and fuzzy checkpoints
Recovery with Existing MMDB Systems(2)
Fast Path – supports the memory-resident data and disk resident data – performs updates to memory resident data at commit time – no undo operations are required when a failure occurs – a group commit is adopted – transaction-consistent backup copy of the database is refreshed during system shutdown or infrequently checkpoints.
– Two backup database with ping-pong backups
Recovery with Existing MMDB Systems(3)
two real-time system examples NEC Real-Time DBMS Stone RTDB – NEC RTDBMS has several features to ensure high throughput and accurate predictability no page fault in-memory log buffer is nonvolatile physical logging using deferred update fuzzy checkpointing no real-time characteristics such as transaction deadline and criticalness are utilized in the recovery components
Summary and Conclusion
– Discussed 3 logging rules nonvolatile log buffer should be used to satisfy WAL without requiring I/O prior to transaction commit LAW should be followed to reduce the amount of log needed to redo transactions after a system failure – described three groups of checkpointing – identified 3 issues about reloading data should be prioritized for reload purposes – future research investigate how real-time requirements such as transaction deadline and temporal data intervals can be incorporated into MMDB recovery
a crash recovery scheme for a memory-resident database system
Robert B. Hagmann IEEE transactions on computers. Vol. C-35, No. 9, september 1986
overview
Presents a method of doing recovery that uses the existing techniques of fuzzy dumps and log compression design requirement – small system example 2 pages/transaction *100 transactions/s * 3600s /h * 8h = 5,760,000 pages written to the log – transaction size must be short – checkpointed periodically every five minutes
Overview(2)
– The principal requirement of the system is “fast” recovery from a system crash critical factor : transfer rate of the disk can be improved by using several parallel processors design overview – fuzzy dump simply a copy of the database taken without any synchronization – If a DBMS uses a nonvolatile storage, some log compression can occur – else precommitting and group commits can be used to increase performance
Design details