Recovery in Main Memory Databases

Transcript Recovery in Main Memory Databases

Recovery in Main Memory Databases

-Le Gruenwald, Jing Huang, Margaret H. Dunham el al Engineering Intelligent Systems, Vol.4, No. 3, September 1996 이 인선 97/08/21

Introduction

  General MMDB Architecture – Main Memory (MM) in RAM memory – Stable Memory(SM)  optional nonvolatile memory  used to hold log buffers(log tail)  avoid I/O actions when transaction are committed  essential to performance – Archive Memory(AM) holds a backup of the entire database focus on logging, checkpointing, reloading

MMDB Logging(1)

– physical logging  the state of the database modified by an operation are logged  it is recommended for MMDB systems – logical logging  contains descriptions of higher level operations and records the state transition of the database  the idempotent property does not hold

MMDB Logging(2)

 Logging rules – Write Ahead Rule  undo-log data must be written to a nonvolatile memory prior to the updating in the database – Commit rule  if a DBMS allows a transaction to commit, the redo-log data of it should be ensured in nonvolatile storage – Logging After Writing  the after image of an updated item should be written to the log after its corresponding update is propagated to the database  simplifies the log processing with a fuzzy checkpointing MMDB

MMDB Logging(3)

 MMDB logging differs from DRDB logging in three ways – a nonvolatile log buffer should be used to satisfy WAL without requiring I/O prior to transaction commit – physical logging is recommended as it is easier to use with fuzzy checkpointing – to reduce the amount of the log needed to redo transactions after a system failure, the LAW policy should be followed

Checkpointing DRDB

   Commit consistent checkpointing – periodically stop processing transactions – flush all dirty cache slots and mark the log cache consistent checkpointing fuzzy checkpointing – only flushes those dirty slots that have not been flushed since before the previous checkpoint – normal replacement activity will flush most cache slots that were dirty since before the previous checkpoint – checkpoint won’t have much flushing to do and won’t delay active transaction for very long.

Checkpointing MMDBs(1)

  Focuses on low-interference with normal transactions and supporting efficient recovery Fuzzy checkpointing – Hagmann  first suggested using fuzzy checkpointing for MMDBs  “a crash recovery scheme for a memory-resident database system”  IEEE transactions on computers. Vol. C-35, No. 9, september 1986  the checkpointer does not need to obtain the locks on the data items to be checkpointed  the database is dumped in sections  after dumping a section, the checkpointer writes a log record to the log 

a section must not overwrite its previous image (sliding monoplexed backups)

LAW with fuzzy checkpointing

Checkpointing MMDBs(2)

– Salem and Garcia-Molina  “checkpointing memory-resident databases”(‘89)  compared the fuzzy checkpointing scheme with two-non fuzzy checkpointing schemes  fuzzy checkpointing is the most efficient one  ping-pong scheme – each dirty page is flushed twice – Lin and Dunham  “segmented fuzzy checkpointing for main memory databases”(‘94)  checkpoints one segment at a time in a round-robin fashion  automatically changes the segment boundaries based on the distribution of update operations

Checkpointing MMDBs(3)

3 1 2 4 B a1 C 1 b1 c1 B C 2 a2 b2 c2

Redo log size in the Segmented fuzzy checkpointing – Li et al  “checkpointing and recovery in partitioned main memory databases(‘95)  the database is divided into partitions, each of which has its own log disks  the time to recover from a system failure is reduced

Checkpointing MMDBs(4)

 Non-Fuzzy Checkpointing – overhead comes from locking the checkpointed objects to ensure transaction-consistency or action-consistency – Lehman and Carey  “a recovery algorithm for a high-performance memory resident database system”(‘87)  transaction-consistent(at relation level)scheme  no need to maintain undo-log-records in nonvolatile storage  checkpointing increases the data contention with normal transaction

Checkpointing MMDBs(5)

– Salem and Garcia-Molina  “checkpointing memory-resient databases” (‘89)  discuss two non-fuzzy checkpointing approaches – the first(black and white) one aborts some update transactions – the second(Copy-On-Update) one requires some update transactions storing the original values of data items to be updated – both have severe impact on the system performance – Jagadish et al  “recovering from main-memory lapses” (‘93)  propose an action-consistent checkpointing scheme  the undo-logs of active transactions are first written to the log, and then dirty pages are flushed to disk  during normal processing, the redo-logs of the committed transactions are written to the log  ping-pong update  this approach was originally used in Dali

Checkpointing MMDBs(6)

 Log-driven checkpointing – applies the log to a previous dump to generate a new dump – originally used to generate remote backup of the database – is adopted to “incremental recovery in main memory database systems” (‘92) – with high transaction processing rate in MMDBs, the size of the log can increase rapidly – it is quite inefficient compared to fuzzy checkpointing

MMDB Reloading(1)

 Issues –

occurrence frequency of the reload process

 on average, a system failure occurs once every few weeks  media failure, MM page faults –

when the system should resume its execution after a failure

 28.43 minutes are needed to recover 1Giga DB [?]  if the system is not available at all during recovery, many transactions will be backlogged –

reload prioritization

 reload priority can be determined based on access frequency, transaction deadline(“MMDB reload algorithms”) or temporal data interval from real-time applications[?]

MMDB Reloading(2)

 Existing reload schemes – simple reloading  the system can not be brought online until the entire database is memory-resident – concurrent reloading  Grenwald – “mmdb reload algorithms” (‘91) – two processors(RP & DP), nonvolatile shadow memory(SM) and dual address translation mechanism in the MARS system –

ordered reload with prioritization/ smart reload/ frequency reload

– the differences lie in the structure of AM, utilization of data access frequency, reload prioritization, and reload granularity – the frequency reload yields the best transaction response time and system throughput

MMDB Reloading(3)

 Lehman – “a recovery algorithm for a high-preformance” – after the system catalogs and their indices are reloaded then regular transaction processing is allowed to resume  Levy and Silberschatz – “incremental recovery in main memory database systems”, (‘92) – resume transaction processing immediately after a system failure and recovers pages individually according to the demand of post-crash transaction.

– Stale/fresh marking technique – in order to implement a page-based recovery, log records must be grouped together on a page basis during normal operation



Recovery with Existing MMDB Systems(1)

Dali from AT&T – the original recovery manager was implemented according to “recovering from main-memory lapses” (‘93)  logging only redo records during normal execution  segment-level action-consistent checkpoints  checkpointer write to the disk relevant parts of the undo log  recovery has only a single pass over the log  require no special h/w to preserve the data – test led to a restructuring of its recovery manager  “multi-level recovery in the Dali storage manager” (‘95)  multi-level logging, post-commit actions, dirty page detection, and fuzzy checkpoints

Recovery with Existing MMDB Systems(2)

 Fast Path – supports the memory-resident data and disk resident data – performs updates to memory resident data at commit time – no undo operations are required when a failure occurs – a group commit is adopted – transaction-consistent backup copy of the database is refreshed during system shutdown or infrequently checkpoints.

– Two backup database with ping-pong backups



Recovery with Existing MMDB Systems(3)

two real-time system examples  NEC Real-Time DBMS  Stone RTDB – NEC RTDBMS has several features to ensure high throughput and accurate predictability  no page fault  in-memory log buffer is nonvolatile  physical logging using deferred update  fuzzy checkpointing  no real-time characteristics such as transaction deadline and criticalness are utilized in the recovery components

Summary and Conclusion

– Discussed 3 logging rules  nonvolatile log buffer should be used to satisfy WAL without requiring I/O prior to transaction commit  LAW should be followed to reduce the amount of log needed to redo transactions after a system failure – described three groups of checkpointing – identified 3 issues about reloading  data should be prioritized for reload purposes – future research  investigate how real-time requirements such as transaction deadline and temporal data intervals can be incorporated into MMDB recovery

a crash recovery scheme for a memory-resident database system

Robert B. Hagmann IEEE transactions on computers. Vol. C-35, No. 9, september 1986

overview

  Presents a method of doing recovery that uses the existing techniques of fuzzy dumps and log compression design requirement – small system example  2 pages/transaction *100 transactions/s * 3600s /h * 8h = 5,760,000 pages written to the log – transaction size must be short – checkpointed periodically every five minutes

Overview(2)

 – The principal requirement of the system is “fast” recovery from a system crash  critical factor : transfer rate of the disk  can be improved by using several parallel processors design overview – fuzzy dump  simply a copy of the database taken without any synchronization – If a DBMS uses a nonvolatile storage, some log compression can occur – else precommitting and group commits can be used to increase performance

 Design details

Recovery in Main Memory Databases

Transcript Recovery in Main Memory Databases