Transcript Document
Log File Management in DB2 for Linux, UNIX, and Windows Ron Castelletto IBM Canada Lab [email protected] March 1, 2009 © 2009 IBM Corporation Information Management – DB2 Agenda How does DB2 ensures no data loss on crash ? Log file management Log file archiving Removing single points of failure 2 © 2009 IBM Corporation Information Management – DB2 Database Fundamentals The ACID properties – Atomicity – all actions in the TXN happen or none happen – Consistency – if each TXN is consistent, and the DB starts as consistent, the DB ends up consistent – Isolation – execution of one TXN is isolated from that of other TXNs – Durability – if a TXN commit, its effects persists The Recovery Manager ensure Atomicity and Durability 3 © 2009 IBM Corporation Information Management – DB2 Atomicity and Durability Atomicity – Transactions may abort (rollback) Durability – What if DB2 stops running (eg, power failure) ? After system crashes – T1, T2, T3 should be durable – T4 and T5 should abort 4 © 2009 IBM Corporation Information Management – DB2 Properties of Database Transactions Concurrency control is in effect – Share (read) and exclusive (write) locking Updates happen in place – Data is overwritten on disk A simple solution for atomicity and durability ? – Use a “force” and “no steal” buffer pool 5 © 2009 IBM Corporation Information Management – DB2 The Buffer Pool “Force” write of data pages to disk at commit – Provides durability – But poor performance “No stealing” of pages from uncommited Txns – However, “no steal” implies poor performance – But, if 'steal' allowed, how to ensure atomicity ? No Steal Steal Force No Force 6 © 2009 IBM Corporation Information Management – DB2 Forcing and Stealing Pages No Force (why Durability is hard) – Reminder: “Force” is writing all modified pages for this TXN at commit time – What is system crashes before all pages written ? – Need to know what changes were lost before I/O completed so can redo them. Steal (why enforcing Atomicity is hard) – Reminder: “Steal” is allowing a new TXN to steal a slot in the buffer pool, ie: write a modified page to disk before the TXN commits – What is the TXN aborts ? Need to know what modification were written to disk so can undo the change. Solution: write log records to support redo/undo of changes 7 © 2009 IBM Corporation Information Management – DB2 Logging DB2 records all modifications for every update in log records – Log records are written sequentially to log files(s) – Should be placed on a different disk – Log record should contain as little possible (old/new data) – Multiples log records fit on a single log page – Log records represent an ordered list of update TXNs 8 © 2009 IBM Corporation Information Management – DB2 Write Ahead Logging Write Ahead Logging Protocol – 1) Log records for an update must be written to disk, before the corresponding data page is written – 2) write all log records for a TXN when it commits 1) guarantees Atomicity 2) guarantees Durability 9 © 2009 IBM Corporation Information Management – DB2 Bufferpool and Log buffer – all log records flushed Read Only Committed Update Uncommitted Update Min Buff LSN Low Tran LSN Log Buffer DB2 Agent 200 new 300 new 200 old new 300 old old new new 300 301 old new 201 old new Buffer pools DB2 Agent 201 new 301 new 100 101 I/O Servers SQLOGCTL.LFH.1 SQLOGCTL.LFH.2 commit Page Cleaners Log Control File Logger Table space Containers S0000020.log 100 300 old 101 301 old 201 new 200 new 200 old new 300 old new S0000021.log 301 old new 201 old new commit Low Tran LSN = Oldest Uncommited Log Record Min Buff LSN = Oldest Changed Page in Buffer pool 10 Active Log Directory © 2009 IBM Corporation Information Management – DB2 Bufferpool and Log buffer – some logrecs flushed Read Only Committed Update Uncommitted Update Min Buff LSN Low Tran LSN Log Buffer DB2 Agent 200 new 300 new 200 old new Buffer pools DB2 Agent 201 new 301 new 100 101 I/O Servers 301 old new 300 old old new new 300 commit SQLOGCTL.LFH.1 SQLOGCTL.LFH.2 201 old new Page Cleaners Log Control File Logger Table space Containers S0000020.log 100 300 old 101 301 old 201 old 200 old 200 old new 300 old new S0000021.log 301 old new commit Low Tran LSN = Oldest Uncommited Log Record Min Buff LSN = Oldest Changed Page in Buffer pool 11 Active Log Directory © 2009 IBM Corporation Information Management – DB2 ERROR - commit not flushed Read Only Committed Update Uncommitted Update Min Buff LSN Low Tran LSN Log Buffer DB2 Agent 200 new 300 new 200 old new 300 old old new new 300 301 old new 201 old new Buffer pools DB2 Agent 201 new 301 new 100 101 I/O Servers SQLOGCTL.LFH.1 SQLOGCTL.LFH.2 commit Page Cleaners Log Control File Logger Table space Containers S0000020.log 100 300 new 101 301 new 201 new 200 new 200 old new 300 old new S0000021.log 301 old new Low Tran LSN = Oldest Uncommited Log Record Min Buff LSN = Oldest Changed Page in Buffer pool 12 Active Log Directory © 2009 IBM Corporation Information Management – DB2 ERROR – page flushed before log record (row 201) Read Only Committed Update Uncommitted Update Min Buff LSN Low Tran LSN Log Buffer DB2 Agent 200 new 300 new 200 old new Buffer pools DB2 Agent 201 new 301 new 100 101 I/O Servers 301 old new 300 old old new new 300 commit SQLOGCTL.LFH.1 SQLOGCTL.LFH.2 201 old new Page Cleaners Log Control File Logger Table space Containers S0000020.log 100 300 new 101 301 new 201 new 200 new 200 old new 300 old new S0000021.log 301 old new commit Low Tran LSN = Oldest Uncommited Log Record Min Buff LSN = Oldest Changed Page in Buffer pool 13 Active Log Directory © 2009 IBM Corporation Information Management – DB2 Log Space Reservation S0000020.log 1000 S0000021.log 1001 1000 S0000023.log S0000024.log S0000022.log 1000 S0000025.log 1000 Active log directory Other Transactions (committed) Transaction 1000 14 1001 © 2009 IBM Corporation Information Management – DB2 Out Of Log Space S0000020.log 1000 1000 S0000023.log S0000021.log 1001 1000 S0000024.log 1000 S0000022.log Reserved S0000024.log Reserved Active log directory Other Transactions (committed) Transaction 1000 15 1001 Transaction 1000 and 1001 (reserved) © 2009 IBM Corporation Information Management – DB2 MAX_LOG database configuration parameter S0000020.log S0000021.log S0000022.log 1000 1000 S0000023.log 1000 1000 S0000024.log S0000024.log 1000 Active log directory Other Transactions Transaction 1000 12 MB Limit -964 sqlcode max_log * (logfilsiz * 4096 *logprimary) / 100 50 * (1000 * 4096 * 6 ) / 100 = 12 MB 16 Database CFG logfilsiz logprimary max_log 1000 6 50 © 2009 IBM Corporation Information Management – DB2 NUM_LOG_SPAN database cfg parameter S0000020.log S0000022.log S0000021.log S0000023.log 1000 1000 num_log_span = 2 Active log directory db2 Force application -964 SQLCODE Other Transactions Transaction 1000 Database CFG logfilsiz 1000 logprimary 4 num_log_span 2 17 © 2009 IBM Corporation Information Management – DB2 Infinite active logs - LOGSECOND = -1 Database Configuration Log CTL File LOGSECOND = -1 LOGPRIMARY= 6 LOGARCHMETH1=DISK:/db/archive OVERFLOWLOGPATH = /db/tp1/ologs FIRST ACTIVE LOG = S0000100.LOG Low Tran LSN Active log directory LOG 121 LOG 122 LOG 123 LOG 124 LOG 125 LOG 126 Log Manager Active Logs Archived logs 18 Archived logs LOG 116 LOG 117 LOG 118 LOG 119 LOG 120 LOG 121 LOG 110 LOG 111 LOG 112 LOG 113 LOG 114 LOG 115 LOG 104 LOG 105 LOG 106 LOG 107 LOG 108 LOG 109 LOG 98 LOG 99 LOG 100 LOG 101 LOG 102 LOG 103 © 2009 IBM Corporation Information Management – DB2 Controlling Minbuff - SOFTMAX Specifies the percentage of the logfilsiz when a soft checkpoint will: – Write the log control file to disk (Sqlogctl.lfh) – Call an asynchronous page cleaner (Log space page cleaners) Range: 1 to 100 * number of primary logs. Default = 100 (soft checkpoint in every logfile). Lower value will reduce the time required to restart a database during crash recovery. The smaller the number, the greater overhead of normal database logging activity due to page clean activity. 19 © 2009 IBM Corporation Information Management – DB2 Database crash recovery - Softmax = 200 1010 1008 1009 1004 1007 1005 1006 Page Cleaner 1003 Page Cleaner 1002 Tablespace Containers 1002 Buffer Pools 1000 1001 S0000020.log 1002 1003 Change Written to Disk xxxx Change NOT Written to Disk 200% * ( 1000 * 4K ) = 8MB LOGFILSIZ = 1000 1001 xxxx 1004 S0000021.log 1005 1006 1007 1008 S0000022.log 1009 1010 Active log directory SOFTMAX = 200 20 Current Log CTL File Updates to Log CTL 1001 Min Buff LSN © 2009 IBM Corporation Information Management – DB2 Database crash recovery - Softmax = 50 1010 1008 Page Cleaner 1009 Page Cleaner 1008 Table space Containers Buffer Pools 1007 1004 1002 1001 1006 1005 1003 1000 xxxx Change Written to Disk xxxx Change NOT Written to Disk LOGFILSIZ = 1000 50% * ( 1000 * 4K ) = 2MB S0000020.log 1001 1002 1003 1004 S0000021.log 1005 1006 1007 1008 S0000022.log 1009 1010 Active log directory SOFTMAX = 50 21 Current Updates to Log CTL Log CTL File 1007 Min Buff LSN © 2009 IBM Corporation Information Management – DB2 DB2 log archive processing S0000021.log S0000020.log 1000 1001 1000 1001 1002 S0000022.log 1003 1002 1003 1003 Active log directory 1 2 3 transaction 1003 transaction 1000 transaction 1001 4 transaction 1002 DB2 Log Manager Point 1 Activity Action Taken Log S0000020 full Log Manager archives Log 20. Transaction 1000 is oldest current transaction (lowtran lsn). Cannot rename log 20. Point 2 Transaction 1000 commits Transaction 1000 Commits Log S0000021 full Point 3 Point 4 22 Transaction 1002 commits Log 20 no longer an active log, can be renamed to Log 23. Log Manager archives Log 21. Transaction 1002 is oldest current transaction (lowtran lsn). Cannot rename log 21. Log S0000021 can be renamed to Log 24 © 2009 IBM Corporation Information Management – DB2 DB2 log archive processing - Backlog S0000021.log S0000020.log 1000 1001 1000 1001 1002 S0000022.log 1003 1000 1003 1003 Active log directory 1 2 3 4 transaction 1000 transaction 1001 transaction 1002 transaction 1003 DB2 Log Manager 23 Activity Action Taken Point 1 Log S0000020 full Log Manager begins to Archive Log 20 but FAILS. Transaction 1000 is oldest current transaction (lowtran lsn) Point 2 Log S0000021 full Log Manager does not Archive log 21 until 20 is done. Point 3 Transaction 1000 Commits Log 20 not active log and not archived. Txn 1003 is lowtran lsn. Log 21 is first active log. Log 23 is created Point 4 Active log dir contains 4 logs Log 20 is not active, and unarchived. Logs 21-23 are active logs. © 2009 IBM Corporation Information Management – DB2 What if Archiving does not work for a long time? Eventually, the filesystem becomes full .... DB2 cannot pre-allocate log files to maintain LOGPRIMARY ... Later, run out of disk space provided by preallocated log files. This scenario is 'out of disk space', distinct from 'out of log space' Default behaviour, database comes down with severe error How to avoid this ? 2 solutions ... 24 © 2009 IBM Corporation Information Management – DB2 Log disk full condition handling ..... BLK_LOG_DSK_FUL ..... = YES If NO (default) CFG DB2 Database Log Buffer DISK FULL! Database comes down with severe error (out of disk space) If YES Active Log Files Active log path Update transactions wait for new log Read only transactions continue Write message to db2diag.log Wait 5 minutes and retry creating new log Check for completed log archive and rename log 25 Inactive Unarchived Logs DB2 Log Manager Offline Archive Log Files © 2009 IBM Corporation Information Management – DB2 DB2 log archive processing - FAILARCHPATH S0000020.log S0000021.log S0000022.log 1000 1001 1001 1000 1002 1003 1000 1003 1003 Active log directory 1 S0000020.log 1000 1001 1001 2 4 transaction 1000 transaction 1001 transaction 1002 FAILARCH directory 26 3 transaction 1003 DB2 Log Manager Activity Action Taken Point 1 Log S0000020 full Log Manager begins to archive Log 20 but FAILS. Transaction 1000 is oldest current transaction (lowtran lsn) Point 2 Log S0000021 full Log Manager does not archive log 21 until 20 is done. Point 3 Transaction 1000 Commits Point 4 Archiving works again. Log 20 no longer an active log, but is not archived. Txn 1003 is lowtran lsn. Log 21 is first active log. Log 20 is moved to FAILPATH. Log 23 created. Log 20 is archived from FAILPATH, Log 21 is archived © 2009 IBM Corporation from active log dir. Information Management – DB2 Protecting the DB2 logs DB2 Database Recovery from Log media failure: – 1. Log Buffer Operating System facilities - Software RAID and Disk Mirrors Operating System Mirroring or Hardware RAID RAID Hardware: RAID 1 - Mirroring RAID 5 - Parity Based DB2 Logs Mirror Copy Primary Disk Mirror Disk No protection from accidental deletion of active logs. 27 © 2009 IBM Corporation Information Management – DB2 Protecting the DB2 Active logs - Log mirroring CFG DB2 Database ..... LOGPATH = /database/tp1/logs MIRRORLOGPATH = /database/tp1/logmir ..... Log Buffer /database/tp1/logs /database/tp1/logmir DB2 Active Logs copy 1 DB2 Active Logs copy 2 Protection from accidental deletion of active logs. 28 © 2009 IBM Corporation Information Management – DB2 Archiving logs to Local and Remote disks db2 update db cfg for salesdb using logarchmeth1 disk:/dbarchlcl db2 update db cfg for salesdb using logarchmeth2 disk:/dbarchdr Path Includes: – Instance Name – Database Name – Database Partition (NODExxxx) salesdb Database Database Directory Table Spaces DB CFG Active Logs Recovery History SYS1 db2 Log Manager SYS2 Remote Archive logs /dbarchdr/INST1/SALESDB/NODE0000 Local Archive logs /dbarchlcl/INST1/SALESDB/NODE0000 29 © 2009 IBM Corporation Log File Management in DB2 for Linux, UNIX, and Windows THE END Ron Castelletto IBM Canada Lab [email protected] March 1, 2009 © 2009 IBM Corporation