Transcript Document

Log File Management in DB2
for Linux, UNIX, and Windows
Ron Castelletto
IBM Canada Lab
[email protected]
March 1, 2009
© 2009 IBM Corporation
Information Management – DB2
Agenda
 How does DB2 ensures no data loss on crash ?
 Log file management
 Log file archiving
 Removing single points of failure
2
© 2009 IBM Corporation
Information Management – DB2
Database Fundamentals
 The ACID properties
– Atomicity – all actions in the TXN happen or none happen
– Consistency – if each TXN is consistent, and the DB starts
as consistent, the DB ends up consistent
– Isolation – execution of one TXN is isolated from that of
other TXNs
– Durability – if a TXN commit, its effects persists
 The Recovery Manager ensure Atomicity and
Durability
3
© 2009 IBM Corporation
Information Management – DB2
Atomicity and Durability
 Atomicity
– Transactions may abort (rollback)
 Durability
– What if DB2 stops running (eg, power failure) ?
 After system crashes
– T1, T2, T3 should be durable
– T4 and T5 should abort
4
© 2009 IBM Corporation
Information Management – DB2
Properties of Database Transactions
 Concurrency control is in effect
– Share (read) and exclusive (write) locking
 Updates happen in place
– Data is overwritten on disk
 A simple solution for atomicity and durability ?
– Use a “force” and “no steal” buffer pool
5
© 2009 IBM Corporation
Information Management – DB2
The Buffer Pool
 “Force” write of data pages to disk at
commit
– Provides durability
– But poor performance
 “No stealing” of pages from uncommited
Txns
– However, “no steal” implies poor performance
– But, if 'steal' allowed, how to ensure atomicity ?
No Steal
Steal
Force
No Force
6
© 2009 IBM Corporation
Information Management – DB2
Forcing and Stealing Pages
 No Force (why Durability is hard)
– Reminder: “Force” is writing all modified pages for this
TXN at commit time
– What is system crashes before all pages written ?
– Need to know what changes were lost before I/O
completed so can redo them.
 Steal (why enforcing Atomicity is hard)
– Reminder: “Steal” is allowing a new TXN to steal a slot in
the buffer pool, ie: write a modified page to disk before the
TXN commits
– What is the TXN aborts ? Need to know what modification
were written to disk so can undo the change.
 Solution: write log records to support redo/undo of
changes
7
© 2009 IBM Corporation
Information Management – DB2
Logging
 DB2 records all modifications for every update in
log records
– Log records are written sequentially to log files(s)
– Should be placed on a different disk
– Log record should contain as little possible (old/new data)
– Multiples log records fit on a single log page
– Log records represent an ordered list of update TXNs
8
© 2009 IBM Corporation
Information Management – DB2
Write Ahead Logging
 Write Ahead Logging Protocol
– 1) Log records for an update must be written to disk,
before the corresponding data page is written
– 2) write all log records for a TXN when it commits
 1) guarantees Atomicity
 2) guarantees Durability
9
© 2009 IBM Corporation
Information Management – DB2
Bufferpool and Log buffer – all log records flushed
Read Only
Committed Update
Uncommitted Update
Min Buff LSN
Low Tran LSN
Log Buffer
DB2 Agent
200 new
300 new
200 old new
300 old
old new
new
300
301 old new
201 old new
Buffer pools
DB2 Agent
201 new
301 new
100
101
I/O Servers
SQLOGCTL.LFH.1
SQLOGCTL.LFH.2
commit
Page Cleaners
Log Control
File
Logger
Table space Containers
S0000020.log
100
300 old
101
301 old
201 new
200 new
200 old new
300 old new
S0000021.log
301 old new
201 old new
commit
Low Tran LSN = Oldest Uncommited Log Record
Min Buff LSN = Oldest Changed Page in Buffer pool
10
Active Log Directory
© 2009 IBM Corporation
Information Management – DB2
Bufferpool and Log buffer – some logrecs flushed
Read Only
Committed Update
Uncommitted Update
Min Buff LSN
Low Tran LSN
Log Buffer
DB2 Agent
200 new
300 new
200 old new
Buffer pools
DB2 Agent
201 new
301 new
100
101
I/O Servers
301 old new
300 old
old new
new
300
commit
SQLOGCTL.LFH.1
SQLOGCTL.LFH.2
201 old new
Page Cleaners
Log Control
File
Logger
Table space Containers
S0000020.log
100
300 old
101
301 old
201 old
200 old
200 old new
300 old new
S0000021.log
301 old new
commit
Low Tran LSN = Oldest Uncommited Log Record
Min Buff LSN = Oldest Changed Page in Buffer pool
11
Active Log Directory
© 2009 IBM Corporation
Information Management – DB2
ERROR - commit not flushed
Read Only
Committed Update
Uncommitted Update
Min Buff LSN
Low Tran LSN
Log Buffer
DB2 Agent
200 new
300 new
200 old new
300 old
old new
new
300
301 old new
201 old new
Buffer pools
DB2 Agent
201 new
301 new
100
101
I/O Servers
SQLOGCTL.LFH.1
SQLOGCTL.LFH.2
commit
Page Cleaners
Log Control
File
Logger
Table space Containers
S0000020.log
100
300 new
101
301 new
201 new
200 new
200 old new
300 old new
S0000021.log
301 old new
Low Tran LSN = Oldest Uncommited Log Record
Min Buff LSN = Oldest Changed Page in Buffer pool
12
Active Log Directory
© 2009 IBM Corporation
Information Management – DB2
ERROR – page flushed before log record (row 201)
Read Only
Committed Update
Uncommitted Update
Min Buff LSN
Low Tran LSN
Log Buffer
DB2 Agent
200 new
300 new
200 old new
Buffer pools
DB2 Agent
201 new
301 new
100
101
I/O Servers
301 old new
300 old
old new
new
300
commit
SQLOGCTL.LFH.1
SQLOGCTL.LFH.2
201 old new
Page Cleaners
Log Control
File
Logger
Table space Containers
S0000020.log
100
300 new
101
301 new
201 new
200 new
200 old new
300 old new
S0000021.log
301 old new
commit
Low Tran LSN = Oldest Uncommited Log Record
Min Buff LSN = Oldest Changed Page in Buffer pool
13
Active Log Directory
© 2009 IBM Corporation
Information Management – DB2
Log Space Reservation
S0000020.log
1000
S0000021.log
1001
1000
S0000023.log
S0000024.log
S0000022.log
1000
S0000025.log
1000
Active log directory
Other Transactions (committed)
Transaction 1000
14
1001
© 2009 IBM Corporation
Information Management – DB2
Out Of Log Space
S0000020.log
1000
1000
S0000023.log
S0000021.log
1001
1000
S0000024.log
1000
S0000022.log
Reserved
S0000024.log
Reserved
Active log directory
Other Transactions (committed)
Transaction 1000
15
1001
Transaction 1000 and 1001 (reserved)
© 2009 IBM Corporation
Information Management – DB2
MAX_LOG database configuration parameter
S0000020.log
S0000021.log
S0000022.log
1000
1000
S0000023.log
1000
1000
S0000024.log
S0000024.log
1000
Active log directory
Other Transactions
Transaction 1000 12 MB Limit
-964
sqlcode
max_log * (logfilsiz * 4096 *logprimary) / 100
50 * (1000 * 4096 * 6 ) / 100 = 12 MB
16
Database CFG
logfilsiz
logprimary
max_log
1000
6
50
© 2009 IBM Corporation
Information Management – DB2
NUM_LOG_SPAN database cfg parameter
S0000020.log
S0000022.log
S0000021.log
S0000023.log
1000
1000
num_log_span = 2
Active log directory
db2 Force
application
-964
SQLCODE
Other Transactions
Transaction
1000
Database CFG
logfilsiz
1000
logprimary
4
num_log_span
2
17
© 2009 IBM Corporation
Information Management – DB2
Infinite active logs - LOGSECOND = -1
Database Configuration
Log CTL File
LOGSECOND = -1
LOGPRIMARY= 6
LOGARCHMETH1=DISK:/db/archive
OVERFLOWLOGPATH = /db/tp1/ologs
FIRST ACTIVE LOG = S0000100.LOG
Low Tran LSN
Active log directory
LOG 121
LOG 122
LOG 123
LOG 124
LOG 125
LOG 126
Log
Manager
Active Logs
Archived logs
18
Archived logs
LOG 116
LOG 117
LOG 118
LOG 119
LOG 120
LOG 121
LOG 110
LOG 111
LOG 112
LOG 113
LOG 114
LOG 115
LOG 104
LOG 105
LOG 106
LOG 107
LOG 108
LOG 109
LOG 98
LOG 99
LOG 100
LOG 101
LOG 102
LOG 103
© 2009 IBM Corporation
Information Management – DB2
Controlling Minbuff - SOFTMAX
 Specifies the percentage of the logfilsiz when a soft
checkpoint will:
– Write the log control file to disk (Sqlogctl.lfh)
– Call an asynchronous page cleaner (Log space page cleaners)
 Range: 1 to 100 * number of primary logs.
 Default = 100 (soft checkpoint in every logfile).
 Lower value will reduce the time required to restart a
database during crash recovery.
 The smaller the number, the greater overhead of normal
database logging activity due to page clean activity.
19
© 2009 IBM Corporation
Information Management – DB2
Database crash recovery - Softmax = 200
1010
1008
1009
1004
1007
1005
1006
Page Cleaner
1003
Page Cleaner
1002
Tablespace
Containers
1002
Buffer Pools
1000
1001
S0000020.log
1002
1003
Change Written to Disk
xxxx
Change NOT Written to Disk
200% * ( 1000 * 4K ) = 8MB
LOGFILSIZ = 1000
1001
xxxx
1004
S0000021.log
1005
1006
1007
1008
S0000022.log
1009
1010
Active log directory
SOFTMAX = 200
20
Current
Log CTL File
Updates to Log CTL
1001
Min Buff LSN
© 2009 IBM Corporation
Information Management – DB2
Database crash recovery - Softmax = 50
1010
1008
Page Cleaner
1009
Page Cleaner
1008
Table space
Containers
Buffer Pools
1007
1004
1002
1001
1006
1005
1003
1000
xxxx
Change Written to Disk
xxxx
Change NOT Written to Disk
LOGFILSIZ = 1000
50% * ( 1000 * 4K ) = 2MB
S0000020.log
1001
1002
1003
1004
S0000021.log
1005
1006
1007
1008
S0000022.log
1009
1010
Active log directory
SOFTMAX = 50
21
Current
Updates to Log CTL
Log CTL File
1007
Min Buff LSN
© 2009 IBM Corporation
Information Management – DB2
DB2 log archive processing
S0000021.log
S0000020.log
1000
1001
1000
1001
1002
S0000022.log
1003
1002
1003
1003
Active log directory
1
2
3
transaction 1003
transaction 1000
transaction 1001
4
transaction 1002
DB2 Log Manager
Point 1
Activity
Action Taken
Log S0000020 full
Log Manager archives Log 20. Transaction 1000 is oldest
current transaction (lowtran lsn). Cannot rename log 20.
Point 2
Transaction 1000 commits
Transaction 1000 Commits
Log S0000021 full
Point 3
Point 4
22
Transaction 1002 commits
Log 20 no longer an active log, can be renamed to Log 23.
Log Manager archives Log 21. Transaction 1002 is oldest
current transaction (lowtran lsn). Cannot rename log 21.
Log S0000021 can be renamed to Log 24
© 2009 IBM Corporation
Information Management – DB2
DB2 log archive processing - Backlog
S0000021.log
S0000020.log
1000
1001
1000
1001
1002
S0000022.log
1003
1000
1003
1003
Active log directory
1
2
3
4
transaction 1000
transaction 1001
transaction 1002
transaction 1003
DB2 Log Manager
23
Activity
Action Taken
Point 1
Log S0000020 full
Log Manager begins to Archive Log 20 but FAILS.
Transaction 1000 is oldest current transaction (lowtran lsn)
Point 2
Log S0000021 full
Log Manager does not Archive log 21 until 20 is done.
Point 3
Transaction 1000 Commits
Log 20 not active log and not archived. Txn 1003 is lowtran
lsn. Log 21 is first active log. Log 23 is created
Point 4
Active log dir contains 4 logs Log 20 is not active, and unarchived. Logs 21-23 are
active logs.
© 2009 IBM Corporation
Information Management – DB2
What if Archiving does not work for a long time?
 Eventually, the filesystem becomes full ....
 DB2 cannot pre-allocate log files to maintain
LOGPRIMARY ...
 Later, run out of disk space provided by preallocated log files.
 This scenario is 'out of disk space', distinct from
'out of log space'
 Default behaviour, database comes down with
severe error
 How to avoid this ? 2 solutions ...
24
© 2009 IBM Corporation
Information Management – DB2
Log disk full condition handling
.....
BLK_LOG_DSK_FUL
.....
= YES
If NO (default)
CFG
DB2 Database
Log Buffer
DISK FULL!
Database comes down with severe
error (out of disk space)
If YES
Active Log
Files
Active log path
Update transactions wait for new log
Read only transactions continue
Write message to db2diag.log
Wait 5 minutes and retry creating new log
Check for completed log archive and rename log
25
Inactive
Unarchived
Logs
DB2 Log Manager
Offline
Archive Log
Files
© 2009 IBM Corporation
Information Management – DB2
DB2 log archive processing - FAILARCHPATH
S0000020.log
S0000021.log
S0000022.log
1000 1001 1001
1000 1002 1003
1000 1003 1003
Active log directory
1
S0000020.log
1000 1001 1001
2
4
transaction 1000
transaction 1001
transaction
1002
FAILARCH directory
26
3
transaction 1003
DB2 Log
Manager
Activity
Action Taken
Point 1
Log S0000020 full
Log Manager begins to archive Log 20 but FAILS.
Transaction 1000 is oldest current transaction (lowtran lsn)
Point 2
Log S0000021 full
Log Manager does not archive log 21 until 20 is done.
Point 3
Transaction 1000 Commits
Point 4
Archiving works again.
Log 20 no longer an active log, but is not archived. Txn
1003 is lowtran lsn. Log 21 is first active log. Log 20 is
moved to FAILPATH. Log 23 created.
Log 20 is archived from FAILPATH, Log 21 is archived
© 2009 IBM Corporation
from active log dir.
Information Management – DB2
Protecting the DB2 logs
DB2 Database
Recovery from Log media failure:
–
1.
Log Buffer
Operating System facilities
- Software RAID and Disk Mirrors
Operating System Mirroring or
Hardware RAID
RAID Hardware:
RAID 1 - Mirroring
RAID 5 - Parity Based
DB2 Logs
Mirror Copy
Primary Disk
Mirror Disk
No protection from accidental deletion of active logs.
27
© 2009 IBM Corporation
Information Management – DB2
Protecting the DB2 Active logs - Log mirroring
CFG
DB2 Database
.....
LOGPATH = /database/tp1/logs
MIRRORLOGPATH = /database/tp1/logmir
.....
Log Buffer
/database/tp1/logs
/database/tp1/logmir
DB2 Active Logs
copy 1
DB2 Active Logs
copy 2
Protection from accidental deletion of active logs.
28
© 2009 IBM Corporation
Information Management – DB2
Archiving logs to Local and Remote disks
db2 update db cfg for salesdb using logarchmeth1 disk:/dbarchlcl
db2 update db cfg for salesdb using logarchmeth2 disk:/dbarchdr
Path Includes:
– Instance Name
– Database Name
– Database Partition (NODExxxx)
salesdb
Database
Database
Directory
Table
Spaces
DB CFG
Active Logs
Recovery
History
SYS1
db2 Log
Manager
SYS2
Remote
Archive
logs
/dbarchdr/INST1/SALESDB/NODE0000
Local
Archive
logs
/dbarchlcl/INST1/SALESDB/NODE0000
29
© 2009 IBM Corporation
Log File Management in DB2
for Linux, UNIX, and Windows
THE END
Ron Castelletto
IBM Canada Lab
[email protected]
March 1, 2009
© 2009 IBM Corporation