DB2 War Stories and Scary Tales (Part 1)

Download Report

Transcript DB2 War Stories and Scary Tales (Part 1)

Session: A11
DB2 War Stories and Scary Tales
(Part 1)
Robert Goodman
Certified DB2 Professional
Certified Business Continuity Planner
May 9, 2007 3:00 p.m. – 4:00 p.m.
Platform: DB2 for z/OS
DB2 War Stories & Scary Tales
Major Areas
Database Foundation Stones
SQL Exposures
Questionable Strategies
DDL Exposures
Operational Exposures
Page: 2
DB2 on z/OS – A World Class Act!
Page: 3
Database Foundation Stones
Page: 4
Where Shall We Begin?
DB2 War Stories
And Scary Tales
Chapters
Fear of Commitment
By
Trigger Happy
DATABASE BOB
Rotate Roulette
Extents Wide Open
Think Tank Publications
Page: 5
Chapter 1
Fear of
Commitment
Page: 6
Cycling DB2
Start
DB2
Apps
Up
Apps
Up
DB2
Stop
Crash
When DB2 Does Lots of Recovery
Start:
IRLM
DM
DDF
Recover:
Unresolved URs
Page: 7
Once Upon A Time
In the kingdom of DB2, the time came for a systems
downtime. Many tasks had been busy during the
week. It came time to cycle DB2. It was cancelled, but
was naughty and wouldn’t come down. After two
hours, the operator forcefully killed DB2. Maintenance
was applied and the system was brought up. DB2
lingered and wouldn’t wake up. After three hours, the
operator killed DB2 for a second time. We tried once
again to wake up DB2. This time we asked the wizards
at IBM how to slay the problem. They declared that we
should just let it run. Twelve hours later DB2 came up.
Why did it take so long? And can this happen even
today?
Page: 8
What Happened and Why?
Force
DB2
DB2
Up
Sparse Updating Long URs
X X
XX
X
X
X
DAY3
DAY2
DAY1
DB2 Restart:
Recover Incomplete URs
Log1
Log2
A1
A2
Log3 Log1 Log2
- DB2 Log Records 1) Do Records
2) Undo Records
Page: 9
DB2 Restart
Start DB2
A1
Log Log
Forward Recovery
Backward Recovery
App App App App
All Recoveries Must Complete
Restart Complete
Without Deferred Restart
Deferred Restart
Forward Recovery
X
App App App App
Page: 10
Log Log
Backward
Recovery
Restart Complete
Start DB2
A1
Applications
Access DB
(Quicker)
X
App RECP App App
Applications
Access DB
App App App App
Turning On Deferred Restart
 LIMIT BACKOUT Parameter
 AUTO – Automatically recovers once DB2 is up
 YES – RECOVER POSTPONED command resumes recovery
 NO – Process all inflight and inabort URs
 BACKOUT DURATION Parameter
 log records during backward recovery before deferring
Page: 11
DB2 Restart Questions
• Does deferred restart always
work?
• In rare cases it fails
• Won’t deferred restart fix all my
problems?
• Normally
• Deferred pagesets still need recovery
• What is status of pagesets after
restart?
• Most pagesets are available
• Deferred pagesets are unavailable
• What is the exposure?
• Applications which use deferred
pagesets will fail
Page: 12
• How can we detect long
running URs?
• DB2 log message - DSNR035I
UNCOMMITTED UR AFTER ###
CHECKPOINTS
• Can I automatically cancel
long running URs?
• Netview can be used to do this
Chapter 2
Trigger Happy
Page: 13
The Trigger Concept
Program1
Program1
SQL1
SQL1
SQL
Master
Create
Trigger
Page: 14
Database
Invokes
Trigger
SQL
After
Trigger
Trigger Facts:
• Programs are unaware
• Synchronous
• Part of UR
• Firing cost of a FETCH
• Add in trigger SQL
• Plus cost of trigger work
• Adds to SQL elapsed
Once Upon A Time
The king declared that dashboards would help rule
the kingdom. This was a daunting task for many
programs had to be changed. Triggers came to the
rescue, they were quick and easy. Since 1 trigger
was good, many were even better. They multiplied
like rabbits and soon the whole kingdom was full
of triggers. The word came down from on high that
things were slow. Many programs were dragging
but no changes had been made. They noticed that
when triggers were added, darkness descended
upon the kingdom. What had gone wrong? And
how could it be fixed?
Page: 15
Multiple Triggers
Multiple Triggers
• All are synchronous
• All in UR
• Multiple trigger’s SQL
• Fire one after another
• Fired in timestamp order
• Serially add to elapsed time
Program
SQL
SQL
Master
Database
Invokes Before
Trigger
Trigger
After After
Trigger Trigger
SQL Elapsed Time
Page: 16
Triggers & Stored Procedure
Program
SQL
DB2MSTR SQL
Database
Invokes
Trigger
Triggers & Stored Procedures
Invoke
DDF
After
Trigger
Stored
Procedure
SQL Elapsed Time
Page: 17
• Synchronous
• SP program load time
• SP execution time
• Can make calls outside DB2
• Greatly extends total times
Stored Procedure w/Transition Tables
Using Transition Tables
• Synchronous
• Transition tables
• Create table time
• Use table time
• Delete time
• SP program load time
• SP execution time
• Calls outside DB2 time
• Adds significantly to times
Program
SQL
MSTR
Database
Invokes
Trigger
After
Trigger
Invoke
DDF
SQL
Stored
Procedure
DM
SQL Elapsed Time
Page: 18
How Expensive Are Triggers?
Fire Trigger – Cost of a FETCH
Statement Triggers – Cheapest
Row Triggers – Cheap
SP Triggers – Expensive
SP Triggers w/Trans Vars - Priceless
+ Trigger SQL – Cost of SQL
+ WHEN – Invoked every time trigger event happens
+ Transition Variables – Cost of transition table
+ Invoke Stored Procedure – DDF, Start thread
+ Resident Stored Procedure – work in SP
+ Non-Resident Stored Procedure –start SP + work in SP
Page: 19
Generally Poor Reasons to Use Triggers
1)
2)
3)
4)
5)
6)
7)
8)
9)
Page: 20
Just because they’re quick
Lazy man’s solution
Easier than changing programs
Temp fixes that become permanent
For data replication
To populate summary tables
To enforce simple value constraints
To enforce RI constraints
To maintain dashboards (oops!)
Modifying Triggers
Update TRIGGER
Refresh TRIGGER
DROP TRIGGER
CREATE TRIGGER
DROP & CREATE TRIGGER
How Triggers Are Maintained
Page: 21
Trigger – Firing Sequence
SYSIBM.SYSTRIGGERS
CREATEDTS
“It is also used to order the
execution of multiple triggers.”
Firing
Order
1
2
3
Page: 22
Firing
Order
Trigger 1
2007-01-01
Trigger 2
2007-01-15
Trigger 3
2007-01-30
DROP &
CREATE
1
Trigger 1
2
Trigger 3
3
Trigger 2
On 2007-02-15
2007-01-01
2007-01-30
2007-02-15
Who Is Aware of Triggers?
Source Awareness
 Yes - DB2
 Should Be - DBAs
 Maybe – Programmers
 NO – DB2 Utilities
Triggering Awareness
 NO – Applications
 NO – SQL
 NO – DB2 Optimizer
 NO – Explain
 NO – Resource Limit Facility
 NO – Constraints
Page: 23
Triggers - A Run-Time Event
Invisible Program Dependencies
Programs A B C D . . .
Invisible Causes of Breakage
SQL
MSTR
SQL
Trigger
DDF
Page: 24
AT
TT
AT
AT
AT
Calls Outside DB2
DBM1
Stored Procedure
 Trigger
 Trigger SQL
 DDF
 Transition tables
 Stored procedure
 Application tables
 Calls outside DB2
 Unavailable resources
 RI & check constraints
 Utilities
 Deadlocks & timeouts
Any Break Causes
All Triggering SQL to Fail!
When DDF reloaded,
the resident SP-X the
transition variables
no longer matched
the trigger and SP-X.
The trigger had to be
dropped & recreated
to correct this. SP-X
had to be changed to
include new column
in TableA
Trigger Traps
The Scenario
1) Update trigger on TableA
Starts resident stored procedure(SP-X)
Inserts before image into log – TableB
2) DBA adds column to TableA
3) Days Later - SQL updating TableA starts failing
Corrective Actions
1) Drop the trigger (may require down-time)
2) Drop the stored procedure
3) Add column to parameter lists & SQL
4) Recreate the stored procedure
5) Recreate the trigger
Page: 25
DBA
Alters
Triggers
Chapter 3
Rotate
Roulette
Page: 26
The ROTATE Concept
ROTATE DDL Command
1) Delete Oldest Partition Rows
2) Reuse Oldest Partition
1
Newest
Oldest
Limitkey A
E
Delete Old Data
Page: 27
2
3
4
Old
New
Newer
Limitkey B
Limitkey C
Limitkey D
New Last
Part Limitkey
Rotating Partitions
ALTER TABLE table
ROTATE PARTITION FIRST TO LAST
ENDING AT (limitkeys) RESET;
Page: 28
ROTATE In Action
ROTATE DDL
LP 1
catg.DSNDB.db.ts.I0001.A001
LP 4
LP 2
catg.DSNDB.db.ts.I0001.A002
LP 1
LP 3
catg.DSNDB.db.ts.I0001.A003
LP 2
LP 4
catg.DSNDB.db.ts.I0001.A004
LP 3
Logical Partition
SYSTABLEPART(V8)
Page: 29
Physical Partition
Dataset
Logical Partition
After The ROTATE
A Series of ROTATEs
1st Rotate
2nd Rotate
3rd Rotate
4th Rotate
Limitkey
P1
P1
P1 1
catg.DSNDB.db.ts.I0001.A00
P1
P1
‘A’
‘E’
P2
P2
P2 2
catg.DSNDB.db.ts.I0001.A00
P2
P2
‘B’
‘F’
P3
P3
P3 3
catg.DSNDB.db.ts.I0001.A00
P3
P3
‘C’
‘G’
P4
P4
P4 4
catg.DSNDB.db.ts.I0001.A00
P4
P4
‘D’
‘H’
Page: 30
Once Upon A Time
Version 8 was up and running well. The call came to
convert to table based partitioning and reuse the oldest
partition. The ROTATE command was chosen to do this
non-disruptive deed. Suddenly the phone began to ring
and thick darkness covered the database cubicles. A
quick check revealed that two parts were mired in
REORP status. User processing ground to a halt their
workloads were in peril. The database guardians
countered with concurrent REORG to fix the problem.
This crashed and burned. Share level NONE REORG
was called upon. When it finished, the sun came out
and life was good again. What happened to disrupt the
peace of this database kingdom?
Page: 31
Why Did ROTATE Set REORP?
Maximum Limitkey in
last partition is not
enforced by “index
based partitioning”
INSERT INTO tableX
(D_YEAR)
VALUES (‘2008’)
Table_IBP
Last Part
Limitkey
(‘2007’)
Index Based
Partitioning
Table_TBP
Last Part
Limitkey
(‘2007’)
Table Based
Partitioning
Page: 32
The 1st time only,
ROTATE converts
Indexed Based
partitioning to
Table Based
partitioning.
Because limitkey
is not enforced in
Index Based, the
1st and last parts
have to be put in
REORP status to
eliminate this
potential issue.
Converting to Table-Controlled Partitioning
ALTER INDEX clustering_index NOT CLUSTER;
(conversion to table-controlled partitioning)
COMMIT WORK;
ALTER INDEX clustering_index CLUSTER;
(clustering index reestablished)
It’s Simple to Do This Before Rotate
Page: 33
Lowering Limitkeys in Last Part
REORG TABLESPACE tablespace
SCOPE PENDING SHRLEVEL NONE STATISTICS COPYDDN
(Data keys beyond limitkey, discarded during REORG!)
Page: 34
Data Outage
ALTER TABLE table
ALTER PARTITION # ENDING AT (limitkeys);
(This & next partition put in REORP, data outage!)
Which Partition Number is Used?
-DIS DB(db) SPACENAM(ts)
NAME TYPE PART STATUS
SRG9700 TS 0002 RW
-THRU 0004
SRG9700 TS 0001 RW
Page: 35
Cmd/DDL/Utility Physical Part
-DISPLAY DB
X
ALTER ROTATE
Logical Part
Order of Display
X
Other ALTERs
X
Image Copy
X
Unload
X
Load
X
Reorg
X
Recover/Rebuild
X
DB2 Datasets
X
X - Rebalance
Rotating Logical Partitions
1
1st LP Logical Part = ?PP Physical Part
SELECT PARTITION, LOGICAL_PART
FROM SYSIBM.SYSTABLEPART
WHERE DBNAME
= ‘db'
AND TSNAME
= ‘ts'
AND LOGICAL_PART = 1
UNLOAD TABLESPACE tablespace PART ?PP
2
LOAD TABLESPACE tablespace PART ?PP
REPLACE using dummy SYSREC
Data Outage
Advisable
3
?PP
4
-START DB(db) SPACE (ts) PART(?PP) ACCESS FORCE
5
ALTER TABLE table
ROTATE PARTITION FIRST TO LAST
ENDING AT (limitkeys) RESET; (1st LP data deleted, becomes last LP)
Advisable
6
Page: 36
COPY TABLESPACE tablespace DSNUM ?PP SHRLEVEL CHANGE
(rotated partition now recoverable)
Mitigating Rotate Issues
 Avoiding Data Outages
 New Tables – Use “table based partitioning”
 Last Partition – Don’t set max/min limitkey (may cause -327 SQLCODEs)
 Converting from “Indexed Based Partitioning”
 Know Logical Partitions Prior to Rotate
 Query SYSIBM.SYSTABLEPART
Page: 37
Outage
 Don’t convert to table based partitioning with ROTATE
 Use …ALTER INDEX index NOT CLUSTER
 Then …ALTER INDEX index CLUSTER
 Plan for Outage on 1st ROTATE
 Query for values beyond limitkey before reorg
 ALTER ASC/DESC limitkey from max/min value
 Downtime REORG to remove REORP status
Rotate Dangers
1)
Knowing which physical part is 1st logical part
2)
Long running DELETEs to empty 1st logical part
(42 secs to ROTATE / delete 1,000,000 row partition)
3)
ROTATE can cause an outage (REORP status)
(convert to table based partitioning or ALTER limitkeys)
4)
Which part # to use for Commands / DDL / Utilities
5)
Mistakenly rotating the wrong table
(DDL reuse or finger fault)
6)
Adding partitions to ROTATEd tables
(Confusion factor on first & last parts)
7)
With ascending keys, trying to insert null key inserts
8)
Recoverability after a ROTATE
9)
Attempting to REBALANCE ROTATEd tables
Page: 38
The Rotate Questions
•
Which partition will rotate next?
•
•
•
•
•
•
Set MAXVALUE in limitkey of last partition
Show we use rotate to Convert to Table
Based Partitioning?
•
•
•
Yes – if indexed based partitioning (convert
to table based, last 2 partitions in REORP)
•
Yes – If last part limitkey is altered
No – table based partitioning & limitkey
doesn’t need to be altered
Can rotate be blocked?
Not advisable (last 2 partitions in REORP)
Does rotate behave differently with
Index Based Partitioning?
•
Page: 39
Yes - the first time
Does rotate rename datasets?
•
Does rotate interrupt availability?
•
•
1st Logical partition - query
SYSIBM.SYSTABLEPART
•
No – does SQL deletes instead
Does rotate delete SYSCOPY
entries?
•
No – puts a rotate row in SYSCOPY
If accidentally rotate can I
recover?
•
No – rotate row in SYSCOPY
prevents recovery prior to ROTATE
Chapter 4
Extents
Wide Open
Page: 40
The Extent Concept
VOL001
1st Extent
catg.DSNDB.db.ts.I0001.A001
VOL002
2nd Extent
Extent
Consolidation
DB2 requests space
from z/OS which
finds blocks of space
and updates the
VTOC and Catalog.
Page: 41
VOL003
3rd Extent
Once Upon A Time
Life in the database kingdom was good. Autonomic
features had eliminated many servant duties. One
day a troubled user called. A table was broken and
needed to be recovered. Luckily it had only had five
million rows. We proclaimed that it would be back
in merely moments. Tapes were mounted, disks
were spinning and the clock was ticking. Five
minutes turned to ten and then to fifteen. After
having many discussions with management, the
recovery finally finished. Why did a small table
take so long to recover?
Page: 42
Extent Evolution
20th
Century
DASD
z/OS
Rules
z/OS Rules
DASD
Reality
21st
DASD
Century
DASD Subsystem
Page: 43
Extent Limits
• Logical Extent Limits
• Max Extents / Dataset
• 255 extents z/OS 1.6
• 7,257 extents z/OS 1.7
• Max Extents / Volume
• 123 extents / volume
• Large DASD increase this issue
(Mod 27s & 54s)
• Extent Rule
• 5 pieces primary extent
• Whatever can get on secondary
Page: 44
Solving Systemic Extent Issues
• IBM Strategies
• Tolerate More Extents
• z/OS 1.7 – 7,257 extents
• Make It Harder to Hit Limits
• SMS Extent Consolidation
• Automate Extent Management
• V8 Sliding Secondaries
Page: 45
How Extents Affect Utilities (V8)
300
250
Table 1.2 M Rows
and 1 Index
200
150
118
100
63
60
Re
or
g
Re
co
ve
r
Re
bu
ild
Un
l
y
Co
p
115
25
16
Lo
ad
50
0
Page: 46
305
276
256
oa
d
Elapsed Seconds
350
1 Extent
85 Extents
Avoiding Extent Issues
DASD Fragmentation
STOGROUPs by size
Standardized allocations
z/OS Slow Allocations
Fewer Extents
Automated Method
Use sliding secondaries
Can cause fragmentation
Managed Method
STOGROUPs by size
Standardized allocations
From z/OS We Need
Faster Allocation Search
Faster Cataloging z/OS
Page: 47
In Closing
• Beware
DB2 War Stories
And Scary Tales
• Be Knowledgeable
By
DATABASE BOB
• Be Careful
• Do Excellent Work
Think Tank Publications
Page: 48
Session: A11
DB2 War Stories and Scary Tales (Part 1)
Robert Goodman
[email protected]
Page: 49