Transcript Document

Monitoring and Tuning Oracle for z/OS and
Oracle for z/Linux
Thomas Niewel
Oracle Deutschland GmbH
[email protected]
Agenda
•
•
•
•
•
Tuning Why ?
Reasons for bad Response Time
Statspack
Diagnosing reasons for bad response Times
SQL Tuning
– TKPROF
– Explain Plan
•
Page 3
WLM
Why do we need to tune ?
• Users report „bad“ response times because of
– CPU Time + Wait Time
– Poor performing queries
– SQL-Tuning
– „bad“ Database parameters
– Bottlenecks in „System“
(Operating System, WLM, IO/Subsystem etc.)
Page 4
What can be the reasons for “bad”
Response Time
• High CPU Usage
Page 5
•
High I/O Usage
•
Memory Usage
•
Network problems
•
„idle“ System
•
Operating System (WLM, VM)
Diagnose from the Oracle point of view
Statspack
A short overview
Page 6
Statspack – a short overview
 spcreate.sql
once)
- installs Statspack (run only
 statspack.snap
- data capture (procedure)
 spreport.sql
- reporting
 spdoc.txt
- user documentation
 sppurge.sql
- delete Statspack data
7
Pagespdrop.sql
- drop Statspack
Capturing data
•
Prerequisite: timed_statistics=true
•
Use stored procedure statspack.snap
SQL> execute statspack.snap;
Page 8
Capturing data
•
Get a baseline for future comparisons
•
Capture snapshots
•
–
–
across peak load
across batch window
–
The time between snapshots should be <= 30 minutes
Capture can be automated
– Use OS utility e.g. cron
–
Page 9
Use dbms_job
– spauto.sql shipped as example
Reporting with Statspack
•
All data is held in an Oracle database
•
Report between two or more snapshots
–
•
Page 10
cannot report across instance startup
Spreport.sql creates a report
Reporting with Statspack
SQL>
@spreport
DB Id DB Name
Instance# Instance
----------- ---------- ---------- ---------1361567071 DB21
1
MAIL
Completed Snapshots
Instance
DB Name
SnapId
Snap Started
Snap Level
---------- ---------- ------ ---------------------- ---------DB21
DB21
1 17 Aug 2003 10:00:16
5
2 17 Aug 2003 10:30:28
5
Enter beginning Snap Id: 1
Enter ending
Snap Id: 2
Enter name of output file [sp_1_2] : <enter name or return>
Page 11
Analyzing a Statspack report
•
Top down analysis
•
Summary page
–
–
–
–
–
•
Page 12
Enviroment
Load profile
Instance efficiency
Shared pool usage
Top 5 Timed Events
Top SQL
Environment section
STATSPACK report for
DB Name
DB Id
Instance
Inst Num Release
Cluster Host
------------ ----------- ------------ -------- ----------- ------- -----------RECONPRD
1403107896 RECONPRD
Snap Id
1 9.2.0.2.0
Snap Time
NO
lin390t1
Sessions Curs/Sess Comment
------- ------------------ -------- --------- ------------------Begin Snap:
2 03-Mar-03 11:28:01
10
5.1
End Snap:
31 04-Mar-03 11:58:04
17
5.5
Elapsed:
30.05 (mins)
Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Page 13
Buffer Cache:
256M
Std Block Size:
16K
Shared Pool Size:
48M
Log Buffer:
128K
Load profile
•
Contains a number of common ratios
•
Allows characterisation of the application
•
Can point to problems
– high hard parse rate
–
–
Page 14
high IO rate
high login rate
Load profile
•
Useful if you have a comparable baseline
•
What has changed?
Page 15
–
txn/sec change implies changed workload
–
redo size/txn implies changed transaction mix
–
physical reads/txn implies changed SQL or plan
Load profile
Load Profile
~~~~~~~~~~~~
Redo size:
Logical reads:
Block changes:
Physical reads:
Physical writes:
User calls:
Parses:
Hard parses:
Sorts:
Logons:
Executes:
Transactions:
% Blocks changed per Read:
Rollback per transaction %:
Page 16
Per Second
--------------19,057.68
2,408.15
98.64
990.47
6.92
76.40
7.08
0.02
29.22
24.73
63.79
0.91
4.10
36.52
Recursive Call %:
Rows per Sort:
Per Transaction
--------------20,937.67
2,645.70
108.37
1,088.18
7.61
83.93
7.78
0.02
32.10
27.17
70.08
72.76
153.46
Instance Efficiency
•
Gives an overview of how the instance is performing
•
Can also be used with a comparable baseline
•
Shared pool Statistics allow quick identification of cursor
sharing problems
Page 17
Instance Efficiency
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %:
99.99
Redo NoWait %:
Buffer Hit
%:
59.00
In-memory Sort %:
Library Hit
%:
99.94
Soft Parse %:
Execute to Parse %:
88.89
Latch Hit %:
Parse CPU to Parse Elapsd %:
56.55
% Non-Parse CPU:
Shared Pool Statistics
Memory Usage %:
% SQL with executions>1:
% Memory for SQL w/exec>1:
Page 18
Begin
-----38.86
43.41
39.28
End
-----66.81
87.22
80.21
99.97
99.99
99.69
99.98
99.93
Top 5 Timed Events
•
•
•
•
•
CPU time – real work
Shows where Oracle sessions are waiting
Compare Wait Time to elapsed time
% Total Wait Time shows potential benefits
Use as basis for directed drilldown
% Total
Event
Waits
Time (s) Ela Time
------------------------------- ------------ ----------- -------CPU time
78,588
50.24
enqueue
1,560,523
59,961
38.33
db file sequential read
1,635,253
6,324
4.04
db file scattered read
14,620,725
5,907
3.78
control file parallel write
32,816
1,396
.89
Page 19
Top 5 Timed Events
•
Sample drilldowns
–
CPU Time „on CPU“
–
enqueue
e.g TX Enqueue
–
db file sequential read
Index Access
–
Page 20
db file scattered read
Scan Operations
control file parallel write
Top SQL
•
Helps to find problem statements
– SQL ordered by Gets
– SQL ordered by Reads
– SQL ordered by Executions
– SQL ordered by Parse Calls
Page 21
Top SQL
CPU
Buffer Gets
Executions
Gets per Exec
Elapsd
%Total Time (s)
Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ---------79,562,398
8,114
9,805.6
34.6 27182.71 28127.71 1525844323
Module: SQL*Plus
SELECT MAX(STMT_BKG_DATE_CLOSE)
TMT_ACCT_ID = :b1
AND ((:b2 = :b3
STMT_MSG_TYPE != :b5
AND ((:b8 IS NULL
LL
FROM GAH_T_STATEMENTS
AND STMT_CARRIER != :b4
AND (:b6 IS NULL
AND
OR :b6 = STMT_CARRIER )
AND STMT_MSG_TYPE != :b9 ) OR (:b8 IS NOT NU
AND :b8 = STMT_MSG_TYPE ))) OR (:b2 = :b13
Page 22
WHERE S
AND STMT_CARRIER
I/O Statistics
•
Help to find I/O Problems
– Tablespace IO Stats
– File IO Stats
Page 23
I/O Statistics
Tablespace
-----------------------------Av
Av
Av
Reads Reads/s Rd(ms) Blks/Rd
Av
Buffer Av Buf
Writes Writes/s
Waits Wt(ms)
-------------- ------- ------ ------- ------------ -------- ---------- ------
GAH_TS00_DT_MEDIUM
15,242,896
160
0.4
6.1
41,066
0
22,468
18.4
2
11.2
1.0
130,299
1
9
15.6
2
6.9
1.0
86,699
1
39
43.8
2
1.7
1.6
101,560
1
0
0.0
2
8.4
1.0
34,867
0
1
0.0
GAH_TS00_IX_ITEM
210,346
GAH_TS00_IX_MEDIUM
207,433
RECONPRD_TS00_TEMP
185,865
GAH_TS00_IX_ITEM_REF
155,027
Page 24
Diagnosing high CPU usage
• High CPU Usage
Page 25
•
High I/O utilization
•
Memory Usage
•
Network problems
•
„idle“ System
•
Operating System (WLM, VM)
Diagnosing high CPU usage
-Operating System-
• Linux/390
– sar -u 3 3333
– iostat -x 3
– vmstat 3
– top
– Etc.
•
Z/OS
– SDSF
– RMF
– Omegamon
– etc.
Page 26
Diagnosing high CPU usage
•
What can be the reason for „high CPU“ Usage ?
– Shared_Pool / SQL-Cache
– db_file_multiblock_read_count
– Buffer_Cache/ Buffer_Pool
– How can Statements with a great # of buffergets be
seperated ?
– Statspack
– SQL Script
Page 27
Diagnosing high CPU usage
CPU
Buffer Gets
Executions
Gets per Exec
%Total Time (s)
Elapsd
Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ---------4,494,662
SELECT *
155
FROM GAH_T_STATEMENTS
((:b2 = :b3
28,997.8
OR :b6 = STMT_CARRIER ) AND ((:b8 IS NULL
TYPE ))) OR (:b2 = :b13
Page 28
2414.11 3961361411
AND
AND STMT_MSG_TYPE != :b5
STMT_MSG_TYPE != :b9 ) OR (:b8 IS NOT NULL
Module: SQL*Plus
1049.63
WHERE STMT_ACCT_ID = :b1
AND STMT_CARRIER != :b4
AND (:b6 IS NULL
2.0
AND
AND :b8 = STMT_MSG_
AND STMT_CARRIER = :b14
AND STMT_MSG_T
Diagnosing high CPU usage
spool cpu_users.lst
select buffer_gets,disk_reads,executions,
ratio_to_report(buffer_gets) over () * 100
buffer_ratio,
ratio_to_report(disk_reads) over () * 100
disk_ratio,
sql_text
from v$sqlarea
order by buffer_ratio desc;
spool off
Page 29
Diagnosing high CPU usage
BUFFER_GETS DISK_READS EXECUTIONS BUFFER_RATIO DISK_RATIO
----------- ---------- ---------- ------------ ---------SQL_TEXT
---------------------------------------------------------------------------------------19564429
154
46908
65.9945773 5.40350877
select t.schema, t.name, t.flags, q.name from system.aq$_queue_tables t,
ys.aq$_queue_table_affinities aft, system.aq$_queues q where aft.table_objno = t.objno
and aft.owner_instance = :1 and
q.table_objno = t.objno and q.usage = 0 and
bitand(t.flags, 4+16+32+64+128+256) = 0 for update of t.name, aft.table_objno skip
locked
Page 30
SQL Tuning
•
Check Object Statsitics
– Use DBMS_STATS
•
Analyze Execution Plan
– Explain Query / V$SQL_PLAN
– Optimize Query
– Optimize Indexes
– Index Only Access, Function Based Indexes
Page 31
Diagnose
Page 32
•
High CPU Usage
•
High I/O utilization
•
Memory Usage
•
Network problems
•
„idle“ System
•
Operating System (WLM, VM)
High I/O utilization
•
Linux/390
– sar -d 3 33333
– iostat -x 3
– vmstat 3
• Z/OS
– RMF
– Omegamon etc
Page 33
High I/O utilization
•
Disk I/O
– Disk access is slower than memory access (Factor
5000 to 100000)
– One physical disk is able to perform 100-150 I/O´s per
Second
– Disk Reponse Times (Read operations)
– 2ms (Read from disk cache)
– 10ms – 15ms (Physical Reads)
Page 34
High I/O utilization
•
Reasons for High I/O utilization
– Database Cache too small (DB_CACHE_SIZE)
– Sortarea too small (sort_area_size)
– Hasharea too small (hash_area_size)
– Too many Checkpoints
– Ineffective Execution Plans (e.g. Full-TableScans which are not necessary)
Page 35
High I/O utilization
•
Increase Cache Size
– Reduces physical I/O Operations
– Z/OS
– Limited by 31 Bit Arcitecture
– Multiple Adress Spaces help to improve the
Memory management
Page 36
High I/O utilization
• An Oracle server instance has a single SGA regardless
of the number of address spaces or regions configured.
• The user context is distributed across all AS
AS1
AS2
AS3
ASn
Single Shared SGA Across Address Spaces
Page 37
High I/O utilization
•
Linux/390
– The default maximum SGA size on Linux/390 is 750
MB without changing the base adress
– the maximum SGA size to 1 GB by changing the SGA
base address
Page 38
High I/O utilization
Top 5 Timed Events
% Total
Event
Waits
Time (s)
Wt Time
-------------------------------------------- ------------ ------------ ------db file sequential read
89,086,819
11,009
93.13
9,875,076
776
6.56
file open
505,227
23
.19
log file sync
440,409
8
.07
11,042,510
3
.03
db file scattered read
latch free
Page 39
High I/O utilization
Tablespace I/O Stats:
Tablespace
Av
Av
Av
Av
Reads Reads/s Rd(ms) Blks/Rd
Buffer Av Buf
Writes Writes/s
Waits Wt(ms)
-------------- ------- ------ ------- ------------ -------- ---------- -----RECEIVABLE_T_01
18,398,460
213
12.0
1.6
59,325
1
4,892,686
0.0
79
13.2
1.6
27,462
0
4,506
0.0
62
9.0
1.3
18,388
0
35,935
0.0
1400
21.7
1.8
72,563
1
217,799
0.0
SO_T_03
6,827,475
SO_I_01
5,356,393
PO_I_01
4,641,732
Page 40
High I/O utilization
RMF Report (Monitor 1; RMF Postprocessor)
D I R E C T
STORAGE
GROUP
DBORACLE
DBORACLE
DBORACLE
DBORACLE
DBORACLE
DBORACLE
DBORACLE
DBORACLE
DBORACLE
DBORACLE
DBORACLE
DBORACLE
DBORACLE
DBORACLE
DBORACLE
DEV
NUM
7651
7652
7653
7654
7655
7656
7657
7658
7659
765A
765B
765C
765D
765E
765F
DEVICE
TYPE
33903
33903
33903
33903
33903
33903
33903
33903
33903
33903
33903
33903
33903
33903
33903
Page 41
VOLUME
SERIAL
LEOR00
LEOR01
LEOR02
LEOR03
LEOR04
LEOR05
LEOR06
LEOR07
LEOR08
LEOR09
LEOR0A
LEOR0B
LEOR0C
LEOR0D
LEOR0E
LCU
008F
008F
008F
008F
008F
008F
008F
008F
008F
008F
008F
008F
008F
008F
008F
A C C E S S
DEVICE
AVG AVG
ACTIVITY RESP IOSQ
RATE
TIME TIME
0.817
4
0
0.878
9
0
0.502
2
0
108.968
56
52
0.828
3
0
98.779
50
48
2.768
2
0
0.943
3
0
1.003
4
0
0.945
3
0
0.217
3
0
0.833
4
0
0.963
4
0
0.013
3
0
0.935
4
0
D E V I C E
AVG AVG AVG
DPB CUB DB
DLY DLY DLY
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
A C T I V I T Y
AVG AVG AVG
PEND DISC CONN
TIME TIME TIME
0.2 2.6 0.8
0.2 0.3 8.7
0.2 0.0 1.5
0.2 2.4 0.8
0.2 2.3 0.8
0.2 1.7 0.8
0.3 1.3 0.7
0.2 2.3 0.7
0.2 3.5 0.8
0.2 2.2 0.8
0.2 2.2 0.8
0.2 2.5 0.8
0.2 2.7 0.9
0.2 2.6 0.5
0.2 3.0 0.8
%
DEV
CONN
0.06
0.76
0.08
0.08
0.06
0.13
0.20
0.07
0.08
0.07
0.02
0.06
0.09
0.00
0.07
%
DEV
UTIL
0.28
0.79
0.08
0.32
0.25
0.42
0.56
0.28
0.43
0.28
0.06
0.28
0.35
0.00
0.35
%
DEV
RESV
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
AVG
NUMBER
ALLOC
1.0
3.0
6.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
2.0
1.0
1.0
1.0
%
ANY
ALLOC
100.0
100.0
100.0
100.0
100.0
100.0
100.0
100.0
100.0
100.0
100.0
100.0
100.0
100.0
100.0
%
MT
PEND
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
High I/O utilization
•
RMF Report – Explanations
– IOSQ TIME = UCB Queueing time
– Avg Pend Time = ms, all Path´s to logical volume are
busy
– AVG Resp Time = Connect Time + Dicsonnect Time +
Pending Time + IOSQ
Page 42
SQL Tuning
•
Check Object Statsitics
– Use DBMS_STATS
• Analyze Execution Plan
– Explain Query
– Optimize Query
– Optimize Indexes
– Index Only Access, Function Based Indexes
Page 43
Diagnose
Page 44
•
High CPU Usage
•
High I/O utilization
•
Memory Usage
•
Network problems
•
„idle“ System
•
Operating System (WLM, VM)
Memory Problems
•
How to determine Paging/Swapping
– Linux/390
– VMSTAT
– Z/OS
– RMF
– OMEGAMON
•
Page 45
Reasons for Paging/Swapping
– Too many processes/users
– Database Parameters which are too generously
– DB_CACHE_SIZE
– HASH_SIZE
– SQL_CACHE
Diagnosing high CPU usage
-Operating System-
• High CPU Usage
Page 46
•
High I/O utilization
•
Memory Usage
•
Network problems
•
„idle“ System
•
Operating System (WLM, VM)
Diagnosing Network problems
•
Latency
– LAN: < 1ms
– WAN: < 10ms - 500ms
– ISDN: < 50ms
– VPN: 100-500 ms
•
•
Badwidth
–
11-18 Mbit (Copper)
–
100 Mbit (Copper, fibre)
–
1 Gbit (fibre)
Great number of small packets
– tcp_nodelay
– SDU, TDU-Parameters (not available on z/os)
Page 47
Diagnosing high CPU usage
-Operating System-
• High CPU Usage
Page 48
•
High I/O utilization
•
Memory Usage
•
Network problems
•
„idle“ System
•
Operating System (WLM, VM)
Idle System
• One CPU is 100% used – All other CPU´s are
idle
– Reason
– dedicated Server
– Only one process is running
– Solution
– Parallel Query
– Not useful for OLTP Aplications
– Split work - run more Processes
Page 49
Idle System
• Latch Contentions
– Use Statspack to diagnose
•
Enqueue Waits
– Use Statspack to diagnose
– Often Block Contentions because of too small
initrans, Freelist, Freelist goup settings
• Parsing because the use of Literals
– Use Statspack to diagnose
– Use CURSOR SHARING
– Use Bind Variables
Page 50
Idle System
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~
Event
% Total
Waits
Time (s) Ela Time
-------------------------------------------- ------------ ----------- -------enqueue
1,560,523
78,588
50.24
59,961
38.33
1,635,253
6,324
4.04
14,620,725
5,907
3.78
32,816
1,396
.89
CPU time
db file sequential read
db file scattered read
control file parallel write
-------------------------------------------------------------
Page 51
Idle System
Enqueue activity for DB: RECONPRD
Instance: RECONPRD
Snaps: 2 -31
-> Enqueue stats gathered prior to 9i should not be compared with 9i data
-> ordered by Wait Time desc, Waits desc
Avg Wt
Eq
Requests
Succ Gets Failed Gets
Waits
Wait
Time (ms)
Time (s)
-- ------------ ------------ ----------- ----------- ------------- -----------TX
438,961
438,941
20
114
512,902.49
58,471
TC
34,530
34,530
0
6,904
369.61
2,552
PS
9,526,323
9,386,524
139,799
1,517,315
.25
381
CF
42,761
42,751
10
23
897.57
21
CI
55,594
55,594
0
12
6.17
0
HW
11,356
11,356
0
8
.13
0
-------------------------------------------------------------
Page 52
SQL-Tuning
•
•
Prerequisites
–
Use Cost based optimizer
–
DBMS_STATS (important)
Explain Query
–
•
Create Plan Table: UTLXPLAN
Visualize Execution Plan
–
UTLXPLS
–
UTLXPLP
Note: Scripts are located in xxxxxxxx.yyyyyyyy.SQL library (z/OS)
$ORACLE_HOME/rdbms/admin (Linux/Unix)
Page 53
SQL-Tuning
SQL> explain plan for select a.* from scott.emp a, scott.dept b where a.deptno=b.deptno;
Explained.
SQL> save explain
Created file explain.sql
SQL> @?/rdbms/admin/utlxpls
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------| Id
| Operation
|
Name
| Rows
| Bytes | Cost
|
-------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
14 |
560 |
2 |
|
1 |
|
|
14 |
560 |
2 |
|
2 |
TABLE ACCESS FULL
| EMP
|
14 |
518 |
2 |
|*
3 |
INDEX UNIQUE SCAN
| PK_DEPT
|
1 |
3 |
|
NESTED LOOPS
-------------------------------------------------------------------Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------3 - access("A"."DEPTNO"="B"."DEPTNO")
Page 54
SQL-Tuning
•
Optimizer features which help to improve
execution plans
–
Function based indexes (very important)
– SELECT * From emp where upper(ename) =
´SMITH´
–
Bitmap indexes (Useful in case of Read Only)
– Useful for Low Cardinality columns
–
Parameter: Optimizer_index_cost_adj
– Optimizer access path selection can be
adjusted to be more index friendly
Page 55
SQL-Tuning
•
SQLTRACE
–
Prerequisite: timed_statistics=true
–
Activate
– Alter Session set SQL_trace=true
– dbms_system.set_sql_trace_in_session
–
Use TKPROF to show execution statistics
– sys=no,explain=uid/pw
Page 56
z/OS WLM
•
•
Everything works fine without peaks (e.g.CPU 30%)
Common Problems we had with WLM(during peak
periods)
–
The „Everything is important syndrom“
– User didn´t classify any discretionary goals
– Everything had the same importance
–
Enclave(Sess) with response time goals
– Enclave goes to last period (which was discretionary)
shortly after Logon
–
No default service class for OSDI
– Mistake in classification rules will result in SYSOTHER
being used – discretionary goal
Page 57
Oracle for Linux /390
•
•
We had tuning work
–
Linux on an LPAR
–
Linux under VM
We did not have any VM related problems
• The reasons for performance bottlenecks were
Page 58
–
Execution plan of a few SQL Queries
–
I/O Subsystem
Oracle for z/OS
•
The reasons for performance bottlenecks were
–
WLM configuration
–
Execution plan of a few SQL Queries
–
I/O Subsystem
– Variances in disc response time
Page 59
?
Page 60