Standard Presentation Template

Download Report

Transcript Standard Presentation Template

Explain
DB2 EXPLAIN
Explanations
Introduction
> Who are you ?
 Job function – Application Developer / DBA
 DB2 experience
> Who am I ?
 Steen Rasmussen, CA inc.
 Principal Technical Specialist DB2 Tools
 22 years marriage with DB2 since V1R1M0
 Developer, DBA, Presales, Technical Manager, Presenter
2
July 21, 2015
DISCLAIMER
> DB2 Performance is a huge topic
 You will not become performance specialists today

> This presentation is to be considered an appetizer for
more detailed presentations
 Understand the fundamentals of Explain
 A “quick start” to decrypt Explain output
 What to focus on
3
July 21, 2015
Agenda – the BIG one
(if you like the agenda and Roberta – I will be back)
> What is EXPLAIN and why performance is important
> What influences EXPLAIN output (Access Path Selection)
> EXPLAIN prerequisites

Explain table(s)

How to create the tables

History and backward compatibility
> How can EXPLAIN be executed
> Access Path – introduction

What is Explain

OPTIMIZER overview
> Explain Output – Explain EXPLAIN
4
July 21, 2015
What can you do to save DB2 CPU
(and improve performance)
> How to build SQL statements
> Static <> Dynamic SQL differences
> What if DB2 decides the “wrong” Access Path
 OPTHINT
 REOPT
 RUNSTATS – overview
> Help to the Optimizer
 Filter Factors
5
July 21, 2015
Hvad kan du gore for at spare CPU
(og forbedre performance)
> DB2 Locking
 Lock types
 Recommendations
> SQL & Index design
 STAGE1 and STAGE2 predicates – does it matter and what
is it in reality
> How do you find tuning potential – the “bad” SQL’s
> Benchmarking
 What is it
 How can we learn from our experiences
6
July 21, 2015
Some terms to understand when
“digging” into DB2 Performance issues
> RI
> Buffer Pool (BP)
> RID pool – List Prefetch
> Parallelism – DEGREE()
> Access Path (AP)
> Sargable – STAGE1 STAGE2
> Getpage - GETP
> Optimizer
> Explain
7
July 21, 2015
> Sequential Prefetch
(detection)
> Deadlock / Timeout
> Lock Avoidance
> Lock, Latch and Claim
> Constraints
> Filter Factor & Cardinality
> Correlated query
> Firstkeycardf og Fullkeycardf
What is
EXPLAIN
&
Optimizer
overview
What is EXPLAIN
> Illustrates how DB2’s Optimizer will execute a SQL statement, a
Package or a Plan (depending on how EXPLAIN is executed)
> Why is EXPLAIN a necessity

9
Can’t we simply look at the SQL-statement and estimate if it has
been coded “all right”
–
What if it’s a 12 tabeller joines
–
Or the statement is 2 MB (or “just” 24 KB)
Let’s see a cool example

Does performance mean anything – if a statement executes in
half a second or 5 minutes ?

Predict WHAT will happen when SQL changes or at a Package
REBIND after a reorganization and Runstats

The latest example – how will upgrading to DB2 V8 impact
Access Path (compare DB2 V7 Optimizer with DB2 V8)

Do you always know whether a JOIN or SUBSELECT/EXISTS
provides the best performance ??
July 21, 2015
What is EXPLAIN
> Why is Performance so important
 Bad performing SQL costs $$$$
 One benchmark illustrates:
– It costs 30$ to correct “bad” SQL in test
– It costs 1000$ to correct in production
 If response times are not optimal
– Fewer transactions will go through the pipe
– Each end user will be less productive
– Other SQL-statements will suffer due to sharing of
resources

Buffer Pools, I/O channels, Locking conflicts, contention in
shared pools like SORT area, RID pool, . . . .
 Hardware upgrade to conform to SLA
10
July 21, 2015
What’s influencing the EXPLAIN output
> Some factors to consider when comparing Explain
between two different enviroments
 Optimizer looks at Hardware type
 Optimizer looks at number of processors
 Optimizer looks at Buffer Pools
> Host variables will be replaced by “Parameter Markers”
when doing dynamic Explain – this could be a major
problem in earlier DB2 versions (pre DB2 V8) if host
variables (or column predicate) was defined differently
than the column defined in the DB2 catalog
11
July 21, 2015
What’s influencing the EXPLAIN output
> Table size (and compression)
> Column cardinality and Filter Factor
> Indexes present and the columns

FIRSTKEYCARDF and FULLKEYCARDF (details later)

Different RUNSTATS methods to collect statistics
> Clustering (Cluster Ratio) as well as clustered indicator
> Number of Index Levels (NLEVELS)
> SQL predicates (predicate analysis)
> ORDER BY and the ability to eliminate sorts
> ..........
12
July 21, 2015
EXPLAIN – OPTIMIZER overview
> What is the purpose of the Optimizer
 Decides how the “database navigates”
 “Parsing” SQL statements to check tables and columns
 Investigates statistics from the DB2 catalog (which can be
updated by RUNSTATS utility or manual)
 Decides what is the “LEAST expensive” access path
 DB2’s Optimizer is COST BASED (opposite from Oracle)
– Finds statistics from the DB2 Catalog
– Calculates Filter Factor(s) (estimated qualified #rows)
– Finds a potential number of Access Path’s
– Calculates the cost based on CPU and I/O cost
13
July 21, 2015
EXPLAIN – OPTIMIZER overview
> Cost based Optimizer – CPU and I/O
 CPU costs
– Looks at predicates (STAGE-1 or STAGE-2)
(this can be covered in another session – next page is an appetizer )
–
How many PAGE’s to look at
–
SORT – any kind
 I/O costs
– DB2 Catalog statistics (like Clusterratio which also can be
covered in a separate session)
– Buffer Pool size
– What does it cost to allocate temporary files etc.
 A lot of activity spent on looking at which indexes exist
and how these are matching the predicates
14
July 21, 2015
EXPLAIN – OPTIMIZER overview
Result Set
SQL
statement
Relational
Data Manager
Apply stage 2
predicates
and sort data
(can be expensive)
Optimized
SQL
Read Buffer
or get data (I/O)
15
July 21, 2015
Data
Manager
Buffer
Manager
Apply stage 1
predicates
Data
EXPLAIN Pre-Requisites
> EXPLAIN tables
 creator.PLAN_TABLE (minimum to do explain)
– Records Optimizers choice of Access Path
– Not immediately easy to decrypt – many “codes”
(see next slide)
 creator.DSN_STATEMNT_TABLE (optional)
– Shows COST estimates (this is HUGE in my opinion)
 creator.DSN_FUNCTION_TABLE (optional)
– Only used if UDF (User Defined Function) needs to be
explained
 Creator.DSN_STATEMENT_CACHE_TABLE (new in DB2 V8)
– Used to explain DB2 Dynamic Statement Cache
or parts of this
16
July 21, 2015
EXPLAIN TABLE content (example)
QUERYNO QBLOCKNO APPLNAME PROGNAME PLANNO METHOD CREATOR TNAME
1411
1
RQATD
1
0 SYSIBM
TABNO
SYSTABLES
1
ACCESSTYPE MATCHCOLS ACCESSCREATOR ACCESSNAME INDEXONLY
I
0
SYSIBM
DSNDTX02
N
SORTN_UNIQ SORTN_JOIN SORTN_ORDERBY SORTN_GROUPBY SORTC_UNIQ SORTC_JOIN
N
N
N
N
N
SORTC_ORDERBY SORTC_GROUPBY TSLOCKMODE TIMESTAMP
N
N
IS
REMARKS
2007073116080531
PREFETCH COLUMN_FN_EVAL MIXOPSEQ VERSION
0
N
COLLID
CAD72_2004-08-31-18.24.46
RQPAR110_ALL
COLLID
ACCESS_DEGREE ACCESS_PGROUP_ID JOIN_DEGREE JOIN_PGROUP_ID
RQPAR110_ALL
------
------
------
------
SORTC_PGROUP_ID SORTN_PGROUP_ID PARALLELISM_MODE MERGE_JOIN_COLS
------
------
CORRELATION_NAME
-
------
PAGE_RANGE JOIN_TYPE GROUP_MEMBER
A
YÄ.½T æ
WHEN_OPTIMIZE QBLOCK_TYPE BIND_TIME
SELECT
2007-07-31-16.08.03.615184
PRIMARY_ACCESSTYPE PARENT_QBLOCKNO TABLE_TYPE
17
July 21, 2015
IBM_SERVICE_DATA
0
T
OPTHINT
³ INITUK15
HINT_USED
EXPLAIN Pre-Requisites
> How to create the tables for EXPLAIN ?
 Every new DB2 version adds additional columns
–
DB2 V1R0 had no EXPLAIN (afair)
–
DB2 V1R2 had 25 columns in PLAN_TABLE
–
.........
–
DB2 V7 has 51 columns in PLAN_TABLE
–
DB2 V8 has 58 columns in PLAN_TABLE
–
DB2 V9 has 59 columns in PLAN_TABLE
 Backward and forward compatible !!!
> CREATE TABLE steen.plan_table like . . . . . . .
 Usually the Systems Programmer creates one as part of
the IVP
 Look into IBM SDSNSAMP
18
July 21, 2015
What EXPLAIN DOESN’T show
> Explain only shows SELECT, DELETE, UPDATE, INSERT
> Important not to forget the issues below when doing
performance / tuning
 RI Definitions
 TRIGGERs executed as part of the SQL-statement
 UDF’s
 Table- and Column Check Constraints
> Not always a guarantee DB2 will USE the illustrated AP
 Prefetch activities can be disabled depending on BP status
 Parallelism is decided at the execution time
 RID pool shortage
19
July 21, 2015
How to execute EXPLAIN
> EXPLAIN
 Explain Dynamic Statement cache will not be covered in
detail. Normally used during DBA Performance/Tuning
 Manual type-in via SPUFI
 BIND eller REBIND PACKAGE med EXPLAIN(YES)
20
July 21, 2015
How to execute EXPLAIN
> Manual “type in”
IQPSQLE1 --- (CAPS ON)
COMMAND ===>
--- SQL Editor --- (NULLS ON)
--- 2007/08/03 13:25:54
SCROLL ===> CSR
Edit the SQL statements that you wish to test below. When you are done
editing the SQL, hit PF3/15 or enter 'END' to return to the RC/SQL control
panel.
Host variables will be parsed out, so that you may test your SQL
repeatedly with various host values.
___ explain plan set queryno = 190 for
___ select * from sysibm.systables
___ where name = ?
___ and creator = 'PTI'
___ with UR ;
___
___ commit;
___
___ select * from plan_table where queryno=190;
___
******************************* BOTTOM OF DATA ********************************
21
July 21, 2015
How to execute EXPLAIN
> BIND / REBIND and use EXPLAIN(YES)
> Note – you will have to select from PLAN_TABLE to view
Access path information.
To Rebind
Or Not To Rebind
….
That is the QUESTION !
 Consequences can be “less than optimal” – or even
“career limiting decisions”
22
July 21, 2015
How to execute EXPLAIN
> Edit BIND or REBIND
COMMAND ===>
SCROLL ===> PAGE
********************************* TOP OF DATA *********************************
.CALL DSN PARM(S81A)
.DATA
REBIND PACKAGE(RQPAR110_ALL.RQATD.(CAD72_2004-08-31-18.24.46))
OWNER(RASST02)
QUALIFIER(RASST02)
CURRENTDATA(NO )
VALIDATE(RUN )
EXPLAIN(yes)
ISOLATION(CS)
RELEASE(COMMIT
)
DEGREE(1 )
REOPT(NONE )
KEEPDYNAMIC(NO )
DBPROTOCOL(PRIVATE)
ENCODING(EBCDIC )
IMMEDWRITE(NO )
FLAG(I);
.ENDDATA
******************************** BOTTOM OF DATA *******************************
23
July 21, 2015
BIND / REBIND with EXPLAIN warning
> Manual EXPLAIN
 This is a “What if” analysis
 Illustrates WHICH Access Path will be chosen if a
BIND or REBIND is executed
 You can check the implications / consequences
> BIND / REBIND with EXPLAIN(YES)
 DB2 will generate a NEW Access path !!!!!
 Is this a good idea ? Maybe !
 What if the DB2 catalog statistics ISN’T optimal
– No RUNSTATS executed  statistics columns in the catalog
have -1  Optimizer has NO clue about statistics for
tables, indexes, columns
24
July 21, 2015
Versioning of Explain
> How
 One extra step at Program promotion
 Can be integrated with your existing Change Management
process
– Execute EXPLAIN
– Save Explain output in a table
– Each Explain will generate a new VERSION
– Save xx versions
– Create a SQL statement which will flag packages where the
cost difference is greater than xx %

SUM(PROCMS) and SUM(PROCSU) per Explained Package
version and compare to the costs from previous Explain
– Also consider saving key values from the DB2 catalog used
by the Optimizer
25
July 21, 2015
Versioning of Explain
> Why
 The primary reason is quickly to identify WHEN and WHY
the performance of a program increased/decreased
 It is always possible to verify if performance has changes
– Primarily due to COST or AP changes
 You can see when the change happened (every Explain
has a unique timestamp)
 You can see what the DB2 Catalog statistics were at the
Explain time
26
July 21, 2015
Access Path – what is good / bad AP
> No definitive answer exists for this question – except for:
IT DEPENDS !
> Let’s look at some issues in order to find the “correct”
answer
> One example : One SQL-statement costs 0,050 CPU-sec
while another one costs 2 CPU-minutes – which one do
you want to spend time tuning ?
27
July 21, 2015
Access Path – what is good / bad AP
> The “cheap” SQL-statement is executed 100,000 times per
hour in an online transaction, while the other is executed
once every day in a batch job
 One weeks consumption:
 Online SQL: (0,050 x 100000 x 24 x 7) =840000 CPU-sec.
 Batch SQL : (2 x 60 x 7)
28
July 21, 2015
=
840 CPU-sec.
Explain
EXPLAIN
Explain EXPLAIN
> It will take hours to describe every column currently
existing in PLAN_TABLE – here are a few to focus on:
> Let’s use a real SQL-statement which has been explained
to “decrypt” the PLAN_TABLE columns.
> Description of all the PLAN_TABLE columns can be viewed
in this manual:
IBM DB2 SQL REFERENCE GUIDE (SC18-7426-04) :
http://publib.boulder.ibm.com/epubs/pdf/dsnsqj14.pdf
30
July 21, 2015
Explain EXPLAIN
EXPLAIN PLAN SET QUERYNO=75 for
SELECT
A.NAME , A.CREATOR , A.TBNAME , A.TBCREATOR , A.CREATEDBY ,
B.COLNAME , B.COLSEQ , B.ORDERING , A.CLUSTERING , A.UNIQUERULE
FROM
SYSIBM.SYSINDEXES A , SYSIBM.SYSKEYS B
WHERE ( A.CREATOR = B.IXCREATOR AND
A.NAME = B.IXNAME )
AND A.CREATOR = :hostvar1
ORDER BY A.NAME , A.CREATOR , B.COLSEQ ;
COMMIT;
SELECT * FROM PLAN_TABLE WHERE QUERYNO=75;
QUERYNO QBLOCKNO PROGNAME PLANNO METHOD CREATOR TNAME
75
1
BPASQL8
1
0 SYSIBM SYSINDEXES
75
1
BPASQL8
2
1 SYSIBM SYSKEYS
75
1
BPASQL8
3
3
ACCESSTYPE MATCHCOLS
I
1
I
2
0
ACCESSCREATOR ACCESSNAME
SYSIBM
DSNDXX01
SYSIBM
DSNDKX01
SORTN_ORDERBY
N
N
N
SORTC_UNIQ
N
N
N
31
July 21, 2015
SORTC_JOIN
N
N
N
INDEXONLY SORTN_UNIQ
N
N
N
N
N
N
SORTC_ORDERBY
N
N
Y
SORTN_JOIN
N
N
N
SORTC_GROUPBY
N
N
N
PREFETCH
L
TSLOCKMODE
IS
IS
SORTN_GROUPBY
N
N
N
Explain EXPLAIN
> QUERYNO
– The number specified doing manual Explain – or the statement number from the package/plan which
was explained.
– For programs – this number is used to identify a SQLstatement in the source code.
– In this scenario, the QUERYNO=75 which also was specified
in the manual explain.
– Did you know you can assign a fixed QUERYNO to the
SQL statement in your program so it’s easier to
compare statements when the program is changed
???!!!
32
July 21, 2015
Explain EXPLAIN
> QBLOCKNO
– A number identifying every query block inside the SQL
statement.
– A typical example is if a UNION or SUBSELECT is present
where DB2 will have to execute several statements within
one SQL statement.
– In this example the SQL-statement is a simply JOIN, which
is why we only see QBLOCKNO=1
33
July 21, 2015
Explain EXPLAIN
> PROGNAME
– Name of the program which holds the SQL-statement being
explained
– Package name when REBIND / BIND of a package is
executed using EXPLAIN(YES)
– Using dynamic explain – this column will hold the name of
the program executing the dynamic explain
– In this scenario a dynamic explain was executed using
program BPASQL8.
34
July 21, 2015
Explain EXPLAIN
> PLANNO
– Specifies which sequence the individual components are
executed inside a QBLOCKNO.
– Is this really interesting and do we need to worry about it ?
– This will be covered in the section dealing with
FILTER FACTOR – this is one of the most important
informations to pay attention to
– In this scenario we can see SYSINDEXES is accessed first
and then SYSKEYS, and finally a third component is
performed (sorting)
35
July 21, 2015
Explain EXPLAIN
> METHOD
– Describes which form of JOIN being used.

0 : First table accessed in this step (QBLOCKNO)

1 : NESTED LOOP JOIN

2 : MERGE SCAN JOIN

3 : Sorting (of some kind)
–

(ORDER, GROUP, DISTINCT, UNION)
4 : HYBRID JOIN
– We can see SYSINDEXES is accessed first, and then a
NESTED LOOP JOIN with SYSKEYS and finally a SORT is
performed to satisfy the ORDER BY
36
July 21, 2015
Explain EXPLAIN
> CREATOR – TNAME
– Table name and creator for the table accessed
– If METHOD=3 , these columns will be spaces
– In this scenario two tables are accessed:
37
July 21, 2015

SYSIBM.SYSINDEXES

SYSIBM.SYSKEYS
Explain EXPLAIN
> ACCESSTYPE
– Describes how the table mentioned is accessed.
– This is a very important information which in many
“performance exercises” can be a “first shop stop”
(why will be explained)
– In DB2 9 you can find 17 different methods – the most
important ones will be covered here:
–
38
July 21, 2015
–
I : Index Access
–
I1: One-Fetch Index scan
–
M : Superceeded by MX, MI, MU – Multiple Index Access (can be
excellent or very bad – at least two indexes from the same table
are used)
–
N : Index Scan where IN is used
–
R : Tablespace Scan – if many pages exist for this tablespace, this
can be extremely expensive
In this scenario both table accessed have ACCESSTYPE=I, so we know
indexe(s) are used – and we don’t face a tablespace scan  
Explain EXPLAIN
> MATCHCOLS
– Illustrates how many columns being used in the index listed
under ACCESSNAME.
– Used for ACCESSTYPE : I , I1 , N and MX
– The value can be:
39
July 21, 2015

o
: The entire index is scanned – warning !!!!!

>0 : The number of columns being used from the index listed
prior to Tablespace access (unless Index Only).

If the index has 4 columns and MATCHCOLS = 1 , this can lead
to non-optimal response times – depending on column
cardinality.
–
Assume the table has 1.000.000 rows
–
The index has 4 columns
–
The first column has 2 distinct values (cardinality=0.5)
–
DB2 will “guess” half the table rows qualify  TS-scan
Explain EXPLAIN
> ACCESSNAME and ACCESSCREATOR
– If an index is used to access the table, these columns
illustrates which index(es) being used.
– Use your common sense and think about the index listed –
is it the best one based on the predicates used in the
WHERE component of the SQL-statement.
– Also think about this information in conjunction with
MATCHCOLS – if more columns exist in the index compared
to what is described in MATCHCOLS – maybe an additional
pedicate can improve performance.
– In our scenario an index is used to access both tables
specified in our JOIN statement.
40
July 21, 2015
Explain EXPLAIN
> INDEXONLY
– Describes if DB2 is satisfied by ONLY scanning the index
listed under ACCESSNAME – without looking into the data
piece (tablespace).
– The value can be Y(es) or N(o)
– Often it may pay off to have an additional column in the
index to avoid the tablespace access (indexonly), and
maybe one I/O can be spared at in every statement (the
most expensive in the DB2 world). Before making this
decision – please have a closer look at
“index complications – advantages and disadvantages”
– Our JOIN scenario is using two indexes, and both have
INDEXONLY=N , meaning what is being selected can NOT
be satisfied by the columns in the 2 indexes used.
41
July 21, 2015
Explain EXPLAIN
> SORTN_xxxxxx hvor xxxxxx :
42
July 21, 2015
–
UNIQUE
: Must the “internal” table ne sorted to remove
duplicates
–
JOIN
: Is it necessary to sort due to METHOD = 2 or 4 ?
–
ORDERBY : Must the “internal” table be sorted due to ORDER BY
–
GROUPBY : Must the “internal” table be sorted due to GROUP BY
Explain EXPLAIN
> SORTC_xxxxxx hvor xxxxxx :

43
–
UNIQUE
: Must the “composite” table be sorted to remove
duplicates ?
–
JOIN
: Sorting due to METHOD = 2 or 4 ?
–
ORDERBY : Must the “composite” table be sorted due to
ORDER BY
–
GROUPBY : Must the “composite” table be sorted due to
GROUP BY
Our scenario requires the composite result table (from the
JOIN) to be sorted due to ORDER BY in the SQL-statement.
July 21, 2015
Explain EXPLAIN
> PREFETCH
– Describes which PREFETCH method MIGHT be used.

L : List Prefetch – DB2 sorts RID’s from index(es) to avoid
reading the same data page more than once.

D : Dynamic Prefetch – DB2 will start to read blocks of data
into the buffer pool asynchronously (if it pays off)

S : Sequential Prefetch – DB2 will read all “needed” pages
into the buffer pool asynchronously to save time doing I/O
– This scenario illustrates index DSNDXX01 will be used
where CREATOR fullfill the WHERE clause. Since the table
also has to be accessed, the RID’s are being sorted so DB2
only needs to read the same page once. If this index was
the CLUSTERING index – maybe LIST PREFETCH could have
been avoided.
44
July 21, 2015
Explain EXPLAIN
> TSLOCKMODE
– Describes what kind of LOCK DB2 will use for the
tablespace being accessed. Beside the TS-locks, DB2 will
do table or row locks (covered in a separate section)
– Different LOCK types will be described later, but in general
X og IX are not considered “nice” since these can prohibit
concurrent access and ultimately lead to TIMEOUTs and
DEADLOCKs.
 In our scenario we have IS (Intent Share) for both tables
being accessed, which is expected since it’s a simple
SELECT statement.
45
July 21, 2015
What can YOU do
to save DB2 CPU
and improve
performance
----Until next time

Thank You