All about Joins Tony Hasler UKOUG TECH14 December 2014 L Win a copy of the book - SQL QUIZ http://tonyhasler.wordpress.com.

Download Report

Transcript All about Joins Tony Hasler UKOUG TECH14 December 2014 L Win a copy of the book - SQL QUIZ http://tonyhasler.wordpress.com.

All about Joins
Tony Hasler
UKOUG TECH14 December 2014
L
Win a copy of the book - SQL QUIZ
http://tonyhasler.wordpress.com
Agenda
What is a join?
ANSI join syntax and outer joins
Left lateral joins
Join methods
Hash join input swapping
Warning and disclaimer!
Some statements made in the forthcoming
slides are not true when hash join input
swapping is considered!
We will consider hash join input swapping later in the
presentation
What is a join?
What is a join in SQL?
A join is an operation on an ordered pair of row sources
I will refer to the first element of the pair as the driving row source
and the second element the probed row source
Row sources can be elements in the FROM clause of an SQL
query block
− Tables
− Data dictionary views (unmerged)
− Inline views (unmerged or created by CBO transformations)
− Factored subqueries
− Results from the TABLE and XMLTABLE “operators”
Or unnested subqueries ( EXISTS, NOT EXISTS, IN, NOT IN, etc.)
Or intermediate results generated from other joins
Tables to work with
The simplest possible join
SELECT * FROM T1, T2;
If there are M rows in T1 and N rows in T2 there are M x N rows in
the result set.
In this case the result set will have 5 x 5 = 25 rows
A more complex case
SELECT * FROM T1, T2, T3, T4
WHERE T1.C1 > 1
AND T1.C1=T2.C2
AND T2.C2 = T3.C3
AND T3.C3 > T4.C4;
The number of joins is always one less than the number of row
sources (in this case 3 joins for 4 row sources)
One possible join tree can be depicted as:
(((T1  T2)  T3)  T4)
Possible join trees
There are120 possible join trees for four tables
The CBO will only consider the 24 join trees where the driving row
source of the second and subsequent joins is the intermediate
result from previous joins.
− The CBO may consider (((T3  T4)  T1)  T2)
− The CBO will not consider ((T3  T4)  (T1  T2))
Using the restricted set of join trees considered by the CBO we can
consider the join order as fully specifying the join tree.
The original vision for joins in the SQL language
SQL is a declarative language. You specify what you want to do
not how to do it.
Join order is not a programmers concern
All predicates are equal under Codd!
− Logically, all predicates in the WHERE clause of a query block
can be evaluated after all the joins have completed.
The comma-separated syntax of the FROM clause and the
separation of the FROM and WHERE clauses reflects that vision.
ANSI join syntax and outer joins
ANSI join syntax
SELECT *
FROM T1
JOIN T2 ON T1.C1 = T2.C2
CROSS JOIN T3 JOIN T4 ON T3.C3 > T4.C4
WHERE T1.C1 > T4.C3
ANSI syntax includes join predicates in the FROM clause and
selection predicates in the WHERE clause.
A join order is explicitly specified
In the above SQL statement, the CBO will ignore the distinction
between predicates and ignore the specified join order!
Heresy?
Outer Joins
An outer join has a preserved row source and an optional row
source
Any rows in the preserved rows source that do not match rows in
the optional row source are included in the join results
These extra rows have no value (NULL) for columns from the
optional row source
Implications
− The operands of an outer join are semantically different
− The preserved row source must precede the optional row source
in the join order.
− There is now a distinction between join predicates and
selection predicates
SELECT *
FROM t1
LEFT JOIN t2 ON t1.c1 = t2.c2
LEFT JOIN t3 ON t1.c1 = t3.c3
WHERE t1.c1 != 3
ORDER BY t1.c1;
C1
C2
C3
1
2
2
4
4
4
5
5
5
SELECT *
FROM t1
LEFT JOIN t2 ON t1.c1 = t2.c2
LEFT JOIN t3 ON t1.c1 = t3.c3 and t1.c1 != 3
ORDER BY t1.c1;
C1
C2
C3
1
2
2
3
3
4
4
4
5
5
5
Only legal join orders are ((T1  T2)  T3) and ((T1  T3)  T2)
SELECT *
FROM t1 LEFT JOIN
(t2 LEFT JOIN t3 ON t2.c2 = t3.c3)
ON t1.c1 = t2.c2
ORDER BY t1.c1;
C1
C2
C3
1
2
2
3
3
3
4
4
4
5
5
5
SELECT c1,c2,c3
FROM t2
LEFT JOIN t3 ON t2.c2 = t3.c3
RIGHT JOIN t1 ON t1.c1 = t2.c2
ORDER BY t1.c1;
C1
C2
C3
1
2
2
3
3
3
4
4
4
5
5
5
Theoretically no legal join order! The CBO has to transform this
query.
WITH q1
AS (SELECT *
FROM t2 LEFT JOIN t3 ON t2.c2 = t3.c3)
SELECT *
FROM t1 LEFT JOIN q1 ON t1.c1 = q1.c2
ORDER BY t1.c1; -------------------------------------| Id | Operation
| Name |
-------------------------------------|
0 | SELECT STATEMENT
|
|
|
1 | SORT ORDER BY
|
|
|* 2 |
HASH JOIN OUTER
|
|
|
3 |
TABLE ACCESS FULL | T1
|
|
4 |
VIEW
|
|
|* 5 |
HASH JOIN OUTER
|
|
|
6 |
TABLE ACCESS FULL| T2
|
|
7 |
TABLE ACCESS FULL| T3
|
--------------------------------------
Outer joins with extensions to traditional syntax
SELECT *
FROM t1, t2, t3
WHERE t1.c1 = t2.c2(+) AND t3.c3 = t2.c2(+)
ORDER BY t1.c1;
SELECT *
FROM t1
LEFT JOIN t2 ON t1.c1 = t2.c2
RIGHT JOIN t3 ON t3.c3 = t2.c2;
SELECT *
FROM t1 CROSS JOIN t3 LEFT JOIN t2
ON t1.c1 = t2.c2 AND t2.c2 = t3.c3;
Lateral Joins
Lateral joins with the TABLE operator
SELECT p.*
FROM v$session s,
TABLE (DBMS_XPLAN.display_cursor
(s.sql_id, s.sql_child_number)) p
WHERE sid in (123,456);
LEGAL ENTITY, department or author (Click Insert | Header & Footer)
Month Day, Year
Lateral joins in 12c
SELECT *
FROM t1
,LATERAL (
SELECT t2.c2, MEDIAN (t2.c2) OVER () med
FROM t2
WHERE t2.c2 BETWEEN t1.c1 - 3 AND t1.c1) v
WHERE t1.c1 = v.c2
ORDER BY t1.c1;
In ANSI syntax use the keywords CROSS APPLY (for inner lateral
joins) and OUTER APPLY (for outer lateral joins)
Other types of join
Full outer joins (two preserved row sources)
Partitioned outer joins (potentially multiple rows in the result set
from one preserved row).
Anti joins (standard and null aware in 11g onwards)
Semi joins (standard and null accepting in 12c onwards)
NOTE: A Partial Join is a 12c optimizer transformation not a
special type of join
Join methods
There are 3½ join methods
Nested loops join
Hash join
Merge join
Merge join Cartesian (a variation on a merge join)
Variations on a theme:
Full and partial partition-wise joins
Buffered joins for parallel queries
Bloom filtering can be applied, primarily in parallel queries
Nested loops can be pre-fetched or batched
Nested loops join
For every row in the driving row source, find corresponding rows in
the probed row source.
SELECT /*+ leading(t1) use_nl(t2) */ *
FROM t1, t2
----------------------------------WHERE t1.c1 = t2.c2;
| Id | Operation
| Name |
----------------------------------|
0 | SELECT STATEMENT
|
|
|
1 | NESTED LOOPS
|
|
|
2 |
TABLE ACCESS FULL| T1
|
|* 3 |
TABLE ACCESS FULL| T2
|
-----------------------------------
Nested loops join with a hash cluster (1)
CREATE CLUSTER cluster_hash
(
ck
)
HASHKEYS 3
HASH IS ck;
CREATE TABLE tch1
(
ck
INTEGER
,c1
INTEGER
)
CLUSTER cluster_hash ( ck );
INTEGER
Nested loops join with a hash cluster (2)
SELECT /*+ leading(t1) use_nl(tch1) */ *
FROM t1, tch1
WHERE t1.c1 = tch1.ck;
----------------------------------| Id | Operation
| Name |
----------------------------------|
0 | SELECT STATEMENT
|
|
|
1 | NESTED LOOPS
|
|
|
2 |
TABLE ACCESS FULL| T1
|
|* 3 |
TABLE ACCESS HASH| TCH1 |
-----------------------------------
Nested loops join with an index
CREATE INDEX t2_i1
ON t2 (c2);
SELECT /*+ leading(t1) use_nl_with_index(t2 (c2)) */ *
FROM t1, t2
WHERE t1.c1 = t2.c2; -----------------------------------| Id | Operation
| Name |
-----------------------------------|
0 | SELECT STATEMENT
|
|
|
1 | NESTED LOOPS
|
|
|
2 |
TABLE ACCESS FULL| T1
|
|* 3 |
INDEX RANGE SCAN | T2_I1 |
------------------------------------
Pros and cons of nested loops joins
Advantages
− Assuming a suitable index is available, nested loops joins scale
linearly
− Only blocks containing rows to be selected are accessed
− The only join method that support lateral joins
− Supports theta joins (e.g. t1.c1 > t2.c2)
− No in-memory workarea
Disadvantages
− Usually works poorly without an index and indexes are
expensive!
− Indexes access a table with single block reads
− Index range scans may access the same block multiple times
Interview sound bite: Used with two small tables
Hash join
Create an in-memory hash cluster from the contents of the driving
row source. Use an upside-down nested loops join from the probe
row source into the in-memory hash cluster.
SELECT /*+ leading(t1) use_hash(t2) no_swap_join_inputs(t2) */ *
FROM t1, t2
-----------------------------------WHERE t1.c1 = t2.c2;
| Id | Operation
| Name |
-----------------------------------|
0 | SELECT STATEMENT
|
|
|* 1 | HASH JOIN
|
|
|
2 |
TABLE ACCESS FULL| T1
|
|
3 |
INDEX FULL SCAN | T2_I1 |
------------------------------------
Pros and cons of hash joins
Advantages
− No index on the join column is needed!
− When full table scans are used blocks are only visited once
(excluding visits to the workarea)
− When full table scans are used multi-block reads are possible
− More tolerant of cardinality errors that nested loops joins
− Supports hash join input swapping
Disadvantages
− Requires a workarea that limits scalability
− Only equijoins are supported (e.g. t1.c1 = t2.c2)
− May visit blocks in the probed row source that contain no rows in
the result set.
Interview sound bite: used when joining a small table with a large
table
Hash join input swapping
Hash join input swapping
Exchange the two operands of the hash join. According to the hints
the join order is not affected by the swap!
SELECT /*+ leading(t2 t1) use_hash(t1) swap_join_inputs(t1) */ *
FROM t1, t2
WHERE t1.c1 = t2.c2;
-----------------------------------| Id | Operation
| Name |
-----------------------------------|
0 | SELECT STATEMENT
|
|
|* 1 | HASH JOIN
|
|
|
2 |
TABLE ACCESS FULL| T1
|
|
3 |
INDEX FULL SCAN | T2_I1 |
------------------------------------
The three benefits of hash join input swapping
The intermediate result of a join operation can be the probed row
source of a hash join.
The optional row source in an outer join can be the driving row
source of a hash join.
The subquery in a semi-join or an anti-join can be the driving row
source of a hash join.
However, hash join input swapping does not allow the intermediate
result set from one join to be the optional row source in a later
outer join.
Furthermore, hash join input swapping doesn’t facilitate bushy
joins like the one shown earlier: ((T3  T4)  (T1  T2))
SELECT /*+ leading(t3 t4 t1 t2)
use_hash(t4) use_hash(t1) use_hash(t2)
swap_join_inputs(t4) swap_join_inputs(t1)
swap_join_inputs(t2) */ *
FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c2
JOIN t3 ON t1.c1 = t3.c3 JOIN t4 ON t3.c3 = t4.c4;
--------------------------------------| Id | Operation
| Name |
--------------------------------------|
0 | SELECT STATEMENT
|
|
|* 1 | HASH JOIN RIGHT OUTER|
|
|
2 |
INDEX FULL SCAN
| T2_I1 |
((T2 
|* 3 |
HASH JOIN
|
|
|
4 |
TABLE ACCESS FULL | T1
| T3)))
|* 5 |
HASH JOIN
|
|
|
6 |
TABLE ACCESS FULL | T4
|
|
7 |
TABLE ACCESS FULL | T3
|
---------------------------------------
(T1  (T4 
Questions?
http://tonyhasler.wordpress.com