SQL on Fire! Part 1

Download Report

Transcript SQL on Fire! Part 1

SQL on Fire!
Part 1
Tips and Tricks around SQL
Agenda
Easy
 Part







I
SQL vs. SQL PL example
Error handling
Tables out of nowhere
Pivoting
Aggregation
Deleting duplicate rows
Alternatives to Min and Max
 Part




Difficult


II
Mass deletes
Order processing
Moving rows between tables
Recursion
Merge
Calculating nesting levels
Motivation – The OLTP Mantra
Reduce Codepath
Reduce Logical and Physical I/O
Reduce Network Traffic
Minimize Lock Contention
Avoid Deadlocks
High performance starts with the application.
SQL vs. SQL PL - Example
 DDL
CREATE TABLE emp(empID
salary
name
deptID
INTEGER NOT NULL
INTEGER,
VARCHAR(20),
INTEGER);
CREATE UNIQUE INDEX empIdxPK
ON emp(empID) INCLUDE (deptID);
ALTER TABLE emp ADD PRIMARY KEY (empID);
SQL vs. SQL PL
EmpID
Salary
Name
DeptID
1
10000
Jones
100
2
12000
Wang
200
3
11000
Zhang
100
4
20000
Smith
300
5
15000
Kim
200
6
13000
Chen
400
7
14000
Jou
300
8
11000
Yung
100
9
16000
Hall
400
10
14000
Zhu
100
11
13000
Xia
200
SQL vs. SQL PL
 Business request:
Give everyone in deptID 100 a 5% raise.
SQL vs. SQL PL – First Attempt
DECLARE empcur CURSOR FOR
SELECT * FROM emp FOR UPDATE;
OPEN empcur;
emploop: LOOP
Read whole table
FETCH empcur
INTO vempid, vsalary, vname, vdeptid;
IF SQLSTATE = ‘02000’ THEN LEAVE emploop; END IF;
IF deptid = 100 THEN
SET vsalary = vsalary * 1.05;
UPDATE emp SET salary = vsalary
WHERE CURRENT OF empcur;
END IF;
END LOOP emploop;
Only need few rows
CLOSE empcur;
SQL vs. SQL PL - Second Attempt
DECLARE
SELECT
FROM
WHERE
empcur CURSOR FOR
*
Retrieve all columns
emp
deptID = 100 FOR UPDATE;
OPEN empcur;
emploop: LOOP
FETCH empcur
INTO vempid, vsalary, vname, vdeptid;
SET vsalary = vsalary * 1.05;
IF SQLSTATE = ‘02000’ THEN LEAVE emploop; END IF;
UPDATE emp SET salary = vsalary
WHERE CURRENT OF empcur;
END LOOP emploop;
Use only one column
CLOSE empcur;
SQL vs. SQL PL - Third Attempt
DECLARE
SELECT
FROM
WHERE
empcur CURSOR FOR
salary
Use cursor
emp
deptID = 100 FOR UPDATE;
OPEN empcur;
emploop: LOOP
FETCH empcur INTO vsalary;
SET vsalary = vsalary * 1.05;
IF SQLSTATE = ‘02000’ THEN LEAVE emploop; END IF;
UPDATE emp SET salary = vsalary
WHERE CURRENT OF empcur;
END LOOP emploop;
CLOSE empcur;
Trivial logic
SQL vs. SQL PL - Final Attempt
UPDATE emp
SET salary = salary * 1.05
WHERE deptID = 100;
Tips:
WHERE deptID IN (100, 200)
WHERE deptID
IN (SELECT deptID FROM dept
WHERE deptlocation = ‘Shenzhen’)
SET salary = salary * CASE WHEN name = ‘Zhang’
THEN 1.05 ELSE 1.03
END
SQL vs. SQL PL – Aggregation
 Return all departments with more than 5 employees:
SELECT deptID FROM emp
GROUP BY deptID
HAVING COUNT(*) > 5
 Return the salary of the most highly paid employee by
department for departments with more than 5 employees:
SELECT deptID, MAX(salary) FROM emp
GROUP BY deptID
HAVING COUNT(*) > 5
Create table if it does not exist
BEGIN
DECLARE CONTINUE HANDLER FOR SQLSTATE '42710'
BEGIN /* Already exists? Ignore */ END;
CREATE TABLE T(c1 INT);
END
Tip 1:
There is no need to avoid SQL errors
as long as they are handled…
Tip 2:
.. except when statement rollback implies
undo operation from the log
SQL PL – DDL cleanup
BEGIN
DECLARE CONTINUE HANDLER FOR SQLSTATE '42704'
BEGIN /* Does not exist? Ignore */ END;
DROP TABLE T1;
DROP TABLE T2;
DROP SEQUENCE S1;
END
Tip:
Lost errors are very hard to find.
So be specific and never ignore generic SQLEXCEPTION.
SQL PL – Is this table empty?
IF EXISTS(SELECT 1 FROM emp WHERE dept = 100)
THEN .. END IF;
Instead of:
IF EXISTS(SELECT 1 FROM emp WHERE empid = 5)
THEN
UPDATE emp SET salary = salary * 1.05
WHERE empID = 5;
END IF;
Use:
UPDATE emp SET salary = salary * 1.05
WHERE empID = 5;
Tip: Don’t be afraid of NOTFOUND
(SQLSTATE ‘02000’). NOT FOUND not an error
Tables from nowhere
 Create tables without INSERT:
VALUES (1, 2),
or
VALUES (CAST(?
CAST(?
(?, ?),
(3, 4), …. (10, 11)
AS INTEGER),
AS DOUBLE)),
… (?, ?)
 Use anywhere like
INSERT INTO T
VALUES (1, 2), (3, 4), (5, 6)
VALUES for multi row INSERT
CREATE TABLE T(c1 INT, c2 VARCHAR(10));
INSERT INTO T
VALUES (?, ?),
(?, ?),
(?, ?),
…
(?, ?);
Tip:
For mass inserts prepare INSERT statements with:
1, 10, 100, 1000, 10000 rows of parameter markers each.
Execute the biggest that fits the remaining load in a loop.
PIVOT and UNPIVOT
CREATE TABLE Sales(Year
INTEGER,
Quarter INTEGER,
Results INTEGER);
Year
Quarter
Results
2004
1
20
2004
2
30
2004
3
15
2004
4
10
2005
1
18
2005
2
40
2005
3
12
2005
4
27
PIVOT and UNPIVOT
CREATE TABLE SalesAgg(year
q1
q2
q3
q4
INTEGER,
INTEGER,
INTEGER,
INTEGER,
INTEGER);
Sales
Q1
Q2
Q3
Q4
2004
20
30
15
10
2005
18
40
12
27
PIVOT
SELECT Year,
MAX(CASE WHEN
THEN
MAX(CASE WHEN
THEN
MAX(CASE WHEN
THEN
MAX(CASE WHEN
THEN
FROM Sales
GROUP BY Year
Quarter
Results
Quarter
Results
Quarter
Results
Quarter
Results
= 1
END)
= 2
END)
= 3
END)
= 4
END)
AS Q1,
AS Q2,
AS Q3,
AS Q4
Tip:
Use MAX() because it is supported for all comparable types
including strings
PIVOT
Access Plan:
----------RETURN
(
1)
|
GRPBY
(
2)
|
FETCH
(
3)
/----+---\
IXSCAN
TABLE: SALES
(
4)
|
INDEX: SALESIDX
UNPIVOT
SELECT Year, Quarter, Results
FROM SalesAgg AS S,
LATERAL(VALUES(1, S.q1),
(2, S.q2),
(3, S.q3),
(4, S.q4))
AS Q(Quarter, Results);
Tip:
Use LATERAL (or TABLE) to allow correlation of
S.q* to the left.
UNPIVOT
Access Plan:
----------RETURN
(
1)
|
NLJOIN
(
2)
/------+-----\
TBSCAN
TBSCAN
(
3)
(
4)
|
|
TABLE: SALESAGG
TABFNC: GENROW
Aggregation
 Problem:
DB2 does not support user defined aggregates
 Thoughts:
– XMLAGG() provides aggregation without loss of data
– Use mathematical rules
 Examples:
– Aggregate concatenation
– Aggregate multiplication
XML functions – a primer
 XMLAGG()
– Aggregates XML values into an XML sequence
 XMLELEMENT()
– Tags a scalar value and returns XML
e.g. XMLELEMENT(NAME ‘x’ 5) => <x>5</x>
 XMLSERIALIZE()
– Casts an XML value into a string
Aggregate concatenation
CREATE TABLE Employee(name VARCHAR(15),
dept VARCHAR(15));
Name
Miso
John
Serge
Lee
Mark
Jack
Lily
Berni
Dept
Solutions
Development
Solutions
L3
ID
L3
Quality
Solutions
Aggregate concatenation
Dept
Names
Solutions
Berni, Miso, Serge
Development
John
L3
Jack, Lee
ID
Mark
Quality
Lily
Aggregate concatenation
Strip last comma
SELECT Dept,
SUBSTR(Names, 1, LENGTH(names) -1)
FROM
Replace end tags with commas
(SELECT
Strip start tags
Dept,
REPLACE
XML to VARCHAR
(REPLACE
Aggregate in order of names
(XMLSERIALIZE
(CONTENT XMLAGG(XMLELEMENT(NAME a, name)
ORDER BY name)
AS VARCHAR(60)), '<A>', ''),
'</A>', ',') AS Names
FROM Employee
GROUP BY Dept) AS X;
Tag a name to become XML
Aggregate multiplication
CREATE TABLE probabilities( model VARCHAR(10),
event VARCHAR(10),
percent FLOAT);
Model
Boing 737
Airbus 320
Boing 737
Airbus 320
Airbus 320
Boing 737
Event
Engine 1
Engine 1
Engine 2
Engine 2
Fin
Fin
Percent
0.001
0.0009
0.001
0.0009
0.002
0.0018
Aggregate multiplication
 a = EXP(LOG(a))
X* Y = EXP(LOG(X * Y))
X * Y = EXP(LOG(X) + LOG(Y))
 PROD(Xi)i=1..n = EXP(SUM(LOG(Xi)i=1..n)
Aggregate multiplication
SELECT
model,
DEC(EXP(SUM(LOG(percent))), 8, 7) AS percent
FROM probabilities
WHERE event IN (‘Engine 1’, ‘Fin’)
GROUP BY model
Model
Percent
Boing 737
0.000018
Airbus 320
0.000018
Retrieving MAXimum row
CREATE TABLE emp(name
VARCHAR(10),
dept
VARCHAR(10),
salary INTEGER);
CREATE INDEX emp_ind ON
emp(dept, salary DESC);
 Standard using selfjoin:
SELECT name, salary FROM emp
WHERE salary = (SELECT MAX(salary) FROM emp
WHERE dept = ‘SQL Compiler’
AND dept = ‘SQL Compiler’);
Retrieving MAXimum row
SELECT name, salary FROM emp
WHERE dept = 'SQL Compiler'
ORDER BY salary DESC
FETCH FIRST ROW ONLY;
Name
Dept
Salary
Frank
Rewrite
20000
Harry
SQL Compiler
18000
Janet
SQL Compiler
19000
Gwen
SQL Compiler
22000
Jason
Optimizer
21000
Retrieving MAXimum row
Access Plan:
-----------RETURN
(
1)
|
FETCH
(
2)
/----+---\
IXSCAN
TABLE: EMP
(
3)
|
INDEX: EMP_IND
Delete duplicate rows
CREATE TABLE Inventory(Item
INTEGER,
Quantity INTEGER,
InvDate DATE);
CREATE UNIQUE INDEX InvIdx
ON Inventory(Item ASC, InvDate DESC);
Delete duplicate rows
Item
1
1
2
3
3
3
4
4
5
6
Quantity
30
25
100
4
8
6
12
0
28
1
InvDate
04 Oct 2003
01 Nov 2003
01 Nov 2003
04 Oct 2003
01 Nov 2003
18 Dec 2003
04 Oct 2003
01 Nov 2003
18 Dec 2003
18 Dec 2003
Delete duplicate rows
Item
1
2
3
4
5
6
Quantity
25
100
6
0
28
1
InvDate
01 Nov 2003
01 Nov 2003
18 Dec 2003
01 Nov 2003
18 Dec 2003
18 Dec 2003
Delete duplicate rows - classic
DELETE FROM Inventory AS D
WHERE (Item, InvDate)
NOT IN (SELECT Item, InvDate
FROM Inventory I
WHERE D.Item = I.Item
ORDER BY InvDate DESC
FETCH FIRST ROW ONLY);
Delete duplicate rows - improved
 Using OLAP
DELETE FROM
(SELECT row_number()
OVER(PARTITION BY Item
ORDER BY InvDate DESC) AS rn
FROM Inventory)
WHERE rn > 1;
Tip:
Remove ORDER BY clause if it doesn’t matter
which duplicates get eliminated.
Delete duplicate rows
Access Plan:
-----------RETURN
(
1)
|
DELETE
(
2)
/---+---\
FETCH
TABLE: INVENTORY
(
3)
/---+---\
FILTER
TABLE: INVENTORY
(
4)
|
IXSCAN
(
5)
|
INDEX: INVIDX
Conclusion
 Exploit SQL to:
Increase concurrency
Reduce I/O
Reduce code-path
Make the application more readable
 SQL provides powerful support
Tip:
Part II with even meaner examples after the break
SQL on Fire! Part 1
Serge Rielau
IBM
[email protected]