No Slide Title

Download Report

Transcript No Slide Title

Chapter 6
How to code subqueries
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 1
Objectives
Applied
 Code SELECT statements that require subqueries.
 Code SELECT statements that use common table expressions
(CTEs) to define the subqueries.
Knowledge
 Describe the way subqueries can be used in the WHERE,
HAVING, FROM and SELECT clauses of a SELECT statement.
 Describe the difference between a correlated subquery and a
noncorrelated subquery.
 Describe the use of common table expressions (CTEs).
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 2
Objectives (continued)
 Explain the difference between a correlated subquery and a
noncorrelated subquery.
 Given a SELECT statement that uses a subquery, explain how the
result set of the subquery will affect the final result set.
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 3
How to use subqueries
 A subquery is a SELECT statement that’s coded within another
SQL statement.
 A subquery can return a single value, a result set that contains a
single column, or a result set that contains one or more columns.
 A subquery that returns a single value can be coded, or introduced,
anywhere an expression is allowed.
 A subquery that returns a single column can be introduced in place
of a list of values, such as the values for an IN phrase.
 A subquery that returns one or more columns can be introduced in
place of a table in the FROM clause.
 A subquery that’s used in a WHERE or HAVING clause is called
a subquery search condition or a subquery predicate. This is the
most common use for a subquery.
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 4
How to use subqueries (continued)
 The syntax for a subquery is the same as for a standard SELECT
statement. However, a subquery doesn’t typically include the
GROUP BY or HAVING clause, and it can’t include an
ORDER BY clause unless the TOP phrase is used.
 Subqueries can be nested within other subqueries. However,
subqueries that are nested more than two or three levels deep
can be difficult to read and can result in poor performance.
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 5
Four ways to introduce a subquery in a
SELECT statement
1.
2.
3.
4.
In a WHERE clause as a search condition
In a HAVING clause as a search condition
In the FROM clause as a table specification
In the SELECT clause as a column specification
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 6
A SELECT statement that uses a subquery in the
WHERE clause
SELECT InvoiceNumber, InvoiceDate, InvoiceTotal
FROM Invoices
WHERE InvoiceTotal >
(SELECT AVG(InvoiceTotal)
FROM Invoices)
ORDER BY InvoiceTotal
The value returned by the subquery
1879.7413
The result set
(21 rows)
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 7
How subqueries compare to joins
 Like a join, a subquery can be used to code queries that work with
two or more tables.
 Most subqueries can be restated as joins and most joins can be
restated as subqueries.
Advantages of joins
 The result set of a join can include columns from both tables. The
result set of a query with a subquery can only include columns
from the table named in the outer query, not in the subquery.
 A join tends to be more intuitive when it uses an existing
relationship between the two tables, such as a primary key to
foreign key relationship.
 A query with a join typically performs faster than the same query
with a subquery, especially if the query uses only inner joins.
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 8
Advantages of subqueries
 You can use a subquery to pass an aggregate value to the outer
query.
 A subquery tends to be more intuitive when it uses an ad hoc
relationship between the two tables.
 Long, complex queries can sometimes be easier to code using
subqueries.
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 9
A query that uses an inner join
SELECT InvoiceNumber, InvoiceDate, InvoiceTotal
FROM Invoices JOIN Vendors
ON Invoices.VendorID = Vendors.VendorID
WHERE VendorState = 'CA'
ORDER BY InvoiceDate
The same query restated with a subquery
SELECT InvoiceNumber, InvoiceDate, InvoiceTotal
FROM Invoices
WHERE VendorID IN
(SELECT VendorID
FROM Vendors
WHERE VendorState = 'CA')
ORDER BY InvoiceDate
The result set returned by both queries
(40 rows)
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 10
The syntax of a WHERE clause that uses an IN
phrase with a subquery
WHERE test_expression [NOT] IN (subquery)
How to use subqueries with the IN operator
 You can introduce a subquery with the IN operator to provide
the list of values that are tested against the test expression.
 When you use the IN operator, the subquery must return a
single column of values.
 A query that uses the NOT IN operator with a subquery can
typically be restated using an outer join.
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 11
A query that returns vendors without invoices
SELECT VendorID, VendorName, VendorState
FROM Vendors
WHERE VendorID NOT IN
(SELECT DISTINCT VendorID
FROM Invoices)
The result of the subquery
(34 rows)
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 12
The result set of vendors without invoices
(88 rows)
The query restated without a subquery
SELECT Vendors.VendorID, VendorName, VendorState
FROM Vendors LEFT JOIN Invoices
ON Vendors.VendorID = Invoices.VendorID
WHERE Invoices.VendorID IS NULL
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 13
The syntax of a WHERE clause that compares an
expression with the value returned by a subquery
WHERE expression comparison_operator [SOME|ANY|ALL]
(subquery)
How to compare the result of a subquery with an
expression
 You can use a comparison operator in a search condition to
compare an expression with the results of a subquery.
 If you code a search condition without the SOME, ANY, and ALL
keywords, the subquery must return a single value.
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 14
A query that returns invoices with a balance due
less than the average
SELECT InvoiceNumber, InvoiceDate, InvoiceTotal,
InvoiceTotal - PaymentTotal - CreditTotal
AS BalanceDue
FROM Invoices
WHERE InvoiceTotal - PaymentTotal - CreditTotal > 0
AND InvoiceTotal - PaymentTotal - CreditTotal <
(SELECT AVG(InvoiceTotal - PaymentTotal –
CreditTotal)
FROM Invoices
WHERE InvoiceTotal - PaymentTotal - CreditTotal > 0)
ORDER BY InvoiceTotal DESC
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 15
The result of the subquery (the average balance
due)
2910.9472
The result set
(9 rows)
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 16
How the ALL keyword works
Condition
Equivalent
expression
x > ALL (1, 2)
x > 2
x < ALL (1, 2)
x < 1
x = ALL (1, 2)
(x = 1) AND
(x = 2)
x <> ALL (1, 2)
(x <> 1) AND
(x <> 2)
Murach’s SQL Server 2008, C6
Description
x must be greater than the
maximum value returned by
the subquery.
x must be less than the
minimum value returned by
the subquery.
This condition evaluates to
True only if the subquery
returns a single value or if all
the values returned by the
subquery are the same.
This condition is equivalent to:
x NOT IN (1, 2)
© 2008, Mike Murach & Associates, Inc.
Slide 17
How the ALL keyword works (continued)
 The ALL keyword tests that a comparison condition is true for all
of the values returned by a subquery.
 If no rows are returned by the subquery, a comparison that uses
ALL is always true.
 If all of the rows returned by the subquery contain a null value, a
comparison that uses ALL is always false.
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 18
A query that returns invoices that are larger than
the largest invoice for vendor 34
SELECT VendorName, InvoiceNumber, InvoiceTotal
FROM Invoices JOIN Vendors ON Invoices.VendorID =
Vendors.VendorID
WHERE InvoiceTotal > ALL
(SELECT InvoiceTotal
FROM Invoices
WHERE VendorID = 34)
ORDER BY VendorName
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 19
The result of the subquery (the invoice totals for
vendor 34)
The result set
(25 rows)
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 20
How the ANY and SOME keywords work
Condition
Equivalent
expression
x > ANY (1, 2)
x > 1
x < ANY (1, 2)
x < 2
x = ANY (1, 2)
(x = 1) OR
(x = 2)
x <> ANY (1, 2)
Murach’s SQL Server 2008, C6
(x <> 1) OR
(x <> 2)
Description
x must be greater than the
minimum value returned by the
subquery.
x must be less than the maximum
value returned by the subquery.
This condition is equivalent to:
x IN (1, 2)
This condition will evaluate to
True for any non-empty result set
containing at least one non-null
value that isn’t equal to x.
© 2008, Mike Murach & Associates, Inc.
Slide 21
How the ANY and SOME keywords work
(continued)
 The ANY or SOME keyword tests that a condition is true for at
least one of the values returned by a subquery. SOME is the ANSIstandard keyword, but ANY is more commonly used.
 If no rows are returned by the subquery or all of the rows returned
by the subquery contain a null value, a comparison that uses ANY
or SOME is always false.
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 22
A query that returns invoices smaller than the
largest invoice for vendor 115
SELECT VendorName, InvoiceNumber, InvoiceTotal
FROM Vendors JOIN Invoices ON Vendors.VendorID =
Invoices.InvoiceID
WHERE InvoiceTotal < ANY
(SELECT InvoiceTotal
FROM Invoices
WHERE VendorID = 115)
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 23
The result of the subquery (the invoice totals for
vendor 115)
The result set
(17 rows)
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 24
How to code correlated subqueries
 A correlated subquery is a subquery that is executed once for each
row processed by the outer query. A noncorrelated subquery is
executed only once.
 A correlated subquery refers to the value of a column in the outer
query. Because that value varies depending on the row that’s being
processed, each execution of the subquery returns a different result.
 To refer to a value in the outer query, a correlated subquery uses a
qualified column name that includes the table name from the outer
query.
 If the subquery uses the same table as the outer query, an alias, or
correlation name, must be assigned to one of the tables to remove
ambiguity.
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 25
A query that uses a correlated subquery to
calculate the average invoice total for each vendor
SELECT VendorID, InvoiceNumber, InvoiceTotal
FROM Invoices AS Inv_Main
WHERE InvoiceTotal >
(SELECT AVG(InvoiceTotal)
FROM Invoices AS Inv_Sub
WHERE Inv_Sub.VendorID = Inv_Main.VendorID)
ORDER BY VendorID, InvoiceTotal
The value returned by the subquery for vendor 95
28.5016
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 26
The result set for the query with the correlated
subquery
(36 rows)
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 27
The syntax of a subquery that uses the EXISTS
operator
WHERE [NOT] EXISTS (subquery)
How to use the EXISTS operator
 You can use the EXISTS operator to test that one or more rows
are returned by the subquery.
 You can use NOT along with the EXISTS operator to test that no
rows are returned by the subquery.
 A subquery with the EXISTS operator doesn’t return any rows.
It returns an indication of whether any rows meet the condition.
 Because no rows are returned by the subquery, it doesn’t matter
what columns you specify in the SELECT clause.
 The EXISTS operator is used most often with correlated
subqueries.
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 28
A query that returns vendors without invoices
SELECT VendorID, VendorName, VendorState
FROM Vendors
WHERE NOT EXISTS
(SELECT *
FROM Invoices
WHERE Invoices.VendorID = Vendors.VendorID)
The result set
(88 rows)
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 29
How to code subqueries in the FROM clause
 A subquery that’s coded in the FROM clause returns a result
set called a derived table.
 When you create a derived table, you must assign an alias to
it. Then, you can use the derived table within the outer query
just as you would any other table.
 When you code a subquery in the FROM clause, you must
assign names to any calculated values in the result set.
 Derived tables are most useful when you need to further
summarize the results of a summary query.
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 30
A query that uses a derived table to retrieve the
top 5 vendors by average invoice total
SELECT Invoices.VendorID, MAX(InvoiceDate) AS LatestInv,
AVG(InvoiceTotal) AS AvgInvoice
FROM Invoices JOIN
(SELECT TOP 5 VendorID, AVG(InvoiceTotal)
AS AvgInvoice
FROM Invoices
GROUP BY VendorID
ORDER BY AvgInvoice DESC) AS TopVendor
ON Invoices.VendorID = TopVendor.VendorID
GROUP BY Invoices.VendorID
ORDER BY LatestInv DESC
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 31
The derived table generated by the subquery that
retrieves the top 5 vendors
The result set
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 32
How to code subqueries in the SELECT clause
 When you code a subquery for a column specification in the
SELECT clause, the subquery must return a single value.
 A subquery that’s coded within a SELECT clause is usually a
correlated subquery.
 Subqueries are seldom coded in the SELECT clause. Joins are
used instead because they’re generally faster and more
readable.
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 33
A query that uses a correlated subquery in its SELECT
to retrieve the most recent invoice for each vendor
SELECT DISTINCT VendorName,
(SELECT MAX(InvoiceDate) FROM Invoices
WHERE Invoices.VendorID = Vendors.VendorID)
AS LatestInv
FROM Vendors
ORDER BY LatestInv DESC
The result set
(122 rows)
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 34
The same query using a join instead of a
subquery in the SELECT clause
SELECT VendorName, MAX(InvoiceDate) AS LatestInv
FROM Vendors JOIN Invoices
ON Vendors.VendorID = Invoices.VendorID
GROUP BY VendorName
ORDER BY LatestInv DESC
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 35
A procedure for building complex queries
1. State the problem to be solved by the query in English.
2. Use pseudocode to outline the query. The pseudocode should
identify the subqueries used by the query and the data they return.
It should also include aliases used for any derived tables.
3. If necessary, use pseudocode to outline each subquery.
4. Code the subqueries and test them to be sure that they return the
correct data.
5. Code and test the final query.
The problem to be solved
 Which vendor in each state has the largest invoice total?
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 36
Pseudocode for the query
SELECT Summary1.VendorState, Summary1.VendorName,
TopInState.SumOfInvoices
FROM (Derived table returning VendorState, VendorName,
SumOfInvoices) AS Summary1
JOIN (Derived table returning VendorState,
MAX(SumOfInvoices)) AS TopInState
ON Summary1.VendorState = TopInState.VendorState AND
Summary1.SumOfInvoices = TopInState.SumOfInvoices
ORDER BY Summary1.VendorState
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 37
Pseudocode for the TopInState subquery
SELECT Summary2.VendorState, MAX(Summary2.SumOfInvoices)
FROM (Derived table returning VendorState, VendorName,
SumOfInvoices) AS Summary2
GROUP BY Summary2.VendorState
The code for the Summary1 and Summary2
subqueries
SELECT V_Sub.VendorState, V_Sub.VendorName,
SUM(I_Sub.InvoiceTotal) AS SumOfInvoices
FROM Invoices AS I_Sub JOIN Vendors AS V_Sub
ON I_Sub.VendorID = V_Sub.VendorID
GROUP BY V_Sub.VendorState, V_Sub.VendorName
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 38
The result of the Summary1 and Summary2
subqueries
(34 rows)
The result of the TopInState subquery
(10 rows)
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 39
All of the code for the query with three subqueries
SELECT Summary1.VendorState, Summary1.VendorName,
TopInState.SumOfInvoices
FROM
(SELECT V_Sub.VendorState, V_Sub.VendorName,
SUM(I_Sub.InvoiceTotal) AS SumOfInvoices
FROM Invoices AS I_Sub JOIN Vendors AS V_Sub
ON I_Sub.VendorID = V_Sub.VendorID
GROUP BY V_Sub.VendorState, V_Sub.VendorName) AS Summary1
JOIN
(SELECT Summary2.VendorState,
MAX(Summary2.SumOfInvoices) AS SumOfInvoices
FROM
(SELECT V_Sub.VendorState, V_Sub.VendorName,
SUM(I_Sub.InvoiceTotal) AS SumOfInvoices
FROM Invoices AS I_Sub JOIN Vendors AS V_Sub
ON I_Sub.VendorID = V_Sub.VendorID
GROUP BY V_Sub.VendorState, V_Sub.VendorName)
AS Summary2
GROUP BY Summary2.VendorState) AS TopInState
ON Summary1.VendorState = TopInState.VendorState AND
Summary1.SumOfInvoices = TopInState.SumOfInvoices
ORDER BY Summary1.VendorState
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 40
The result set of the query with three subqueries
(10 rows)
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 41
The syntax of a CTE
WITH cte_name1 AS (query_definition1)
[, cte_name2 AS (query_definition2)]
[...]
sql_statement
How to code a CTE
 A common table expression (CTE) is an expression that creates
one or more temporary tables that can be used by the following
query.
 To use a CTE with a query, you code the WITH keyword
followed by the definition of the CTE. Then, immediately after the
CTE, you code the statement that uses it.
 Separate multiple CTEs with commas. Each CTE can refer to itself
and any previously defined CTEs in the same WITH clause.
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 42
Two CTEs and a query that uses them
WITH Summary AS
(
SELECT VendorState, VendorName,
SUM(InvoiceTotal) AS SumOfInvoices
FROM Invoices
JOIN Vendors ON Invoices.VendorID = Vendors.VendorID
GROUP BY VendorState, VendorName
),
TopInState AS
(
SELECT VendorState, MAX(SumOfInvoices) AS SumOfInvoices
FROM Summary
GROUP BY VendorState
)
SELECT Summary.VendorState, Summary.VendorName,
TopInState.SumOfInvoices
FROM Summary JOIN TopInState
ON Summary.VendorState = TopInState.VendorState AND
Summary.SumOfInvoices = TopInState.SumOfInvoices
ORDER BY Summary.VendorState
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 43
The result set
(10 rows)
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 44
How to code a recursive CTE
 A recursive query is a query that is able to loop through a result
set and perform processing to return a final result set. A recursive
CTE can be used to create a recursive query.
 A recursive CTE must contain at least two query definitions, an
anchor member and a recursive member, and these members must
be connected by the UNION ALL operator.
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 45
The Employees table
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 46
A recursive CTE that returns hierarchical data
WITH EmployeesCTE AS
(
-- Anchor member
SELECT EmployeeID,
FirstName + ' ' + LastName As EmployeeName,
1 As Rank
FROM Employees
WHERE ManagerID IS NULL
UNION ALL
-- Recursive member
SELECT Employees.EmployeeID,
FirstName + ' ' + LastName,
Rank + 1
FROM Employees
JOIN EmployeesCTE
ON Employees.ManagerID =
EmployeesCTE.EmployeeID
)
SELECT *
FROM EmployeesCTE
ORDER BY Rank, EmployeeID
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 47
The final result set
Murach’s SQL Server 2008, C6
© 2008, Mike Murach & Associates, Inc.
Slide 48