Transcript Slide 1

SQL Unit 9
Correlated Subqueries
Kirk Scott
1
2
3
•
•
•
•
9.1
9.2
9.3
9.4
Correlated Subqueries
EXISTS
NOT EXISTS
Double NOT EXISTS
4
9.1 Correlated Subqueries
5
• 1. Recall how subqueries with IN work.
• There is an inner query and an outer query.
• The inner query is run first, producing a set of
results.
• The outer query is run second, and the results
it produces depend a row-by-row, yes or no
test, of whether a field value is in the results
of the inner query.
6
• A row would be included in the outer query
results only if the value of one of its fields
matched a value in the results of the inner
query.
• The connection between the two queries
depends on the matching of the field in the
WHERE clause of the outer query with the
field in the SELECT clause of the inner query.
7
• Here is an example that finds the names of
salespeople who sold cars:
•
• SELECT name
• FROM Salesperson
• WHERE spno IN
• (SELECT spno
• FROM Carsale)
8
• 2. The following example illustrates the syntax
of a correlated subquery.
• There is a join in the inner query that involves
a field from the table of the inner query and a
field from the table of the outer query.
• This is what is meant by correlation:
9
•
•
•
•
•
•
SELECT name
FROM Salesperson
WHERE spno IN
(SELECT spno
FROM Carsale
WHERE Salesperson.spno = Carsale.spno)
10
• You would never actually write a query like
this intentionally, because the correlation in
this case is redundant.
• This query gives exactly the same results as
the previous example.
• However, the query is syntactically correct and
it illustrates some of the important aspects of
correlation.
11
• Consider the join.
• Recall that for inline views, the outer query
may make use of fields defined in the inner
query.
• On the other hand, for queries with IN, the
outer query can’t make use of fields defined in
the inner query.
12
• In a correlated subquery, the fields of the
outer table are available to the inner query.
• In this example, the inner query can make use
of the field Salesperson.spno in the join.
• The only thing to keep in mind is that if you
are joining on fields that have the same name,
then those names have to be qualified with
the names of their tables in order to tell them
apart.
13
• There is another thing to notice about correlated
subqueries.
• The term correlated means that the inner and
outer queries are not independent of each other.
• Unlike other nested queries, the inner query does
not run to completion first, followed by the outer
query.
• The inner query cannot run to completion
independently of the outer query because the
inner query contains a join which involves the
outer table.
14
• 3. The next example illustrates a correlated
subquery which does not have the redundancy of
the previous example.
• This is a type of subquery with an operator which
also includes correlation.
• In words, the query will find the names of
salespeople who sold at least two cars.
• This query works by comparing the results of a
correlated subquery using COUNT with the
constant value 2:
15
•
•
•
•
•
•
SELECT name
FROM Salesperson
WHERE
(SELECT COUNT(*)
FROM Carsale
WHERE Salesperson.spno = Carsale.spno) >=
2
16
• In order to understand what the previous
query accomplishes, it might be useful to look
at a query that does approximately the same
thing using syntax learned earlier.
• The following query finds the names of
salespeople who sold more than two cars, and
also finds the counts of the numbers of cars
they sold by doing a join and using GROUP BY
with HAVING:
17
•
•
•
•
SELECT name, COUNT(*)
FROM Salesperson, Carsale
WHERE Salesperson.spno = Carsale.spno
GROUP BY spno HAVING COUNT(*) >= 2
18
• Here is an alternative that does approximately the
same thing using an inline view.
• It only makes use of the Carsale table, so it gives spno
instead of name:
•
• SELECT spno, salescount
• FROM
• (SELECT spno, COUNT(*) AS salescount
• FROM Carsale
• GROUP BY spno)
• WHERE salescount >= 2
19
• The inline view plays the role of the table in
the FROM clause of the outer query.
• The fields of this inline view are available to
the outer query.
• Since the view is unnamed, a new field, like
the COUNT, has to be given an alias if it is to
be referred to elsewhere in the query.
20
• The examples get a little messy.
• It’s clear that there may be more than one way to
write a query that accomplishes something more
complicated than “SELECT * FROM Table”.
• Keep in mind that the point of this section is that
one of the alternative ways involves a correlated
subquery.
• The other ways which we’ve seen before are now
just sideshows.
21
9.2 EXISTS
22
• 1. The most common use of correlated
subqueries is with the keyword EXISTS.
• This is the general form of an EXISTS query:
•
• SELECT blah
• FROM blah
• WHERE EXISTS
• (A correlated subquery goes here.
• A join is needed to connect the inner and outer
queries.)
23
• It may be helpful to compare an EXISTS query
with an IN query.
• The keyword EXISTS immediately follows
WHERE in the query.
• No field in the outer query table is specified.
• The link between the inner and outer queries
arises from the join condition in the inner
query alone.
24
• 2. Suppose you would like to find the names of
salespeople where there exist sales records for those
salespeople in the Carsale table.
• This would be the solution using the keyword EXISTS:
•
• SELECT name
• FROM Salesperson
• WHERE EXISTS
• (SELECT *
• FROM Carsale
• WHERE Salesperson.spno = Carsale.spno)
25
• The join in the inner query is the only
connection between the inner and the outer
query.
• It is essential.
• The WHERE clause of the outer query doesn't
specify a field from the outer table.
• It goes directly to EXISTS.
26
• The inner query does a SELECT *.
• There is no need to match fields from the
inner and outer table in the SELECT, so you
can select anything you want to. SELECT * is
used because you only have to type the one
symbol, *.
• It suggests whole records from the Carsale
table, but keep in mind that none of the fields
are actually needed in the query.
27
• 3. Not surprisingly, what you can do with an EXISTS
query you can also do with an IN query or a join query.
• Here is the IN query that corresponds to the foregoing
EXISTS query:
•
• SELECT name
• FROM Salesperson
• WHERE spno IN
• (SELECT spno
• FROM Carsale)
28
•
•
•
•
•
Here is the corresponding join query:
SELECT DISTINCT name
FROM Salesperson, Carsale
WHERE Salesperson.spno = Carsale.spno
29
• None of the alternatives is either better or
worse than the others.
• It's personal choice whether to use EXISTS, IN,
or a join.
• Remember that with the subqueries it's not
possible to select fields from the inner table to
appear as results in the outer query.
• If you want to do that, then you have to use a
join.
30
• 4. Here is another straightforward example of
an EXISTS query.
• Like the earlier example, it could also be done
with an IN or a join query.
• Those alternatives are not shown.
• This example is given without further
explanation.
• You should be able to figure out what it does:
31
•
•
•
•
•
•
SELECT vin, make, model, year
FROM Car
WHERE EXISTS
(SELECT *
FROM Carsale
WHERE Car.vin = Carsale.vin)
32
• 5. The following query does not contain a
syntactical error, but it is wrong.
• It is an EXISTS query without correlation:
•
• SELECT name
• FROM Salesperson
• WHERE EXISTS
• (SELECT *
• FROM Carsale)
33
• Recall that IN was a set operator.
• This example highlights the fact that EXISTS is
a kind of logical operator.
• Either something exists or it doesn't, either
"EXISTS" is true or it's false.
• Without the correlation, either Carsale
contains records or it doesn't.
34
• If it does, then the inner query is true, and the
outer query will simply select the names of all
salespeople.
• If, by chance, the Carsale table contained no
records, then the inner query would return
false, and the outer query would return
nothing, even though there may be
salespeople in the Salesperson table.
35
• Remember that a join query without a joining
condition gives drastically wrong results
• —namely a Cartesian product.
• An EXISTS query without a joining condition,
that is, without correlation, does not give such
drastically wrong results.
• But if written without a join it is almost
certainly not the query that was intended.
36
• 6. Just as it was possible to join a table with
itself, it's also possible to write a correlated
subquery with EXISTS where the table of the
inner query is the same as the table of the
outer query.
• This example will be based on the Salesperson
table.
• Here are its contents again, as a reminder:
37
spno
111
222
333
444
name
Fred Flintstone
Wile E. Coyote
Bugs Bunny
Rocky the Squirrel
Salesperson
addr
city
Anchorage
456 Karluk
Anchorage
789 Otis
Anchorage
345 Tudor
Anchorage
555
Yosemite Sam
678 Muldoon Anchorage
state
AK
AK
AK
AK
phone
723-6666
724-7777
725-8888
727-3333
bossno commrate
333
0.03
333
0.05
333
0.05
AK
525-2222 333
0.03
38
• Suppose you'd like to find all of the
salespeople who are bosses.
• You can translate this verbally into an EXISTS
query as follows:
• Find all salespeople where there exists some
other salesperson whose bossno matches the
first salesperson's spno.
39
• If you're going to open the same table more
than once, then you'll need to give it an alias
each time.
• In this case the aliases will correspond to
opening the table as a salesperson who is a
boss and opening the table as a salesperson
who has a boss.
40
• In the query below, the alias A corresponds to
opening the table as a salesperson who is a
boss.
• This is who you're trying to find information
about, and it appears in the outer query.
• The alias B corresponds to opening the table
as an employee.
• This appears in the inner query.
41
• The joining condition determines which role
the A and B play in the query.
• Matching on A.spno = B.bossno means that A
is playing the role of boss.
• Where A and B appear in the query depends
on whether you are trying to pull information
out of A or information out of B.
42
•
•
•
•
•
•
SELECT *
FROM Salesperson AS A
WHERE EXISTS
(SELECT *
FROM Salesperson AS B
WHERE A.spno = B.bossno)
43
• Looking back at the table contents, it's easy to
see that the only salesperson who is a boss is
Bugs Bunny.
• Looking at the logic of the query, spno 333, Bugs
Bunny's spno, is the only spno where there are
other salesperson records in the table that have
this value as their bossno.
• As pointed out with an earlier example, in
general, it would also be possible to do this query
using IN or a join.
44
• 7. The last example of a plain EXISTS query is
slightly less realistic than some of the others,
but it does give a chance to use a non-equijoin
as the correlating condition.
• Verbally, this query could be described as
finding those cars that are so cheap that their
stickerprices are lower than the dealercosts of
some other cars in the Car table.
45
• This is another example where one table,
namely the Car table, is opened twice and has
to be aliased.
• One opening of the table, in the outer query,
corresponds to the very cheap cars.
• The opening in the inner query corresponds to
the ones that are not so cheap:
46
•
•
•
•
•
•
SELECT *
FROM Car AS A
WHERE EXISTS
(SELECT *
FROM Car AS B
WHERE A.stickerprice < B.dealercost)
47
9.3 NOT EXISTS
48
• 1. NOT EXISTS is the negation of EXISTS and
its meaning is straightforward.
• If you think back to NOT IN, if IN produced a
certain set of results, then NOT IN would
produce the complement of that set.
• Similarly, if EXISTS would hold true for a
certain set of tuples, then NOT EXISTS would
hold false for that set.
49
• Put another way, NOT EXISTS would specify
those tuples for which the condition did not
hold true.
• If you wanted to find the names of
salespeople who didn't sell a car, if there were
any that didn't sell a car, you could do it this
way:
50
•
•
•
•
•
•
SELECT name
FROM Salesperson
WHERE NOT EXISTS
(SELECT *
FROM Carsale
WHERE Salesperson.spno = Carsale.spno)
51
• This produces the same results as this query,
which you've seen before:
•
• SELECT name
• FROM Salesperson
• WHERE spno NOT IN
• (SELECT spno
• FROM Carsale)
52
• This means that once again, the user has a
choice of what kind of query to use in order to
find the desired results.
• Remember that the kinds of queries with
negation shown above cannot be
accomplished with an inequality join.
53
• The join query shown below is not equivalent and
it is unlikely that a situation would arise where
you would need the results generated by this
query.
• They are similar to a Cartesian product:
• SELECT name
• FROM Salesperson, Carsale
• WHERE Salesperson.spno <> Carsale.spno
54
• 2. Here is another straightforward example of
a NOT EXISTS query.
• An equivalent NOT IN query could be written,
but an inequality join would not be
equivalent.
• Those alternatives are not shown for this
example.
• You should be able to figure out what this
query does:
55
•
•
•
•
•
•
SELECT vin, make, model, year
FROM Car
WHERE NOT EXISTS
(SELECT *
FROM Carsale
WHERE Car.vin = Carsale.vin)
56
9.4 Double NOT EXISTS
57
• In the previous unit, all combinations of
nesting and negating IN queries were
examined:
• IN/IN
• IN/NOT IN
• NOT IN/IN
• NOT IN/NOT IN
58
• All of these combinations are also possible
with nested, negated EXISTS queries.
• However, double NOT EXISTS is the
quintessential case, and it will be considered
directly.
• If you can understand double NOT EXISTS,
then understanding the other cases should
not be a problem.
59
• 1. Queries that are doubly nested with NOT
EXISTS are the most logically complex queries
that will be dealt with here.
• In theory you could nest even deeper than that,
but it is difficult to understand beyond double
nesting, and it's hard to think of a use for such
complicated queries.
• Although not simple, doubly nested NOT EXISTS
queries do have a specific and useful meaning.
60
• The keyword EXISTS in SQL corresponds to
what is known as the existential quantifier in
mathematical logic.
• In a logical proposition, EXISTS is signified by
the phrase "there exists".
• It addresses the question of whether or not
there exists at least one case where a
particular condition holds.
61
• There is another quantifier in mathematical
logic, and it is known as the universal
quantifier.
• The phrase that signifies this quantifier in a
logical proposition is "for all".
• It addresses the question of whether or not a
particular condition holds for all cases.
62
• There is no keyword or keyword phrase in SQL
such as "FOR ALL".
• In other words, SQL does not have a universal
quantifier.
• It turns out that double NOT EXISTS is the
logical equivalent of universal quantification.
63
• For better or worse, doubly nested NOT
EXISTS queries are the classic way to
accomplish universal quantification in SQL.
• You may recall that in analyzing the results of
nested IN queries, it frequently came down to
a question of, “Are all included or are some
included?”
• This means that universal quantification can
also be accomplished with IN/NOT IN queries
64
• The reality is that doubly nested NOT EXISTS may
be the most straightforward way of dealing with
the question of all vs. not all.
• If you prefer to try and deal with this using IN
queries, that’s OK.
• You will need to know how to write both IN and
EXISTS queries.
• On assignments or tests, if a question is
specifically about the concept of “for all”, it is
highly likely that the clearest route to a solution
will be double NOT EXISTS.
65
• The logical equivalence of "for all" and double
NOT EXISTS will not be proven.
• Here is a verbal explanation of the equivalence:
• Suppose I am interested in this proposition:
• "Condition X holds in all cases."
• I claim that this proposition is equivalent:
• "There are no cases in which condition X does not
hold."
66
• Both in math and the English language "two
negatives yield a positive".
• На русском языке это не совсем так.
• You're warned in English class not to write
sentences with double negatives because they
are hard to understand.
• They are also hard to understand in SQL, but
you have no choice.
67
• If you can master the meaning of a double
NOT EXISTS query, you can do universal
quantification successfully.
• Even if the verbal explanation is not very clear,
the examples which follow will empirically
demonstrate how double NOT EXISTS is
equivalent to "for all".
68
• 2. In greater detail, this is the structure of the
kinds of queries needed in order to
accomplish what is under discussion here:
• A. The overall query consists of three smaller
queries:
• An outer query, a query inside that one, and
another inside the second one.
• This is double nesting.
69
• B. The two nested inner queries are
introduced by NOT EXISTS.
• C. All three queries are correlated.
– In other words, the overall query is hooked
together like a three-way join.
70
• D. In a simple example, the table of the outer
query is joined to the table of the innermost
query, and the table of the middle query is
also joined to the table of the innermost
query.
– Needless to say, if this plan is to be realized, there
have to be corresponding fields in the respective
tables.
71
• E. In a simple example the joining conditions
both appear in the innermost query.
72
• 3. Before giving a doubly nested NOT EXISTS query, a
reminder of the relationships between the Customer,
Carsale, and Salesperson tables is given here in the
form of a straightforward three-way join:
•
• SELECT Customer.name, salesprice, Salesperson.name
• FROM Customer, Carsale, Salesperson
• WHERE Customer.custno = Carsale.custno
• AND Carsale.spno = Salesperson.spno
73
• 4. Suppose I'm interested in a query that
could be described in this way:
• Find the customers who bought cars from all
of the salespeople.
• The "from all" in the verbal description signals
universal quantification.
74
• If the first description is ambiguous, maybe
this description is more specific:
• Find the customers who have bought at least
one car from each of the salespeople.
75
• Finally, it might be helpful in trying to write
the query to describe it in the following way.
• This is tortured English, but it closely parallels
the logic of the query:
• Find the customers, such that there does not
exist a salesperson that they haven't bought a
car from.
76
• Put another way:
• You're interested in the customers where
there is no salesperson that the customers
haven't bought a car from.
• Enough word twisting.
• The query follows.
77
• SELECT *
• FROM Customer
• WHERE NOT EXISTS
• (SELECT *
• FROM Salesperson
• WHERE NOT EXISTS
•
(SELECT *
•
FROM Carsale
•
WHERE Customer.custno = Carsale.custno
•
AND Carsale.spno = Salesperson.spno))
78
• This query illustrates all of the points made
earlier.
• Three tables are opened.
• It is doubly nested with NOT EXISTS.
79
• The two joining conditions that link the three
tables together on their corresponding fields
are in the innermost query.
• The logic of the query as seen when reading
its parts in order follows the logic of the
doubly negated expression of the meaning of
the query in the English language.
80
• There is one more thing worth noting about
this example query and universal
quantification queries in general:
• Quite often they identify exceptional cases.
• How likely is it that there are any customers
who have bought cars from all salespeople?
• It is pretty unlikely.
81
• If there are none, then this query will produce
no results.
• This is not a sign that the query is wrong.
• It simply means that you have asked for an
exceptional case, and there is no such case.
• When writing and testing queries of this kind,
do not be surprised if the query returns no
results.
82
• There is another approach to analyzing and
understanding a query like this.
• In general, a doubly nested query can involve
three tables.
• Those three tables are likely to consist of one
entity in a many-to-many relationship with
another, with a table in the middle.
• An E-R diagram of this situation follows:
83
Entity1
Table in the middle
Entity2
84
• The important thing to notice about the structure
of the double NOT EXISTS query is this:
• Reading from left to right in the diagram, you
have Entity 1, Table in the middle, Entity 2.
• The order that the tables appear in the query is
Entity 1, Entity 2, Table in the middle.
• The two joining conditions, which both involve
the Table in the middle, occur in the innermost
subquery, along with the Table in the middle.
85
• 5. It is easy to make another example by turning
the previous query around.
• Suppose you'd like to find the salespeople who
have sold cars to all customers.
• Again, there are various ways to rephrase it.
• One way which is close to the logic of the SQL
query is:
• Find the salespeople where there does not exist a
customer that the salesperson hasn't sold a car
to.
86
• It should be apparent that the order of tables
in the query makes a big difference.
• Turning the query around logically results in
the tables appearing in this query in the
opposite order to their appearance in the
previous one.
• The joining conditions remain the same.
• Here is the SQL for this query:
87
• SELECT *
• FROM Salesperson
• WHERE NOT EXISTS
• (SELECT *
• FROM Customer
• WHERE NOT EXISTS
•
(SELECT *
•
FROM Carsale
•
WHERE Salesperson.spno = Carsale.spno
•
AND Carsale.custno = Customer.custno))
88
• If anything, this query may be even more
unlikely to have a result than the previous
example.
• Even if the query doesn't return a result, that
doesn't mean that it's wrong.
89
• 6. The parts of the example can be rearranged
again, resulting in a different query, which
illustrates other things.
• Here is a query where the Customer table
comes first, the Carsale table comes second,
and the Salesperson table comes third:
90
• SELECT *
• FROM Customer
• WHERE NOT EXISTS
• (SELECT *
• FROM Carsale
• WHERE Customer.custno = Carsale.custno
• AND NOT EXISTS
•
(SELECT *
•
FROM Salesperson
•
WHERE Carsale.spno = Salesperson.spno))
91
• There are several points to be made about this
query.
• The first point is syntactical.
• Unlike the previous examples, one of the joining
conditions is in the first subquery and the other is
in the second subquery.
• This query won't work in Microsoft Access SQL if
the condition in the first subquery is put together
with the condition in the second subquery.
92
• Look at the tables involved in the first
condition:
• They are Customer and Carsale.
• If this condition is moved into the second
subquery, it will be inside a nest where neither
of these tables exist.
• This doesn't cause an error message to be
generated, but this condition will be ignored.
93
• The second point has to do with the meaning
of the query.
• In words, it might be translated as:
• Find the customers where there is no carsale
to that customer where there isn't a
salesperson for the carsale.
• In other words, find the customers where all
of the purchases they made were made from
salespeople.
94
• Unlike the previous examples, you would expect
to get results from this query since you would
expect most customers to have made their
purchases from salespeople.
• Depending on the data in the database, there
may be some customer who at some time has
bought a car without a salesperson being
recorded for the sale.
• If so, that customer would not be in the results of
the query.
95
• If you look at the E-R diagram again, you see
that the order of the tables in the query
agrees with the order of the table, reading
form left to right in the diagram.
96
Entity1
Table in the middle
Entity2
97
• Keep in mind that if the tables are ordered in
this way, the meaning of the query is
drastically changed.
• Also keep in mind that the placement of the
joining conditions in the query is also
changed.
98
• That leads to the third point.
• It is possible to write queries with both EXISTS
and NOT EXISTS.
• Suppose you wanted to find any customers
who had made at least one purchase without
a salesperson
• Suppose you also wanted to exclude those
customers who only made purchases from
salespeople
99
• Then you could rewrite the query above using
EXISTS for the first subquery and NOT EXISTS
for the second.
• It is worth noting that such a query would not
be restricted to only those customers who
made all of their purchases without
salespeople.
• The query is shown on the next overhead.
100
• SELECT *
• FROM Customer
• WHERE EXISTS
• (SELECT *
• FROM Carsale
• WHERE Customer.custno = Carsale.custno
• AND NOT EXISTS
•
(SELECT *
•
FROM Salesperson
•
WHERE Carsale.spno = Salesperson.spno))
101
• In general, these queries can be arbitrarily
complex.
• There can be both EXISTS and NOT EXISTS;
• the joining conditions may be equality or not
equality;
• you can nest more than two levels, and so on.
• The challenge comes from understanding
what these arbitrarily complex queries mean.
102
• 7. Here is one last example.
• It is of less practical consequence than the
previous examples, but it illustrates some
useful points.
• If you understand how it works, it is a sign that
you understand the relationships between the
different subqueries of a double NOT EXISTS
query.
103
• Suppose you are interested in finding out
information similar to one of the previous
queries, such as customers who bought from all
salespeople.
• Suppose however, that you don't need the names
of those customers in the results, just their
custno's.
• All of the information needed for the query is in
one table, the Carsale table, but it would have to
be opened three times, with a different alias for
each of its three appearances in the query.
104
• One time it would be opened because it
contained customer information, another time
because it contained salesperson information,
and the last time because it contained
information about car sales.
• The only way you can tell what role an opening of
a table is playing in the query is by which of its
fields is used in the joining condition.
• Here is an example:
105
• SELECT custno
• FROM Carsale AS A
• WHERE NOT EXISTS
• (SELECT *
• FROM Carsale AS B
• WHERE NOT EXISTS
•
(SELECT *
•
FROM Carsale AS C
•
WHERE A.custno = C.custno
•
AND B.spno = C.spno))
106
• Carsale opened as A is playing the role of the
Customer table.
• You know this not just because custno is
selected from it, but because of the joining
condition A.custno = C.custno.
• The joining field from A is the field which
relates to customers, custno.
107
• Carsale opened as B is playing the role of the
Salesperson table.
• You know this because the joining field from B
is the field which relates to salespeople, spno.
108
• Carsale opened as C is playing the role of itself,
namely the table in the middle of the many-tomany relationship between customers and
salespeople.
• You know this because C appears in both of the
joining conditions, once on custno and once on
spno.
• This query finds the custno's of customers who
have bought a car from all of the salespeople.
109
• There is one last point to be made about this
example.
• It is not exactly equivalent to the version of
the query where the Customer and
Salesperson tables were actually used.
• This query finds those queries who bought a
car from every salesperson who actually sold a
car.
110
• The previous version found customers who
literally bought a car from all salespeople.
• If, by chance, there was a salesperson in the
Salesperson table who hadn’t even sold one
car, then the previous version of the query
would not produce any results.
111
The End
112