CS-220 Select Nested Queries

Download Report

Transcript CS-220 Select Nested Queries

Nested queries
• Again, sometimes the information we want to
retrieve is in 2 (or more tables)
– What is the name of the customer who placed order
101?
– What is the description of all the products that are in
order 103?
• Another strategy is to nest select queries into one
another
Nested queries
• What is the name and address of the
customer who placed order 108?
•  2 tables involved: customer and order
• Could start by using 2 simple queries
– Find out customerId of customers who placed
order 108
– From there, find out name and address of
customer
Nested queries
• What is the name and address of the
customer who placed order 108?
•  combine the 2 simple queries into one
• Select name,address from customer where
customerId = ( select customerId from order
where orderId = 108 );
•  equivalent to what query using a join?
Nested queries
• Select name,address from customer where
customerId = ( select customerId from order
where orderId = 108 );
• Inner query in yellow
• Outer query in red
Nested queries
• Could also use IN instead of =
• In inner query, orderId is the primary key of
the order table  can only yield ONE row
max in the resulting table  can use =
• If your selection criteria in the inner query
is not a primary key value  inner query
can yield more than one row  use IN
Examples
• What customers have placed orders?
• Select name from customer where
customerId IN (select DISTINCT
customerId from order );
Examples
• What customers have NOT placed orders?
• Select name from customer where customerId
NOT IN (select DISTINCT customerId from order
);  issue because of NULL value
• Select name from customer where customerId
NOT IN (select DISTINCT customerId from order
where customerId IS NOT NULL);
More Examples
• What customers have NOT placed orders
for computer desks?
• Select name from customer where
customerId NOT IN ( /*… customerIds of
customers who have placed orders for
computer desks .. */ );
• Complete query page 301
• Note: could use nested queries all the way
Customers who have not placed
orders from computer desk
• Using nested queries all the way
• Select name from customer where
customerId NOT IN ( select customerId
from order where orderId in ( select orderId
from orderLine where productId in (select
productId from product where description =
‘computer desk’ ) ) and customerId is not
null );
Exists, not exists
• If we are only interested in whether a
subquery returns at least one row, then we
can use EXISTS instead of IN
• If the subquery returns 1 or more rows 
the where exists ( subquery ) condition will
evaluate to TRUE
Exists, not exists
• If the subquery returns 0 row  the where
exists ( subquery ) condition will evaluate to
FALSE
• To invert the value of the condition, use
NOT EXISTS instead of EXISTS
What are the order ids for all orders
that have included furniture finished
in natural ash?
• Select distinct orderId from orderLine
where EXISTS ( select * from product
where productId = orderLine.productId and
finish = ‘natural ash’ );
• The inner query uses * because we do not
care which columns are selected; we only
care if it returns 0 or 1 or more rows
What are the order ids for all orders
that have included furniture finished
in natural ash?
• In the inner query, we select from the
product table only but the where clause uses
also the orderLine table, which we use in
the outer query
• The outer and inner queries are correlated
What are the order ids for all orders
that have included furniture finished
in natural ash?
• For each order line, the subquery checks to
see if the finish for the product is natural
ash; if it is  the orderId is included in the
result
• The outer query checks this for every row
of the orderLine table
•  be careful that your SQL queries scale as
data increases; this one will require testing
What are the order ids for all orders
that have included furniture finished
in natural ash?
• Could use this instead
• Select distinct orderId from orderLine
where productId IN (select productId from
product where finish = ‘natural ash’ ) ;
What are the order ids for all orders
that have included furniture finished
in natural ash?
• Be careful how you use EXISTS
• Select distinct orderId from orderLine where
EXISTS ( select * from product where finish =
‘natural ash’ );
•  all orderIds will be selected because EXISTS
clause is ALWAYS true  logic error
JOINS vs SUBQUERIES
• Whatever you use, you should:
• Test that it produces the correct result
• Test that it can scale if you expect the
quantity of data in your tables to increase
siginificantly
JOINS vs SUBQUERIES
• With a JOIN, we build a big table before
selecting it
• With NESTED QUERIES, we select from a
regular table and then keep selecting from a
smaller and smaller table
•  If you can, use nested queries
•  If you need to return data from both
tables, you may have to use a JOIN
More Examples
• What is our most expensive product?
• Could use only one query
• Select id, description, finish, price from
product order by price desc limit 0, 1;
• However, with the above query, if 2 or
more products have the max price, we will
only get one of them
More Examples
• What is our most expensive product?
• All the information is in just one table, the
product table
• But we may have to extract the max price
first, then select products based on their
price equal to the max price
More Examples
• Inner query uses same table as outer query
• Select productId, description, finish, price
from product where price = ( select max(
price ) from product );
More Examples
• What are our most popular products, i.e. the
products with the highest quantity of units
sold, listed in descending order?
• Let’s proceed one step at a time
• First compute quantity of units sold per
product
More Examples
• Sum quantities on order per product id
• Select productId, sum( quantity ) from
orderLine group by productId;
More Examples
• That gives a table (the result of a select
query is a table)
•  we can join that table with the product
table to put all the information together and
order by sum(quantity) in descending order
More Examples
• Select description, finish, totalQty from
product, ( Select productId, sum( quantity )
totalQty from orderLine group by productId
) tempTable where tempTable.productId =
product.productId order by totalQty desc ;
Renaming tables and columns
• We renamed the inner query resulting table
because we needed a name for it to perform
the join with the product table (its name is
in the where clause)
• We also renamed sum( quantity ) totalQty
so we can select it
More Examples
• What is the most popular product, i.e. the product
with the highest quantity of units sold?
• Could do:
• Select productId, sum( quantity ) from orderLine
group by productId order by sum(quantity) desc
limit 0,1;
• But that only gives us one (other products could
be tied for first place) and no information on the
product(s)
What is the most popular product,
i.e. the product with the highest
quantity of units sold?
• If we want all the products tied for 1st place
as well as product information
• We could take the max from the previously
computed table and select based on the max
What is the most popular product,
i.e. the product with the highest
quantity of units sold?
Select description, finish, totalQty from
product, ( Select productId, sum( quantity )
totalQty from orderLine group by productId
) tempTable where tempTable.productId =
product.productId AND
totalQty = ( select max( totalQty2 ) from
( Select sum( quantity ) totalQty2 from
orderLine group by productId ) tempTable2
);
Derived tables need to be renamed
We renamed the table resulting from the last
select statement because every derived table
must have its own alias, even though we are
not using its name anywhere else in the
query
Who are our biggest customers, in
terms of # of orders placed
• Info will be in customer and order tables
• Go one step at a time to build the final
query
• Find out customer ids vs # of orders placed
• Select customerId, count( orderId ) from
order group by customerId
Who are our biggest customers, in
terms of # of orders placed
• The previous query generates a table
• Join it with the customer table to get all the
information needed
Who are our biggest customers, in
terms of # of orders placed
• Select name, city, state, orders from
customer, ( select customerId, count(
orderId ) orders from order group by
customerId ) tempTable where
tempTable.customerId =
customer.customerId;
Who are our biggest customers, in
terms of $ bought
•
•
•
•
•
We will need to get data from all 4 tables
Customer names .. are in the customer table
..
Product prices are in the product table
 need all 4 tables
Who are our biggest customers, in
terms of $ bought
• Let’s go step by step
• $ amount involves price and quantity
bought per product
•  2 tables are involved to compute price *
quantity
•  need to join product and orderLine tables
(via productId)
Who are our biggest customers, in
terms of $ bought
• Let’s go step by step
• For each order, get the $ amount
• Select orderId, quantity * price from
orderLine, product where product.productId
= orderLine.productId;
Who are our biggest customers, in
terms of $ bought
• In previous query, order ids appear several
times
•  need to aggregate that data, sum it and
group it by orderId
Who are our biggest customers, in
terms of $ bought
• Select orderId, sum( quantity * price ) from
orderLine, product where product.productId
= orderLine.productId group by orderId;
•  now we have total $ per order id
Who are our biggest customers, in
terms of $ bought
• Now we need to join the result of the
previous table with the order table to get
aggregate data per customer id
Who are our biggest customers, in
terms of $ bought
• Select customerId, sum( dollars ) from
order, ( select orderId, sum( quantity * price
) dollars from orderLine, product where
product.productId = orderLine.productId
group by orderId ) tempTable where
tempTable.orderId = order.orderId group by
customerId;
Who are our biggest customers, in
terms of $ bought
• Now join that table with the customer table
and aggregate the dollars to answer the
question
Who are our biggest customers, in
terms of $ bought
• Select name, totalDollars from customer, ( select
customerId, sum( dollars ) totalDollars from order,
( select orderId, sum( quantity * price ) dollars
from orderLine, product where product.productId
= orderLine.productId group by orderId )
tempTable where tempTable.orderId =
order.orderId group by customerId ) temptable2
where tempTable2.customerId =
customer.customerId order by totalDollars desc;
Correlated subquery
• A subquery in which processing the inner
query depends on data from the outer query
• Note: we have already seen one in the
EXISTS example
• Inner query is computed for each outer row
List details about the product with
the highest unit price (page 303)
• Highest unit price  price is higher than
prices of ALL OTHER products
• For each product, is its price higher than the
prices of ALL the other products?
List details about the product with
the highest unit price
• Select description, finish, price from
product pA where price > ALL ( select
price from product pB where pB.productId
!= pA.productId );
• Table pA shows in both inner and outer
queries, even though we are not selecting
from pA in inner query
•  queries are correlated
List details about the product with
the highest unit price
• The inner query will be executed once for
each product to be sure that no other
product has a higher price
• We are comparing rows in a table to
themselves, we are able to do that by giving
the same table 2 aliases
List details about the product with
the highest unit price
• Problem with the previous query
• If 2 different products have the highest price
 query returns an empty set
•  change to >=
List details about the product with
the highest unit price
• Could achieve same results with nested
queries
• Select description, finish, price from
product where price = ( select max(price)
from product );
•  much more simple and more efficient
Combining queries
• We can combine the output of 2 or more
queries, i.e. UNION them
• Constraints: both queries must output the
same number of columns and the data types
of these columns must be UNION
compatible (this may vary depending on
RDBMS used)
Combining queries
• Which customers have either purchased the
largest quantity of furniture or the smallest
quantity of furniture as part of an order?
•  can code 2 separate queries and then
UNION the 2 results
Customers having purchased largest
quantity of furniture
• Select c1.customerId, name, quantity,
‘largest’ from customer c1, order o1,
orderLine ol1 where c1.customerId =
o1.customerId and o1.orderId = ol1.orderId
and quantity = ( select max( quantity) from
orderLine );
Customers having purchased
smallest quantity of furniture
• Select c1.customerId, name, quantity,
‘smallest’ from customer c1, order o1,
orderLine ol1 where c1.customerId =
o1.customerId and o1.orderId = ol1.orderId
and quantity = ( select min( quantity) from
orderLine );
Customers having purchased largest
quantity of furniture
• Select c1.customerId, name, quantity, ‘largest’
from customer c1, order o1, orderLine ol1 where
c1.customerId = o1.customerId and o1.orderId =
ol1.orderId and quantity = ( select max( quantity)
from orderLine ) UNION Select c1.customerId,
name, quantity, ‘smallest’ from customer c1, order
o1, orderLine ol1 where c1.customerId =
o1.customerId and o1.orderId = ol1.orderId and
quantity = ( select min( quantity) from orderLine )
We can cast a column data type
• To make a column data type compatible
with another column, we can cast it to
another data type
• This could be useful when doing a union
• Select cast(orderDate as char) from order;
• Note: cast is a function  No space
between cast