CS 122: Week 7 Subqueries

Download Report

Transcript CS 122: Week 7 Subqueries

Lecture 7: Subqueries

Tarik Booker California State University, Los Angeles

What we will cover…

• • • • • • • • • Subqueries • What, Why, How Where to use • Subqueries in the WHERE clause Subqueries in the HAVING clause IN / NOT IN ALL / ANY Subqueries in the SELECT clause Nested Subqueries Correlated Subqueries EXISTS with Subqueries Subqueries vs. Joins

Subqueries

• • There might be cases where certain results can’t be done with only one query!

• Ex: Find the name of the youngest friend of “Helen X. World.” • I can find the youngest age (max(birthdate)), but not name of that person.

In this case, use a subquery!

• What is a subquery?

• A query within a query!

• Also called an inner query • These must be surrounded by parentheses!

• The query that contains a subquery is called an outer query • We will come back to the above problem (later in the lecture).

Subqueries (2): Example

• We can also use subqueries to simplify longer queries: • Ex: List the name of all salespeople that don’t represent any members • What we’ve done so far: SELECT s.firstname, s.lastname

FROM Salespeople s LEFT JOIN Members m USING (SalesID) WHERE m.memberid is NULL; • Using a Subquery: SELECT firstname, lastname FROM Salespeople WHERE salesID NOT IN (SELECT DISTINCT salesID FROM Members);

Subqueries (3): When to use

• When to use a subquery?

• • • • Impossible or very difficult to solve the problem using a single query When a subquery runs faster than a non-subquery solution • Not as necessary with MySQL When it is easier to understand than an alternate solution • (Outer join, union, might be harder) When you want to use an aggregate function in a WHERE clause • Mostly used this way • Subquery executes separately

Types of Subqueries

• Single Value Subqueries • The subquery returns a single value • Data is in one column, one row • List Subqueries • Subquery returns a list • One column, multiple rows • Table Subqueries • Subquery returns an entire table • Multiple columns, multiple rows

Subqueries in the WHERE Clause

WHERE Clause subqueries are most common • Filters out values from the outer query using values (or a list of values) from the inner query Outer Query WHERE ( Inner Query ) • We can now apply the results of aggregate functions in the WHERE clause!!!

• Note: This is the only way of doing this (to get the right result)!

WHERE Clause Subqueries (2)

• HOW to do this?

• Ex: List all tracks with runtime greater than the average runtime of all tracks.

• WRONG Answer: • SELECT Tracktitle, LengthSeconds • FROM Tracks t • WHERE LengthSeconds > AVG(t.Lengthseconds) • Why? The WHERE Clause processes before the aggregate function • AVG () cannot be processed.

• • That order is part of the design of SQL We have to process the aggregate function before the WHERE clause • How? Use a subquery!

WHERE Clause Subqueries (3)

• Ex: List all tracks with runtime greater than the average runtime of all tracks.

• • CORRECT Answer: • SELECT Tracktitle, LengthSeconds • FROM Tracks • WHERE LengthSeconds > (SELECT AVG(LengthSeconds) FROM Tracks); Note: • The aggregate function is processed before the WHERE clause, because it’s in an entirely different query (a subquery) • The inner query doesn’t actually filter anything out, it just supplies a value (in this case the average) to the WHERE statement in the outer query • The inner query runs first, returns a value, then the outer query runs last.

WHERE Clause Subqueries (4)

• The inner query runs first, the outer query runs last.

• Inner query’s result(s) produces outer query’s values Outer Query Inner Query • Note: Inner query can be any legal query!

WHERE Clause Subqueries (5)

• • Expanded from the midterm review DB: • Ex: Find the name of the youngest friend (s) of “Helen X. World.” We can only do this with a subquery: • First: Find the query to get the date of Helen’s youngest friend: • SELECT MAX(birthday) • FROM Users JOIN XrefUsersChats USING(chatID) • JOIN Chats USING(ChatID) • WHERE Firstname = ‘Helen’ AND Lastname = ‘World’ AND MI = ‘X’; • Next: Use the resulting (date) value to compare names with • SELECT Firstname, Lastname, MI • FROM Users • WHERE Birthday = (Resulting_Value); • Last: Substitute the inner query for the resulting value:

WHERE Clause Subqueries (6)

• • • Inner query: SELECT MAX(birthday) FROM Users JOIN XrefUsersChats USING(chatID) JOIN Chats USING(ChatID) WHERE Firstname = ‘Helen’ AND Lastname = ‘World’ AND MI = ‘X’; Outer query: SELECT Firstname, Lastname, MI FROM Users WHERE Birthday = (Resulting_Value); Complete Query: SELECT Firstname, Lastname, MI FROM Users WHERE Birthday = ( SELECT MAX(birthday) FROM Users JOIN XrefUsersChats USING(chatID) JOIN Chats USING(ChatID) WHERE Firstname = ‘Helen’ AND Lastname = ‘World’ AND MI = ‘X’); • Note: This looks hard, but we’ve just combined two simple queries.

Where Clause Subquery Example

Using the Lyric database: • List all titles recorded at MakeTrax or LoneStar Recording. Do not use a join and do not hard-code company ID’s.

• We will use subqueries: • Outer query: • SELECT Title FROM Titles • WHERE StudioID = (X) OR StudioID = (Y); • Inner Query X (Collecting the StudioID for MakeTrax): • SELECT StudioID FROM Studios • WHERE studioname = ‘MakeTrax’; • Inner Query Y (Collecting the StudioID for LoneStar): • SELECT StudioID FROM Studios • WHERE studioname = ‘Lone Star Recording’;

WHERE Clause Subquery Example (2)

Complete Query: SELECT Title FROM Titles WHERE StudioID = ( SELECT StudioID FROM Studios WHERE Studioname = ‘MakeTrax’) OR StudioID = ( SELECT StudioID FROM Studios WHERE Studioname = ‘Lone Star Recording’); • Note: Indenting helps specify subqueries, but is not required.

SELECT Title FROM Titles WHERE StudioID = (SELECT StudioID FROM Studios WHERE Studioname = ‘MakeTrax’) OR StudioID = (SELECT StudioID FROM Studios WHERE Studioname = ‘Lone Star Recording’); • This is okay. • What’s another way to do this problem (without joins)?

Tips for WHERE Clause Subqueries

Do’s: • Do remember to remove semicolons from inner queries.

• Do separate inner and outer queries if you are having trouble.

• Do check that each query works and gives the correct result before combining them.

• Don’ts: • • Don’t forget the ending parentheses for subqueries!

Don’t forget this is the only way to use aggregate functions in

the WHERE Clause!

IN / NOT IN

• IN (Keyword) • Tests if an expression matches any items in a list • List = one column, many rows • Typically used in the result of a (list) subquery.

• NOT IN • If the expression is not in the list • Syntax: • • expression IN (list_subquery) expression NOT IN (list subquery)

IN / NOT IN Example

• Ex: List the names of salespeople that represent Members in the USA without using a join.

• • How to solve? • Get the USA Members’ salesid’s, then compare with salespeople • Members info is in Members Table • Salespeople information is in SalesPeople Table How to structure the subquery?

• Listing salespeople names is the final result, so that should be an

outer query.

• Therefore, getting member associated salesIDs should be an inner query • Test if the salespeople names are IN the group returned by the member salesID’s

IN / NOT IN Example (2)

• Outer query: Get names of a particular salesID.

• SELECT SalesID • • FROM Salespeople WHERE SalesID IN (salesID_values); • I’m using IN (instead of “=“) because there are (possibly) multiple values.

• Note: No single quotes in the salesID_values (between the parentheses) when using a subquery.

• Inner query: Get SalesID’s of Members from the USA.

• SELECT SalesID • FROM Members • WHERE Country = ‘USA’; • When you combine, insert the inner query directly between the parentheses.

IN / NOT IN Example (3)

• Final Result : SELECT Firstname, Lastname FROM Salespeople WHERE SalesID IN ( SELECT SalesID FROM Members WHERE Country = ‘USA’); • • Or (no indentation): SELECT Firstname, Lastname FROM Salespeople WHERE SalesID IN ( SELECT SalesID FROM Members WHERE Country = ‘USA’); Note: I didn’t need to use a table alias for my outer query.

• Why?

IN / NOT IN Example(4)

• Keep in mind: • IN is NOT A JOIN!!!

• IN simply compares a group of values • Whatever is in the results of the subquery • (In the previous case, salesids) • You can compare salesid’s anywhere without having to chain together tables • What does this query do?

SELECT Firstname, Lastname FROM Salespeople WHERE SalesID IN ( SELECT SalesID FROM Studios WHERE Country = ‘USA’);

ALL and ANY

• • ALL • Condition must hold true for all elements in the list • Syntax: • expression operator ALL ANY • Condition must hold true for any element in the list • Syntax: • expression operator ANY

ALL and ANY Example

• Ex: List the names of all members whose birthdays are later than those of all members from CA or OH • • How to solve?

• Get birthdays of members from CA and OH, then compare with other member birthdays How to structure?

• • • Listing later birthdays is final result, so this is outer query Birthdays from CA or OH is inner query All members, so ALL

ALL and ANY Example (2)

• Outer query: SELECT LastName, FirstName FROM Members WHERE Birthday > ALL (X) AND Birthday > ALL (Y); • • Inner query (X): SELECT birthday FROM Members WHERE Region = ‘CA’; Inner query (Y): SELECT birthday FROM Members WHERE Region = ‘OH’; • Note: What’s another way to do this query?

ALL and ANY Example (3)

• Combined Query: SELECT LastName, FirstName FROM Members WHERE Birthday > ALL ( SELECT birthday FROM Members WHERE Region = ‘CA’ ) AND Birthday > ALL ( SELECT birthday FROM Members WHERE Region = ‘OH’ ); • What’s the other way to do this?

• Instead of two subqueries…

Subqueries in the HAVING Clause

You can also have subqueries in the HAVING clause!

• You can substitute each subquery as well • Ex: List the number of members in each region that has more members than California.

• • Outer Query = Number of members in each region Inner Query = Number of members in California • Since we want number of members in each region, we also must group results (in outer query) by region.

• Outer Query: • SELECT Region, COUNT(*) • FROM Members • GROUP BY Region • HAVING COUNT(*) > (number_of_members_in_ca); • Inner Query (Number of members in CA): • SELECT COUNT( *) • FROM Members • WHERE Region = ‘CA’;

Subqueries in the HAVING Clause (2)

Solution: • SELECT Region, COUNT(*) • FROM Members • • GROUP BY Region HAVING COUNT(*) > (SELECT COUNT(*) FROM Members WHERE Region = ‘CA’); • Note: We don’t need DISTINCT because we are working from the Members table (all rows are unique)

Subqueries in the SELECT Clause

If you put a subquery in the SELECT clause, the subquery must

return a single value

• NOT a list, or table!

• Examples: • SELECT (SELECT 1) + (SELECT 2); • Ans: 3 • SELECT (SELECT COUNT(*) FROM Tracks); • Ans: 50 • SELECT (SELECT * From Tracks); • • Ans: Error!

Why?

Subqueries in the SELECT Clause (2)

SELECT Clause subqueries are useful when computing single value calculations • Percentages • Ex: What percentage of members are male?

• How to solve?

• Percentage = 100 * (total number of male members) / (total number of members) • • Outer Query = 100 * (x) / (y); Inner Query (x): Total number of male members • SELECT COUNT(*) FROM Members WHERE Gender = ‘M’; • Inner Query (y): Total number of members • SELECT COUNT(*) FROM Members;

Subqueries in the SELECT Clause(3)

Complete Query: • SELECT 100 * (SELECT COUNT(*) FROM Members WHERE Gender = ‘M’) / (SELECT COUNT(*) FROM Members); • Remember, only single-valued subqueries in the SELECT Clause • When in doubt, test each query individually!

Nested Subqueries

• Subqueries within Subqueries!

• Do the same techniques, but with more layers • • Ex: List the birthdays of all members who belong to artists which have recorded titles that include the word “the”. Do

not use any joins.

How to solve?

• • • Outermost query: Birthdays of all members Inner query : Members who belong to artists with titles Innermost query: • Titles that include the word “the”.

Note: We want the word “the”, not everything with the letters t-h-e in it.

Nested Subqueries (2)

• • • • Outermost query: • SELECT Birthday FROM Members • WHERE Memberid IN (group_of_memberids); Inner query: • SELECT memberid FROM XrefArtistsMembers • WHERE ArtistID IN (group_of_artistids); Innermost query: • • SELECT ArtistID FROM Titles WHERE Title LIKE ‘% the %’ OR Title LIKE ‘% the’ OR Title LIKE ‘the %’ ; Complete Query: SELECT Birthday FROM Members WHERE Memberid IN ( SELECT memberid FROM XrefArtistsMembers WHERE ArtistID IN ( SELECT ArtistID FROM Titles WHERE Title LIKE ‘% the %’ OR Title LIKE ‘% the’ OR Title LIKE ‘the %’ ) );

Nested Subqueries (3)

• SQL -> English: • SELECT A.artistName FROM artists A WHERE (A.artistID IN (select artistID FROM titles WHERE (titles.studioID IN (select studioID FROM studios P WHERE P.salesID IN (select salesID FROM salespeople WHERE base > 100)))));

Nested Subqueries (4)

• Find all artists who have recorded titles at studios which are represented by salespeople whose base salaries are greater than $100

Correlated Subqueries

• • Our Previous subqueries have been non-correlated.

• Non-correlated?

• • This means no “dependencies.” The inner query could be run separately We will now deal with correlated subqueries: • • You can’t run the inner query separately The result of the inner query “depends on” data given it by the outer query • The correlated subquery is executed for each row returned by an outer query • • The WHERE clause of the subquery is joined to the outer query Note: Correlated subqueries cannot be debugged (checked) like independent subqueries.

Correlated Subqueries (2)

• • • • Ex: List the first track of each title with its length in seconds and the total length in seconds of all tracks for that title: Display: First Track, Length(s) of First Track, Total length of all tracks • SELECT TrackTitle, LengthSeconds As Sec, • (SELECT SUM(LengthSeconds) FROM Tracks SC • WHERE SC.TitleID = T.TitleID) AS TotSec • FROM Tracks T WHERE TrackNum = 1; Note: Third field is a subquery in the SELECT Clause, but its TitleID is tied into the Tracks table of the outer query This will make the inner query operate on the same rows as the outer query (at the same time) • Note: A correlated subquery in the SELECT clause can only return

one value!

Correlated Subqueries (3)

• Use an alias for the results of the outer query to make the results more readable. Aliases aren’t required in correlated subqueries, but required if using the same table.

• Ex: Find the titles of all tracks that are less than the mean (average) lengths of tracks for the titles on which they occur: • SELECT tr.TrackTitle, tr.Lengthseconds FROM Tracks tr • WHERE tr.lengthseconds < ( SELECT AVG(LengthSeconds) FROM Tracks WHERE titleid = tr.titleid); • Note: Because the inner query’s titleid is linked to the outer query’s titleid, this becomes a correlated subquery • Inner query runs once per row in the outer query

Using EXISTS with Subqueries

• EXISTS (Keyword) • Checks if data exists in a subquery • If there is data, returns true • If not, returns false • Ex: List the name of artists who have recorded at least one title • SELECT artistname FROM Artists A • WHERE EXISTS (SELECT ArtistID FROM Titles T WHERE T.ArtistID = A.ArtistID);

More on Subqueries

• Ex: Find all artists that have members from Georgia (GA).

• We can do this different ways: • Without subqueries: • SELECT DISTINCT Artistname FROM Artists A INNER JOIN XRefArtistsMembers X USING(ArtistID) • INNER JOIN Members M USING(MemberID) • WHERE M.Region = ‘GA’; • With one subquery (and two joins): • SELECT DISTINCT Artistname FROM Artists A INNER JOIN XRefArtistsMembers X USING(ArtistID) INNER JOIN (SELECT MemberID FROM Members WHERE Region = ‘GA’) M USING(MemberID); • How can you do this without joins at all?

More on Subqueries (2)

• Use multiple subqueries!

• Two subqueries: SELECT DISTINCT Artistname FROM Artists A WHERE ArtistID IN (SELECT ArtistID FROM XrefArtistsMembers X WHERE X.memberid IN (SELECT MemberID FROM Members WHERE Region = ‘GA’));

Subqueries vs. Joins

• Joins construct Cartesian Products • Then filter (remove) data • Subqueries select matching records • Subqueries tend to be much faster

Updating Records

• EXISTS with a subquery is faster than a join. Why?

• With an EXISTS sub-query, SQL does not have to perform a full row by row join, building the Cartesian product and then tossing out unmatched rows.

• It simply runs the sub-query for each row of the outer query. • It may not even have to run the entire sub-query, since as soon as it finds one good record it knows that at least some data exists.

Tips on Solving Subquery Problems

When solving subquery problems: • Think substitution!

• Analyze the question, looking for subqueries within the question • Replace subqueries in the original question with substitution variables such as X, Y, and Z • Write queries for your substitution variables • Write a query to that solves the original question using your substitution variables • Replace substitution variables with your subqueries

Tips on Solving Subquery Problems (2)

• If you find yourself unsure of a problem: • Try to solve the problem using a single query, and when you get stuck, write a subquery for the part you get stuck on!

• Good Luck!