CS186: Introduction to Database Systems

Download Report

Transcript CS186: Introduction to Database Systems

CS 405G: Introduction to
Database Systems
Lecture 11: SQL III
Instructor: Chen Qian
Summary of SQL features
covered so far

SELECT-FROM-WHERE statements

Ordering
Set and bag operations
Aggregation and grouping
Table expressions, subqueries




Next: NULL’s, outerjoins, data modification, constraints,
…
7/18/2015
Chen Qian @ University of Kentucky
2
Today’s Topic



NULL value
Outer join
Nested queries
7/18/2015
Chen Qian @ University of Kentucky
3
Incomplete information


Example: Student (SID, name, age, GPA)
Value unknown


Value not applicable


We do not know Nelson’s age
Nelson has not taken any classes yet; what is his GPA?
How to express these values in SQL language?
7/18/2015
Chen Qian @ University of Kentucky
4
Solution 1

A dedicated special value for each domain (type)


GPA cannot be –1, so use –1 as a special value to indicate
a missing or invalid GPA
Leads to incorrect answers if not careful


Complicates applications


SELECT AVG(GPA) FROM Student;
SELECT AVG(GPA) FROM Student
WHERE GPA <> -1;
Remember the Y2K bug?

7/18/2015
“00” was used as a missing or invalid year value
Chen Qian @ University of Kentucky
5
Solution 2

A valid-bit for every column


Student (SID, name, name_is_valid,
age, age_is_valid,
GPA, GPA_is_valid)
Complicates schema and queries

7/18/2015
SELECT AVG(GPA) FROM Student
WHERE GPA_is_valid;
Chen Qian @ University of Kentucky
6
Solution 3?

Decompose the table; missing row = missing value




StudentName (SID, name)
StudentAge (SID, age)
StudentGPA (SID, GPA)
StudentID (SID)
Conceptually the cleanest solution
Still complicates schema and queries


7/18/2015
How to get all information about a student in a table?
Would natural join work?
Chen Qian @ University of Kentucky
7
SQL’s solution

A special value NULL



For every domain
Special rules for dealing with NULL’s
Example: Student (SID, name, age, GPA)

< 789, “Nelson”, NULL, NULL >
7/18/2015
Chen Qian @ University of Kentucky
8
Three-valued logic





TRUE = 1, FALSE = 0, UNKNOWN = 0.5
x AND y = min(x, y)
x OR y = max(x, y)
NOT x = 1 – x
AND
True
True
True
False
False
NULL
UNK
False
NULL
False False
UNK False
AND
False
UNK
OR
True
True
True
False
False
NULL
TRUE
False False False
NULL TRUE UNK
OR
UNK
UNK
WHERE and HAVING clauses only select rows for output if the
condition evaluates to TRUE

UNKNOWN is not enough
7/18/2015
Chen Qian @ University of Kentucky
9
Computing with NULL’s



(Arithmetic operation) when we operate on a NULL and
another value (including another NULL) using +, –, etc.,
the result is NULL
Aggregate functions ignore NULL, except COUNT(*)
(since it counts rows)
When we compare a NULL with another value
(including another NULL) using =, >, etc., the result is
UNKNOWN
7/18/2015
Chen Qian @ University of Kentucky
10
Unfortunate consequences

SELECT AVG(GPA) FROM Student


SELECT SUM(GPA)/COUNT(*) FROM Student;



3.4
2.72
SELECT * FROM Student;
SELECT * FROM Student WHERE GPA = GPA

Not equivalent
7/18/2015
sid
name
age
gpa
1234
John Smith
21
3.5
1123
Mary Carter
19
3.8
1011
Bob Lee
22
NULL
1204
Susan Wong
22
3.4
1306
Kevin Kim
18
2.9
Chen Qian @ University of Kentucky
11
Another problem

Example: Who has NULL GPA values?

SELECT * FROM Student WHERE GPA =
NULL;


(SELECT * FROM Student)
EXCEPT ALL
(SELECT * FROM Student WHERE GPA =
GPA)


Does not work; never returns anything
Works, but ugly
Introduced built-in predicates IS NULL and IS NOT
NULL

7/18/2015
SELECT * FROM Student WHERE GPA IS NULL;
Chen Qian @ University of Kentucky
12
Outerjoin motivation

Example: a master class list

SELECT c.CID, s.SID
FROM Enroll e, Student s, Course c
WHERE e.SID = s.SID and c.CID = e.CID;

What if a student take no classes


What if a course with no student enrolled yet?


For these students, CID column should be NULL
For these courses, SID should be NULL
Therefore the above query does not return the complete
list!
7/18/2015
Chen Qian @ University of Kentucky
13
Outerjoin

SELECT * FROM R FULL OUTER JOIN S ON p;

A full outer join between R and S (denoted R
includes all rows in the result of R pS, plus


S)
“Dangling” R rows (those that do not join with any S
rows) padded with NULL’s for S’s columns
“Dangling” S rows (those that do not join with any R
rows) padded with NULL’s for R’s columns
7/18/2015
Chen Qian @ University of Kentucky
14
Outerjoin (II)

SELECT * FROM R LEFT

SELECT * FROM R RIGHT OUTER JOIN S ON p;

A left outer join includes rows in R S plus dangling R rows
padded with NULL’s
A right outer join includes rows in R S plus dangling S rows
padded with NULL’s

7/18/2015
OUTER JOIN S ON p;;
Chen Qian @ University of Kentucky
15
Outerjoin examples
SELECT * FROM Employee LEFT OUTER
JOIN Department ON Eid = Mid
Employee
Eid
Name
Eid
Name
Did
Mid
Dname
1123
John Smith
1123
John Smith
4
1123
Research
1234
Mary Carter
1234
Mary Carter
5
1234
Finance
1311
Bob Lee
1311
Bob Lee
NULL
NULL
NULL
SELECT * FROM Employee FULL
RIGHTOUTER
OUTER
JOIN Department ON Eid = Mid
Department
Did
Mid
Dname
Eid
Name
Did
Mid
Dname
4
1123
Research
1123
John Smith
4
1123
Research
5
1234
Finance
1234
Mary Carter
5
1234
Finance
6
1312
HR
NULL
1311
Bob
NULL
Lee
NULL
6
NULL
1312
NULL
HR
NULL
NULL
6
1312
HR
7/18/2015
Chen Qian @ University of Kentucky
16
Summary

Query







SELECT-FROM-WHERE statements
Ordering
Set and bag operations
Aggregation and grouping
Table expressions, subqueries
NULL
Outerjoins
7/18/2015
Chen Qian @ University of Kentucky
17
Exercise


Goal: Get the Name of one’s supervisor
EMPLOYEE (Eid, Mid, Name)
SQL Statement:
SELECT EID, Name FROM EMPLOYEE WHERE MId
=Eid;
SELECT E1.EID, E2.Name FROM EMPLOYEE E1,
EMPLOYEE E2 WHERE E1.MId = E2.EId;
Eid
Mid
Name
Eid
Name
1123
1234
John Smith
1123
Mary Carter
1234
1311
Mary Carter
1234
Bob Lee
1311
1611
Bob Lee
1311
Jack Snow
1455
1611
Lisa Wang
1455
Jack Snow
1611
1611
Jack Snow
1611
Jack Snow
7/18/2015
Chen Qian @ University of Kentucky
18

SELECT DISTINCT E2.Name FROM EMPLOYEE
E1, EMPLOYEE E2 WHERE E1.MId = E2.EId;
Name
Mary Carter
Bob Lee
Jack Snow
7/18/2015
Chen Qian @ University of Kentucky
19
More on Nested Queries: Scoping

To find out which table a column belongs to



Start with the immediately surrounding query
If not found, look in the one surrounding that; repeat if
necessary
Use table_name.column_name notation and AS
(renaming) to avoid confusion
7/18/2015
Chen Qian @ University of Kentucky
20
An Example
SELECT * FROM Student s
WHERE EXISTS
(SELECT * FROM Enroll e
WHERE SID = s.SID
AND EXISTS
(SELECT * FROM Enroll
WHERE SID = s.SID
AND CID <> e.CID));
Students who are taking at least two courses
7/18/2015
Chen Qian @ University of Kentucky
21
Quantified subqueries


A quantified subquery can be used as a value in a WHERE
condition
Universal quantification (for all):
… WHERE x op ALL (subquery) …


True iff for all t in the result of subquery, x op t
Existential quantification (exists):
… WHERE x op ANY (subquery) …


True iff there exists some t in the result of subquery such that x op t
Beware


7/18/2015
In common parlance, “any” and “all” seem to be synonyms
In SQL, ANY really means “some”
Chen Qian @ University of Kentucky
22
Examples of quantified subqueries

Which employees have the highest salary?
Employee (Sid, Name, Salary)
 SELECT *
FROM Employee
WHERE Salary >= ALL
(SELECT Salary FROM Employee);

How about the lowest salary?

SELECT *
FROM Employee
WHERE Salary <= ALL
(SELECT salary FROM Employee);
7/18/2015
Chen Qian @ University of Kentucky
23
More ways of getting the highest Salary

Who has the highest Salary?


SELECT * FROM Employee e
WHERE NOT EXISTS
(SELECT * FROM Employee
WHERE Salary > e.Salary);
SELECT * FROM Employee
WHERE Eid NOT IN
(SELECT e1.SID
FROM Employee e1, Employee e2
WHERE e1.Salary < e2.Salary);
7/18/2015
Chen Qian @ University of Kentucky
24
Nested Queries

Nested queries do not add expression power to SQL



For convenient
Write intuitive SQL queries
Can always use SQL queries without nesting to complete
the same task (though sometime it is hard)
7/18/2015
Chen Qian @ University of Kentucky
25
More Exercise
Sailors (sid: INTEGER, sname: string, rating: INTEGER, age:
REAL)
Boats (bid: INTEGER, bname: string, color: string)
Reserves (sid: INTEGER, bid: INTEGER)
Sailors
Boats
sid
1
sname
Fred
2
3
Jim
2
Nancy 8
Reserves
7/18/2015
rating
7
sid
bid
1
2
102
102
age
22
bid
101
bname
Nina
color
red
39
27
102
103
Pinta
Santa
Maria
blue
red
Chen Qian @ University of Kentucky
26
Exercise I



Find sid’s of sailors who’ve reserved a red AND a green boat
SELECT R1.sid
FROM Boats B1, Reserves R1, Boats B2, Reserves R2
WHERE B1.color=‘red’ AND B2.color=‘green’
AND R1.bid=B1.bid AND R2.bid=B2.bid
AND R1.sid=R2.sid
SELECT sid
FROM Boats, Reserves
Where B.color = ‘red’
INTERSECT
(SELECT sid
FROM Boats, Reserves
Where B.color = ‘green’)
7/18/2015
Chen Qian @ University of Kentucky
27
Exercise II

Find sid’s of sailors who have not reserved a boat
SELECT sid
Non-monotonic operation!
FROM Sailors
EXCEPT
(SELECT S.sid
FROM Sailors S, Reserves R
WHERE S.sid=R.sid)
7/18/2015
Chen Qian @ University of Kentucky
28
Exercise III (a tough one)

Find sailors who’ve reserved all boats
Non-monotonic operation!
SELECT Sid
FROM Sailor
#All sailors
EXCEPT
#Those who do not reserve all
SELECT Sid
boats
FROM
#All possible combinations
(SELECT bid, sid
between sid and bid
FROM Boat, Sailor
EXCEPT
#Existing reservations
Reserves)
7/18/2015
Chen Qian @ University of Kentucky
29
Summary of SQL features
covered so far

SELECT-FROM-WHERE statements


Set and bag operations
Ordering
Aggregation and grouping
Table expressions, subqueries
NULL’s and outerjoins

Next: data modification statements, constraints



7/18/2015
Chen Qian @ University of Kentucky
30
Review: Create Table in SQL

An SQL relation is defined using the create table
command:
CREATE TABLE r (A1 D1, A2 D2, ..., An Dn,
(integrity-constraint1),
...,
(integrity-constraintk))



r is the name of the relation
each Ai is an attribute name in the schema of relation r
Di is the data type of values in the domain of attribute Ai
7/18/2015
Chen Qian @ Univ. of Kentucky
31
Constrains in SQL

Attribute constraint



Key constraint



NOT NULL
DEFAULT <value>
PRIMARY KEY
UNIQUE
Referential Integrity

FOREIGN KEY
7/18/2015
Chen Qian @ Univ. of Kentucky
32
Constraints in Table Declaration

Restrictions on allowable data in a database




In addition to the simple structure and type restrictions
imposed by the table definitions
Declared as part of the schema
Enforced by the DBMS
Why use constraints?


Protect data integrity (catch errors)
Tell the DBMS about the data (so it can optimize better)
7/18/2015
Chen Qian @ Univ. of Kentucky
33
Types of SQL constraints



Key constraint
NOT NULL
Referential integrity (foreign key)
7/18/2015
Chen Qian @ Univ. of Kentucky
34
Key declaration

At most one PRIMARY KEY per table



Typically implies a primary index
Rows are stored inside the index, typically sorted by the
primary key value ( best speedup for queries)
Any number of UNIQUE keys per table


Typically implies a secondary index
To speedup queries
7/18/2015
Chen Qian @ Univ. of Kentucky
35
NOT NULL

Indicates that the value of an attribute can not be NULL
7/18/2015
Chen Qian @ Univ. of Kentucky
36
Examples



CREATE TABLE Student
(SID INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(30) NOT NULL,
Used for query
email VARCHAR(30) UNIQUE,
age INTEGER,
processing
GPA FLOAT);
CREATE TABLE Course
(CID CHAR(10) NOT NULL PRIMARY KEY,
title VARCHAR(100) NOT NULL);
CREATE TABLE Enroll
(SID INTEGER NOT NULL,
CID CHAR(10) NOT NULL,
PRIMARY KEY(SID, CID));
This form is required for multi-attribute keys
7/18/2015
Chen Qian @ Univ. of Kentucky
37
Referential integrity example



Enroll.SID references Student.SID
 If an SID appears in Enroll, it must appear in Student
Enroll.CID references Course.CID
 If a CID appears in Enroll, it must appear in Course
That is, no “dangling pointers”
7/18/2015
Chen Qian @ Univ. of Kentucky
38
Referential integrity in SQL

Referenced column(s) must be PRIMARY KEY
Referencing column(s) form a FOREIGN KEY

Example


CREATE TABLE Enroll
(SID INTEGER NOT NULL,
CID CHAR(10) NOT NULL,
PRIMARY KEY(SID, CID),
FOREIGN KEY CID REFERENCES
Course(CID)
FOREIGN KEY SID REFERENCES
Student(SID));
7/18/2015
Chen Qian @ Univ. of Kentucky
39
Enforcing referential integrity
Example: Enroll.SID references Student.SID
 Insert or update an Enroll row so it refers to a nonexistent SID


Reject
Delete or update a Student row whose SID is referenced
by some Enroll row





Reject
Cascade: ripple changes to all referring rows
Set NULL: set all references to NULL
Set DEFAULT: use the default value of the attribute
All four options can be specified in SQL
7/18/2015
Chen Qian @ Univ. of Kentucky
40

Quiz and homework due at the beginning of next class!
7/18/2015
Chen Qian @ University of Kentucky
41