CS186: Introduction to Database Systems

Download Report

Transcript CS186: Introduction to Database Systems

EECS 647: Introduction to
Database Systems
Instructor: Luke Huan
Spring 2007
Administrative

Final project is posted at the class website.

Go through it and we will discuss the final project at next
class meeting
7/20/2015
Luke Huan Univ. of Kansas
2
Review







SFW statement:
Sort the outputs:
USE bag operation:
Enforce set operation:
Output average:
Partition rows in a table:
Select certain groups:
7/20/2015
ORDER BY
Default choice
DISTINCT
AVG (aggregation function)
GROUP BY
HAVING
Luke Huan Univ. of Kansas
3
Review

Nested queries




Treat as a table
Treat as a scalar
IN
EXIST
7/20/2015
Luke Huan Univ. of Kansas
4
Today’s Topic




NULL value
Outer join
Nested queries
Table creation and constraints
7/20/2015
Luke Huan Univ. of Kansas
5
Incomplete information


Example: Student (SID, name, age, GPA)
Value unknown


We do not know Nelson’s age
Value not applicable

Nelson has not taken any classes yet; what is his GPA?
7/20/2015
Luke Huan Univ. of Kansas
6
Solution 1

A dedicated special value for each domain (type)


GPA cannot be –1, so use –1 as a special value to indicate
a missing or invalid GPA
Leads to incorrect answers if not careful


Complicates applications


SELECT AVG(GPA) FROM Student;
SELECT AVG(GPA) FROM Student
WHERE GPA <> -1;
Remember the Y2K bug?

7/20/2015
“00” was used as a missing or invalid year value
Luke Huan Univ. of Kansas
7
Solution 2

A valid-bit for every column


Student (SID, name, name_is_valid,
age, age_is_valid,
GPA, GPA_is_valid)
Complicates schema and queries

7/20/2015
SELECT AVG(GPA) FROM Student
WHERE GPA_is_valid;
Luke Huan Univ. of Kansas
8
Solution 3?

Decompose the table; missing row = missing value




StudentName (SID, name)
StudentAge (SID, age)
StudentGPA (SID, GPA)
StudentID (SID)
Conceptually the cleanest solution
Still complicates schema and queries


7/20/2015
How to get all information about a student in a table?
Would natural join work?
Luke Huan Univ. of Kansas
9
SQL’s solution

A special value NULL



For every domain
Special rules for dealing with NULL’s
Example: Student (SID, name, age, GPA)

h 789, “Nelson”, NULL, NULL i
7/20/2015
Luke Huan Univ. of Kansas
10
Three-valued logic





TRUE = 1, FALSE = 0, UNKNOWN = 0.5
x AND y = min(x, y)
x OR y = max(x, y)
NOT x = 1 – x
AND
True
True
True
False
False
NULL
UNK
False
NULL
False False
UNK False
AND
False
UNK
OR
True
True
True
False
False
NULL
TRUE
False False False
NULL TRUE UNK
OR
UNK
UNK
WHERE and HAVING clauses only select rows for output if the
condition evaluates to TRUE

UNKNOWN is not enough
7/20/2015
Luke Huan Univ. of Kansas
11
Computing with NULL’s



(Arithmetic operation) when we operate on a NULL and
another value (including another NULL) using +, –, etc.,
the result is NULL
Aggregate functions ignore NULL, except COUNT(*)
(since it counts rows)
When we compare a NULL with another value
(including another NULL) using =, >, etc., the result is
UNKNOWN
7/20/2015
Luke Huan Univ. of Kansas
12
Unfortunate consequences

SELECT AVG(GPA) FROM Student


SELECT SUM(GPA)/COUNT(*) FROM Student;



3.4
2.72
SELECT * FROM Student;
SELECT * FROM Student WHERE GPA = GPA

Not equivalent
7/20/2015
sid
name
age
gpa
1234
John Smith
21
3.5
1123
Mary Carter
19
3.8
1011
Bob Lee
22
NULL
1204
Susan Wong
22
3.4
1306
Kevin Kim
18
2.9
Luke Huan Univ. of Kansas
13
Another problem

Example: Who has NULL GPA values?

SELECT * FROM Student WHERE GPA =
NULL;


(SELECT * FROM Student)
EXCEPT ALL
(SELECT * FROM Student WHERE GPA =
GPA)


Does not work; never returns anything
Works, but ugly
Introduced built-in predicates IS NULL and IS NOT
NULL

7/20/2015
SELECT * FROM Student WHERE GPA IS NULL;
Luke Huan Univ. of Kansas
14
Outerjoin motivation

Example: a master class list

SELECT c.CID, s.SID
FROM Enroll e, Student s, Course c
WHERE e.SID = s.SID and c.CID = e.CID;

What if a student take no classes


For these students, CID column should be NULL
What if a course with no student enrolled yet?

7/20/2015
For these courses, SID should be NULL
Luke Huan Univ. of Kansas
15
Outerjoin

SELECT * FROM R FULL OUTER JOIN S ON p;

A full outer join between R and S (denoted R !
includes all rows in the result of R !pS, plus


S)
“Dangling” R rows (those that do not join with any S
rows) padded with NULL’s for S’s columns
“Dangling” S rows (those that do not join with any R
rows) padded with NULL’s for R’s columns
7/20/2015
Luke Huan Univ. of Kansas
16
Outerjoin (II)

SELECT * FROM R LEFT

SELECT * FROM R RIGHT OUTER JOIN S ON p;

A left outer join (R ! S) includes rows in R ! S plus dangling
R rows padded with NULL’s
A right outer join (R ! S) includes rows in R ! S plus dangling
S rows padded with NULL’s

7/20/2015
OUTER JOIN S ON p;;
Luke Huan Univ. of Kansas
17
Outerjoin examples
SELECT * FROM Employee LEFT OUTER
JOIN Department ON Eid = Mid
Employee
Eid
Name
Eid
Name
Did
Mid
Dname
1123
John Smith
1123
John Smith
4
1123
Research
1234
Mary Carter
1234
Mary Carter
5
1234
Finance
1311
Bob Lee
1311
Bob Lee
NULL
NULL
NULL
SELECT * FROM Employee FULL
RIGHTOUTER
OUTER
JOIN Department ON Eid = Mid
Department
Did
Mid
Dname
Eid
Name
Did
Mid
Dname
4
1123
Research
1123
John Smith
4
1123
Research
5
1234
Finance
1234
Mary Carter
5
1234
Finance
6
1312
HR
NULL
1311
Bob
NULL
Lee
NULL
6
NULL
1312
NULL
HR
NULL
NULL
6
1312
HR
7/20/2015
Luke Huan Univ. of Kansas
18
Summary

Query







SELECT-FROM-WHERE statements
Ordering
Set and bag operations
Aggregation and grouping
Table expressions, subqueries
NULL
Outerjoins
7/20/2015
Luke Huan Univ. of Kansas
19
Exercise


Goal: Get the Name of one’s supervisor
EMPLOYEE (Eid, Mid, Name)
SQL Statement:
SELECT EID, Name FROM EMPLOYEE WHERE MId
=Eid;
SELECT E1.EID, E2.Name FROM EMPLOYEE E1,
EMPLOYEE E2 WHERE E1.MId = E2.EId;
Eid
Mid
Name
Eid
Name
1123
1234
John Smith
1123
Mary Carter
1234
1311
Mary Carter
1234
Bob Lee
1311
1611
Bob Lee
1311
Jack Snow
1455
1611
Lisa Wang
1455
Jack Snow
1611
1611
Jack Snow
1611
Jack Snow
7/20/2015
Luke Huan Univ. of Kansas
20
Exercise





Goal: group employees according to their department, for each
department, list EIds of its employees and list the head count
EMPLOYEE (Eid, Name), DEPARTMENT(Eid, Did)
SQL Statement:
SELECT Did, Eid, COUNT(Eid)
FROM EMPLOYEE e, DEPARTMENT d
WHERE e.Eid = d.Eid
GROUP BY Did
There is something wrong!
Wishful thinking (list a group of Eid after grouping) won’t work
Solution: produce (1) a summary table listing Did and head count,
and (2) sort Department according to Did.
7/20/2015
Luke Huan Univ. of Kansas
21
More on Nested Queries: Scoping

To find out which table a column belongs to



Start with the immediately surrounding query
If not found, look in the one surrounding that; repeat if
necessary
Use table_name.column_name notation and AS
(renaming) to avoid confusion
7/20/2015
Luke Huan Univ. of Kansas
22
An Example
SELECT * FROM Student s
WHERE EXISTS
(SELECT * FROM Enroll e
WHERE SID = s.SID
AND EXISTS
(SELECT * FROM Enroll
WHERE SID = s.SID
AND CID <> e.CID));
Students who are taking at least two courses
7/20/2015
Luke Huan Univ. of Kansas
23
Quantified subqueries


A quantified subquery can be used as a value in a WHERE
condition
Universal quantification (for all):
… WHERE x op ALL (subquery) …


True iff for all t in the result of subquery, x op t
Existential quantification (exists):
… WHERE x op ANY (subquery) …


True iff there exists some t in the result of subquery such that x op t
Beware


7/20/2015
In common parlance, “any” and “all” seem to be synonyms
In SQL, ANY really means “some”
Luke Huan Univ. of Kansas
24
Examples of quantified subqueries

Which employees have the highest salary?
Employee (Sid, Name, Salary)
 SELECT *
FROM Employee
WHERE Salary >= ALL
(SELECT Salary FROM Employee);

How about the lowest salary?

SELECT *
FROM Employee
WHERE Salary <= ALL
(SELECT salary FROM Employee);
7/20/2015
Luke Huan Univ. of Kansas
25
More ways of getting the highest Salary

Who has the highest Salary?


SELECT * FROM Employee e
WHERE NOT EXISTS
(SELECT * FROM Employee
WHERE Salary > e.Salary);
SELECT * FROM Employee
WHERE Eid NOT IN
(SELECT e1.SID
FROM Employee e1, Employee e2
WHERE e1.Salary < e2.Salary);
7/20/2015
Luke Huan Univ. of Kansas
26
Nested Queries

Nested queries do not add expression power to SQL



For convenient
Write intuitive SQL queries
Can always use SQL queries without nesting to complete
the same task (though sometime it is hard)
7/20/2015
Luke Huan Univ. of Kansas
27
More Exercise
Sailors (sid: INTEGER, sname: string, rating: INTEGER, age:
REAL)
Boats (bid: INTEGER, bname: string, color: string)
Reserves (sid: INTEGER, bid: INTEGER)
Sailors
Boats
sid
1
sname
Fred
2
3
Jim
2
Nancy 8
Reserves
7/20/2015
rating
7
sid
bid
1
2
102
102
age
22
bid
101
bname
Nina
color
red
39
27
102
103
Pinta
Santa
Maria
blue
red
Luke Huan Univ. of Kansas
28
Exercise I



Find sid’s of sailors who’ve reserved a red AND a green boat
SELECT R1.sid
FROM Boats B1, Reserves R1, Boats B2, Reserves R2
WHERE B1.color=‘red’ AND B2.color=‘green’
AND R1.bid=B1.bid AND R2.bid=B2.bid
AND R1.sid=R2.sid
SELECT sid
FROM Boats, Reserves
Where B.color = ‘red’
INTERSECT
(SELECT sid
FROM Boats, Reserves
Where B.color = ‘green’)
7/20/2015
Luke Huan Univ. of Kansas
29
Exercise II

Find sid’s of sailors who have not reserved a boat
SELECT sid
Non-monotonic operation!
FROM Sailors
EXCEPT
(SELECT S.sid
FROM Sailors S, Reserves R
WHERE S.sid=R.sid)
7/20/2015
Luke Huan Univ. of Kansas
30
Exercise III (a tough one)

Find sailors who’ve reserved all boats
Non-monotonic operation!
SELECT Sid
FROM Sailor
#All sailors
EXCEPT
#Those who do not reserve all
SELECT Sid
boats
FROM
#All possible combinations
(SELECT bid, sid
between sid and bid
FROM Boat, Sailor
EXCEPT
#Existing reservations
Reserves)
7/20/2015
Luke Huan Univ. of Kansas
31
Summary of SQL features
covered so far

SELECT-FROM-WHERE statements


Set and bag operations
Ordering
Aggregation and grouping
Table expressions, subqueries
NULL’s and outerjoins

Next: data modification statements, constraints



7/20/2015
Luke Huan Univ. of Kansas
32
Relational Query Languages

Two sublanguages:
 DDL – Data Definition Language


DML – Data Manipulation Language



Define and modify schema
Queries can be written intuitively.
Insert, delete, update rows in a table
DBMS is responsible for efficient evaluation.



The key: precise semantics for relational queries.
Optimizer can re-order operations, without affecting
query answer.
Choices driven by “cost model”
Constraints in Table Declaration

Restrictions on allowable data in a database




In addition to the simple structure and type restrictions
imposed by the table definitions
Declared as part of the schema
Enforced by the DBMS
Why use constraints?


Protect data integrity (catch errors)
Tell the DBMS about the data (so it can optimize better)
7/20/2015
Luke Huan Univ. of Kansas
34
Types of SQL constraints



Key constraint
NOT NULL
Referential integrity (foreign key)
7/20/2015
Luke Huan Univ. of Kansas
35
Key declaration

At most one PRIMARY KEY per table



Typically implies a primary index
Rows are stored inside the index, typically sorted by the
primary key value ) best speedup for queries
Any number of UNIQUE keys per table


Typically implies a secondary index
To speedup queries
7/20/2015
Luke Huan Univ. of Kansas
36
NOT NULL

Indicates that the value of an attribute can not be NULL
7/20/2015
Luke Huan Univ. of Kansas
37
Examples



CREATE TABLE Student
(SID INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(30) NOT NULL,
Used for query
email VARCHAR(30) UNIQUE,
age INTEGER,
processing
GPA FLOAT);
CREATE TABLE Course
(CID CHAR(10) NOT NULL PRIMARY KEY,
title VARCHAR(100) NOT NULL);
CREATE TABLE Enroll
(SID INTEGER NOT NULL,
CID CHAR(10) NOT NULL,
PRIMARY KEY(SID, CID));
This form is required for multi-attribute keys
7/20/2015
Luke Huan Univ. of Kansas
38
Referential integrity example



Enroll.SID references Student.SID
 If an SID appears in Enroll, it must appear in Student
Enroll.CID references Course.CID
 If a CID appears in Enroll, it must appear in Course
That is, no “dangling pointers”
7/20/2015
Luke Huan Univ. of Kansas
39
Referential integrity in SQL

Referenced column(s) must be PRIMARY KEY
Referencing column(s) form a FOREIGN KEY

Example


CREATE TABLE Enroll
(SID INTEGER NOT NULL,
CID CHAR(10) NOT NULL,
PRIMARY KEY(SID, CID),
FOREIGN KEY CID REFERENCES
Course(CID)
FOREIGN KEY SID REFERENCES
Student(SID));
7/20/2015
Luke Huan Univ. of Kansas
40
Enforcing referential integrity
Example: Enroll.SID references Student.SID
 Insert or update an Enroll row so it refers to a nonexistent SID


Reject
Delete or update a Student row whose SID is referenced
by some Enroll row





Reject
Cascade: ripple changes to all referring rows
Set NULL: set all references to NULL
Set DEFAULT: use the default value of the attribute
All four options can be specified in SQL
7/20/2015
Luke Huan Univ. of Kansas
41
SQL Example

CREATE TABLE Enroll
(SID INTEGER NOT NULL,
CID CHAR(10) NOT NULL,
PRIMARY KEY(SID, CID),
FOREIGN KEY CID REFERENCES
Course(CID) CASCADE ON DELETE
FOREIGN KEY SID REFERENCES
Student(SID) CASCADE ON DELETE);
Delete all student enrollments for a canceled class
Delete all student enrollments for a dropped student
7/20/2015
Luke Huan Univ. of Kansas
42
Database Modification Operations



INSERT
DELETE
UPDATE
7/20/2015
Luke Huan Univ. of Kansas
43
INSERT

Insert one row

INSERT INTO Enroll VALUES (456,
’EECS647’);


Student 456 takes EECS647
Insert the result of a query

INSERT INTO Enroll
(SELECT SID, ’EECS647’ FROM Student
WHERE SID NOT IN (SELECT SID FROM
Enroll
WHERE CID = ’EECS647’));

7/20/2015
Luke Huan
Univ. of Kansas
Force everybody to take
EECS647!
44
DELETE

Delete everything


DELETE FROM Enroll;
Delete according to a WHERE condition
Example: Student 456 drops CPS116

DELETE FROM Enroll
WHERE SID = 456 AND CID = ’EECS647’;
Example: Drop students from all EECS classes with GPA lower
than 1.0

DELETE FROM Enroll
WHERE SID IN (SELECT SID FROM Student
WHERE GPA < 1.0)
AND CID LIKE ’EECS%’;
7/20/2015
Luke Huan Univ. of Kansas
45
UPDATE

Example: Student 142 changes name to “Barney”

UPDATE Student
SET name = ’Barney’
WHERE SID = 142;
7/20/2015
Luke Huan Univ. of Kansas
46
Summary of SQL features
covered so far

Query








Constraints
Modification


SELECT-FROM-WHERE statements
Set and bag operations
Table expressions, subqueries
Aggregation and grouping
Ordering
Outerjoins
INSERT/DELETE/UPDATE
Next: triggers, views, indexes
7/20/2015
Luke Huan Univ. of Kansas
47