CS186: Introduction to Database Systems

Download Report

Transcript CS186: Introduction to Database Systems

EECS 647: Introduction to
Database Systems
Instructor: Luke Huan
Spring 2007
p
Administrative

Take home background survey is assigned.



Due Feb 5th at the beginning of class meeting time.
Be on time.
Your PostgreSQL passwd has been sent to you by email.


Do not change your passwd.
Go though the on-line tutorial (which will be your first
homework)
7/21/2015
Luke Huan Univ. of Kansas
2
Review
Informal Terms
Formal Terms
Table
Relation
Column
Attribute/Domain
Row
Tuple
Values in a column
Domain
Table Definition
Schema of a Relation
Populated Table
Extension
7/21/2015
Luke Huan Univ. of Kansas
3
Foreign Keys

Foreign key : Set of fields in one relation that is used to

`refer’ to a tuple in another relation. (Must correspond
to primary key of the second relation.) Like a `logical
pointer’.
E.g. sid is a foreign key referring to Students:


Student(sid: string, name: string, gpa: float)
Enrolled(sid: string, cid: string, grade: string)
Referential Integrity

Referential integrity

The value of a foreign key in a tuple can either be




A existing value from the corresponding primary key
Or , NULL
If all foreign key constraints are enforced, referential
integrity is achieved, i.e., no dangling references.
Can you name a data model w/o referential integrity?

Links in HTML!
7/21/2015
Luke Huan Univ. of Kansas
5
Example of Referential Integrity
Enrolled
sid
53666
53666
53650
53666
cid
grade
Carnatic101
C
Reggae203
B
Topology112
A
History105
B
Students
sid
53666
53688
53650
name
login
Jones jones@cs
Smith smith@eecs
Smith smith@math
age
18
18
19
gpa
3.4
3.2
3.8
11111 English102 A
7/21/2015
Luke Huan Univ. of Kansas
6
How to Identify Foreign Keys

Step 1: list the schemas:



Student(sid: string, name: string, gpa: float)
Enrolled(sid: string, cid: string, grade: string)
Step 2: list the primary keys


Student(sid: string, name: string, gpa: float)
Enrolled(sid: string, cid: string, grade: string)
How to Identify Foreign Keys (cont.)

Step 3: Identify as foreign key where a primary key in
relation A is used in another relation B




A is a referenced relation
B is the referencing relation
The primary key in A is the foreign key in B
Step 4: Use directed line to connect B to A

Student(sid: string, name: string, gpa: float)

Enrolled(sid: string, cid: string, grade: string)

7/21/2015
Sid in Enrolled is the foreign key in Student
Luke Huan Univ. of Kansas
8
In-Class Exercise
(Taken from Exercise 5.16)
Consider the following relations for a database that keeps track of
student enrollment in courses and the books adopted for each
course:
STUDENT(SSN, Name, Major, Bdate)
COURSE(Course#, Cname, Dept)
ENROLL(SSN, Course#, Quarter, Grade)
BOOK_ADOPTION(Course#, Quarter, Book_ISBN)
TEXT(Book_ISBN, Book_Title, Publisher, Author)
Draw a relational schema diagram specifying the foreign keys
for
this schema.
7/21/2015
Luke Huan Univ. of Kansas
9
Constrains Summary

Attributes value must come from its domain


Every relation mush have a primary key


Key constraint
The primary key value in a tuple can not be NULL


Domain constraint
Entity integrity
Referential integrity

The foreign key value in a referenced tuple must exist or
the foreign key value in the referencing tuple is NULL
7/21/2015
Luke Huan Univ. of Kansas
10
Additional Types of Constraints

Semantic Integrity Constraints:




7/21/2015
based on application semantics and cannot be
expressed by the model per se
e.g., “the max. no. of hours per employee for all
projects he or she works on is 56 hrs per week”
A constraint specification language may have to be
used to express these
SQL-99 allows triggers and ASSERTIONS to allow
for some of these
Luke Huan Univ. of Kansas
11
Update Operations on Relations

Update operations




INSERT a tuple.
DELETE a tuple.
MODIFY a tuple.
Constraints should not be violated in updates
7/21/2015
Luke Huan Univ. of Kansas
12
Example

We have the following relational schemas





Student(sid: string, name: string, gpa: float)
Course(cid: string, department: string)
Enrolled(sid: string, cid: string, grade: character)
We have the following sequence of database update
operations. (assume all tables are empty before we apply
any operations)
INSERT<‘1234’, ‘John Smith’, ‘3.5> into Student
sid
1234
7/21/2015
name
John Smith
Luke Huan Univ. of Kansas
gpa
3.5
13
Example (Cont.)




INSERT<‘647’,
‘EECS’> into Courses
INSERT<‘1234’,
‘647’, ‘B’> into
Enrolled
UPDATE the grade in
the Enrolled tuple with
sid = 1234 and cid =
647 to ‘A’.
DELETE the Enrolled
tuple with sid 1234
and cid 647
7/21/2015
sid
1234
name
John Smith
cid
647
department
EECS
sid
cid
grade
1234
647
A
B
Luke Huan Univ. of Kansas
gpa
3.5
14
Exercise



INSERT<‘108’,
‘MATH’> into
Courses
INSERT<‘1234’,
‘108’, ‘B’> into
Enrolled
INSERT<‘1123’,
‘Mary Carter’, ‘3.8’>
into Student
7/21/2015
sid
1234
name
John Smith
gpa
3.5
1123
Mary Carter
3.8
cid
647
108
department
EECS
MATH
sid
1234
cid
108
Luke Huan Univ. of Kansas
grade
B
15
Exercise (cont.)


A little bit tricky
INSERT<‘1125’, ‘Bob
Lee’, ‘good’> into
Student


INSERT<‘1123’,
NULL, ‘B’> into
Enrolled


Fail due to domain
constraint
name
John Smith
gpa
3.5
1123
Mary Carter
3.8
cid
647
department
EECS
108
MATH
sid
1234
cid
108
Fail due to entity
integrity
INSERT
<‘1233’,’647’, ‘A’>
into Enrolled

sid
1234
grade
B
Failed due to
referential integrity
7/21/2015
Luke Huan Univ. of Kansas
16
Exercise (cont.)


A more tricky one
UPDATE the cid in the
tuple from Course
where cid = 108 to
109
7/21/2015
sid
1234
name
John Smith
gpa
3.5
1123
Mary Carter
3.8
cid
647
department
EECS
108
109
MATH
sid
1234
cid
108
109
Luke Huan Univ. of Kansas
grade
B
17
Update Operations on Relations

In case of integrity violation, several actions can
be taken:




7/21/2015
Cancel the operation that causes the violation
(REJECT option)
Perform the operation but inform the user of the
violation
Trigger additional updates so the violation is
corrected (CASCADE option, SET NULL option)
Execute a user-specified error-correction routine
Luke Huan Univ. of Kansas
18
Relational Query Languages


Query languages: Allow manipulation and retrieval of
data from a database.
Relational model supports simple, powerful QLs:



Strong formal foundation based on logic.
Allows for much optimization.
Query Languages != programming languages!


QLs not intended to be used for complex calculations
and inference (e.g. logical reasoning)
QLs support easy, efficient access to large data sets.
Formal Relational Query Languages
Two mathematical Query Languages form the basis for
“real” languages (e.g. SQL), and for implementation:
Relational Algebra: More operational, very useful for
representing execution plans.
Relational Calculus: Lets users describe what they
want, rather than how to compute it. (Nonprocedural, declarative.)
* Understanding Algebra & Calculus is key to
understanding SQL, query processing!
Relational algebra
A language for querying relational databases based on operators:
RelOp
RelOp

Core set of operators:


Additional, derived operators:


Selection, projection, cross product, union, difference, and renaming
Join, natural join, intersection, etc.
Compose operators to make complex queries
7/21/2015
Luke Huan Univ. of Kansas
21
Selection


Input: a table R
Notation: p R



p is called a selection condition/predicate
Purpose: filter rows according to some criteria
Output: same columns as R, but only rows of R that
satisfy p
7/21/2015
Luke Huan Univ. of Kansas
22
Selection example
Students with GPA higher than 3.0
GPA > 3.0 Student

sid
name
age
gpa
sid
name
age
gpa
1234
John Smith
21
3.5
1234
John Smith
21
3.5
1123
Mary Carter
22
3.8
1123
Mary Carter
22
3.8
1011
Bob Lee
22
2.6
1011
Bob Lee
22
2.6
1204
Susan Wong
22
3.4
1204
Susan Wong
22
3.4
1306
Kevin Kim
21
2.9
1306
Kevin Kim
21
2.9
7/21/2015
GPA > 3.0
Luke Huan Univ. of Kansas
23
More on selection

Selection predicate in general can include any column of
R, constants, comparisons (=, ·, etc.), and Boolean
connectives (: and, : or, and :: not)


Example: straight A students under 18 or over 21
GPA ¸ 4.0 Æ (age < 18 Ç age > 21) Student
But you must be able to evaluate the predicate over a
single row of the input table

Example: student with the highest GPA
GPA ¸ all GPA in Student table Student
7/21/2015
Luke Huan Univ. of Kansas
24
Projection


Input: a table R
Notation: ¼L R



L is a list of columns in R
Purpose: select columns to output
Output: same rows, but only the columns in L


Order of the rows is preserved
Number of rows may be less (depends on where we have
duplicates or not)
7/21/2015
Luke Huan Univ. of Kansas
25
Projection example

ID’s and names of all students
¼SID, name Student
sid
name
age
gpa
sid
name
1234
John Smith
21
3.5
1234
John Smith
1123
Mary Carter
22
3.8
1123
Mary Carter
1011
Bob Lee
22
2.6
1011
Bob Lee
1204
Susan Wong
22
3.4
1204
Susan Wong
1306
Kevin Kim
21
2.9
1306
Kevin Kim
7/21/2015
¼SID, name
Luke Huan Univ. of Kansas
26
More on projection

Duplicate output rows are removed (by definition)

Example: student ages
¼age Student
sid
name
age
gpa
age
1234
John Smith
21
3.5
21
1123
Mary Carter
22
3.8
22
1011
Bob Lee
22
2.6
1204
Susan Wong
22
3.4
1306
Kevin Kim
21
2.9
7/21/2015
¼age
Luke Huan Univ. of Kansas
22
22
21
27
Cross product




Input: two tables R and S
Notation: R £ S
Purpose: pairs rows from two tables
Output: for each row r in R and each row s in S, output a
row rs (concatenation of r and s)
7/21/2015
Luke Huan Univ. of Kansas
28
Cross product example

Student £ Enroll
sid
name
age
gpa
sid
cid
grade
1234
John Smith
21
3.5
1234
647
A
1123
Mary Carter
22
3.8
1123
108
A
1011
Bob Lee
22
2.6
7/21/2015
×
sid
name
age
gpa
sid
cid
grade
1234
John Smith
21
3.5
1234
647
A
1123
Mary Carter
22
3.8
1234
647
A
1011
Bob Lee
22
2.6
1234
647
A
1234
John Smith
21
3.5
1123
108
A
1123
Mary Carter
22
3.8
1123
108
A
1011
Bob Lee
22
2.6
1123
108
A
Luke Huan Univ. of Kansas
29
A note on column ordering


The ordering of columns in a table is considered
unimportant (as is the ordering of rows)
sid
name
age
gpa
sid
name
gpa
age
1234
John Smith
21
3.5
1234
John Smith
3.5
21
1123
Mary Carter
22
3.8
1123
Mary Carter
3.8
22
1011
Bob Lee
22
2.6
1011
Bob Lee
2.6
22
=
That means cross product is commutative, i.e.,
R £ S = S £ R for any R and S
7/21/2015
Luke Huan Univ. of Kansas
30
Derived operator: join


Input: two tables R and S
Notation: R !p S




p is called a join condition/predicate
Purpose: relate rows from two tables according to some
criteria
Output: for each row r in R and each row s in S, output a
row rs if r and s satisfy p
Shorthand for ¾p ( R £ S )
7/21/2015
Luke Huan Univ. of Kansas
31
Join example

Info about students, plus CID’s of their courses
Student !Student.SID = Enroll.SID Enroll
sid
name
age
gpa
sid
cid
grade
1234
John Smith
21
3.5
647
A
1123
Mary Carter
22
3.8
!
1234
1123
108
A
1011
Bob Lee
22
2.6
Student.SID =
Enroll.SID
Use table_name. column_name syntax
to disambiguate
sid
name
identically named 1234
John Smith
columns from
1123
Mary Carter
different input
1011
Bob Lee
tables
1234
John Smith
7/21/2015
age
gpa
sid
cid
grade
21
3.5
1234
647
A
22
3.8
1234
647
A
22
2.6
1234
647
A
21
3.5
1123
108
A
1123
Mary Carter
22
3.8
1123
108
A
1011
Bob Lee
22
2.6
1123
108
A
Luke Huan Univ. of Kansas
32
Derived operator: natural join



Input: two tables R and S
Notation: R S
Purpose: relate rows from two tables, and



Enforce equality on all common attributes
Eliminate one copy of common attributes
Shorthand for ¼L ( R !p S ), where


p equates all attributes common to R and S
L is the union of all attributes from R and S, with duplicate
attributes removed
7/21/2015
Luke Huan Univ. of Kansas
33
Natural join example

Student  Enroll = ¼L ( Student !p Enroll )
= ¼SID, name, age, GPA, CID ( Student !Student.SID = Enroll.SID Enroll )
sid
name
age
gpa
sid
cid
grade
1234
John Smith
21
3.5
1234
647
A
1123
Mary Carter
22
3.8
1123
108
A
1011
Bob Lee
22
2.6
7/21/2015

sid
name
age
gpa
sid
cid
grade
1234
John Smith
21
3.5
1234
647
A
1123
Mary Carter
22
3.8
1234
647
A
1011
Bob Lee
22
2.6
1234
647
A
1234
John Smith
21
3.5
1123
108
A
1123
Mary Carter
22
3.8
1123
108
A
1011
Bob Lee
22
2.6
1123
108
A
Luke Huan Univ. of Kansas
34
Union


Input: two tables R and S
Notation: R [ S


R and S must have identical schema
Output:


Has the same schema as R and S
Contains all rows in R and all rows in S, with duplicate
rows eliminated
7/21/2015
Luke Huan Univ. of Kansas
35
Difference


Input: two tables R and S
Notation: R ¡ S


R and S must have identical schema
Output:


Has the same schema as R and S
Contains all rows in R that are not found in S
7/21/2015
Luke Huan Univ. of Kansas
36
Derived operator: intersection


Input: two tables R and S
Notation: R \ S


Output:





R and S must have identical schema
Has the same schema as R and S
Contains all rows that are in both R and S
Shorthand for R ¡ ( R ¡ S )
Also equivalent to S ¡ ( S ¡ R )
And to R  S
7/21/2015
Luke Huan Univ. of Kansas
37
Renaming





Input: a table R
Notation: ½S R, ½(A1, A2, …) R or ½S(A1, A2, …) R
Purpose: rename a table and/or its columns
Output: a renamed table with the same rows as R
Used to


Avoid confusion caused by identical column names
Create identical columns names for natural joins
7/21/2015
Luke Huan Univ. of Kansas
38
Renaming Example

Enroll1(SID1, CID1,Grade1) Enroll
sid
cid
grade
1234
647
A
1123
108
A
7/21/2015
sid1
cid1
grade1
Enroll1(SID1,
1234
647
A
CID1,Grade1)
1123
108
A
Luke Huan Univ. of Kansas
39
Review: Summary of core operators






Selection:
Projection:
Cross product:
Union:
Difference:
Renaming:

¾p R
¼L R
R£S
R[S
R¡S
½ S(A1, A2, …) R
Does not really add
“processing” power
7/21/2015
Luke Huan Univ. of Kansas
40
Review Summary of derived operators

Join:
Natural join:
Intersection:

Many more




R !p S
RS
R\S
Outer join, Division,
Semijoin, anti-semijoin, …
7/21/2015
Luke Huan Univ. of Kansas
41