CS 157B Database Systems

Download Report

Transcript CS 157B Database Systems

CS 157B
Database Systems
Dr. T Y Lin
Delete old p37 and change old
p40.
CS 157B
Database Systems
Review Relational Algebra
2.4 An Algebraic Query Language
• The set of real numbers has a natural
algebraic structure.
• The set of relations (tables) has a natural
algebraic structure, called Relational
algebra
• For now, we will not look at it as a query
language
2.4 An Algebraic Query Language
•
•
•
•
•
•
•
•
•
•
•
•
•
•
2.4.1 Why Do We Need a Special Query Language?
2.4.2 What is an Algebra?
2.4.3 Overview of Relational Algebra
2.4.4 Set Operations on Relations
2.4.5 Projection
2.4.6 Selection
2.4.7 Cartesian Product
2.4.8 Natural Joins
2.4.9 Theta-Joins
2.4.10 Combining Operations to Form Queries
2.4.11 Naming and Renaming
2.4.12 Relationships Among Operations
2.4.13 A Linear Notation for Algebraic Expressions
2.4.14 Exercises for Section 2.4
2.4.1 Why Do We Need a Special
Query Language?
2.4.2 What is an Algebra?
2.4.3 Overview of Relational
Algebra
• A sample From CS157A Exam:
Choose appropriate names for the following SQL
SELECT
TITLE,LENGTH
FROM
Movie
Special Relational Operators
• Projection
• Selection
• Natural Join
Traditional Set Operators
4. Cartesian Product
5. Union
6. Intersect
7. Minus
2.4.3 Overview of Relational
Algebra
• NUMERICAL ALGEBRA handles
operations involving numbers:
• RELATIONAL ALGEBRA handles
operations among relations.
2.4.3 Overview of Relational
Algebra
+
*
/
Projection
Selection
Natural Join
Cartesian
Product
Union
Intersect
Minus
2.4.3 Overview of Relational
Algebra
• Both numerical algebra and relational
algebra are defined by (respective sets of)
• Algebraic Laws.
2.4.3 Overview of Relational
Algebra
Numerical Algebra is defined by
• Algebraic laws
that we have learned in calculus, algebra
and etc since high school days
2.4.3 Overview of Relational
Algebra
• Algebraic laws
of relational algebra are
•
“New” to most of us
QUERY OPTIMIZATION
• In query optimization the query is
transformed by compiler into such a form
that can solve the problem “fastest ”
QUERY OPTIMIZATION
•
•
Illustration: 2 * 3 + 5 * 3
From the laws of Numerical algebra, it
can be computed as follows:.
Method I - It computes in three operations
• 2 * 3 + 5 * 3 = 6 (first) + 15(second)
= 21 (third)
QUERY OPTIMIZATION
•
•
Illustration: 2 * 3 + 5 * 3
From the laws of Numerical algebra, it
can Also be computed as follows:.
Method II- transform into equivalent one.
= (2 + 5) * 3 = 7 (first) * 3 = 21 (second)
• It computes in two operations
So compiler choose Method II
Roles of Relational Algebra
• QUERY OPTIMIZATION
• Similar Idea is used in Relational Algebra
• So we need to know the
• Algebraic Laws
of Relational Algebra well
• That is the main goal for Ch5 and part of
Ch16
2.4.4 Set Operations on Relations
1. Recall Venn Diagram
Y
X
a
b
d
c
e
2.4.4 Set Operations on Relations
Name
Address
Gender
Birthdate
Carrie Fisher
123 Maple st., Hollywood
F
9/9/99
Mark hamill
456 Oak road., Brentwood
M
8/8/88
Relation R
Name
Address
Gender
Birthdate
Carrie Fisher
123 Maple st., Hollywood
F
9/9/99
Harrison Ford
789 Palm Dr., Beverly Hills
M
7/7/77
Relation S
Figure 2.12 Two Relations
Example 2.8
• Relation R and S
• UNION (R  S) is
Name
Address
Gender
Birthdate
Carrie Fisher
123 Maple st., Hollywood
F
9/9/99
Mark Hamill
456 oak Rd., Brentwood
M
8/8/88
Harrison Ford
789 Palm Dr., Beverly Hills
M
7/7/77
• Note: the two tuples for Carrie Fisher from
two relations appear only once in result
2.4.4 Set Operations on Relations
• The Intersection (RS)
Name
Address
Gender
Birthdate
Carrie Fisher
123 Maple st.,
Hollywood
F
9/9/99
• Now, Only the Carrie Fisher tuple appears, because only
it is in both relations. The Difference is R-S is
Name
Address
Gender
Birthdate
Mark
Hamill
456 oak Rd.,
Brentwood
M
8/8/88
• Fisher and Hamill tuples appear in R and thus are
candidates for R-S. thus Fisher appear in S.
• so is not R-S.
2.4.5 Projection
Projection
Title
Year
Length
Genre
Studioname
producerC#
Star Wars
1977
124
SciFi
Fox
12345
Galaxy
1999
104
Comedy
DreamWorks
67890
Wayne’s
World
1992
95
Comedy
Paramount
99999
Figure 2.13 the Relation Movies
Example 2.9
Title,year,length (Movies)
Title
Year
Length
Star Wars
1977
124
Galaxy Quest
1999
104
Wayne’s World
1992
95
 genre (Movies)
Genre
SciFi
Comedy
Comedy
2.4.6 Selection
Example 2.10
• The relation Movies be as in Fig 2.13. then
the value of expression
length >= 100(Movie) is
Title
Year
Length
Genre
StudioName
producerC#
Star Wars
1977
124
SciFi
Fox
12345
Galaxy
1999
104
Comedy DreamWorks
67890
Example 2.11
• Set tuples in the relation movies that
represent Fox Movies at least 100 minutes
long.
• You can use AND for more than one
condition.
 Length >= 100 AND studioName = ‘Fox’ (Movies)
Title
Year
Length
Genre
StudioName
producerC#
Star Wars
1977
124
SciFi
Fox
12345
Is the only one in the resulting relation.
2.4.7 Cartesian Product
2.4.7 Cartesian Product
A
B
1
2
3
4
Relation R X S
Relation R
A
R.B
S.B
C
D
1
2
2
5
6
1
2
4
7
8
B
C
D
2
5
6
1
2
9
10
11
4
7
8
3
4
2
5
6
9
10
11
3
4
4
7
8
3
4
9
10
11
Relation S
2.4.8 Natural Joins
2.4.8 Natural Joins
A
B
1
2
3
4
Relation R
Relation
RS
A
R.B
S.B
C
D
1
2
2
5
6
1
2
4
7
8
B
C
D
2
5
6
1
2
9
10
11
4
7
8
3
4
2
5
6
9
10
11
3
4
4
7
8
3
4
9
10
11
Relation S
Example 2.13
The natural join is:
A
B
C
D
1
2
5
6
3
4
7
8
2.4.9 Theta-Joins
2.4.9 Theta-Joins
U A<D V
A
B
C
1
2
3
6
7
8
9
7
8
A
B
C
D
(a)
1
2
3
4
1
2
3
5
Relation U
B
C
D
6
7
8
10
2
3
4
9
7
8
10
2
3
5
7
8
10
Relation V
(b)
Figure 2.16 Natural join of relations
Example 2.15
• Consider all nine pairs of tuples, one from
each relation.
• Whether A component from U-tuple is less
than the D Component of the V-tuple.
A
U.B
U.C
V.B
V.C
D
1
2
3
2
3
4
1
2
3
2
3
5
1
2
3
7
8
10
6
7
8
7
8
10
9
7
8
7
8
10
Figure 2.17 Result of U  A < D V
Example 2.16
• U and V that has more complex condition :
U A < D AND (U.B != V.B) V
We require for successful pairing not only that the A component of
U-tuple be less than D component of the V-tuple, but that the two
tuples disagree on their respective B components
A
U.B
U.C
V.B
V.C
D
1
2
3
7
8
10
is the only one to satisfy both conditions, so this relation is the result
of the theta-join.
2.4.10 Combining Operations to
Form Queries
Combining Operations
• Let us find “the title and year of movies
made by fox that are at least 100 minutes
long.
Title
Year
Length
InColor
StudioNa
m
Producer
C#
Star Wars
1977
124
true
fox
12345
Might
Ducks
1991
104
true
Disey
Paramoun
t
67890
Wayne’s
world
1992
95
true
99999
Steps to create expression tree
1. note that all data is in one table.
2. select those Movie tuples that have
length>=100
3. Select those Movie tuples that have
studioName=‘Fox’
4. Find the intersection of steps 2 ad 3. we
want 2 AND 3.
5. Project the relation form step 4 onto
attributes title and year.
Expression Tree
 Title,year (σ length >=100 (Movies) ∩ σ StudioName =‘Fox’ (Movies)
 Title,year (σ length >=100 AND StudioName =‘Fox’ (Movies)
 title, year
∩
σlength>=100
σStudioName=‘Fox
’
movie
Movie
Now the same expression
represented linearly
• πtitle,
year(σlength>=100(movie)∩σstudioName=‘fox’(movie
))
• Note that every parentheses represents
relation.
• Movie is a relation after the intersection we
get a relation that comply with our
condones and finally we use projection to
get our two attributes relation.
• Get the social security number & last
name of each employee who works in
department #5.
•
SSN, LNAME (DNO = 5(Employee) )
• Get the snum who locate in London.
•
SNUM (CITY = London(Suppliers))
2.4.11 Naming and Renaming
Example 2.18
A
B
X
C
D
1
2
2
5
6
1
2
4
7
8
1
2
9
10
11
3
4
2
5
6
3
4
4
7
8
3
4
9
10
11
Figure 2.19 R X ρS(X,C,D) (S)
2.4.12 Relationships Among
Operations
• RS= L (C(R S))
• RS= A,U.B, U.C, D(U.B=V.B AND U.C=V.B (UV))
• A <D AND U.BV.B (UV) =
A <D (UV)  U.BV.B (UV)
2.4.12 Relationships Among
Operations
• Many of these are obvious; we will pick
them up in illustrating examples
Please prepare a cheat sheet for the
“algebraic laws”
• RS = R –(R –S)
• RCS= C(R S)
2.4.13 A Linear Notation for
Algebraic Expressions
2.4.14 Exercises for Section 2.4
• Build the four relations in 2.4.14 using the
data in Figure 2.20-2.21
• Submit the “create-populate files”