Document 7663029

Download Report

Transcript Document 7663029

Lecture 07:
Relational Algebra
1
Outline
• Relational Algebra (Section 6.1)
2
Relational Algebra
• Formalism for creating new relations from
existing ones
• Its place in the big picture:
Declarative
query
language
SQL,
relational calculus
Algebra
Implementation
Relational algebra
3
Relational Algebra
• Five operators:
–
–
–
–
–
Union: 
Difference: Selection: s
Projection: P
Cartesian Product: 
• Derived or auxiliary operators:
– Intersection, complement
– Joins (natural,equi-join, theta join, semi-join)
– Renaming: r
4
1. Union and 2. Difference
• R1  R2
Example:
– ActiveEmployees  RetiredEmployees
• R1 – R2
Example:
– AllEmployees − RetiredEmployees
5
What about Intersection ?
• It is a derived operator
R1  R2 = R1 – (R1 – R2)
• Also expressed as a join (will see later)
Example
– UnionizedEmployees  RetiredEmployees
6
3. Selection
• Returns all tuples which satisfy a condition
• Notation: sc(R)
• Examples
– sSalary > 40000 (Employee)
– sname = “Smith” (Employee)
• The condition c can be =, <, , >, , <>
[in SQL:
SELECT * FROM Employee
WHERE Salary > 40000]
7
Selection Example
Employee
SSN
999999999
777777777
888888888
Name
John
Tony
Alice
DepartmentID
1
1
2
Salary
30,000
32,000
45,000
Find all employees with salary more than $40,000.
s Salary > 40000 (Employee)
SSN
Name
888888888 Alice
DepartmentID
2
Salary
45,000
8
4. Projection
• Eliminates columns, then removes duplicates
• Notation: P A1,…,An (R)
• Example: project to social-security number
and names:
– P SSN, Name (Employee)
– Output schema: Answer(SSN, Name)
[In SQL: SELECT DISTINCT SSN, Name FROM Employee]
9
Projection Example
Employee
SSN
999999999
777777777
888888888
Name
John
Tony
Alice
DepartmentID
1
1
2
Salary
30,000
32,000
45,000
P SSN, Name (Employee)
SSN
999999999
777777777
888888888
Name
John
Tony
Alice
10
5. Cartesian Product
• Combine each tuple in R1 with each tuple
in R2
• Notation: R1  R2
• Example:
– Employee  Dependents
• Very rare in practice; mainly used to express
joins
[In SQL: SELECT * FROM R1, R2]
11
Cartesian Product Example
Employee
Name
John
Tony
SSN
999999999
777777777
Dependents
EmployeeSSN
999999999
777777777
Dname
Emily
Joe
Employee x Dependents
Name
SSN
EmployeeSSN
John
999999999 999999999
John
999999999 777777777
Tony
777777777 999999999
Tony
777777777 777777777
Dname
Emily
Joe
Emily
Joe
12
Relational Algebra
• Five operators:
–
–
–
–
–
Union: 
Difference: Selection: s
Projection: P
Cartesian Product: 
• Derived or auxiliary operators:
– Intersection, complement
– Joins (natural,equi-join, theta join, semi-join)
– Renaming: r
13
Renaming
•
•
•
•
Changes the schema, not the instance
Schema: R(A1, …, An )
Notation: r B1,…,Bn (R)
Example:
– rLastName, SocSocNo (Employee)
– Output schema: Answer(LastName, SocSocNo)
[in SQL: SELECT Name AS LastName, SSN AS SocSocNo
FROM Employee]
14
Renaming Example
Employee
Name
John
Tony
SSN
999999999
777777777
rLastName, SocSocNo (Employee)
LastName
John
Tony
SocSocNo
999999999
777777777
15
Natural Join
• Notation: R1 ⋈ R2
• Meaning: R1 ⋈ R2 = PA(sC(R1  R2))
• Where:
– The selection sC checks equality of all common
attributes
– The projection eliminates the duplicate common
attributes
[in SQL:
R2
SELECT DISTINCT R1.A, R1. B, R2.C FROM R1,
WHERE R1.B = R2.B
Schema: R1(A,B), R2(B,C)]
16
Natural Join Example
Employee
Name
John
Tony
SSN
999999999
777777777
Dependents
SSN
999999999
777777777
Dname
Emily
Joe
Employee
Dependents =
PName, SSN, Dname(s SSN=SSN2(Employee x rSSN2, Dname(Dependents))
Name
John
Tony
SSN
999999999
777777777
Dname
Emily
Joe
17
Natural Join
• R=
A
B
X
S=
B
C
Y
Z
U
X
Z
V
W
Y
Z
Z
V
Z
V
• R ⋈ S=
A
B
C
X
Z
U
X
Z
V
Y
Z
U
Y
Z
V
Z
V
W
18
Natural Join
• Given the schemas R(A, B, C, D), S(A, C, E),
what is the schema of R ⋈ S ?
• Given R(A, B, C), S(D, E), what is R ⋈ S ?
• Given R(A, B), S(A, B), what is R ⋈ S ?
19
Theta Join
• A join that involves a predicate
• R1 ⋈ q R2 = s q (R1  R2)
• Here q can be any condition
20
Eq-join
• A theta join where q is an equality
R1 ⋈A=B R2 = s A=B (R1  R2)
• Example:
– Employee ⋈SSN=SSN Dependents
• Most useful join in practice
(difference to natural join?)
21
Semijoin
• R ⋉ S = P A1,…,An (R ⋈ S)
• Where A1, …, An are the attributes in R
• Example:
– Employee ⋉ Dependents
22
Semijoins in Distributed
Databases
• Semijoins are used in distributed databases
Dependents
Employee
SSN
Name
...
...
SSN
...
Dname
Age
...
network
Employee ⋈ssn=ssn (s age>71 (Dependents))
R = Employee ⋉ T
T = P SSN s age>71 (Dependents)
23
Answer = R ⋈ Dependents
Complex RA Expressions
P name
buyer-ssn=ssn
pid=pid
seller-ssn=ssn
P ssn
Person
Purchase
P pid
sname=fred
sname=gizmo
Person
Product 24
Application:
Query Rewriting for Optimization
sname
sname
bid=100
rating > 5
sid=sid
(Scan;
write to bid=100
temp T1)
sid=sid
Reserves
Sailors
Reserves
rating > 5
(Scan;
write to
temp T2)
Sailors
The earlier we process selections, less tuples we need to manipulate
higher up in the tree (predicate pushdown)
Disadvantages?
25
Algebraic Laws (Examples)
• Commutative and Associative Laws
– R ∩ S = S ∩ R, R ∩ (S ∩ T) = (R ∩ S) ∩ T
– RS = SR, R(S  T) = (R S) T
• Laws involving selection
– s C AND C’(R) = s C(s C’(R)) = s C(R) ∩ s C’(R)
– s C (R  S) = s C (R)  S
• When C involves only attributes of R
• Laws involving projections
– PM(PN(R)) = PM,N(R)
26
Operations on Bags
A bag = a set with repeated elements
All operations need to be defined carefully on bags
• {a,b,b,c}{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f}
• {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b,d}
• sC(R): preserve the number of occurrences
• PA(R): no duplicate elimination
• Cartesian product, join: no duplicate elimination
Important ! Relational Engines work on bags, not sets !
27
Finally: RA has Limitations !
• Cannot compute “transitive closure”
Name1
Name2
Relationship
Fred
Mary
Father
Mary
Joe
Cousin
Mary
Bill
Spouse
Nancy
Lou
Sister
• Find all direct and indirect relatives of Fred
• Cannot express in RA !!! Need to write C program
28
Formulating queries in RA
• Consider a database for student enrollment
for courses, and books used in the courses
–
–
–
–
–
STUDENT (SSN, Name, Major, Bdate)
COURSE (Course#, Cname, Dept)
ENROLL (SSN, Course#, Quarter, Grade)
BOOK_ADOPTION (Course#, Quarter, Book_ISBN)
TEXT (Book_ISBN, Book_Title, Publisher, Author)
29
Formulating queries in RA
• Specify the following queries in relational
algebra
– List the number of courses (Course#) taken by all
students named ‘John Smith’ in Winter 1999 (i.e.,
Quarter = W99)
– List any department which has all its adopted books
published by ‘BC Publishing’
30
Formulating Queries in RA
 PCourse# (s Quarter=W99 ((s Name= ‘John Smith’ (STUDENT) ⋈
ENROLL))
 OtherDept = PDept ((s Publisher <> ‘PS Publishers’
(BOOK_ADOPTION ⋈ TEXT)) ⋈ COURSE)
AllDept = PDept (BOOK_ADOPTION ⋈ COURSE) WHY?
Answer = AllDept - OtherDept
 And how will you express it in SQL?
31