No Slide Title

Download Report

Transcript No Slide Title

CM036: Advanced Database
Lecture 3
Relational Algebra and SQL
Content
1 Models, Languages and their Use in
databases
2 The Relational Algebra (RA)
3 Simulating RA operations in SQL
CM036: Advanced Databases
Lecture 3: Relational Languages
2
1.1 Relational and other data models



Relational models (relational algebra, relational calculus) –
most of the contemporary RDBMS are based on them
Tree models (hierarchical, object-relational) – both legacy
systems and new systems use them
Object-oriented models (ODMG) – recent development,
still not widely employed
Note: XML native databases have some similarities with the
hierarchical database systems (legacy systems), but they
have more elaborated model and query languages, which
are close to OQL (the standard query language of objectoriented databases)
CM036: Advanced Databases
Lecture 3: Relational Languages
3
1.2 Relational Languages and their Use



Data Manipulation Language (DML)
 Use: Populates, updates, and queries relational DB
 Example: relational algebra, SQL DML
Data Definition Language (DDL)
 Use: Specifies the data structures and defines the relational
schema
 Example: domain calculus, SQL DDL
Data Control Language (DCL)
 Use: Specifies operation permissions, resource access discipline
and user profiles
 Example: SQL DCL, LDAP
Note: Contemporary relational languages often incorporate some object-
relational features of the model – e.g. Oracle 8i SQL has types, Oracle
9i SQL has type inheritance
CM036: Advanced Databases
Lecture 3: Relational Languages
4
1.3 Can DB live without formal model?

The answer is NO for several reasons:



as we will see, SQL has ambiguities, while the relational
algebra is unambiguous – so it can provide semantic
interpretation for SQL;
Moreover, because of the same reason, SQL cannot be
executed directly, it needs to be translated into a
realistic structure of operations first, which can then be
interpreted;
Finally, if we want to control the execution of the SQL
statements, we need to know how it works.
CM036: Advanced Databases
Lecture 3: Relational Languages
5
2 The Relational Algebra



Proposed by Codd in 1970 as a formal data model. Describes the
relations and the operations to manipulate relations
Relational operations in relational algebra transform either a single
relation (unary operation), or a pair (binary operation) into another relation
Can also be used to specify retrieval requests (queries). Query result is
also in the form of a relation.
Relational Operations:
 RESTRICT (s) and PROJECT () unary operations.

Set operations: UNION (), INTERSECTION (), DIFFERENCE (—),
CARTESIAN PRODUCT ().

JOIN operations (⋈) are binary.

Other relational operations: DIVISION, OUTER JOIN,
AGGREGATE FUNCTIONS.
CM036: Advanced Databases
Lecture 3: Relational Languages
6
CM036: Advanced Databases
Lecture 3: Relational Languages
7
2.1 RESTRICT s

RESTRICT operation (called also SELECT- denoted bys ):


Selects the tuples (rows) from a relation R that satisfy a certain
selection condition c on the attributes of R : s c(R)
Resulting relation includes each tuple in R whose attribute values
satisfy c, i.e. it has the same attributes as R
Examples:
s DNO=4(EMPLOYEE)
s SALARY>30000(EMPLOYEE)
s(DNO=4 AND SALARY>25000) OR DNO=5(EMPLOYEE)
CM036: Advanced Databases
Lecture 3: Relational Languages
8
2.2 PROJECT P

PROJECT operation (denoted by ):
 Keeps only certain attributes (columns) from a relation R specified in an
attribute list L:  L(R)
 Resulting relation has only those attributes of R specified in L
Example:  FNAME,LNAME,SALARY(EMPLOYEE)
 The PROJECT operation eliminates duplicate tuples in the resulting
relation so that it remains a true set (no duplicate elements).
Example:  SEX,SALARY(EMPLOYEE)
 If several male employees have salary 30000, only a single tuple
<M, 30000> is kept in the resulting relation.
CM036: Advanced Databases
Lecture 3: Relational Languages
9
CM036: Advanced Databases
Lecture 3: Relational Languages
10
2.3 Combining Operations

Because of closure, several operations can be combined to form a
relational algebra expression. For example, the names and salaries
of employees in Department 4:
FNAME,LNAME,SALARY(sDNO=4(EMPLOYEE))

Alternatively, we could specify explicit intermediate relations for each
step:
TEMP  sDNO=4(EMPLOYEE)
R  FNAME,LNAME,SALARY(TEMP)

Attributes can optionally be renamed in a left-hand-side relation (this
may be required for some operations that will be presented later),
e.g.
R (FIRSTNAME,LASTNAME,SALARY)  FNAME,LNAME,SALARY(TEMP)
CM036: Advanced Databases
Lecture 3: Relational Languages
11
CM036: Advanced Databases
Lecture 3: Relational Languages
12
2.4 Set Operations

Binary operations from set theory: UNION: R1  R2,INTERSECTION: R1  R2,
DIFFERENCE: R1 — R2,


For , , —, the operand relations R1(A1, ..., An) and R2(B1, ..., Bn) must
have the same number of attributes, and the domains of attributes must be
compatible; that is, dom(Ai)=dom(Bi) for i=1, 2, ..., n. This condition is called
union compatibility.
The resulting relation for , , or — has the same attribute names as the
first operand relation R1 (by convention).
CM036: Advanced Databases
Lecture 3: Relational Languages
13
CM036: Advanced Databases
Lecture 3: Relational Languages
14
More Set Operations

CARTESIAN PRODUCT
R(A1, A2, ..., Am, B1, ..., Bn)  R1(A1, A2, ..., Am)  R2 (B1, ..., Bn)

A tuple t exists in R for each combination of tuples t1 from R1 and t2
from R2 such that:
t [A1, A2, ..., Am] = t1 and t [B1, B2, ..., Bn] = t2
3 attributes
+
2 attributes = 5 attributes
R1 A
1
2
3


B
2
3
4
X
R2 D E
a b
b c
=
R A B C D E
1 2 3 a b
1 2 3 b c
2 3 4 a b
2 3 4 b c
3 tuples * 2 tuples
= 6 tuples 3 4 5 a b
3 4 5 b c
The resulting relation R has m1 + m2 columns
If R1 has n1 tuples and R2 has n2 tuples, then R will have n1*n2 tuples.
CM036: Advanced Databases
C
3
4
5
Lecture 3: Relational Languages
15
CARTESIAN PRODUCT – continued

Obviously, CARTESIAN PRODUCT is useless if alone, since it
generates all possible combinations. It can combine related tuples
from two relations in a more informative way if followed by the
appropriate RESTRICT operation
Example: Retrieve a list of the names of dependents for each
female employee
FEMALE_EMPS  s SEX=‘F’(EMPLOYEE)
EMPNAMES   FNAME,LNAME,SSN(FEMALE_EMPS)
EMP_DEPENDENTS  EMPNAMES  DEPENDENT
ACTUAL_DEPENDENTS  s SSN=ESSN(EMP_DEPENDENTS)
RESULT   FNAME,LNAME,DEPENDENT_NAME(ACTUAL_DEPENDENTS)
CM036: Advanced Databases
Lecture 3: Relational Languages
16
CM036: Advanced Databases
Lecture 3: Relational Languages
17
2.5 JOIN Operations

THETA JOIN: Similar to a CARTESIAN PRODUCT followed by a
RESTRICT. The condition c is called a join condition.
R(A1, A2, ..., Am, B1, B2, ..., Bn) 
R1(A1, A2, ..., Am) ⋈ c R2 (B1, B2, ..., Bn)

EQUIJOIN: The join condition c includes equality comparisons
involving attributes from R1 and R2. That is, c is of the form:
(Ai=Bj) AND ... AND (Ah=Bk); 1<i,h<m, 1<j,k<n
In the above EQUIJOIN operation:
 Ai, ..., Ah are called the join attributes of R1
 Bj, ..., Bk are called the join attributes of R2
Example: Retrieve each department's name and its manager's name:
T  DEPARTMENT ⋈MGRSSN=SSN EMPLOYEE
RESULT   DNAME,FNAME,LNAME(T)
CM036: Advanced Databases
Lecture 3: Relational Languages
18
DEPT_MGR   DNAME,…,MGRSSN,…LNAME,…,SSN… (DEPARTMENT ⋈ MGRSSN=SSN EMPLOYEE)
CM036: Advanced Databases
Lecture 3: Relational Languages
19
Natural Join

In an EQUIJOIN R  R1 ⋈c R2, the join attribute of R2 appear
redundantly in the result relation R. In a NATURAL JOIN, the join
attributes of R2 are eliminated from R. The equality is implied and
there is no need to specify it. The form of the operator is
R  R1 * R2
Example: Retrieve each project's details along with the details of its
department:
Step 1: (Rename DNUMBER to DNUM)
DEPT (DNAME, DNUM, MGRSSN, MGRSTARTDATE) 
 DNAME, DNUMBER, MGRSSN, MGRSTARTDATE (DEPARTMENT)
Step 2: (Now both DEPT and PROJECT have DNUM)
PROJ_DEPT  PROJECT * DEPT
CM036: Advanced Databases
Lecture 3: Relational Languages
20
CM036: Advanced Databases
Lecture 3: Relational Languages
21
Multiple Join
Example: Retrieve each employee’s name and the name of the
department he/she works for:
JOIN ATTRIBUTES
RELATIONSHIP
EMPLOYEE.SSN =
DEPARTMENT.MGRSSN
EMPLOYEE manages
the DEPARTMENT
EMPLOYEE.DNO =
DEPARTMENT.DNUMBER
EMPLOYEE works in
the DEPARTMENT
T  EMPLOYEE ⋈ DNO=DNUMBER DEPARTMENT
RESULT   FNAME,LNAME,DNAME(T)
CM036: Advanced Databases
Lecture 3: Relational Languages
22
Self Join

A relation can have a set of join attributes to join it with itself :
JOIN ATTRIBUTES
EMPLOYEE(1).SUPERSSN=
EMPLOYEE(2).SSN


RELATIONSHIP
EMPLOYEE(2) supervises
EMPLOYEE(1)
One can think of this as joining two distinct copies of the
relation, although only one relation actually exists
In this case, renaming can be useful
Example: Retrieve each employee’s name and the name of his/her
supervisor:
SUPERVISOR(SSSN,SFN,SLN)   SSN,FNAME,LNAME(EMPLOYEE)
T  EMPLOYEE ⋈SUPERSSN = SSSNSUPERVISOR
RESULT   FNAME,LNAME,SFN,SLN(T)
CM036: Advanced Databases
Lecture 3: Relational Languages
23
2.6 Complete Set of Operations



All the operations discussed so far can be described as a sequence
of only the operations RESTRICT, PROJECT, UNION, SET
DIFFERENCE, and CARTESIAN PRODUCT.
Hence, the set {s , , ,—, } is called a complete set of relational
algebra operations. Any query language equivalent to these
operations is called relationally complete.
For database applications, additional operations are needed that
were not part of the original relational algebra. These include:
 1. Aggregate functions and grouping.
 2. OUTER JOIN and OUTER UNION.
CM036: Advanced Databases
Lecture 3: Relational Languages
24
2.7 More Relational Operations

OUTER JOIN


In a regular EQUIJOIN or NATURAL JOIN operation, tuples in R1 or
R2 that do not have matching tuples in the other relation do not
appear in the result. Some queries require all tuples in R1 (or R2 or
both) to appear in the result
When no matching tuples are found, nulls are placed for the missing
attributes
 LEFT OUTER JOIN: R1
R2 lets every tuple in R1 appear in the
result
 RIGHT OUTER JOIN: R1
R2 lets every tuple in R2 appear in
the result
 FULL OUTER JOIN: R1
R2 lets every tuple in both R1 and R2
appear in the result
CM036: Advanced Databases
Lecture 3: Relational Languages
25
TEMP EMPLOYEE
SSN=MGRSSNDEPARTMENT
RESULT   FNAME,MINIT,LNAME,DNAME(TEMP)
CM036: Advanced Databases
Lecture 3: Relational Languages
26
3 Simulating RA operations

All relational algebra operations can be simulated in SQL using
SELECT statements only. For this purpose the following variations of
different parameters in the SELECT clauses can be used:





The list of attributes to be selected in the SELECT clause
The number of tables to be looked into in the FROM clause
The conditions specified in the WHERE clause
The resulting relation printed out or transferred row-by-row to some
variables if INTO clause is present (e.g. using PL/SQL).
There is no structural correspondence between the expressions of
relational algebra and the SQL expressions; in a single SQL statement
the following combinations are possible:




single unary operator applied to one single relation
single binary operator applied to pair of relations
sequence of unary operators applied inner side out to one relation and all
the intermediate results
combinations of binary and unary operators applied to several relations
and the intermediate results according nesting rules
CM036: Advanced Databases
Lecture 3: Relational Languages
27
3.1 Summary of RA vs. SQL
CM036: Advanced Databases
Lecture 3: Relational Languages
28