CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS 1

Download Report

Transcript CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS 1

CHAPTER 4
RELATIONAL ALGEBRA
AND CALCULUS
1
Introduction
- We discuss here two mathematical formalisms which
can be used as the basis for stating and obeying
queries in some specific user-oriented language.
 SQL is based on relational algebra.
- QBE (not included in the new edition of the text) and
Datalog (see chapter 24) are based on relational
calculus.
- The mathematical formalisms introduce various
mathematical operators whose application yields
answers to the queries.
2
PRELIMINARIES
- The inputs and outputs of a query are relational instances.
- We shall present the mathematical operators as operating on
relations (sets of lists). However, the cost of eliminating
duplicate rows has also led to the practical usage of “multisets”
(also called bags) rather than sets. Therefore we shall also
present the necessary mathematical operations as operating on
relations which consist of multisets of rows.
- We shall always illustrate the mathematical operations through
examples. To that effect we shall use a sample database defined
by the schema (shown on p. 101) and the instances shown in
figures 4.1, 4.2, 4.3. In some cases we shall use larger instances
under the same schema.
-It will be noted that queries can always be expressed by several
different expressions. We shall often look at these various
approaches.
-We shall give unique identifying numbers to our illustrative
queries.
3
SAMPLE SCHEMAS AND
INSTANCES
The Schemas:
Sailors(sid: integer, sname: string, rating: integer, age: real)
Boats(bid: integer, bname: string, color: string)
Reserves(sid: integer, bid: integer, day: date)
The Instances:
4
SEMANTICS OF THE SAMPLE
RELATIONS
• Sailors: Entity set; lists the relevant properties of sailors.
• Boats: Entity set; lists the relevant properties of boats.
• Reserves: Relationship set: links sailors and boats by describing
the boat number and date for which a sailor made a reservation.
Example of the declarative sentences for which rows stand:
Row 1: “Sailor ’22’ reserved boat number ‘101’ on 10/10/98”.
N.B. The declarative sentence is obvious in this case, but this may
not always be the case, especially for relations which express
complex relationship sets with complicated constraints. For
example we might add the constraint that a sailor can only reserve
one boat on any given date; or, worse yet that a sailor can only
reserve up to two boats on any given date.
5
Relational Algebra
6
Selection and Projection
condition
• Selection Operator: σrating>8 (S2)
Retrieves from the current instance of relation named S2 those rows
where the value of the attribute ‘rating’ is greater than 8.
Applying the above selection operator to the sample instance of S2
shown in figure 4.2 yields the relational instance on figure 4.4 as
shown below:
• π
7
Projection Operator
πsname,rating(S2)
Retrieves from the current instance of the relation named S2 those
columns whose names are ‘sname’ and ‘rating’.
Applying the above operator to the sample instance of S2 shown in
figure 4.2 yields the relational instance on figure 4.5 as shown
below:
N. B.: Note that the projection operator can produce
duplicate rows in the resulting instance.
8
Remark Concerning Duplicate Rows
• Duplicate rows are not permitted in
relational algebra and calculus.
• Duplicate rows can occur in SQL, though
they may be controlled by explicit keywords.
9
- Projection Operator (cont’d)
Similarly
πage(S2)
yields the following relational instance
10
FUNCTIONAL APPLICATION OF SELECTION
AND PROJECTION OPERATORS
The selection and projection operators can be applied successively
as many times as desired in the usual functional denotation as
illustrated below.
Thus the expression
πsname,rating(σrating>8(S2))
yields the following relational instance
11
SET OPERATIONS
The following set operations are also available in relational algebra:
• Union*
• Intersection*
• Set-difference*
• Cross-product
N.B.
(1) The asterisks indicate operations whose operand relations must be unioncompatible. Two relation instances are said to be union-compatible if:
- they have the same number of fields,
- corresponding fields have the same domains.
- they have the same semantics.
(2) The results of set operations on sets and multisets are different, therefore we
shall examine both of these separately.
12
EXAMPLES OF SET
OPERATIONS ON RELATIONS
• Union: Given the sample instances S1 and S2
The union of S1 and S2, i.e. S1 ∪ S2 is shown below
13
• Given the two sample relational instances
We can form:
Intersection: S1∩S2
Set-Difference: S1 – S2
14
Given the two relational samples S1 and S2
We can form the Cross-product S1 R1:
15
MULTISET OPERATIONS
Examples: Given two multisets
M1 = {a, a, b, b, c, d, f}
M2 = {a, b, b, d, e, e}
We can form
MU = M1 ∪ M2 = {a, a, a, b, b, b, b, c, d, d, d, f, e, e}
MI = M1 ∩ M2 = {a, b, b, d}
MD = M1 – M2 = {a, c, f}
MC= M1  M2 = {<a, a>, <a, b>, <a, b>, <a, d>, <a, e>, <a, e>,
<a, a>, <a, b>, <a, b>, <a, d>, <a, e>, <a, e>,
<b, a>, <b, b>, <b, b>, <b, d>, <b, e>,<b, e>,
<b, a>, <b, b>, <b, b>, <b, d>, <b, e>,<b, e>,
<c, a>, <c, b>, <c, b>, <c, d>, <c, e>, <c, e>,
<d, a>, <d, b>, <d, b>, <e, d>, <d, e>, <d, e>,
<f, a>, <f, b>, <f, b>, <f, d>, <f, e>, <f, e>}
16
RENAMING OPERATOR, 
Purpose: to avoid name conflicts and to permit naming
anonymous relational instances (like query answers).
Example: the expression
Instance being renamed
Name of new relation
Attributes being renamed
(1st and 5th)
(can be a name or an operation)
(C(1  sid1, 5  sid2), S1  R1)
Returns a relational instance that contains the tuples
shown in figure 4.11 and has the schema
C(sid1: integer, sname: string, rating: integer, age: real,
sid2: integer, bid: integer, day: date)
N.B. The second operand may be a simple relational
instance name, thereby renaming that relational
instance.
17
SPECIAL RELATIONAL OPERATORS
The following operators are peculiar to relations:
- Join operators
There are several kind of join operators. We only consider three of
these here (others will be considered when we discuss null values –
see section 5.6.4):
- (1) Condition Joins
- (2) Equijoins
- (3) Natural Joins
- Division
18
JOIN OPERATORS
Condition Joins:
- Defined as a cross-product followed by a selection:
R ⋈c S = σc(R  S)
(⋈ is called the bow-tie)
where c is the condition.
- Example:
Given the sample relational instances S1 and R1
The condition join S ⋈S1.sid<R1.sid R1 yields
19
Equijoin:
Special case of the condition join where the join condition consists solely of
equalities between two fields in R and S connected by the logical AND operator
(∧).
Example: Given the two sample relational instances S1 and R1
The operator S1
R.sid=Ssid
R1
yields
20
Natural Join
- Special case of equijoin where equalities are implicitly specified on
all fields having the same name in R and S.
- The condition c is now left out, so that the “bow tie” operator by
itself signifies a natural join.
- N. B. If the two relations have no attributes in common, the natural
join is simply the cross-product.
21
DIVISION
- The division operator is used for queries which involve the ‘all’
qualifier such as “Find the names of sailors who have reserved all
boats”.
- The division operator is a bit tricky to explain, and perhaps best
approached through examples as will be done here.
22
EXAMPLES OF DIVISION
Consider the relational instances
A, B1, B2, and B3.
The division operation A/Bi yields
the three relational instances on
the right, which are constructed
as follows: search among A for all
those rows whose A.pno are the
same as all the B.pno, and whose
A.sno is the same. The corresponding A.sno then belong in the
answer relation.
Figure 4.14 Examples Illustrating Division
Thus, B1 contains only p2, so we find in A rows 2, 6, 7, 8 for which A.pno = p2. Therefore
we must include in the answer (A/B1) the A.sno values in rows 2, 6, 7, 8.
Next, B2 contains p2, p4. So we locate in A rows 2 and 4 which contain respectively p2 and
p4 and which have the same value of A.sno (namely s1); then we locate in A rows 8, 9 which
again contain respectively p2 and p4 with the same value of A.sno (namely s4); so we place
in the answer s1 and s4.
Next, B3 contains p1, p2, p4; so we search in A for three rows with the same sno, containing
respectively p1, p2, p4. We find only rows 1, 2, 4 for which A.sno is s1. So s1 goes in the answer.
23
DIVISION
Interpretation of the division operation A/B:
- Divide the attributes of A into 2 sets: A1 and A2.
- Divide the attributes of B into 2 sets: B2 and B3.
- Where the sets A2 and B2 have the same attributes.
- For each set of values in B2:
- Search in A2 for the sets of rows (having the same A1 values)
whose A2 values (taken together) form a set which is the same as
the set of B2’s.
- For all the set of rows in A which satisfy the above search, pick
out their A1 values and put them in the answer.
24
DIVISION
Example: Find the names of sailors who have reserved all boats:
(1) A = sid,bid(Reserves). A1 = sid(Reserves) A2 = bid(Reserves)
(2) B2 = bid(Boats) B3 is the rest of B.
Thus, B2 ={101, 102, 103, 104}
(3) Find the rows of A such that their A.sid is the same and their combined A.bid is
the set B2.
Thus we find A1 = {22}
(4) Get the set of A2 corresponding to A1: A2 = {Dustin}
25
FORMAL DEFINITION OF DIVISION
The formal definition of division is as follows:
A/B = x(A) - x((x(A)  B) – A)
26
EXAMPLES OF ALGEBRA QUERIES
In the rest of this chapter we shall illustrate queries using the
following new instances S3 of sailors, R2 of Reserves and B1 of
boats.
27
QUERY Q1
Given the relational instances:
(Q1) Find the names of sailors who have reserved boat 103
sname((σbid=103 Reserves) ⋈ Sailors)
The answer is thus the following relational instance
{<Dustin>, <Lubber>, <Horatio>}
28
QUERY Q1 (cont’d)
There are of course several ways to express Q1 in relational algebra.
Here is another:
Sailors))
sname(σbid=103(Reserves⋈ Sailors))
Which of these expressions should we use?
That is a question of optimization. Indeed, when we describe
how to state queries in SQL, we can leave it to the optimizer
in the DBMS to select the nest approach.
29
QUERY Q2
(Q2) Find the names of sailors who have reserved a red boat.
sname((σcolor=‘red’Boats) ⋈ Reserves ⋈ Sailors
30
QUERY Q3
(Q3) Find the colors of boats reserved by Lubber.
color((σsname=‘Lubber’Sailors) ⋈ Reserves ⋈ Boats
31
QUERY Q4
(Q4) Find the names of Sailors who have reserved at least one boat
sname(Sailors ⋈ Reserves)
32
QUERY Q5
(Q5) Find the names of sailors who have reserved a red or a green boat.
(Tempboats, (σcolor=‘red’) ∪ (σcolor=‘green’Boats))
sname(Tempboats ⋈ Reserves ⋈ Sailors)
33
QUERY Q6
(Q6) Find the names of Sailors who have reserved a red and a green
boat.
It seems tempting to use the expression used in Q5, replacing simply
 by . However, this won’t work, for such an expression is
requesting the names of sailors who have requested a boat that is
both red and green! The correct expression is as follows:
(Tempred, sid((σcolor=‘red’Boats) ⋈ Reserves))
(Tempgreen, sid((σcolor=‘green’Boats) ⋈ Reserves))
sname ((Tempred ∩ Tempgreen) ⋈ Sailors)
34
QUERY Q7
(Q7) Find the names of sailors who have reserved at least two boats.
(Reservations, sid,sname,bid(Sailors ⋈ Reserves))
(Reservationpairs(1sid1, 2sname, 3bid1, 4sid2,
5sname, 6bid2), ReservationsReservations)
sname1σ(sid1=sid2)(bid1bid2)Reservationpairs)
35
QUERY 8
(Q8) Find the sids of sailors with age over 20 who have not reserved a
red boat.
sid(σage>20Sailors) - sid((σcolor=‘red’Boats) ⋈ Reserves ⋈ Sailors)
36
QUERY 9
(Q) Find the names of sailors who have reserved all boats.
(Tempsids, (sid,bidReserves)/(bidBoats))
sname(Tempsids ⋈ Sailors
37
QUERY Q10
(Q10) Find the names of sailors who have reserved all boats called
Interlake.
(Tempsids, (sid,bidReserves)/(bid(σbname=‘Interlake’Boats)))
sname(Tempsids ⋈ Sailors)
38
RELATIONAL CALCULUS
39
Introduction
- Relational algebra is procedural: it specifies the procedure to be
followed in order to get the answer to the query.
- Relational calculus is declarative: it describes (declares) the answer
to the query without specifying how to get it.
- Relational calculus strongly resembles First Order Predicate Logic,
or simply first order logic.
- There are two variants of relational calculus:
- Tuple relational calculus (TRC)
- Domain relational calculus (DRC)
40
TUPLE RELATIONAL CALCULUS
- A query statement in TRC is a set declaration having the form:
{ P first-order logic formula}
- This is to be read as ‘the set of tuple variables, P, for which the
specified first order logic formula is true’.
- Thus a TRC query is a request (to the DBMS) to produce a set of
tuples corresponding to the tuples of the relational answer in
SQL.
- Example
Given the following query:
(Q11) Find all sailors with a rating above 7.
The TRC statement of this query is
{S S  Sailors  S.rating > 7}.
41
SYNTAX AND SEMANTICS OF TRC
• The syntax and semantics of TRC is that
of first-order logic. It is stated quite
precisely in the text and there is no need
to repeat it here. Instead we shall
examine a few query applications.
42
QUERY Q12
(Q12) Find the names and ages of sailors with a rating above 7.
{P ∃S  Sailors (S.rating > 7  P.name = S.name  P.age = S.age)}
Remarks
1. The fact that the tuple variable P occurs with two attributes (using
the dot notation) means that solely these two attributes are
required in the answer relation.
2. The symbols used are the usual first-order logic symbols:
∀: for all
∃: there exists
⋀ : and
⋁: or
¬ : not
⇒ : implies
43
QUERIES 1,2,7,9,14
The TRC statements for these queries are
pretty well self explanatory, especially
with the added English statements of how
to read them.
44
DOMAIN RELATIONAL
CALCULUS (1)
- The form of a DRC query is as follows:
{<X1, X2, … , Xn> logical DRC formula}
signifying that the system must construct (and output) a set of all
the tuples which satisfy the stated logical DRC formula in terms
of the n attributes X1, X2, … ,Xn. Thus, the answer is a relational
instance with attributes X1, X2, … , Xn, these attributes
corresponding to those of some of the relations in the database.
- Again, the approach used by the system is left unspecified.
- The Syntax and the semantics of the DRC are explicitly and
precisely described in the text.
45
DOMAIN RELATIONAL
CALCULUS (2)
Example:
(Q11) Find all sailors with a rating above 7.
{ < I, N, T, A > <I, N, T, A > ∈ Sailors ⋀ T > 7 }
Other queries are illustrated and described in the text with all
necessary explanation.
46
EXPRESSIVE POWER OF
ALGEBRA AND CALCULUS( 1)
Safety
- Certain queries stated in the relational calculus may lead to
answers which contain an infinite number of tuples (or at least as
many as the system can handle).
Example:
Consider the TRC query {S ¬(S ∈ Sailors)}. Since there is a quasiinfinite number of tuples that can be created with the attributes of
sailors, the answer is (quasi)-infinite.
- A query which yields a (quasi)-infinite answer is said to be unsafe,
and, of course, should not be allowed by the system.
- It is possible to define a safe formula in TRC (see text, section 4.4).
47
EXPRESSIVE POWER OF
ALGEBRA AND CALCULUS (2)
• A query language is said to be relationally
complete if it can express all the queries that can
be expressed in relational algebra.
• SQL is relationally complete.
• Every query that can be expressed using a safe
relational calculus query can be also be expressed
as a relational algebra query.
• SQL provides additional expressive power
beyond relational algebra.
48
QUERY Q1
49