No Slide Title

Download Report

Transcript No Slide Title

Relational Algebra
Relational Query Languages
 Query = “retrieval program”
 Language examples:
 Theoretical:
1. Relational Algebra
2. Relational Calculus
a. tuple relational calculus (TRC)
b. domain relational calculus (DRC)
 Practical
1. SQL (SEQUEL from System R)
2. QUEL (Ingres)
3. Datalog (Prolog-like)
 Theoretical QL’s:
 give semantics to practical QL’s
 key to understand query optimization in relational DBMSs
Relational Algebra
 Basic operators
 select ( )
 project (p )
 union ( )
 set difference ( - )
 cartesian product ( x )
 rename ( )
 The operators take one or two relations as inputs and give a
new relation as a result.
relation
relation
relational
operator
relation
Example Instances
bid
101
102
103
104
bname
Interlake
Interlake
Clipper
Marine
color
blue
red
green
red
Boats
Schema:
Boats(bid, bname, color)
Sailors(sid, sname, rating, age)
Reserves( sid, bid, day)
R1 sid
22
58
bid
101
103
day
10/10/96
11/12/96
sid
S1 22
31
58
sname
dustin
lubber
rusty
rating
7
8
10
age
45.0
55.5
35.0
sid
S2 28
31
44
58
sname
yuppy
lubber
guppy
rusty
rating
9
8
5
10
age
35.0
55.5
35.0
35.0
Projection
 Examples:
p age(S2)
;
p sname,rating(S2)
 Retains only attributes that are in the “projection list”.
 Schema of result:
 exactly the columns in the projection list, with the same names that they
had in the input relation.
 Projection operator has to eliminate duplicates
(How do they
arise? Why remove them?)
 Note: real systems typically don’t do duplicate elimination unless the
user explicitly asks for it. (Why not?)
Projection
sid
28
31
44
58
sname rating age
yuppy
9
35.0
lubber
8
55.5
guppy
5
35.0
rusty
10 35.0
S2
sname
rating
yuppy
lubber
guppy
rusty
9
8
5
10
p sname,rating (S 2)
age
35.0
55.5
p age(S2)
Selection ()
 Selects rows that satisfy selection condition.
 Result is a relation.
Schema of result is same as that of the input relation.
 Do we need to do duplicate elimination?
sid
28
31
44
58
sname rating age
yuppy
9
35.0
lubber
8
55.5
guppy
5
35.0
rusty
10 35.0
 rating 8(S2)
sname
yuppy
rusty
rating
9
10
p sname,rating( rating 8(S2))
Selection
 Notation:
 p(r)
 p is called the selection predicate , r can be the name of a table,
or another query
 Predicate:
1. Simple
 attr1 = attr2
 Attr = constant value
 (also, <, > , etc)
2. Complex
 predicate1 AND predicate2
 predicate1 OR predicate2
 NOT (predicate)
Union and Set-Difference
 All of these operations take two input relations, which
must be union-compatible:
 Same number of columns (attributes).
 `Corresponding’ columns have the same type.
 For which, if any, is duplicate elimination required?
Union
sid
22
31
58
sname rating age
dustin
7
45.0
lubber
8
55.5
rusty
10 35.0
S1
sid
28
31
44
58
sname rating age
yuppy
9
35.0
lubber
8
55.5
guppy
5
35.0
rusty
10 35.0
S2
sid sname rating age
22
31
58
44
28
dustin
lubber
rusty
guppy
yuppy
7
8
10
5
9
S1S2
45.0
55.5
35.0
35.0
35.0
Set Difference
sid
22
31
58
sname rating age
dustin
7
45.0
lubber
8
55.5
rusty
10 35.0
sid sname rating age
22 dustin 7
45.0
S1 S2
S1
sid
28
31
44
58
sname rating age
yuppy
9
35.0
lubber
8
55.5
guppy
5
35.0
rusty
10 35.0
S2
sid sname rating age
28 yuppy
9
35.0
44 guppy
5
35.0
S2 – S1
Cartesian-Product
 S1  R1: Each row of S1 paired with each row of R1.
Like the c.p for mathematical relations: every tuple of S1 “appended” to every
tuple of R1
 Q: How many rows in the result?
 Result schema has one field per field of S1 and R1, with field
names `inherited’ if possible.
 May have a naming conflict: Both S1 and R1 have a field with the
same name.
 In this case, can use the renaming operator…
Cartesian Product Example
sid
22
31
58
sname
dustin
lubber
rusty
rating
7
8
10
age
45.0
55.5
35.0
sid bid
day
22 101 10/10/96
58 103 11/12/96
R1
S1
(sid) sname rating age
R1 X S1 =
(sid) bid day
22
dustin
7
45.0
22
101 10/10/96
22
dustin
7
45.0
58
103 11/12/96
31
lubber
8
55.5
22
101 10/10/96
31
lubber
8
55.5
58
103 11/12/96
58
rusty
10
35.0
22
101 10/10/96
58
rusty
10
35.0
58
103 11/12/96
Rename ( )
 Allows us to refer to a relation by more than one name
and to rename conflicting names
Example:
 x (E)
returns the expression E under the name X
 If a relational-algebra expression E has arity n, then
x (A1, A2, …, An) (E)
returns the result of expression E under the name X, and with the
attributes renamed to A1, A2, …., An.
Ex.
temp1(sid1,sname,rating, age, sid2, bid, day) (R1 x S1)
Compound Operator: Intersection
 In addition to the 6 basic operators, there are
several additional “Compound Operators”
 These add no computational power to the language, but are
useful shorthands.
 Can be expressed solely with the basic ops.
 Intersection takes two input relations, which must be
union-compatible.
 Q: How to express it using basic operators?
R  S = R  (R  S)
Intersection
sid
22
31
58
sname rating age
dustin
7
45.0
lubber
8
55.5
rusty
10 35.0
S1
sid
28
31
44
58
sname rating age
yuppy
9
35.0
lubber
8
55.5
guppy
5
35.0
rusty
10 35.0
S2
sid sname rating age
31 lubber 8
55.5
58 rusty
10
35.0
S1 S2
Compound Operator: Join
 Joins are compound operators involving cross product,
selection, and (sometimes) projection.
 Most common type of join is a “natural join” (often
just called “join”). R
S conceptually is:
 Compute R  S
 Select rows where attributes that appear in both relations have equal values
 Project all unique atttributes and one copy of each of the common ones.
 Note: Usually done much more efficiently than this.
 Useful for putting “normalized” relations back
together.
Natural Join Example
sid
22
58
bid
101
103
day
10/10/96
11/12/96
sid
22
31
58
sname rating age
dustin
7
45.0
lubber
8
55.5
rusty
10
35.0
R1
R1
S1
S1 =
sid
22
58
sname
dustin
rusty
rating
7
10
age
45.0
35.0
bid
101
103
day
10/10/96
11/12/96
Other Types of Joins
 Condition Join (or “theta-join”):
R  c S   c (R  S)
(sid)
22
31
sname
dustin
lubber
rating
7
8
S1
age
45.0
55.5
(sid)
58
58
S1.sid  R1.sid
bid
103
103
day
11/12/96
11/12/96
R1
 Result schema same as that of cross-product.
 May have fewer tuples than cross-product.
 Equi-join: special case: condition c contains only conjunction of
equalities.
Compound Operator: Division
 Useful for expressing “for all” queries like:
Find sids of sailors who have reserved all boats.
 For A/B attributes of B are subset of attrs of A.
 May need to “project” to make this happen.
 E.g., let A have 2 fields, x and y; B have only field y:
AB 
x  y

 B( x, y  A)
A/B contains all tuples (x) such that for every y tuple in B, there is an
xy tuple in A.
Examples of Division A/B
sno
s1
s1
s1
s1
s2
s2
s3
s4
s4
pno
p1
p2
p3
p4
p1
p2
p2
p2
p4
A
pno
p2
pno
p2
p4
B2
pno
p1
p2
p4
sno
s1
s2
s3
s4
sno
s1
s4
A/B1
A/B2
sno
s1
B1
B3
A/B3
Expressing A/B Using Basic Operators
 Division is not essential op; just a useful shorthand.
 (Also true of joins, but joins are so common that systems implement joins
specially.)
 Idea: For A/B, compute all x values that are not `disqualified’
by some y value in B.
 x value is disqualified if by attaching y value from B, we obtain an xy tuple
that is not in A.
Disqualified x values =
p x((p x( A)B) A)
A/B = p x ( A)  Disqualified x values
Banking Example
branch (branch-name, branch-city, assets)
customer (customer-name, customer-street, customer-only)
account (account-number, branch-name, balance)
loan (loan-number, branch-name, amount)
depositor (customer-name, account-number)
borrower (customer-name, loan-number)
Example Queries
loan (loan-number, branch-name, amount)
 Find all loans of over $1200
amount >1200 (loan)
 Find the loan number for each loan of an amount greater than
$1200
ploan-number (amount > 1200 (loan))
Example Queries
depositor (customer-name, account-number)
borrower (customer-name, loan-number)
 Find the names of all customers who have a loan, an
depositor account, or both, from the bank
pcustomer-name (borrower)  pcustomer-name (depositor)
 Find the names of all customers who have a loan and
a depositor account at bank.
pcustomer-name (borrower)  pcustomer-name (depositor)
Example Queries
loan (loan-number, branch-name, amount)
depositor (customer-name, account-number)
borrower (customer-name, loan-number)
 Find the names of all customers who have a loan at the Perryridge
branch but do not have an depositor account at any branch of the bank.
pcustomer-name (branch-name = “Perryridge” (borrower
– pcustomer-name (depositor)
loan))
Example Queries
account (account-number, branch-name, balance)
Find the largest account balance
 Rename account relation as d
 The query is:
p balance(account) -
paccount.balance( account.balance < d.balance (account x d (account)))
Example Queries
account (account-number, branch-name, balance)
depositor (customer-name, account-number)
 Find all customers who have an account from at least
the “Downtown” and the Uptown” branches.
 Query 1
pCN(BN=“Downtown”(depositor
account)) 
pCN(BN=“Uptown”(depositor
account))
where CN denotes customer-name and BN denotes
branch-name.
 Query 2
customer-name, branch-name (depositor
account)
 temp(branch-name) ({(“Downtown”), (“Uptown”)})
Example Queries
 Find all customers who have an account at all
branches located in Boston.
pcustomer-name, branch-name (depositor account)
 pbranch-name (branch-city = “Boston” (branch))
Extended Relational Operations
 Additional Operators that extend the power of the
language
 Based on SQL… make the language less clean
 Generalized projections
 Outer Joins
 Update
General Projection
Notation:
p e1, e2, …, en (Relation)
ei: can include any arithmetic operation – not only attributes
Example:
credit =
cname
Jones
Turner
Smith
limit
5000
3000
4000
balance
2000
2500
3000
Then:
p cname, limit – balance =
cname limit -balance
Jones
3000
Turner
500
Smith
1000
Outer Joins
Motivation:
loan
loan
bname
Downtown
Redwood
Perry
borrower =
lno
L-170
L-230
L-260
amt
3000
4000
1700
bname
lno
Downtown L-170
Redwood L-230
Join result loses:
•any record of Perry
•any record of Smith
cname
Jones
Turner
Smith
lno
L-170
L-230
L-155
borrower
amt
3000
4000
cname
Jones
Turner
Outer Join (
 Left outer Join (
)
)
preserves all tuples in left relation
loan
borrower =
bname
Downtown
Redwood
Perry
lno
L-170
L-230
L-260
amt
3000
4000
1700
bname
 Right outer Join (
)
Downtown
preserves all tuples in right relation Redwood
NULL
cname
Jones
Turner
NULL
lno
L-170
L-230
L-155
amt
3000
4000
NULL
cname
Jones
Turner
Smith
Outer Join (cont)
 Full Outer Join
(
)
 preserves all tuples in both relations
bname
Downtown
Redwood
Perry
NULL
lno
L-170
L-230
L-260
L-155
amt
3000
4000
1700
NULL
cname
Jones
Turner
NULL
Smith
Update ()
1) Deletion: r  r – s
 account  account –  bname = Perry (account)
2) Insertion: r  r  s
 branch  branch  {( BU, Boston, 9M)}
3) Update: r 
 account 
pe1, e2, …, en (r)
p bname, acct_no, bal * 1.05 (account)