Lecture 12: Further relational algebra, further SQL www.cl.cam.ac.uk/Teaching/current/Databases/ Today’s lecture • Where does SQL differ from relational model? • What are some other features of SQL? •

Download Report

Transcript Lecture 12: Further relational algebra, further SQL www.cl.cam.ac.uk/Teaching/current/Databases/ Today’s lecture • Where does SQL differ from relational model? • What are some other features of SQL? •

Lecture 12:
Further relational algebra,
further SQL
www.cl.cam.ac.uk/Teaching/current/Databases/
1
Today’s lecture
• Where does SQL differ from relational
model?
• What are some other features of SQL?
• How can we extend the relational algebra
to match more closely SQL?
2
Duplicate rows
• Consider our relation instances from
lecture 6, Reserves, Sailors and Boats
• Consider
SELECT rating,age
FROM
Sailors;
• We get a relation that doesn’t satisfy our
definition of a relation!
• RECALL: We have the keyword
DISTINCT to remove duplicates
3
Multiset semantics
• A relation in SQL is really a multiset or
bag, rather than a set as in the relational
model
– A multiset has no order (unlike a list), but
allows duplicates
– E.g. {1,2,1,3} is a bag
– select, project and join work for bags as well
as sets
• Just work on a tuple-by-tuple basis
4
Bag operations
• Bag union:
– Sum the number of times that an element appears in
the two bags, e.g.
• {1,2,1}{1,2,3} = {1,1,1,2,2,3}
• Bag intersection:
– Take the minimum of the number of occurrences in
each bag, e.g.
• {1,2,1}{1,2,3,3} = {1,2}
• Bag difference:
– Proper-subtract the number of occurrences in the two
bags, e.g.
• {1,2,1}-{1,2,3,3} = {1}
5
Laws for bags
• Note that whilst some of the familiar (settheoretic) laws continue to hold, some of
them do not
• Example: R(ST) = (RS)(RT) ??
6
Extended relational
algebra
Add features needed for SQL
1.
2.
3.
4.
5.
Bag semantics
Duplicate elimination operator, 
Sorting operator, 
Grouping and aggregation operator, 
Outerjoin operators, oV, Vo, oVo
7
Duplicate-elimination
operator
• (R) = relation R with any duplicated
tuples removed
• R= A B
(R)= A B
1
3
1
2
4
2
1
3
2
4
• This is used to model the DISTINCT
feature of SQL
8
Sorting
• L1,… Ln(R) returns a list of tuples of R, ordered according
to the attributes L1, …, Ln
• Note:  does not return a relation
• R=
B(R)= [(5,2),(1,3),(3,4)]
A
B
1
3
3
4
5
2
• ORDER BY in SQL, e.g.
SELECT *
FROM Sailors
WHERE rating>7
ORDER BY age, sname;
9
Extended projection
• SQL allows us to use arithmetic operators
SELECT age*5
FROM
Sailors;
• We extend the projection operator to allow the
columns in the projection to be functions of one or
more columns in the argument relation, e.g.
• R=
A+B,A,A(R)=
A
B
A+B A.1 A.2
1
3
2
4
3
7
1
3
1
3
10
Arithmetic
• Arithmetic (and other expressions) can not
be used at the top level
– i.e. 2+2 is not a valid SQL query
• How would you get SQL to compute 2+2?
11
Aggregation
• SQL provides us with operations to summarise a
column in some way, e.g.
SELECT COUNT(rating)
FROM
Sailors;
SELECT COUNT(DISTINCT rating)
FROM
Sailors;
SELECT COUNT(*)
FROM
Sailors WHERE rating>7;
• We also have SUM, AVG, MIN and MAX
12
Grouping
• These aggregation operators have been
applied to all qualifying tuples. Sometimes
we want to apply them to each of several
groups of tuples, e.g.
– For each rating, find the average age of the
sailors
– For each rating, find the age of the youngest
sailor
13
GROUP BY in SQL
SELECT [DISTINCT] target-list
FROM
relation-list
WHERE
qualification
GROUP BY grouping-list;
• The target-list contains
1. List of column names
2. Aggregate terms
– NOTE: The variables in target-list must be
contained in grouping-list
14
GROUP BY cont.
For each rating, find the average age of the sailors
SELECT
rating,AVG(age)
FROM
Sailors
GROUP BY rating;
For each rating find the age of the youngest sailor
SELECT
rating,MIN(age)
FROM
Sailors
GROUP BY rating;
15
Grouping and aggregation
• L(R) where L is a list of elements that are
either
– Individual column names (“Grouping
attributes”), or
– Of the form (A), where  is an aggregation
operator (MIN, SUM, …) and A is the column
it is applied to
• For example,
rating,AVG(age)(Sailors)
16
Semantics
• Group R according to the grouping
attributes
• Within each group, compute (A)
• Result is the relation consisting of one
tuple for each group. The components of
that tuple are the values associated with
each element of L for that group
17
Example
• Let R=
bar
beer
price
Anchor
Anchor
Mill
6X
2.50
Adnam’s 2.40
6X
2.60
Mill
Eagle
Fosters
Fosters
2.80
2.90
• Compute beer,AVG(price)(R)
18
Example cont.
1. Group according to the grouping attribute,
beer: bar
beer
price
Anchor
6X
2.50
Mill
6X
2.60
Anchor
Adnam’s 2.40
Mill
Fosters
2.80
Eagle
Fosters
2.90
beer
2. Compute average
6X
of price within groups:
price
2.55
Adnam’s
2.40
Fosters
2.85
19
NULL values
• Sometimes field values are unknown (e.g.
rating not known yet), or inapplicable (e.g. no
spouse name)
• SQL provides a special value, NULL, for both
these situations
• This complicates several issues
–
–
–
–
Special operators needed to check for NULL
Is NULL>8? Is (NULL OR TRUE)=TRUE?
We need a three-valued logic
Need to carefully re-define semantics
20
NULL values
• Consider
INSERT INTO Sailors (sid,sname)
VALUES (101,”Julia”);
SELECT * FROM Sailors;
SELECT rating FROM Sailors;
SELECT sname
FROM
Sailors
WHERE rating>0;
21
Entity integrity constraint
• An entity integrity constraint states that
no primary key value can be NULL
22
Outer join
• Note that with the usual join, a tuple that
doesn’t ‘join’ with any from the other
relation is removed from the resulting
relation
• Instead, we can ‘pad out’ the columns with
NULLs
• This operator is called an full outer join,
written oVo
23
Example of full outer join
• Let R=
A
B
1
3
2
4
• Then RVS =
• But RoVoS =
Let S=
A
B
C
3
4
5
B
C
4
5
6
7
A
B
C
1
3
2
4
NULL
NULL 6
5
7
24
Outer joins in SQL
• SQL/92 has three variants:
– LEFT OUTER JOIN (algebra: oV)
– RIGHT OUTER JOIN (algebra: Vo)
– FULL OUTER JOIN (algebra: oVo)
• For example:
SELECT *
FROM
Reserves r LEFT OUTER JOIN
Sailors s
ON
r.sid=s.sid;
25
Views
• A view is a query with a name that can be used
in further SELECT statements, e.g.
CREATE VIEW
ExpertSailors(sid,sname,age)
AS SELECT sid,sname,age
FROM
Sailors
WHERE rating>9;
• Note that ExpertSailors is not a stored
relation
• (WARNING: mysql does not support views )
26
Querying views
• So an example query
SELECT sname
FROM
ExpertSailors
WHERE age>27;
• is translated by the system to the
following:
SELECT sname
FROM
Sailors
WHERE rating>9 AND age>27;
27
Summary
You should now understand:
• Multi-set semantics
• Conditions
• Aggregation
• GROUP BY
• NULLs, entity ICs and outer joins
• Views
• Extensions to the core relational algebra
28