Document 7771484

Download Report

Transcript Document 7771484

Midterm 3 Revision
Prof. Sin-Min Lee
Department of Computer Science
Relational Model
• Understand the logical structure of the relational
model
– Tables
– Attributes
– Tuples
•
•
•
•
•
Primary keys and entity integrity
Foreign keys and referential Integrity
Normalization and data model quality
Mapping E-R models to the relational model
Exercises
6/13/2016
Structure of the Relational Model
• Table (also called a relation)
– A table of n columns and m rows
– A relational schema typically consists of several
interrelated tables
– No explicit linkage between tables
• Give each table (relation) a meaningful name
– STUDENT, EMPLOYEE, CAR, COMPANY
• Note the distinction between a relation in the
relational model and a relationship in the E-R
model
6/13/2016
Attributes
• Attributes (columns)
– Data we want to know about every instance in the table
• For a student, we may be interested in Major, GPA, Academic
Standing, Work Experience, etc.
• For each attribute, the appropriate domain must be
defined
– Acceptable Majors, Range of acceptable values for
GPA
• Only attributes from the same domain should be
compared
6/13/2016
Tuples
• Tuples (rows)
–
–
–
–
A row represents data about a specific instance
Each row should be uniquely identifiable
No two rows can be identical
The attribute values reflect the characteristics of the
specific instance
• One Representation Approach
– Table_Name(Attribute[1], Attribute[2], …. Attribute[n])
– STUDENT(SSNO, Major, GPA, Work Experience)
– CAR(VIN#, Model, Make, Year)
6/13/2016
Primary Keys and Entity
Integrity
• Basis to ensure entity integrity
– A primary key uniquely identifies every possible
instance (row) in a relational table
– It can be defined by one or multiple attributes
– No component of the primary key of a relational table
can be null
• Useful to have semantically meaningful primary
keys
– STUDENT(SSNO, Major, GPA, Work Experience)
– Course(Course# , Name, Credit Hours)
– REGISTRATION(Student#, Course# , Grade)
6/13/2016
• The primary key must be clearly indicated for
Foreign Keys
• Foreign key is defined as “an attribute of a
relational table that is the primary key of another
relational table.”
• Becomes the basis to logically link tables
• Can be composite
• Consider the all popular (by now!), studentcourse-registration scenario
– STUDENT(SSNO, major, GPA, work experience)
– COURSE(Course#, Name, Credit Hours)
– REGISTRATION(SSNO, Course#,Quarter Taken, Grade)
6/13/2016
Referential Integrity
• A database must not contain any unmatched
foreign key values
• For every foreign key value there is a
corresponding primary key value
• Implications
– A student that is not in the student table cannot register for a course
– A course that is not in the course table cannot be registered for by
students
• A critical aspect of database design and essential
to maintain data integrity
6/13/2016
Normalization
• A process to design good relations
• Not a requirement of the relational model or any
other model
• Attempts to reflect the semantics of the data. This
meaning is determined by the organization.
– What does salary mean? Profits? Stock Price?
• Relations can be in 1 NF, 2 NF, 3 NF, Boyce-Codd
NF, 4 NF, and 5 NF
• We will concern ourselves with 1 NF, 2 NF, and 3
NF.
6/13/2016
First Normal Form
• All attributes must be single-valued
– Each row can assume only one value for a given
attribute
• Consider employees working for a given
department at a given salary level
– EMPLOYEE(EMPNO, department, salary)
• Now consider employees who works for a given
department and receive a raise every year. We are
interested in maintaining data about each
employee’s yearly salary.
•6/13/2016
EMPLOYEE(EMPNO, YEAR, department,
salary)
Second Normal Form
• Consider the following:
– REGISTRATION (SSNO, CNO, quarter taken, grade,
student_major, work experience)
– What’s wrong with this table? What problems can this
create?
• Quarter taken and grade are fully functionally dependent
on the primary key, which is SSNO+CNO
• Student-major and work experience are dependent on part
of the primary key (SSNO), but not on CNO
• Decompose the table into two tables
– Student(SSNO, student_major, work experience)
– Registration(SSNO, CNO, quarter taken, grade)
6/13/2016
2 NF - The Rule
• First, ensure that you are in 1 NF.
• Check if there are any partial functional
dependencies.
• Engage in lossless decomposition to multiple
tables. We use the term “lossless” to emphasize
that no meaning (semantic content) is lost as a
result of such decomposition.
• Notice that these tables can be “joined” back using
the linking information
6/13/2016
Third Normal Form
• Consider the following: A large urban university offers
several types of scholarships. Let’s assume that there are
four types and we have a predefined code for each. The
type of scholarship that a student is eligible for is based on
their major. An analyst sets up the following database:
• (SSNO, name, major, GPA, scholarship type)
– What problems are we likely to encounter? Why?
• Note that scholarship type is not dependent on (determined
by) SSNO. Rather, it is determined by major, which is
determined by SSNO.
• This represents a transitive dependency.
6/13/2016
Third Normal Form
• Decompose the relation into two relations
– Student(SSNO, name, major, GPA)
– Scholarship_Eligibility(Major, Scholarship_Type)
• The 3 NF rule
– Once you are in 2 NF, examine if there are any
transitive dependencies
– Engage in lossless decomposition so as to remove these
transitive dependencies
– You are now in 3 NF!
6/13/2016
Mapping E-R to the Relational
Model
• 1: 1 relationship
– Each entity is defined as a table. Use the entity name as
the table name. The entity identifier becomes the
primary key for the table. The attributes are examined
in terms of their domains and domain constraints are
specified.
– Include the primary key of one of the tables in the
second one. This becomes the foreign key and enables
linking the two tables.
6/13/2016
Mapping E-R to the Relational
Model
• 1:M relationship
– Each entity is defined as a table. Use the entity name as
the table name. The entity identifier becomes the
primary key for the table. The attributes are examined
in terms of their domains and domain constraints are
specified.
– Add the identifier of the first entity (one-side) as an
attribute of the second table (many-side). This becomes
the foreign key and enables linking the two tables.
– Note here that you should not add the identifier of the
second entity (many-side) as an attribute of the first
table (one-side).
6/13/2016
Mapping E-R to the Relational
Model
• M:N relationship
– Each entity is defined as a table. Use the entity name as
the table name. The entity identifier becomes the
primary key for the table. The attributes are examined
in terms of their domains and domain constraints are
specified.
– Create a table with the attributes at the “intersection” of
the two entities - the relationship. The primary key of
this table is a concatenation of the primary keys of the
entities participating in the relationship.
– What are the foreign keys in this case?
6/13/2016
Exercise
• For each of the E-R problems, develop a
relational schema
• Identify the primary keys, foreign keys and
attribute domains
• Comment on the data quality of each of the
relational schema that you have developed
6/13/2016
Overview
•
•
•
•
It is possible to decompose any
relational schema into a set of
relational schemas with the following
properties:
1) Attribute Preserving
2) FDs preserving
3) Lossless Join
The Decomposition Algorithm
•
•
Input: A relational schema R and a set
of FDs F.
Output: A set of relational schemas
R1, R2, ..., Rm
The Decomposition Algorithm
•
•
•
Step 1. Find a minimal cover G for F
Step 2. For each left-hand side X that
appears in G, create a relation schema with
attributes
{X  A1  A2, ... ,  Am}
where X  A1, X  A2, ... , X  Am are all
dependencies in G with X as left-hand side.
Step 3. If none of the relation schemas
contains a key of R, create one more
relation schema that contains attributes that
form a key for R.
Example
•
•
•
•
•
Consider R(A, B, C, D, E) and
F={AB  C, A  BE, C E}
Step 1. minimal cover
Fmin={AC, AB, CE}
Step 2. R1(A,B,C), R2(C,E)
Step 3. Key: {A,D}
we have R3(A,D)
Final Result:
R1(A, B,C), R2(C,E), and R3(A,D)
BCNF Decomposition
•
•
Property LJ1:
A decomposition D={R1, R2} of R has
the lossless join property with respect
to a set of functional dependencies F
on R if and only if either
– the FD ((R1  R2)  (R1 - R2)) is in F+,
or
– the FD ((R1  R2)  (R2 - R1)) is in F+
BCNF Decomposition
•
•
Property LJ2:
If a decomposition D={R1, R2, ..., Rm} of R
has the lossless join property with respect
to a set of functional dependencies F on R,
and if a decomposition D1={Q1, Q2, ... ,Qk}
of Ri has the lossless join property with
respect to the projection of F on Ri, then the
decomposition
D2={R1, R2, ... Ri-1, Q1, Q2, ..., Qk, Ri+1,
..., Rm} of R has the lossless join property
with respect to F
BCNF Decomposition Algorithm
1. Set D  {R}
2. While there is a relation schema Q in D that
is not in BCNF do
begin
choose a relation schema Q in D that is
not in BCNF;
find a functional dependency X  Y in Q
that violates BCNF;
replace Q in D by two schemas
(Q-Y) and (X  Y)
end;
BCNF Decomposition Example
Consider R(A,B,C,D) and
F={A  B, B  C, D  B}
Decompose R into BCNF relations.
Step 1.D={R(A,B,C,D)}
Step 2. Loop 1.R is not in BCNF because A 
B and A is not a superkey
decompose R into R1(A, C, D), and R2(A,
B)
Loop 2. R1 is not in BCNF because A 
C and A is not a superkey
decompose R1 into R11(A, D) and R12(A,
Questions
•
•
•
•
Is the decomposition always unique?
Is a decomposition still useful if it does not
preserve the lossless property?
Is a decomposition still useful if it does not
preserve the dependency preservation
condition?
Is a decomposition still useful if it does not
preserve the attribute preservation
condition?
Review
•
•
•
•
Given R(A, B, C) and F={A->B}
Is R in 2NF?
Is R in 3NF?
Is R in BCNF
Review
•
•
•
•
Given R(A, B, C) and F={A->B, B->C}
Is R in 2NF?
Is R in 3NF?
Is R in BCNF
Review
•
•
•
•
Given R(A, B, C) and F={A->B, B->C}
Is R in 2NF?
Is R in 3NF?
Is R in BCNF
Review
•
•
•
•
•
Given R(A, B, C) and
F={}
Is R in 2NF?
Is R in 3NF?
Is R in BCNF
which is in 3NF but not in BCNF.
Overview
•
•
Given a relation schema R(A1, A2, ... , An).
If R is not in the third normal form (3NF), we
wan to decompose it into a set of relation
schema
D= { R1, R2, ... ,Rm }, where
each Ri is in 3NF, such that the following
conditions are held:
1.
Attribute preservation condition.
2.
Dependency preservation condition
3.
Lossless join condition
Attribute Preservation Condition
•
•
Attribute preservation condition states
that the union of attributes of Ri equal
to the set of attributes of R.
For example: Given R(A, B, C, D)
and the decomposition
D1={ R1(A,B), R2(B,C) and
R3(A,C,D)}. D1 satisfies the attribute
preservation condition.
Attribute Preservation Condition
•
•
Given R(A, B, C, D) and the
decomposition
D2={R1(A, B), R2(B,C), R3(A, C)},
The attribute preservation condition is
violated because D is missing (not
preserved in the decomposition).
Dependency Preservation Condition
•
•
We say that a decomposition D={R1, R2, ...
, Rm} of R is dependency preserving with
respect to F if the union of the projections of
F on each Ri in D is equivalent to F. That is:
((F(R1)  ...  F(Rm))+ = F+
Given a set of dependencies F on R, the
projection of F on Ri, denoted by F(Ri)
where Ri is a subset of R, is the set of
dependencies X  Y in F+ such that the
attributes in X  Y are all contained in Ri.
Dependency Preservation Condition
•
•
Given R(A, B, C, D) and F = { A  B,
B  C, C  D}
Let D1={R1(A,B), R2(B,C), R3(C,D)}
F(R1)={A  B}
F(R2)={B  C}
F(R3)={C  D}
FDs are preserved.
Dependency Preservation Condition
•
•
Given R(A, B, C, D) and F = { A  B,
B  C, C  D}
Let D2={R1(A,B}, R2(B,C), R3(A, D)},
then FDs are not preserved.
Dependency Preservation Condition
•
•
Given R(A, B, C, D) and F = { A  B,
B  C, C  D}
Let D2={R1(A,B}, R2(B,C), R3(A, D)},
then FDs are not preserved.
Dependency Preservation Condition
•
We want to preserve the
dependencies because each
dependency in F represents a
constraint on a database.
Dependency Preservation Condition
•
If one of the dependencies is not
represented by the dependencies on some
individual relation Ri of the decomposition,
we will not be able to enforce this constraint
by looking only at an individual relation,
instead, to enforce the constraint, we will
have to join two or more of the relations in
the decomposition and then check that
functional dependency hold in the result of
the join operation. This is very inefficient.
Lossless Join Property
•
•
A decomposition D = {R1, R2, ..., Rm}
of R has the lossless join property with
respect to the set of dependencies F
on R if for every relation instance r of
R that satisfies F, the following holds:
*(<R1>(r), ... , <Rm>(r)) = r
Lossless Join Property
•
•
•
Consider R(A, B, C, D) and F= {AB C}
Let D1={R1(A, B, C), R2(C, D)}
Let r be one of the legal instance
A
B
C
D
---------------------------a1 b1 c1
d1
a2 b1 c1
d2
a2 b2 c2
d2
Lossless Join Property
•
<R1>:
AB
C
------------------a1
b1 c1
a2
b1 c1
a2
b2 c2
<R2>:
C
D
----------c1
d1
c1
d2
c2
d2
Lossless Join Property
• <R1> * <R2> :
A
B
C
D
-------------------------------a1 b1 c1
d1
a1 b1 c1
d2 <------- Spurious tuple
a2 b1 c1
d1 <-------- Spurious tuple
a2 b1 c1
d2
a2 b2 c2
d2
R1 and R2 are not a lossless decomposition.
Lossless Join Property
•
•
Informally, a decomposition is lossless
if no spurious tuples appear when the
relations in the decomposition are
JOINed.
Thus, decomposition D1 is not
lossless
Testing for the lossless join
property
•
•
Step1: create a matrix S with one row i
for each relation Ri in the
decomposition D, and one column j for
each attribute Aj in R;
Step 2: set S(i,j):=bij for all matrix
entries;
Testing for the lossless join
property
•
Step 3.
for each row i representing relation schema
Ri
for each column j representing attribute Aj
if Ri includes attribute Aj
then set S(i,j):=aj
Testing for the lossless join
property
•
Step 4.
repeat the following until a loop execution
results
in no changes to S
for each functional dependency X  Y in F
for all rows in S which have the same
symbols in the columns
corresponding to
attributes in
X
Testing for the lossless join
property
•
Step 4 – continued .
make the symbols in each column that correspond
to an attribute in Y be the same in all these rows as
follows:
– if any of the rows has an "a" symbol for the
column, set the other rows to the same "a"
symbol in the column
– if no "a" symbol exists for the attribute in any of
the rows choose one of the "b" symbols that
appear in one of the rows for the attribute and
set the other rows to the "b" symbols in the
column;
Testing for the lossless join
property
•
Step 5
If a row is made up entirely of "a" symbols,
then the decomposition has the lossless join
property-otherwise, it does not;
Testing for the lossless join
property
Consider R(A,B,C) and F={AB, B  C}
D = {R1(A,B), R2(B,C)}
A
B
C
A
B
C
------------------------------------------R1 a1
a2 b13 => a1 a2 a3
R2: b21
a2 a3
b21 a2 a3
•
•
Thus, D is a lossless decomposition.
Testing for the lossless join
property
•
•
•
Consider R(A, B, C, D, E) and
F={AB  C, B  D}
D={R1(A, B, C), R2(B,D)}
Is D a lossless decomposition?
Multivalued Dependencies and
Fourth Normal Form
• Formal Definition of Multivalued
Dependency
• Inference Rules for Functional and
Multivalued Dependencies
• Fourth Normal Form
• Lossless (Nonadditive) Join Decomposition
into 4NF Relations
Fourth Normal Form
SSN
PhoneN ChildSSN
111111 123-4444 222222
111111 123-4444 333333
222222 987-6666 444444
222222 555-5555 444444
Person
• Relation has redundant data
• Relation is in BCNF (since there are no non-trivial
FDs)
• Redundancy due to the inability to handle set
values, not to FDs
Multi-Valued Dependency
• Problem: multi-valued (or binary join) dependency
– Definition: If every instance of schema R can be (losslessly)
decomposed using attribute sets (X, Y) such that:
r =  X (r)
 Y (r)
then a multi-valued dependency exists
Ex: Person= SSN,PhoneN(Person)
 SSN,ChildSSN(Person)
Fourth Normal Form
• A schema is in fourth normal form if for
every non-trivial multi-valued dependency:
R=X
Y
either:
- X  Y or Y  X (trivial case) or
- X  Y is a superkey of R
(i.e., X  Y R )
4th Normal Form
• No multivalued dependencies and BCNF
• Create separate tables for each separate
functional dependency
Example
SalesForce (State, SalesPerson)
Delivery (State, Delivery)
State
Sales
State
Delivery
PA
George
PA
UPS
PA
Sue
PA
RPS
NJ
Mike
NJ
UPS
NJ
Valerie
NJ
Truck
Beyond 4th Normal Form
• 5th Normal Form
– Project-Join Normal Form
• Domain Key Normal Form (DKNF)
Assume the relation R contains the following two tuples
R(A B C D)
(12 34)…
(15 67)…
What other tuples must R contain so that A ->-> B and A ->-> C
hold for R ?
Answer: The tuples that must be included due to the two
multi-valued dependency are:
(1 2 6 7)
(1 5 3 4)
(1 2 6 4)
(1 5 3 7)
(1 2 3 7) second round
(1 5 6 4) second round