Transcript ppt
CS 728
Advanced Database Systems
Chapter 15
Database Design Theory: Normalization
Algorithms
1
Chapter Outline
0. Designing a Set of Relations
1. Properties of Relational Decompositions
2. Algorithms for Relational Database Schema
3. Multivalued Dependencies and 4th Normal Form
4. Join Dependencies and 5th Normal Form
5. Inclusion Dependencies
6. Other Dependencies and Normal Forms
2
Designing a Set of Relations (1)
The 1st approach is (Chapter 14) a Top-Down Design
(Relational Design by Analysis):
1. Designing a conceptual schema in a high-level
data model, such as the EER model
2. Mapping the conceptual schema into a set of
relations using mapping procedures.
3. Each of the relations is analyzed based on the
functional dependencies and assigned primary keys,
by applying the normalization procedure to remove
partial and transitive dependencies if any remain.
3
Designing a Set of Relations (2)
The 2nd approach is (Chapter 15) a Bottom-Up Design
(Relational Design by Synthesis - )التأليف:
1. First constructs a minimal set of FDs
Assumes that all possible functional dependencies are
known.
2. a normalization algorithm is applied to
construct a target set of 3NF or BCNF relations.
start by one large relation schema, called the
universal relation, which includes all the database
attributes.
then repeatedly perform decomposition until it is no
longer feasible or no longer desirable, based on the
functional and other dependencies specified by the
database designer.
4
Designing a Set of Relations (3)
Additional criteria may be needed to ensure the set
of relations in a relational database are satisfactory.
Two desirable properties of decompositions:
The dependency preservation property and
The lossless (or nonadditive) join property
5
Designing a Set of Relations (4)
When we decompose a relation schema R with a set of
functional dependencies F into R1, R2, …, Rn we want
Dependency preservation:
Otherwise, checking updates for violation of
functional dependencies may require computing
joins, which is expensive.
Nonadditive (Lossless) join decomposition:
Otherwise decomposition would result in
information loss.
No redundancy:
The relations Ri preferably should be in either BoyceCodd Normal Form or 3NF.
6
Designing a Set of Relations (5)
F = {AB, BC}
Can be decomposed in two different ways:
R1 = (A, B),
R2 = (B, C)
Example: R = (A, B, C)
Lossless-join decomposition:
– R1 R2 = {B} and B BC
Dependency preserving
R1 =
(A, B),
R2 = (A, C)
Lossless-join decomposition:
– R1 R2 = {A} and A AB
Not dependency preserving
– cannot check B C without computing R1 R2
7
Properties of Relational Decompositions (1)
Relation Decomposition and Insufficiency of
Normal Forms:
Universal Relation Schema:
a relation schema R = {A1, A2, …, An} that includes
all the attributes of the database.
Universal relation assumption:
every attribute name is unique.
Decomposition:
The process of decomposing the universal relation
schema R into a set of relation schemas
D = {R1, R2, …, Rm} that will become the relational
database schema by using the functional
dependencies.
8
Properties of Relational Decompositions (2)
Relation Decomposition and Insufficiency of
Normal Forms:
Attribute preservation condition:
Each attribute in R will appear in at least one relation
schema Ri in the decomposition so that no attributes
are “lost”.
Another goal of decomposition is to have each
individual relation Ri in the decomposition D be
in BCNF or 3NF.
Additional properties of decomposition are
needed to prevent from generating spurious
( )المزوّ رtuples.
9
Properties of Relational Decompositions (3)
Example of spurious tuples.
Decomposition of R = (A, B)
R1 = (A)
R2 = (B)
A B
A
B
1
2
A(r)
B(r)
1
2
1
r
A(r)
B(r)
A
B
1
2
1
2
10
Properties of Relational Decompositions (4)
Dependency Preservation Property of a
Decomposition:
It would be useful if each functional dependency X Y
specified in F either
appeared directly in one of the relation schemas in
the decomposition D or
could be inferred from the dependencies that appear
in some Ri .
Informally, this is the
dependency preservation condition.
11
Properties of Relational Decompositions (5)
Dependency Preservation Property of a
Decomposition:
We want to preserve the dependencies because each
dependency in F represents a constraint on the database.
If one of the dependencies is not represented in some
individual relation of the decomposition, we cannot
enforce this constraint by dealing with an individual
relation; instead,
we have to join two or more of the relations in the
decomposition and then check that the functional
dependency holds in the result of the join operation.
This is clearly an inefficient and impractical procedure.
12
Properties of Relational Decompositions (6)
Dependency Preservation Property of a
Decomposition:
It is not necessary that the exact dependencies specified
in F appear themselves in individual relations of the
decomposition D.
It is sufficient that the union of the dependencies that
hold on the individual relations in D be equivalent to F.
We now define these concepts more formally.
13
Properties of Relational Decompositions (7)
Dependency Preservation Property of a
Decomposition:
Definition:
Given a set of dependencies F on R, the
projection of F on Ri, denoted by Ri(F) where
Ri is a subset of R, is the set of dependencies X
Y in F+ such that the attributes in X Y are all
contained in Ri.
Hence, the projection of F on each relation
schema Ri in the decomposition D is the set of
functional dependencies in F+, the closure of F,
such that all their left- and right-hand-side
attributes are in Ri.
14
Properties of Relational Decompositions (8)
Dependency Preservation Property of a
Decomposition:
A decomposition D = {R1, R2, ..., Rm} of R is
dependency-preserving with respect to F if
the union of the projections of F on each Ri
in D is equivalent to F; that is,
((R1(F)) … (Rm(F)))+ = F+
Claim 1: It is always possible to find a
dependency-preserving decomposition D with
respect to F such that each relation Ri in D is in
3NF.
15
Properties of Relational Decompositions (9)
Consider a decomposition of R = (R, F) into
R1 = (R1, F1) and R2 = (R2, F2)
How to compute the projections F1 and F2?
Fi is the projection of FDs in F+ over Ri
Example: R=ABC and F = A→B, B→C, C→A
Let R1=AB and R2=BC
Not enough to let F1 = A→B and F2 = B→C
Consider FDs in F+: B→A and C→B
So F1 = A→B, B→A and F2 = B→C, C→B
Now F and F1 F2 are equivalent
16
Properties of Relational Decompositions (10)
Lossless (Non-additive) Join Property of a
Decomposition:
This property ensures that no spurious tuples are
generated when a NATURAL JOIN operation is applied
to the relations in the decomposition.
Lossless join property: a decomposition D = {R1, R2, ...,
Rm} of R has the lossless (nonadditive) join property
with respect to the set of dependencies F on R if, for
every relation state r of R that satisfies F, the following
holds, where * is the natural join of all the relations in D:
* (R1(r), ..., Rm(r)) = r
r r 1 * r2 * … * rm
and
r1 * r 2 * … * rm r
17
Properties of Relational Decompositions (11)
Consider
Id#
124
789
Name Address C#
Description Grade
Jones Phila
Phil7 Plato
A
Brown Boston Math8 Topology
C
What happens if we decompose on
(Id#, Name, Address) and
(C#, Description, Grade)?
Spurious tuples will be generated
18
Properties of Relational Decompositions (12)
Lossy Decomposition:
Problem: Name is not a key
SSN
1111
2222
3333
Name
Joe
Alice
Alice
r
Address
1 Pine
2 Oak
3 Pine
SSN Name
1111 Joe
2222 Alice
3333 Alice
r1
19
Name
Joe
Alice
Alice
Address
1 Pine
2 Oak
3 Pine
r2
Properties of Relational Decompositions (13)
Lossless (Non-additive) Join Property of a
Decomposition:
Note: The word loss in lossless refers to loss of
information, not to loss of tuples. In fact, for “loss of
information” a better term is “addition of spurious
information”.
Algorithm 15.3: Testing for Lossless Join Property
Input: A universal relation R, a decomposition D =
{R1, R2, ..., Rm} of R, and a set F of functional
dependencies.
20
Properties of Relational Decompositions (14)
1. Create an initial matrix S with one row i for each
relation Ri in D, and one column j for each attribute
Aj in R.
2. Set S(i, j) = bij for all matrix entries.
// each bij is a symbol associated with indices (i, j)
3. For each row i representing relation schema Ri
For each column j representing attribute Aj
if (relation Ri includes attribute Aj) then S(i, j) = aj
// each aj is a symbol associated with index j
21
Properties of Relational Decompositions (15)
4. Repeat until a complete loop execution results in no
changes to S
for each functional dependency X Y in F
for all rows in S which have the same symbols in the
columns corresponding to attributes in X
make the symbols in each column that correspond
to an attribute in Y be the same in all these rows as
follows:
If any of the rows has an “a” symbol for the
column, set the other rows to that same “a”
symbol in the column.
22
Properties of Relational Decompositions (16)
If no “a” symbol exists for the attribute in any
of the rows, choose one of the “b” symbols that
appear in one of the rows for the attribute and
set the other rows to that same “b” symbol in the
column
5.
If a row is made up entirely of “a” symbols, then the
decomposition has the lossless join property; otherwise it
does not.
23
Properties of Relational Decompositions (17)
Lossless (non-additive) join test for n-ary
decompositions.
(a) Case 1: Decomposition of EMP_PROJ
into EMP_PROJ1 and EMP_LOCS fails test.
(b) A decomposition of EMP_PROJ that has
the lossless join property.
(c) Case 2: Decomposition of EMP_PROJ
into EMP, PROJECT, and WORKS_ON
satisfies test.
24
Properties of Relational Decompositions (18)
25
Properties of Relational Decompositions (19)
26
Properties of Relational Decompositions (20)
Binary Decomposition:
decomposition of a relation R into two relations.
Non-additive (Lossless) Join Test for Binary
decompositions (NJB):
A decomposition D = {R1, R2} of R has the
lossless join property with respect to a set of
functional dependencies F on R if and only if
either
The FD ((R1 ∩ R2) (R1- R2)) is in F+, or
The FD ((R1 ∩ R2) (R2 - R1)) is in F+.
27
Properties of Relational Decompositions (21)
Intuition for Test for Losslessness
Suppose R1 R2 R2. Then a row of r1 can
combine with exactly one row of r2 in the
natural join (since in r2 a particular set of
values for the shared attributes defines a
unique row), i.e.,
R1 R2 is a superkey of R2
R1R2
…………. a
………… a
………… b
………… c
r1
28
R1R2
a ………...
b ………….
c ………….
r2
Properties of Relational Decompositions (22)
Schema (R, F) where
R = {SSN, Name, Address, Hobby}
F = {SSN Name, Address}
can be decomposed into
R1 = {SSN, Name, Address}
F1 = {SSN Name, Address}
and
R2 = {SSN, Hobby}
F2 = { }
Since R1 R2 = SSN and
SSN R1- R2
SSN Name, Address is in F+, then
the decomposition is lossless
29
Properties of Relational Decompositions (23)
Example: WRT the FD set
Id# Name, Address
C# Description
Id#, C# Grade
Is
(Id#, Name, Address) and
(Id#, C#, Description, Grade)
a lossless decomposition?
30
Properties of Relational Decompositions (24)
A relation scheme
{Sname, Sadd, City, Zip, Item, Price}
The FD set
Sname Sadd, City
Sadd, City Zip
Sname, Item Price
Consider the decomposition
{Sname, Sadd, City, Zip} and
{Sname, Item, Price}
Is it lossless?
Is it dependency preserving?
What if we replaced the first FD by
Sname, Sadd City?
31
Properties of Relational Decompositions (25)
The scheme:
{Student, Teacher, Subject}
The FD set:
Teacher Subject
Student, Subject Teacher
The decomposition:
{Student, Teacher} and
{Teacher, Subject}
Is it lossless?
Is it dependency preserving?
32
Properties of Relational Decompositions (26)
Claim 2 (Preservation of non-additivity in successive
decompositions):
If a decomposition D = {R1, R2, ..., Rm} of R has the
lossless (non-additive) join property with respect to
a set of functional dependencies F on R, and
if a decomposition Di = {Q1, Q2, ..., Qk} of Ri has the
lossless (non-additive) join property with respect to
the projection of F on Ri,
then the decomposition D2 = {R1, R2, ..., Ri-1, Q1, Q2,
..., Qk, Ri+1, ..., Rm} of R has the lossless (nonadditive) join property with respect to F.
33
Algorithms for RDB Schema Design (1)
Algorithm 15.4: Relational Synthesis into 3NF with
Dependency Preservation
Input: A universal relation R and a set of functional
dependencies F on the attributes of R.
1.
Find a minimal cover G for F;
2.
For each left-hand-side X of a functional dependency that
appears in G, create a relation schema in D with attributes
{X {A1} {A2} ... {Ak}}, where X A1, X A2, ...,
X Ak are the only dependencies in G with X as lefthand-side (X is the key of this relation) ;
3.
Place any remaining attributes (that have not been placed
in any relation) in a single relation schema to ensure the
attribute preservation property.
34
Algorithms for RDB Schema Design (2)
A set of FDs F is minimal if it satisfies the following
conditions:
Every FD in F is of the form XA, where A is a
single attribute,
We cannot remove any dependency from F and have a
set of dependencies that is equivalent to F.
We cannot replace any dependency XA in F with a
dependency YA, where Y is a proper-subset of X (Y
subset-of X) and still have a set of dependencies that
is equivalent to F.
For no XA in F is F-{XA} equivalent to F.
For no XA in F and YX is F-{XA}{YA} equivalent
to F.
35
Algorithms for RDB Schema Design (3)
Examples
{AC, AB} is a minimal cover for
{ABC, AB}
What about {ABC, B AB, DBC}?
Every set of FDs has an equivalent minimal set
There can be several equivalent minimal sets
There is no simple algorithm for computing a
minimal set of FDs that is equivalent to a set F of
FDs
To synthesize a set of relations, we assume that
we start with a set of dependencies that is a
minimal set.
36
Algorithms for RDB Schema Design (4)
Two sets of FDs F and G are equivalent if:
Every FD in F can be inferred from G, and
Every FD in G can be inferred from F
Hence, F and G are equivalent if F+ = G+
Definition (Covers):
F covers G if every FD in G can be inferred from F
(i.e., if G+ is subset-of F+)
F and G are equivalent if F covers G and G covers F
There is an algorithm for checking equivalence of sets of
FDs
37
Algorithms for RDB Schema Design (5)
Algorithm 15.2 Finding a minimal cover G for F
1. Set G := F.
2. Replace each FD X{A1, A2, ..., Ak} in G by the n
functional dependencies XA1, XA2 , …, XAk.
3. For each FD XA in G
For each attribute B that is an element of X
if ((G -{XA}) {(X-{B})A}) is equivalent to G,
then replace XA with (X-{B})A in G.
4. For each remaining FD XA in G,
if (G-{XA}) is equivalent to G, then remove XA
from G.
38
Algorithms for RDB Schema Design (6)
Example:
{A→B, ABCD→E, EF→GH, ACDF→EG}
Make RHS a single attribute:
{A→B, ABCD→E, EF→G, EF→H, ACDF→E,
ACDF→G}
Minimize LHS: ACD→E instead of ABCD→E
Eliminate redundant FDs
Can ACDF→G be removed?
Can ACDF→E be removed?
Final answer: {A→B, ACD→E, EF→G, EF→H}
39
Algorithms for RDB Schema Design (7)
Minimal Cover Exercise
Compute the minimal cover of the following set
of functional dependencies:
{ABC DE, BD DE, E CF, EG F}
ABC D
ABC E
//
BD D
// reflexive
BD E
EC
EF
EG F
// augmentation
The minimal cover is:
{ABC D, BD E, E C, E F}
40
Algorithms for RDB Schema Design (8)
Example of Algorithm 15.4: (3NF Decomposition)
Consider
the relation R = CSJDPQV
FDs F = C→CSJDPQV, SD→P, JP→C,J→S
Find minimal cover:
{C→J, C→D, C→Q, C→V, SD→P, JP→C,
J→S}
New relations:
R1=CJDQV,
R3=JS,
R2=JPC,
R4=SDP
41
Algorithms for RDB Schema Design (9)
Algorithm 15.5: Relational Decomposition into BCNF with
Lossless (non-additive) join property
Input: A universal relation R and a set of functional
dependencies F on the attributes of R.
1.
2.
Set D = {R}
While there is a relation schema Q in D that is not in BCNF
do {
choose a relation schema Q in D that is not in BCNF;
find a FD XY in Q that violates BCNF;
replace Q in D by two relation schemas (Q-Y) & (XY)
}
42
Algorithms for RDB Schema Design (10)
Example of Algorithm 15.5 (BCNF Decomposition)
R = (branch-name, branch-city, assets, customer-name,
loan-number, amount)
F = {branch-name branch-city, assets
loan-number branch-name, amount}
Key = {loan-number, customer-name}
Decomposition
R1 = (branch-name, branch-city, assets)
R2 = (branch-name, customer-name, loan-number, amount)
R3 = (loan-number, branch-name, amount)
R4 = (loan-number, customer-name)
Final decomposition
R1, R3, R4
43
Algorithms for RDB Schema Design (11)
Example of Algorithm 15.5 (BCNF Decomposition)
R = (A, B, C)
F = {A B, B C}
Key = {A}
R is not in BCNF
Decomposition R1 = (A, B),
R2 = (B, C)
R1 and R2 in BCNF
Lossless-join decomposition
Dependency preserving
44
Algorithms for RDB Schema Design (12)
Algorithm 15.6 Relational Synthesis into 3NF with Dependency
Preservation and Lossless (Non-Additive) Join Property
Input: A universal relation R and a set of functional dependencies F on
the attributes of R.
1.
Find a minimal cover G for F (Use Algorithm 15.2).
2.
For each left-hand-side X of a functional dependency that appears in
G, create a relation schema in D with attributes {Xυ{A1}υ{A2}...υ
{Ak}}, where XA1, XA2, ..., XAk are the only dependencies in G
with X as left-hand-side (X is the key of this relation).
3.
If none of the relation schemas in D contains a key of R, then create
one more relation schema in D that contains attributes that form a
key of R.
4.
Eliminate redundant relations from the resulting set of relations. A
relation T is considered redundant if T is a projection of another
relation S.
45
Algorithms for RDB Schema Design (13)
Algorithm 15.2a Finding a Key K for R Given a
set F of Functional Dependencies
Input: A universal relation R and a set of functional
dependencies F on the attributes of R.
1. Set K = R
2. For each attribute A in K
compute (K - A)+ with respect to F;
If (K - A)+ contains all the attributes in R,
then set K = K - {A}
46
Algorithms for RDB Schema Design (14)
Discussion of Normalization Algorithms:
Problems:
The database designer must first specify all the
relevant functional dependencies among the database
attributes.
These algorithms are not deterministic in general.
It is not always possible to find a decomposition into
relation schemas that preserves dependencies and
allows each relation schema in the decomposition to
be in BCNF (instead of 3NF).
47
Algorithms for RDB Schema Design (15)
Table 15.1 Summary of some of the algorithms discussed
Algorit
hm
Input
Output
Properties/Purpos
e
Remarks
15.3
A decomposition D
of R and a set F of
functional
dependencies
Boolean result:
yes or no for
lossless join
property
Testing for nonadditive join
decomposition
See a simpler test in
Section 11.1.4 for
binary
decompositions
15.4
Set of functional
dependencies F
A set of relations
in 3NF
Dependency
preservation
No guarantee of
satisfying lossless
join property
15.5
Set of functional
dependencies F
A set of relations
in BCNF
Lossless join
decomposition
No guarantee of
dependency
preservation
15.6
Set of functional
dependencies F
A set of relations
in 3NF
Lossless join and
dependency
preserving
decomposition
May not achieve
BCNF
15.2a
Relation schema R
with a set of
functional
dependencies F
Key K of R
To find a key K
(which is a subset
of R)
The entire relation R
is always a default
superkey
48
Multivalued Dependencies and 4th Normal Form (1)
Beyond BCNF:
CustService (State, SalesPerson, Delivery)
State
Sales
Delivery
State
Sales
Delivery
PA
George
UPS
NJ
Mike
UPS
PA
George
RPS
NJ
Mike
Truck
PA
Sue
UPS
NJ
Valerie
UPS
PA
Sue
RPS
NJ
Valerie
Truck
Is this BCNF?
49
Multivalued Dependencies and 4th Normal Form (2)
Everything is in the key -- must be BCNF
Still problems with duplication
Multivalued Dependencies
50
Multivalued Dependencies and 4th Normal Form (3)
At least three attributes (A, B, C)
A B and A C
B and C are independent of each other (they
really shouldn’t be in the same table)
51
Multivalued Dependency and 4NF
Multivalued dependency (MVD)
Consequence of first normal form (1NF)
52
Multivalued Dependency and 4NF
Relations containing nontrivial MVDs
All-key relations
Fourth normal form (4NF)
Violated when a relation has undesirable
multivalued dependencies
53
Multivalued Dependencies and 4th Normal Form (4)
Definition:
A multivalued dependency (MVD) X —>> Y specified on relation
schema R, where X and Y are both subsets of R, specifies the
following constraint on any relation state r of R:
If two tuples t1 and t2 exist in r such that t1[X] = t2[X], then two
tuples t3 and t4 should also exist in r with the following
properties, where we use Z to denote (R - (X υ Y)):
t3[X] = t4[X] = t1[X] = t2[X]
t3[Y] = t1[Y] and t4[Y] = t2[Y].
t3[Z] = t2[Z] and t4[Z] = t1[Z].
An MVD X —>> Y in R is called a trivial MVD if
(a) Y is a subset of X,
or
(b) X υ Y = R
54
Multivalued Dependencies and 4th Normal Form (5)
4th Normal Form
BCNF with no multivalued dependencies
Create separate tables for each separate
functional dependency
55
Multivalued Dependencies and 4th Normal Form (6)
(a) The EMP relation with two MVDs: ENAME —>> PNAME and
ENAME —>> DNAME. (b) Decomposing the EMP relation into
two 4NF relations EMP_PROJECTS and EMP_DEPENDENTS.
56
Multivalued Dependencies and 4th Normal Form (7)
SalesForce (State, SalesPerson)
Delivery (State, Delivery)
State
Sales
State
Delivery
PA
George
PA
UPS
PA
Sue
PA
RPS
NJ
Mike
NJ
UPS
NJ
Valerie
NJ
Truck
57
Multivalued Dependencies and 4th Normal Form (8)
Inference Rules for Functional and Multivalued
Dependencies:
IR1 (reflexive rule for FDs):
If X Y, then X –> Y.
IR2 (augmentation rule for FDs):
IR3 (transitive rule for FDs):
IR4 (complementation rule for MVDs):
IR5 (augmentation rule for MVDs):
IR6 (transitive rule for MVDs):
IR7 (replication rule for FD to MVD):
{X –> Y} XZ –> YZ.
{X –> Y, Y –>Z} X –> Z.
{X —>> Y} X —>> (R – (X Y)).
If X —>> Y and W Z then WX —>> YZ.
{X —>> Y, Y —>> Z} X —>> (Z - Y).
{X –> Y} X —>> Y.
IR8 (coalescence ( )اإلتحادrule for FDs and MVDs):
If X —>> Y and there exists W with the properties that (a)
W Y is empty, (b) W –> Z, and (c) Y Z, then X –> Z.
58
Multivalued Dependencies and 4th Normal Form (9)
A relation schema R is in 4NF with respect to a
set of dependencies F (that includes functional
dependencies and multivalued dependencies), at
least one of the following hold:
X —>> Y is trivial (i.e., Y X or X Y = R)
X is a superkey for schema R
If a relation is in 4NF it is in BCNF
59
Multivalued Dependencies and 4NF (10)
Decomposing a relation state of EMP that is not in 4NF. (a) EMP
relation with additional tuples. (b) Two corresponding 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS.
60
Multivalued Dependencies and 4NF (11)
Algorithm 15.7: Relational decomposition into
4NF relations with non-additive join property
Input: A universal relation R and a set of functional and
multivalued dependencies F.
1.
Set D := { R };
2.
While there is a relation schema Q in D that is not in 4NF
do
{ choose a relation schema Q in D that is not in 4NF;
find a nontrivial MVD X —>> Y in Q that violates
4NF;
replace Q in D by two relation schemas (Q - Y) and
(X υ Y);
};
61
Multivalued Dependencies and 4NF (12)
R =(A, B, C, G, H, I)
F ={ A —>> B, B —>> HI, CG —>> H }
R is not in 4NF since A —>> B and A is not a
superkey for R
Decomposition
R1 = (A, B)
(R1 is in 4NF)
R2 = (A, C, G, H, I)
(R2 is not in 4NF)
R3 = (C, G, H)
(R3 is in 4NF)
R4 = (A, C, G, I)
(R4 is not in 4NF)
Since A —>> B and B —>> HI, A —>> HI, A —>> I
R5 = (A, I)
(R5 is in 4NF)
R6 = (A, C, G)
(R6 is in 4NF)
62