Transcript notes

Schema Refinement and
Normal Forms
•Given a design, how do we know
it is good or not?
Conceptual design
•What is the best design?
•Can a bad design be
transformed into a good one?
Schemas
ICs
Normalization
A relation is said to be in a particular
normal form if it satisfies a certain set of
constraints.
•
If a relation is in a certain normal form (BCNF,
3NF etc.), we know what problems it has and
what problems it does not have
Each normal eliminates or minimizes certain kinds of
problems
•
•
Given a relation, the process of making it to be
in certain normal form is called normalization
•Typically
this is done by breaking up the relation into a
set of smaller relations that possess desirable
properties.
Boyce-Codd Normal Form (BCNF)
A relation R is in BCNF if whenever a FD XA holds in R,
one of the following statements is true.
•XA is a trivial FD.
•X is a superkey.
key
•
•
nonkey attr_1
A trivial FD
XY where Y
nonkey attr_2
…
 X
nonkey attr_n
Example 1:
−
Scheme: Hourly_Emps (ssn, name, lot, rating, hrly_wages, hrs_worked)
−
Constraints: ssn is the primary key and Rating  hrly_wages
Example 2:
−
Schema: R(A, B, C, D)
−
Constraints: A is the primary key, B is a candidate key, is R a BCNF?
BCNF is the most desirable form
A BCNF relation does not allow redundancy
•
Every field of every tuple records a piece of
information that cannot be inferred from the
values in all other non-key fields
Normalization
If a relation is not in BCNF, can we make it BCNF?
Example
Relation R(SNLRWH) has FDs SSNLRWH and RW
• Second FD causes violation of BCNF
–
•
consequence: W values repeatedly associated with R values
We decompose SNLRWH into SNLRH and RW
Normalization through
Decomposition
•
A decomposition of a relation schema R
•
•
The replacement of the schema R by two or more
relation schemas, each contains a subset of R and
together include all attributes of R.
A decomposition must ensure two properties:
•
•
Lossless join
Dependency preservation
Lossless Join Decompositions
• Decomposition of R into X and Y is a losslessjoin decomposition w.r.t. a set of FDs F if, for
every instance r that satisfies F:
–
 X (r)   Y (r) = r
• It is always true that r   X (r)   Y (r)
–
In general, the other direction does not hold! If it
does, the decomposition is lossless-join.
• It is essential that all decompositions used to
deal with redundancy be lossless!
•A decomposition of D={R1,R2,…,Rm} of Relation R
has the lossless join property with respect to the
set of FDs F on R if for every relation instance r(R)
that satisfies F, the following holds:
NATURAL_JOIN(  R1 (r ),..., Rm (r ) ) =r
Lossless Join Decomposition: Property 1
Property 1: A decomposition D={R1,R2} of R has the lossless join
property with respect to a set of FDs F of R if and only if either
R1  R 2  R1 is in F+ or
+.
R1  R 2  R 2 is in F
Case 1:
R1
A
The common attribute must be
a super key for either R1 or R2.
B
R2
B
C
R2
B
C
A foreign key
Case 2: R1
A
B
A foreign key
Lossless Join Decomposition: Property 2
If (i) a decomposition D={R1,…,Rm} of R has the lossless join
property with respect to a set of FDs F on R, and
(ii) a decomposition Di={Q1,…,Q2} of Ri has the lossless join
property with respect to the projection of F on Ri.
then the decomposition D’={R1,R2,…, Ri-1,Q1,…,Qn,Ri+1,…,Rm} of R
has the lossless join property with respect to F.
Decomposition D
Decomposition D3
R
R3
R1 R 2 R3 … R m
Q1 Q2 Q3 … Qn
Decomposition D’
R
R1 R2 Q1 Q2 … Qn R4 … Rm
Lossless Join Decomposition into BCNF relations
Algorithm:
1 Set D{R}
2While there is a relation schema Q in D that is not in BCNF do
begin
Choose a relation schema Q in D that is not in BCNF;
Find a functional dependency XY in Q that violates BCNF;
Replace Q in D by two schemas (Q-Y) and (XUY)
end;
We have (Q  Y )  ( XUY )  X
( XUY )  (Q  Y )  Y
Since XY is in F, (Q  Y )  ( XUY)  ( XUY)  (Q  Y )
D={(Q-Y),(X  Y)} has the lossless join property.
Exercise
Determine whether D={R1,R2, R3} of R(S,E,P,N,L,H) is a
lossless-join decomposition.
R1={S,E}
F={SE, SPH, PNL}
R2={P,N,L}
R3={S,P,H}
DEPENDENCY PRESERVATION
R with a set of FDs F
projection
R1
R2
FR1
FR2
D={R1,…,Rn} is a decomposition of R.
Rn
FRn
The projection of F on Ri (FRi) is defined as:
FRi  {X  Y | X  Y  F  , X  Y  Ri}
D={R1,…,Rn} of R is dependency preserving with respect to F
if (n FRi )  F
.  , meaning that (n FRi )
i 1
is equivalent to F.
i 1
DEPENDENCY PRESERVATION
•We want to preserve the dependencies because each FD in F
represents a constraint on the database.
•We want each original FD to be represented by some
individual relation Ri so we can check the constraint without
joining two or more relations.
•Otherwise, each update would require to do join
operations
Contracts(contractid, supplierid, projectid, deptid, partid, qty, value)
Example: Contracts (C S J D P Q V)
CCSJDPQV
JPC
Is it BCNF?
SDP
JS
DEPENDENCY PRESERVATION
Example: Contracts (C S J D P Q V)
SDP
SDP
CSDJQV
JS
JS
CJDQV
Loss-less join decomposition?
CCSJDPQV
JPC
SDP
JS
DEPENDENCY PRESERVATION
Example: Contracts (C S J D P Q V)
SDP
SDP
CCSJDPQV
JPC
SDP
JS
CSDJQV
JS
JS
CJDQV
Where JPC is in the result of the decomposition?
• To enforce JPC, we need to join the three relations for
each update
An alternative decomposition
Example: Contracts (C S J D P Q V)
J->S
JS
CCSJDPQV
JPC
SDP
JS
CJDPQV
Is this decomposition lossless join and dependency
preserving?
Question
Can any relation be decomposed into BCNF while
ensuring lossless join and dependency preservation?
In general, there may not be a dependency preserving
decomposition into BCNF.
–e.g.,
CSZ, CS  Z, Z  C
–what
–Let’s
consider a decomposition D={ZC,SZ}.
–what
–Is
NF?
NF? Is lossless-join decomposition?
CSZ preserved?
–CS+={C,S,Z};
–R1(ZC);
–R2(SZ);
SZ+= {S,Z,C}; ZC+={Z,C}
FR1={ZC}
FR2={SZZ, SZS}
( FR1  FR 2)= {ZC,SZZ,SZS}+
Is CSZ in
( FR1  FR 2) ?