Normalization

Download Report

Transcript Normalization

Chapter 7
Normalization
Chapter 14 & 15 in Textbook
Database Design
Steps in building a database for an application:
Real-world
domain
Normalization
Conceptual
model
DBMS data
model
Create
Schema
(DDL)
Modify
data (DML)
2
How to produce a good relation
schema?
1. Start with a set of relation.
2. Define the functional dependencies for the relation to specify the PK.
3. Transform relations to normal form.
Normalization
3
Data Redundancy
STAFFBRANCH
StaffNo
FName
LName position
Salary BrnNo
Address
City
SL21
John
White
Manager
30000 B005
22 Deer Rd
London
SG37
Ann
Beech Assistant
12000 B003
163
163Main
MainSt
St
Glasgow
SG14
David
Ford
Supervisor 18000 B003
163
163 Main
Main St
St
Glasgow
SA9
Mary
Howe
Assistant
9000
16 Arglly St
Aberdeen
SG5
Susan
Brand
Manager
B003
24000 B003
163 Main
Main St
St
163
Glasgow
Glasgow
SL41
Julie
Lee
Assistant
9000
B007
B005
22 Deer Rd
London
Relations that have redundant data may have update anomalies (insert, modify, delete)
Normalization
4
STAFF
StaffNo
FName
LName position
Salary
BrnNo
SL21
John
White
Manager
30000
B005
SG37
Ann
Beech Assistant
12000
B003
SG14
David
Ford
Supervisor 18000
B003
SA9
Mary
Howe
Assistant
9000
B007
SG5
Susan
Brand
Manager
24000
B003
SL41
Julie
Lee
Assistant
9000
B005
BRANCH
BrnNo
Address
City
B005
22 Deer Rd
London
B003
163 Main St
Glasgow
B007
16 Arglly St
Aberdeen
Normalization
5
Relation Decomposition
Normalization process involve decomposing a relation.
Decomposition require to be reversible.
Functional dependencies guarantee decomposition to be reversible.
While normalization, two important properties associated with decomposition:
1. Lossless-join
2. Dependency preservation
Normalization
6
STAFF
StaffNo
FName
City
LName position
Salary
30000 London
SL21
John
White
SG37
Ann
Beech Assistant
SG14
David
Ford
Supervisor 18000 Glasgow
SA9
Mary
Howe
Assistant
9000
SG5
Susan
Brand
Manager
24000 Glasgow
SL41
Julie
Lee
Assistant
9000
Manager
12000 Glasgow
London
London
BRANCH
BrnNo
Address
City
B005
22 Deer Rd
London
B003
163 Main St
Glasgow
B007
16 Arglly St
London
Normalization
7
Data Redundancy
STAFFBRANCH
StaffNo
FName
LName position
Salary BrnNo
30000 B005
30000 B007
12000 B003
SL21
SL21
SG37
John
John
Ann
White Manager
White Manager
Beech Assistant
SG14
David
Ford
Supervisor 18000 B003
SA9
Mary
Howe
Assistant
9000
SA9
Mary
9000
SG5
Susan
Howe Assistant
Brand Manager
SL41
Julie
Lee
Assistant
9000
SL41
Julie
Lee
Assistant
9000
Normalization
Address
22 Deer Rd
16 Arglly St
City
163 Main St
London
London
Glasgow
163 Main St
Glasgow
B005
22 Deer Rd
London
B007
24000 B003
16 Arglly St
163 Main St
London
Glasgow
B005
22 Deer Rd
London
B007
16 Arglly St
London
8
Functional Dependencies
Describes the relationship between attributes in a relation.
If A and B are attributes of relation R,
B is functionally dependent on A, denoted by A
B, if each value of A is associated
with exactly one value of B. B may have several values of A.
A
Determinant
B is functionally
dependent on A
B
Dependent
• Functional dependency is identifies between attributes in a relation at different times
(all time functional dependency).
Normalization
9
Functional Dependencies
If t & u agree here
A
Then they must agree here
B
t
u
A
B
whenever two tuples t & u agree on all attributes of A, then they must agree on
attribute B.
Normalization
10
Functional Dependencies
Example
1:1 or M:1
relationship
between
attributes in a
relation
1:M
relationship
between
attributes in a
relation
StaffNo
SL21
StaffNo is NOT functionally
position
dependent on position
Manager
Normalization
Position is functionally
dependent on Staffno
position
Manager
StaffNo
SL21
SG5
11
Trivial Functional Dependencies
A
B is trivial if B  A
StaffNo, Sname
SName
StaffNo, SName
StaffNo
We are not interested in trivial functional dependencies as it provides no genuine
integrity constraints on the value held by these attributes.
Normalization
12
StaffBranch Example
Functional dependencies on StaffBranch relation:
StaffNo
BranchNo
FName, Lname, position, salary, brnNo, Address, city
Address, city
Address, city
BranchNo, position
Address, city, position
BranchNo
salary
salary
Determinants:
StaffNo, BranchNo, (Address, city), (branchNo, position), and (address, city, position)
Normalization
13
Identifying the PK
Purpose of functional dependency, specify the set of integrity constraints that must
hold on a relation.
The determinant attribute(s) are candidate of the relation, if:
• 1:1 relationship between determinant & dependent.
• No subset of determinant attribute(s) is a determinant. (nontrivial)
If (A, B)
C, then NOT A
B, and NOT B
A
• All attributes that are not part of the CK should be functionally dependent on the
key:
CK
all attributes of R
• Hold for all time.
PK is the candidate attribute(s) with the minimal set of functional dependency.
Normalization
14
Closure
Closure (inferred from) X+: The set of functional dependencies that are implied by
a given set of functional dependencies X.
If t & u agree here
CB
X
A
A
Then they must agree here
B
C
B
t
X+
A
C
u
So surely they will agree here
Normalization
15
Closure Example
S
S+
Normalization
BranchNo
BranchNo
BranchNo
(Address, city)
Address
city
Implied by
16
Inference Rules for Functional
Dependencies
Armstrong’s aximos (inference rules): The set of inference rules specifies how
functional dependencies can be inferred from given one.
Inference rules:
Reflexivity
If B  A, then A
Augmentation
If A
B, then A,C
Transitivity
If A
B and B
Self-Determination
A
Decomposition
If A
B,C, then A
Union
If A
B and A
Normalization
B
B,C
C, then A
C
A
B and A
C, then A
C
B,C
17
Minimal Sets of Functional
Dependencies
• Complete set of functional dependencies for a relation can be very large.
• We need to reduce the set to a manageable size, by applying the inference rules
repeatedly until they stop producing new FDs.
Assume S1 & S2 are set of dependencies:
S1  S2, then
(S2 is a cover for S1)
OR
(S1 is covered by S2)
if S2 is a cover for S1
S1 equivalent to S2
& S1 is a cover for S2
Normalization
18
Minimal Sets of Functional
Dependencies
A set of functional dependencies X is minimal if it satisfies the following:
1. Every dependency in X has a single attribute for its right-hand side.
2. Can’t replace any dependency A
B in X with C
a set of dependencies equivalent to X.
B , where C  A, & still have
3. Can’t remove any dependency from X and still have a set of dependencies that is
equivalent to X.
Normalization
19
Minimal Sets of Functional
Dependencies
1. For each X
{A1, A2, .. An}, create X
2. A, B
C is equivalent to B
3. X - {A
B} equivalent to X, then remove A
Normalization
A1, X
A2, …., X
C, then replace A, B
C with B
An.
C.
B.
20
Question
Find the minimal set of the following FDs:
Fd1: B  A
Fd2: D  A
Fd3: A,B  D
Normalization
21
Question
Find FDs of the relation shown below that lists dentist/patient appointment data;
known that:
• A patient is given an appointment at a specific time and date with a dentist located at a
particular surgery.
• On each day of patient appointments, a dentist is allocated to a specific surgery for that
day.
Dentist-patient (staffNo, dentistName, aDate, aTime, patNo, patName, surgeryNo)
Normalization
22
The Purpose of Normalization
Normalization is a bottom-up approach to database design that begins by examining
the relationships between attributes. It is performed as a series of tests on a relation to
determine whether it satisfies or violates the requirements of a given normal form.
Purpose:
Guarantees no redundancy due to FDs
Guarantees no update anomalies
Normal Forms:
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
23
The Process of Normalization
Normalization is a technique for analyzing relations based on their CK & FD.
1NF
2NF
3NF
BCNF
4NF
5NF
Higher Normal
Form
Normalization
24
First Normal Form (1NF)
Unnormalized form (UNF): A relation that contains one or more repeating groups.
First normal form (1NF): A relation in which the intersection of each row and
column contains one & only one value.
CLIENT_PROPERTY
ClientNo
Name
PropertyNo
CR76
John Key
PG4
PG16
CR56
Aline Stewart
PG4
PG36
PG16
Unnormalized relation
Normalization
25
UNF
1NF
Approach 1
Expand the key so that there will be a separate tuple in the original relation for
each repeated attribute(s). Primary key becomes the combination of primary key
and redundant value.
CLIENT_PROPERTY
ClientNo
Name
PropertyNo
CR76
CR76
John Key
John Key
PG4
PG16
CR56
Aline Stewart
CR56
CR56
Aline Stewart
Aline Stewart
PG4
PG36
PG16
1NF relation
Disadvantage: introduce redundancy in the relation.
Normalization
26
UNF
1NF
Approach 2
If the maximum number of values is known for the attribute, replace repeated
attribute (PropertyNo) with a number of atomic attributes (PropertyNo1,
PropertyNo2, PropertyNo3).
CLIENT_PROPERTY
ClientNo
CR76
CR56
Name
John Key
Aline Stewart
PropertyNo1 PropertyNo2 PropertyNo3
PG4
PG16
NULL
PG4
PG36
PG16
1NF relation
Disadvantage: introduce NULL values in the relation.
Normalization
27
UNF
1NF
Approach 3
Remove the attribute that violates the 1NF and place it in a separate relation along
with a copy of the primary key.
PROPERTY
CLIENT
ClientNo
Name
CR76
John Key
CR56
Aline Stewart
1NF relation
ClientNo PropertyNo
CR76
CR76
PG4
PG16
CR56
PG4
PG36
CR56
CR56
PG16
1NF relation
Normalization
28
Full Functional Dependency
If A and B are attributes of a relation.
B is fully functionally dependent on A if B is functionally dependent on A, but not
on any proper subset of A.
B is partial functional dependent on A if some attributes can be removed from A
& the dependency still holds.
StaffNo, Sname
BranchNo
Partial dependency
ClientNo, PropertyNo
RentDate
Full dependency
Normalization
29
Second Normal Form (2NF)
Second normal form (2NF): A 1NF relation in which every attribute is fully
nontrivial functionally dependent on the PK. (non-prime attributes fully dependent
on PK.)
Applies to relations with composite primary keys & partial dependencies.
CLIENT_RENTAL
ClientNo PropertyNo cName pAddress RentStart RentFinish Rent OwnerNo OName
1NF relation
Normalization
30
1NF
2NF
1. Start with 1NF relation.
2. Find the FDs of a relation.
3. Test the FDs whose determinant attribute is part of the PK.
Normalization
31
1NF
2NF
CLIENT_RENTAL
ClientNo PropertyNo cName pAddress RentStart RentFinish Rent OwnerNo OName
(ClientNo, PropertyNo)
PK
ClientNo, PropertyNo
RentStart, RentFinish
Full Dependency
ClientNo
CName
Partial Dependency
PropertyNo
Paddress, Rent, OwnerNo, Oname
Partial Dependency
OwnerNo
OName
ClientNo, RentStart
PropertyNo, pAddress, RentFinish, Rent, OwnerNo, Oname
PropertyNo, RentStart
ClientNo, cName, RentFinish
Normalization
32
1NF
2NF
4. Remove partial dependencies by placing the functionally dependent attributes in
a new relation along with a copy of their determinants.
CLIENT
ClientNo
RENTAL
cName
ClientNo PropertyNo RentStart RentFinish
2NF relation
2NF relation
PROPERTY_OWNER
PropertyNo pAddress
Rent OwnerNo OName
2NF relation
Normalization
33
Transitive Dependency
A, B, C are attributes of a relation, such that:
If A
B and B
C, then C is transitively dependent on A via B.
Provided A is NOT functionally dependent on B or C (nontrivial FD).
Example:
StaffNo
BranchNo ,
StaffNo
Address
Normalization
BranchNo
Address
34
Third Normal Form (3NF)
Third normal form (3NF): A 2NF relation in which NO non-prime attribute is
transitively dependent on the PK.
CLIENT
ClientNo
RENTAL
cName
ClientNo PropertyNo RentStart RentFinish
3NF relation
3NF relation
PROPERTY_OWNER
PropertyNo pAddress
Rent OwnerNo OName
2NF relation
Normalization
35
2NF
3NF
1. Identify the PK in the 2NF relation.
2. Identify FDs in this relation.
3. If transitive dependencies exist, place transitively dependent attributes in a new
relation along with a copy of their determinants.
OWNER
OwnerNo OName
3NF relation
Normalization
PROPERTY_FOR_RENT
PropertyNo
pAddress rent OwnerNo
3NF relation
36
Review of Decompositions
CLIENT_RENTAL
1NF
2NF
CLIENT
RENTAL
3NF
CLIENT
RENTAL
Normalization
PROPERTY_OWNER
OWNER
PROPERTY_FOR_RENT
37
General Definition of 2NF & 3NF
Second normal form (2NF): A 1NF relation in which every non-primary-key attribute
is fully functionally dependent on the CK.
Third normal form (3NF): A 2NF relation in which NO non-primary-key attribute in a
nontrivial FD is transitively dependent on the CK.
Normalization
38
Boyce-Codd Normal Form
(BCNF)
Boyce-Codd normal form (BCNF): A 3NF relation in which every determinant in a
nontrivial FD is a CK.
Difference between 3NF & BCNF:
A
B
• 3NF allows A NOT CK.
• BCNF insists on A is a CK.
Potential to violate BCNF may occur in a relation that:
•
Contain two (or more) composite CKs.
•
CKs overlap. (at least one attribute in common).
Normalization
39
Boyce-Codd Normal Form
(BCNF)
A
B
C
D
3NF but not BCNF
Normalization
40
3NF
BCNF
1. Examine FDs for a relation.
2. If determinant is NOT a CK, decompose relation into 2 relations.
CLIENT_INTERVIEW
ClientNo Int_Date
ClientNo, Int_Date
RoomNo, Int_Date, Int_Time
Normalization
StaffNo
RoomNo
Int_Time, StaffNo, RoomNo
StaffNo, Int_Date, Int_Time
StaffNo, Int_Date
Int_Time
ClientNo
StaffNo, ClientNo
RoomNo
41
3NF
BCNF
3. Remove non-CK dependencies by placing the functionally dependent attributes
in a new relation along with a copy of their determinants.
STAFF_ROOM
StaffNo
Int_Date
INTERVIEW
RoomNo
BCNF relation
Normalization
ClientNo Int_date
Int_time
StaffNo
BCNF relation
42
Review Example
STAFF_PROPERTY_INSPECTION
pAddress
Pno
PG4
iDate
iTime
comments
StaffNo
sName
CarReg
Lawrence St, 18-Oct-00 10:00
Replace crockery
SG37
Ann
M23JGR
Glasgow
22-Apr-01 09:00
Good order
SG14
David
M53HDR
1-Oct-01
Damp rot
SG14
David
N72HFR
Replace carpet
SG14
David
M53HDR
Good condition
SG37
Ann
N72HFR
12:00
PG16 5 Novar Dr., 22-Apr-01 13:00
Glasgow
24-Oct-01 14:00
Unnormalized relation
Normalization
43
UNF
1NF
STAFF_PROPERTY_INSPECTION
StaffNo
sName
CarReg
PG4 Lawrence St, Glasgow 18-Oct-00 10:00 Replace crockery
SG37
Ann
M23JGR
PG4 Lawrence St,Glasgow
22-Apr-01 09:00 Good order
SG14
David
M53HDR
PG4 5 Novar Dr., Glasgow
1-Oct-01
SG14
David
N72HFR
PG16 5 Novar Dr., Glasgow
22-Apr-01 13:00 Replace carpet
SG14
David
M53HDR
PG16 5 Novar Dr., Glasgow
24-Oct-01 14:00 Good condition
SG37
Ann
N72HFR
Pno
pAddress
iDate
iTime
comments
12:00 Damp rot
1NF
Normalization
44
1NF
2NF
STAFF_PROPERTY_INSPECTION
Pno
iDate
pAddress
iTime
comments
StaffNo
sName
CarReg
Pno, iDate
iTime, comments, StaffNo, sName, carReg
Pno
pAddress
Partial Dependency
StaffNo
Sname
iDate, StaffNo
CarReg
iDate, iTime, CarReg
Pno, pAddress, comments, StaffNo, Sname
iDate, iTime, StaffNo
Pno, pAddress, Comments
45
1NF
2NF
PROPERTY
Pno
pAddress
Pno
pAddress
2NF
PROPERTY_INSPECTION
Pno
iDate
iTime
comments
StaffNo
sName
CarReg
2NF
Pno, iDate
iTime, comments, StaffNo, Sname, CarReg
StaffNo
Sname
Transitive Dependency
iDate, StaffNo
CarReg
iDate, iTime, CarReg
Pno, comments, StaffNo, Sname
iDate, iTime, StaffNo
Pno, comments
Normalization
46
2NF
3NF
PROPERTY
pAddress
Pno
3NF
STAFF
sName
StaffNo
3NF
PROPERTY_INSPECTION
Pno
iDate
iTime
comments
StaffNo
CarReg
3NF
PROPERTY(Pno, pAddres)
STAFF(StaffNo, sName)
PROPERTY_INSPECT(Pno, iDate, iTime, comments, staffNo, CarReg)
Normalization
47
3NF
BCNF
PROPERTY_INSPECTION
Pno
iDate
iTime
comments
StaffNo
CarReg
3NF
Pno, iDate
iTime, comments, staffNo, CarReg
StaffNo, iDate
carReg
CarReg, iDate, iTime
pno, comments, staffNo
StaffNo, iDate, iTime
pno, comments
PROPERTY(Pno, pAddres)
STAFF(StaffNo, sName)
STAFF_CAR(StaffNo, iDate, CarReg)
PROPERTY_INSPECT(pno, iDate, iTime, comments, StaffNo)
Normalization
48
Multi-Valued Dependency
(MVD)
Represents a dependency between attributes A, B, C in a relation, such that for
each value of A, there is a set of values for B and a set of values for C. However,
the set of values for B & C are independent of each others.
Denoted by:
Example:
A
B,
A
C
BRANCH_STAFF_OWNER
BranchNo SName
B003
B003
B003
B003
BranchNo
Normalization
Ann
David
Ann
David
SName, BranchNo
OName
Carol
Carol
Tina
Tina
OName
49
Trivial MVD
A
B trivial MVD if:
BA
OR
AB=R
Normalization
50
Fourth Normal Form (4NF)
Fourth normal form (4NF): A BCNF relation with NO nontrivial MVD.
BRANCH_STAFF_OWNER
BranchNo
B003
B003
B003
B003
SName
OName
Ann
David
Ann
David
Carol
Carol
Tina
Tina
BCNF relation
Normalization
51
BCNF
4NF
1. Start with a BCNF relation.
2. Examine FDs for a relation.
3. If nontrivial MVD exists, remove the MVD by placing the attributes in a new
relation along with a copy of their determinant.
BRANCH_STAFF
BRANCH_OWNER
BranchNo
BranchNo
SName
Ann
David
B003
B003
4NF
Normalization
OName
Carol
Tina
B003
B003
4NF
52
Lossless-Join Dependency
A property of decompostion, which ensures that no spurious tuples are generated
when relations are reunited through a natural join operation.
Objectives:
Preserve all the data in the original relation
Does not result in the creation of additional spurious tuples
Normalization
53
Join Dependency
A, B, .., Z attributes in relation R satisfies join dependency if
Every legal value of R is equal to the join of its projections on A, B, .., Z
Normalization
54
Fifth Normal Form (5NF)
Fifth normal form (5NF): A 4NF relation with NO join dependency.
PROPERTY_ITEM_SUPPLIER
PropertyNo I_Description SupplierNo
PG4
PG4
PG16
Normalization
Bed
Chair
Bed
S1
S2
S2
Illegal State
55
4NF
PROPERTY_ITEM
PropertyNo I_Description
PG4
PG4
PG16
Bed
Chair
Bed
5NF
ITEM_SUPPLIER
I_Description
Bed
Chair
Bed
PROPERTY_ITEM
SupplierNo
PropertyNo
S1
S2
S2
PG4
PG4
PG16
SupplierNo
S1
S2
S2
PROPERTY_ITEM_SUPPLIER
PropertyNo I_Description SupplierNo
PG4
PG4
PG16
PG4
Bed
Chair
Bed
Bed
S1
S2
S2
S2
Legal State
56
Question
Given the following Dentist-patient database schema:
Dentist-patient (staffNo, dentistName, aDate, aTime, patNo, patName, surgeryNo)
Normalize the above relation, showing appropriate dependency diagrams to justify
decomposition.
Normalization
57