lu10_data_normalization

Download Report

Transcript lu10_data_normalization

Data Normalization
Normal is not something to aspire to, it's
something to get away from.
~ Jodie Foster ~
Activity: Sample User Report
FudgeCo Livery Driver Permissions Report
Driver Driver
Driver Driver
ID # Name
Chg/Hr Territories
101 Bill Melator $100.00 West, North, Central
102 Willie Work
$75.00 South, East
103 Sal Ladd
$75.00 Central, East, North
104 Carol Ling
$100.00 Central, West
Vehicle
Lic Plate
PPF673
PXK3D7T
445GH2
59DLLK
PXK3D7T
663ETMP
667GM8
445GH2
59DLLK
667GM8
PPF673
59DLLK
Vehicle
Make
Cadillac
Chevy
Lincon
Lincon
Chevy
Chevy
Audi
Lincon
Lincon
Audi
Cadillac
Lincon
Vehicle
Model
Escalde
Tahoe
Towncar
Continental
Tahoe
Suburban
A8
Towncar
Continental
A8
Escalde
Continental
Vehicle
Size
M
L
S
S
L
L
M
S
S
M
M
S
Vehicle Permission
Chg/Hr Exp. Date
$ 100.00 12/31/2004
$ 120.00 12/31/2004
$ 80.00
1/30/2005
$ 80.00
4/30/2005
$ 120.00
1/15/2005
$ 120.00
4/1/2005
$ 100.00
9/1/2005
$ 80.00
9/1/2005
$ 80.00 12/31/2004
$ 100.00
7/31/2004
$ 100.00
7/31/2004
$ 80.00
10/1/2004
Can you build the underlying data model from this?
How many tables? What are the relationships?
Understanding Functional Dependence
For attributes A and B, B is functionally
dependent on A means each value in
column A determines one and only one
value in column B.
SSN
Written: A  B
123-45-6789
A determines B
123-46-9987
B is the determinant 123-02-0902
123-02-0993
Ex: SSN  Name
(Name is functionally
dependent on SSN)
Name
George Foreman
Georgeina Forman
George Foreman
George Foreman
Normalization Lingo
Prime attribute = Any attribute which is a
primary key, or in the case of a
composite key is part of a PK
Non-Prime Attribute = Any attribute
which is not part of the PK.
Key Attribute = Prime Attribute
Non-Key Attribute = Non-Prime Attribute
Normalization and FD
 Technically, normalization is just the
analysis of Functional Dependencies of all
columns with respect to the primary key.
 There are three “levels” of analysis:
1. Functional Dependence – any non-prime attributes
which as FD on the PK.
2. Partial Functional Dependence – any non-key
attributes which are FD on part of the PK.
3. Transitive Functional Dependence – any non-key
attributes which are FD on some other non-key
attribute(s).
Activity: IYCDTYCN!
FudgeCo Livery Driver Permissions Report
Driver Driver
Driver Driver
ID # Name
Chg/Hr Territories
101 Bill Melator $100.00 West, North, Central
102 Willie Work
$75.00 South, East
103 Sal Ladd
$75.00 Central, East, North
104 Carol Ling
$100.00 Central, West
Vehicle
Lic Plate
PPF673
PXK3D7T
445GH2
59DLLK
PXK3D7T
663ETMP
667GM8
445GH2
59DLLK
667GM8
PPF673
59DLLK
Vehicle
Make
Cadillac
Chevy
Lincon
Lincon
Chevy
Chevy
Audi
Lincon
Lincon
Audi
Cadillac
Lincon
Vehicle
Model
Escalde
Tahoe
Towncar
Continental
Tahoe
Suburban
A8
Towncar
Continental
A8
Escalde
Continental
Vehicle
Size
M
L
S
S
L
L
M
S
S
M
M
S
Vehicle Permission
Chg/Hr Exp. Date
$ 100.00 12/31/2004
$ 120.00 12/31/2004
$ 80.00
1/30/2005
$ 80.00
4/30/2005
$ 120.00
1/15/2005
$ 120.00
4/1/2005
$ 100.00
9/1/2005
$ 80.00
9/1/2005
$ 80.00 12/31/2004
$ 100.00
7/31/2004
$ 100.00
7/31/2004
$ 80.00
10/1/2004
Identify the:
Identify the:
 Functional Dependencies (WRT the PK)
 Primary Key?
 Partial Functional Dependencies (WRT part of the PK)
 Prime Attributes?
 Non-Prime Attributes  Transitive Functional Dependencies (WRT some non-prime attribute)
The Dependency Diagram
The Dependency Diagram is a Very
Useful Tool. It depicts the dependencies
which exist among the attributes.
Primary Key
Partial Dependency
Driver Driver Driver Driver
Vehicle Vehicle Vehicle Vehicle Vehicle Permission
ID #
Name Chg/Hr Territories Lic Plate Make
Model Size
Chg/Hr Exp. Date
Partial Dependency
Transitive Dependency
Not FD (Multi-Valued)
Normal Forms
 A Normal Form represents the current “state”
of the data model.
 There are 4 basic normal forms:
Zero Normal Form (0NF)
 Non-key attributes exist which are not FD on PK.
First Normal Form (1NF)
 All non-key attributes FD on entire PK.
Second Normal Form (2NF)
 In 1NF and
 No partial functional dependencies exist.
Third Normal Form (3NF)
 In 2NF and
 No transitive functional dependencies exist.
First Normal Form (1NF)
 Definition:
 All non-key attributes must be FD on the entire PK.
(There must be PKFD for all attributes.)
 Rule:
 Move each non-key FD column into its own new table.
 How to Apply the Rule:
For each non-key FD column:
1. Place non-FD column into a new table
2. Copy the PK (or part of it) from the original table into the
new table. This will be a FK in the new table.
3. Assign a PK to the new table (typically a composite key
of the original Non-FD column and the FK.)
1NF: Example 1/2
My Friends & Their Hobbies
FID
101
102
103
104
105
Email
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Name
Seymour Ofu
Isabelle Gunnering
Pete Moss
Erin Dutyres
Chuck Itupp
Hobby1
Hobby2 Hobby3 Hobby4
Basketball Golf
Skiing Hiking
Golf
Skiing
Skiing
Basketball
Basketball Golf
 What’s wrong with this data model?
What should be PK be? Why?
Is there an attribute not FD on the PK?
Is it in 1NF already?
What if Erin takes up bass fishing?
I’m planning a ski trip, whom should I contact?
 (How do I know Hobby3, skiing and not Hobby1)?
1NF: Example 2/2
Friends
FID
101
102
103
104
105
Email
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Name
Seymour Ofu
Isabelle Gunnering
Pete Moss
Erin Dutyres
Chuck Itupp
 What was done:
Hobbies
FID
101
101
101
101
102
102
103
104
105
105
Hobby
Basketball
Golf
Skiing
Hiking
Golf
Skiing
Skiing
Basketball
Basketball
Golf
 Hobbies table created. Contains the originally non FD
column, “hobby”
 The PK (FID) was copied into the hobbies table.
 The PK of the Hobbies table is the combination of FID and
Hobby.
 Questions:
 Is this in 1NF?
 Can you reproduce the previous data model from this one?
 Who likes skiing? Basketball?
Second Normal Form (2NF)
 Definition:
 The data model must be in 1NF AND
 No partial functional dependencies can exist.
 Rule:
 Move each partially FD non-key column into its own new
table.
 How to Apply the Rule:
For each partial dependency:
1. Move all partially FD columns into a new table
2. Copy the determinant into the new table.
3. Make the determinant of the partial dependency:
1. The PK for the new table, FK to the existing table.
2NF: Example 1/2
Software Installed on IST Servers
Serial Num
VNK334
VNK334
VNK334
ASD44P
ASD44P
81HLV3
81HLV3
Server Name
www.ist
www.ist
www.ist
iststudents
iststudents
istwebct
istwebct
Make
HP
HP
HP
Dell
Dell
Dell
Dell
Model
SWID SW Title
Date Installed
Netserver LH4
101 Windows 2000 Server
8/1/2003
Netserver LH4
201 MS SQL Server 2000
8/1/2003
Netserver LH4
302 Segate BackupExec 7
10/1/2003
Poweredge 2550
101 Windows 2000 Server
4/15/2002
Poweredge 2550
202 MS SQL Server 2000
4/17/2003
Poweredge 4400
111 Red Hat Linux 7.3
12/4/2003
Poweredge 4400
301 Webct 4.1
12/12/2003
 What’s wrong with this data model?
What should be PK be? Why?
Do any partial dependencies exist?
 Where?
 What is the determinant for each, if any?
Is it in 1NF already? 2NF?
I made a mistake, 81HLV3 is a Power edge 5500, not a
4400?
2NF: Example 2/2
Servers
Serial Num
VNK334
ASD44P
81HLV3
Server Name
www.ist
iststudents
istwebct
Make
HP
Dell
Dell
Model
Netserver LH4
Poweredge 2550
Poweredge 4400
SWInstallation
Serial Num SWID Date Installed
VNK334
101
8/1/2003
VNK334
201
8/1/2003
VNK334
302
10/1/2003
ASD44P
101
4/15/2002
ASD44P
202
4/17/2003
81HLV3
111
12/4/2003
81HLV3
301
12/12/2003
Software
SWID
101
201
302
111
301
SW Title
Windows 2000 Server
MS SQL Server 2000
Segate BackupExec 7
Red Hat Linux 7.3
Webct 4.1
 What was done:
 Serial Num + SWID is the primary key.
 Servers, Software tables created from partial dependencies,
where Serial Num,SWID are the determinants.
 Serial Num, is the PK for Servers, SWID is the PK for
Software, each are also FK’s for the SWInstallation table
 Questions:
 Is this in 2NF?
 Can you reproduce the previous data model from this one?
Third Normal Form (3NF)
 Definition:
 The data model must be in 2NF AND
 No transitive functional dependencies can exist.
 Rule:
 Move each transitive FD non-key column into its own
new table.
 How to Apply the Rule:
For each transitive dependency:
1. Move all transitive FD columns into a new table.
2. Copy the determinant column into the new table.
3. Make the determinant of the transitive dependency:
 The the PK for the new table.
 The FK for the original table.
3NF: Example 1/2
Fudgebar - Beer Inventory /Distribution Report
Beer ID
101
105
102
106
107
108
Beer Name Keg Qty Distrib ID
Mikealobe
3
501
Dudweiser
0
501
Meisterchau
2
601
Mil's Beast
4
601
Koors
3
701
Koors Lite
1
701
Distributor
Anhoser-Busch
Anhoser-Busch
Millor
Millor
Koors
Koors
Dist. Phone
555-4901
555-4901
555-6691
555-6691
555-4904
555-4904
 What’s wrong with this data model?
What should be PK be? Why?
Do any transitive dependencies exist?
 Where?
 What is the determinant for each, if any?
Is it in 1NF already? 2NF? 3NF?
I made a mistake, Koors phone number is 4905?
 What’s wrong?
3NF: Example 2/2
Distributors
Beers
Beer ID
101
105
102
106
107
108
Beer Name Keg Qty Distrib ID
Mikealobe
3
501
Dudweiser
0
501
Meisterchau
2
601
Mil's Beast
4
601
Koors
3
701
Koors Lite
1
701
Distrib ID
Distributor Dist. Phone
501 Anhoser-Busch
555-4901
601 Millor
555-6691
701 Koors
555-4904
 What was done:
 Beer ID is the PK.
 All transitive dependencies moved into a new table,
Distributors.
 Distrib ID is the determinant. PK of Distributors table, FK in
original Beer table.
 Questions:
 Is this in 3NF?
 Can you reproduce the previous data model from this one?
Higher Normal Forms
Yes, there IS more…
… and it will blow your mind. 
Boyce-Codd Normal Form (BCNF)
Rule: Eliminate key-transitive dependencies
A table in BCNF Means:
The table is in 3NF
It includes no Non-Key attribute which
determines a key attribute, or part of a key
attribute.
BCNF: An Example
Fourth Normal Form (4NF)
RULE: Eliminate multiple sets of multivalued dependencies.
A table in 4NF Means:
The table is in 3NF
It includes no sets of attributes which contain
multi-valued dependencies.
4NF: An Example
Figure 4.15
Set of Tables in 4NF
Figure 4.14
Multivalued Dependencies
How “far” should one Normalize?
 For relational databases:
 1NF is required, at minimum for practical RDBMS
implementations.
 The majority of the time data models are normalized to 3NF.
 Sometimes certain tables are left in 1NF or 2NF, for performance
or practical reasons.
 Higher normal forms BCNF, 4NF are rare.
 In General, the Higher the NF of your DM:
 The more complicated the internal DM
 The more “programming” required to reproduce the external DM.
 But, the lesser the chance for data anomalies!!
 It’s a total trade-off:
 Database complexity vs. data anomalies.
Mike’s “Road To 3NF”
To normalize correctly, follow this process
for each table in the data model:
Designate a
candidate key
Any partial
dependencies?
PKFD for all
attributes?
n
Apply
1NF Rule
y
1NF
y
Apply
2NF Rule
n
Party Hard !
2NF
Any transitive
dependencies?
y
Apply
3NF Rule
n
3NF
Normalization Summary Cheat Sheet
0NF
O
1NF
O
2NF
O
1NF
O
(Resolve non FD)
N
2NF (Resolve Partial FD)
N1
O
N2
3NF (Resolve Transitive FD)
O
N
Data Normalization
Questions?