Transcript Slide 1

1
NORMALIZATION
Part 1: The Concept
Objectives
2



How to undertake the normalization process.
How normalization uses functional dependencies to group
attributes into relations that are in a known normal form.
How to identify the most commonly used normal forms:





First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
The problems associated with relations that break the rules of
1NF, 2NF, or 3NF.
How to represent attributes shown on a form as 3NF relations
using normalization.
Introduction
3



2 strategies in creating data model:
We have adopted a top-down approach to database design
that begins by identifying the entities and relationship.
Normalization is a bottom-up approach to database design
that begins by examining the relationships between
attributes.
Database Design Strategies
4

Two classical approaches/strategies to dB design:
Top-down
Bottom-up
Conceptual
Model
{ Normalization }
Entity
Attribute
Entity
Attribute
Attribute
Attribute
Table & Relational Schema
5
staff
branch
staffNO
sName
position
S21
Johan
Manager
S37
Ana
S14
salary
branchNO
branchNO
bAddress
3000
B005
B005
123, Kepong
Assistant
1200
B003
B007
456, Nilai
Daud
Supervisor
1800
B003
B003
789, PTP
S9
Mary
Assistant
900
B007
S5
Siti
Manager
2400
B003
S41
Jani
Assistant
900
B005
branch (branchNO, bAddress)
staff (staffNO, sName, position, salary, *branchNO)
Normalization
6


When we design a database, the main objective is to create
an accurate representation of data, relationship between the
data, and constraints on the data that is relevant.
To achieve this objective, we have to identify suitable set of
relations (table) by creating good table structure/process of
assigning attributes to entities. The process is known as
Normalization.
 Process for evaluating and correcting table structures to
minimize/control data redundancies {reduces data
anomalies}.
 Works through a series of stages called normal forms.
Normalization
7




The most commonly used normal forms:
 First Normal Form
(1NF)
 Second Normal Form
(2NF)
 Third Normal Form
(3NF)
1NF < 2NF < 3NF
Highest normalization is not always desirable

More JOINS are required

Affect data retrieval performance/high response time
For most business database design purposes, 3NF is as high as
we need to go in normalization process
Normalization
8


Every normal form is based on functional dependencies between
attributes in a relationship.
Each relationship can be normalized into a specific form to
avoid anomalies.
 Anomalies?
 Anomaly = abnormality
 Ideally a field value change, should be made only in a
single place.
 Data redundancy, promotes an abnormal condition by
forcing field value changes in many different locations.
 Insertion anomalies
 Deletion anomalies
 Modification/Update anomalies
Functional Dependencies
9


An important concept associated with normalization is
functional dependency which describes the relationship
between attributes.
In this section, you will learn about functional dependency and
then focus on the particular characteristics of functional
dependency that are useful for normalization.
Functional Dependencies
10

Functional dependency can be divided into two
types:
 Full
functional dependency/Partial dependency (PD)
 Will
be used to transform 1NF  2NF
 Transitive
 Will
dependency (TD)
be used to transform 2NF  3NF
Functional Dependencies
11

Multivalued Attributes (or repeating groups): non-key attributes
or groups of non-key attributes the values of which are not
uniquely identified by (directly or indirectly) (not functionally
dependent on) the value of the Primary Key (or its part).
1st row
2nd row
Relational Schema
STUDENT(Stud_ID, Name, (Course_ID, Units))
Functional Dependencies
12

Partial Dependency – when an non-key attribute is determined
by a part, but not the whole, of a COMPOSITE primary key
(The Primary Key must be a Composite Key).
Cust_ID → Name
Functional Dependencies
13

Transitive Dependency – when a non-key attribute determines
another non-key attribute.
Dept_ID → Dept_Name
Functional Dependencies
14

Consider a relation with attributes A and B, where attribute B is
functionally depends on attribute A. Let say an A is a PK of R.
R(A,B)
AB
A

B is functionally
depends on A
B
To describe the relationship between attributes A and B is to say
that “A functionally determines B”.
Functional Dependencies
15

When a functional dependency exist, the attribute or group of
attributes on the left-handed side of the arrow is called
determinant.
Determinant:
Refers to the attributes, or a group of attributes, on the
left handed side of the arrow of a functional dependency.
A
A functionally
determines B
B
Functional Dependencies
16
staff
staffNO sName
position
salary branchNo
S21
Johan
Manager
3000 B005
S37
Ana
Assistant
1200 B003
S14
Daud
Supervisor
1800 B003
S9
Mary
Assistant
900 B007
S5
Siti
Manager
2400 B003
S41
Jani
Assistant
900 B005
branch
Determinant
branchNO
bAddress
B005
123, Kepong
B007
456, Nilai
B003
789, PTP
Functional Dependencies
17



Consider the attributes staffNO and position of the staff
relation.
For a specific staffNO (S21), we can determine the position of
that member of staff as Manager.
staffNO functionally determines position.
position is functionally
staffNO
Staff number (S21)
depends on staffNO
position
Position (manager)
Functional Dependencies
18


However the next figure illustrate that the opposite is not true,
as position does not functionally determines staffNO.
A member of staff holds one position; however, they maybe
several members of staff with the same position.
staffNO does not functionally
position
depends on position
staffNO
staff number (S21)
Position(manager)
staff number (S5)
Functional Dependencies
Partial Dependencies:
 Full functional dependency indicates that if A and B are attributes
of a relation, B is fully functionally dependent on A, if B is
functionally dependent on A, but not on any proper subset of A.
staff(staffNO,sName,position,salary,branchNO)
staffNO, staffName  branchNO
True!!! each value of (staffNO, sName) is associated with a
single value of branchNO.
however, branchNO is also functionally dependent on
staffNO.
19
Functional Dependencies
Transitive Dependencies:
staff(staffNO,sName,position,salary,*branchNO)
branch(branchNO,bAddress)
staffNO  sName,position,salary,branchNO,bAddress
branchNO  bAddress
True for transitive dependency!!!  branchNO → bAddress
exists on staffNO via branchNO
20
Normalization Process
21


Formal technique for analyzing relations based on their Primary
Key (or candidate keys) and functional dependencies.
The technique executed as a series of steps (stage). Each step
corresponds to a specific normal form, that have specific
characteristic.
Data Redundancies
0NF/UNF
1NF
2NF
3NF

As normalization proceeds, the relations become progressively
more restricted (stronger) in format and also less vulnerable to
anomalies.
Normalization Process
22

Relationship between Normalize Form
Denormalization
Figure 1: Diagrammatic illustration of the
relationship between the normal forms
Normalization
Normalization Process
23
Data Sources
Users
Users’
requirements
specification
Forms/reports that
are used or generated
by the enterprise
Sources describing the
enterprise such as data
dictionary and corporate
data model
Transfer attributes into table format
Unnormalized Form (UNF)
Remove repeating group
First Normal Form (1NF)
Remove partial dependencies
Second Normal Form (2NF)
Remove transitive dependencies
Third Normal Form (3NF)
Normalization Process
24
Normalization Process
UNF
(1 Table)
1)Repeat Group
2)PK is not defined
1NF
Relation/Table Format
-Have repeating group
-PK not defined
1)Remove Repeat Group (1 or 2 Tables)
2)Defined PK  composite PK consist of attributes -No repeating group
-PK defined
-Test partial dependency
Test for partial dependency
If (exist)
(a b …. TD) 1
(a  ……. TD) 2
(b  ….… TD) 3
(a, b  x, y)
(a c, d)
(b  z)
(c  d)
2NF
(2 or 3 Tables)
(more then 1 table)
Test for transitive dependency
If (exist)
3NF
(3 or 4 Tables)
-No repeating group
-PK defined
-No partial dependency
-Test transitive dependency
-No repeating group
-PK defined
-No partial dependency
-No transitive dependency
Normalization Process
25





Remember this!!!
Unnormalized (UNF): There are multivalued attributes or
repeating groups, not CK (or PKs)
1NF: NO multivalued attributes or repeating groups, has CK
(or PKs)
2NF: 1NF with NO Partial Dependencies (PD)
3NF: 2NF with NO Transitive Dependencies (TD)
Unnormalized Form (UNF)
26
Unnormalized Form
UNF/0NF

= A table that contains one or more repeating
groups
To create an unnormalized table
 Transform the data from the information source (e.g. form)
into table format with columns and rows.
First Normal Form (1NF)
27
First Normal Form
1NF

= A relation in which the intersection of each row
and column contains one and only one value.
A relation is in 1NF if every attribute for every tuple have a
value and domain for each attribute can not be simplified
anymore.
Transforming UNF to 1NF
28
STUDENT
Stud_ID
1 101
2 125
Name
Course_ID
Units
Lennon
MSI 250
3.00
MSI 415
3.00
Johnson
MSI 331
3.00
Name
Course_ID
Units
Lennon
MSI 250
3.00
Lennon
MSI 415
3.00
Johnson
MSI 331
3.00
STUDENT
Stud_ID
1 101
2 101
3 125
UNF
To-do list:
1. Remove repeating
groups
2. Identify
Composite Key
1NF
Example 1: Determine UNF
29
Example 1: Determine UNF
30
Example 1: Determine UNF
31
Example 1: Determine UNF
32
Example 1: Determine UNF
33
Example 2: Determine UNF
34
Example 2: Determine UNF
35
Example 2: Determine UNF
36
Example 2: Determine UNF
37
Example 3: Determine UNF
38
Example 3: Determine UNF
39
UNF to 1NF
40





Nominate an attribute or group of attributes to act as the key
for the unnormalized table.
Identify the repeating group(s) in the unnormalized table which
repeats for the key attribute(s).
Step 1: Eliminate the Repeating Groups
 Eliminate nulls: each repeating group attribute contains an
appropriate data value
Step 2: Identify the Primary Key
 Must uniquely identify attribute value
 New key must be composed
Step 3: Identify All Dependencies
 Dependencies are depicted with a diagram
UNF to 1NF
41
Relational Schema
STUDENT(Stud_ID, Name, (Course_ID, Units))
UNF to 1NF
42
Key Attribute: Stud_ID, Course_ID
Repeating Group: (Course_ID, Units)
UNF to 1NF
43


Ensure a single value at the intersection of each row and column
Enter appropriate student data in each row
Relational Schema
STUDENT(Stud_ID, Name, Course_ID, Units)
UNF to 1NF
44
Key Attribute: Stud_ID, Course_ID
Repeating Group: (Course_ID, Units)
UNF to 1NF
45
UNF to 1NF
46

Dependency diagram:
 Depicts
all dependencies found within given table
structure
 Helpful in getting bird’s-eye view of all relationships
among table’s attributes
 Makes it less likely that you will overlook an important
dependency
UNF to 1NF
47

Remove the repeating group by
 Entering appropriate data into the empty columns of rows
containing the repeating data
 Fill the blanks by duplicating the non repeating data,
where required.
 This approach is commonly referred to as
”flattening table”.
 This approach will produce redundancy in a relationship,
but it can be eliminated in higher normalization process.
UNF to 1NF
48



Example: DreamHome Case Study
A collection of DreamHome leases (rent) form is shown in Figure
2. The lease on top is for a client called Rannia who is leasing
a property in Skudai, Johor, which is owned by Dollah. For this
worked example, we assume that a client rents a given
property only once and cannot rent more than one property at
any one time.
Sample data is taken from two leases for two different clients
called Rannia and Ahmad and is transformed into table
format with rows and columns, as shown in Figure 3. This is an
example of unnormalized table.
UNF to 1NF
49
Figure 2 : Collection of Dream Home leases (rent) form
Page 2
DREAMHOUSE LEASE
Client Rental Information
Date : 28/02/2007
Client Name : Ahmad
House
No
Client Number : CR56
Page 1
Alamat
Rumah
Mula
Sewa
Tamat
Sewa
DREAMHOUSE LEASE
No
ClientHarga
Rental Information
Sewa
Date : 28/02/2007
Nama
Pemilik
Pemilik
Client Name : Rannia
PG04
Subang Jaya,
Selangor. House
House
Client Number : CR76
1/7/93
House
PG16
Address
No
Address1/9/00
Pasir Gudang,
Johor.
31/8/00
750
C040
Karim Fendi
Rent
Rent
Start
Start1/9/01
Rent
Rent
Finish
Finish
850
Monthly
Monthly
Rent
Rent C093
Owner
Owner
No Kasim Selamat
Name
PG04
Skudai,
Johor
1/7/93
31/8/00
750
C040
Dolah
PG16
Ampang,
Selangor
1/9/00
1/9/01
850
C093
Abdullah
UNF to 1NF
50
Figure 3 : ClientRental UNF
clientNo
cName
houseNo
houseAdd
rentStart
rentFinish
rent
ownerNo
oName
CR76
Rannia
PG04
Skudai,
Johor.
1/7/93
31/8/00
750
C040
Dolah
PG16
Ampang,
Selangor
1/9/00
1/9/01
850
C093
Abdullah
PG04
Skudai,
Johor.
20/3/90
19/6/93
750
C040
Dolah
PG 36
Kuantan,
Pahang.
21/6/93
3/1/00
1000
C093
Abdullah
PG 16
Ampang,
Selangor.
25/1/00
30/8/00
850
C093
Abdullah
CR56
Ahmad
Key attribute: clientNo
Repeating group in the unnormalized table as the property rented details, which repeat for each client:
Repeating Group: ( houseNo, houseAdd, rentStart, rentFinish, rent, ownerNo, oName)
UNF to 1NF
51
Key attribute: clientNo
Repeating group in the unnormalized table as the
property rented details, which repeat for each client:
Repeating Group: (houseNo, houseAdd, rentStart, rentFinish, rent, ownerNo, oName)





As a consequences, there are multiple value at the intersection of certain rows and
columns.
For examples, there are two value for houseNo (PG4 and PG16) for the client
Rannia.
To transform an unnormalized table into 1NF, we ensure that there is a single value
at the intersection of each row and column. This is achieved by removing the
repeating group.
With this approach, remove the repeating group (house rented details) by
entering the appropriate client data into each row.
The resulting First Normalize Form (1NF) ClientRental relation is shown in Figure 4.
UNF to 1NF
52
Figure 4 : 1NF ClientRental relation
clientNo
houseNo
cName
houseAdd
rentStart
rentFinish
rent
ownerNo
oName
CR76
PG04
Rannia
Skudai,
Johor.
1/7/93
31/8/00
750
C040
Dolah
CR76
PG16
Rannia
Ampang,
Selangor
1/9/00
1/9/01
850
C093
Abdullah
CR56
PG04
Ahmad
Skudai,
Johor.
20/3/90
19/6/93
750
C040
Dolah
CR56
PG 36
Ahmad
Kuantan,
Pahang.
21/6/93
3/1/00
1000
C093
Abdullah
CR56
PG 16
Ahmad
Ampang,
Selangor.
25/1/00
30/8/00
850
C093
Abdullah
ClientRental (clientNo, houseNo, cName, houseAdd, rentStart, rentFinish, rent, ownerNo, oName)
Primary Key for the ClientRent relation is a composite key that are clientNo and houseNo
ClientRent relation is in 1NF as there is a single value at the intersection of each row and columns.
UNF to 1NF
53


Relationship in Figure 4 contains data that describes client, house for rent
and owner of the house, which is repeated for several times.
As a consequences, the ClientRental relation contains significant data
redundancy.

If implemented, the 1NF relation would be subject to the update anomalies.

To remove some of these, transform 1NF  2NF
54
Borrow(bID, bDATE,mID,mNAME,mADD,ISBN,title,rDATE)
Functional dependencies:
bID, ISBN  bDATE
(PK)
bID
 mID,mNAME,mADD,rDATE (PD)
ISBN
 title
(PD)
mID
 mNAME,mADD
(TD)
55
INSURANCE(cNO,cNAME,cADD,pNO,pTYPE,
pDATE,purchaseDATE,pPRICE,aNO,aNAME,aTEL)
Functional Dependencies:
cNO,pNO,aNO  purcahseDATE
(PK)
cNO  cNAME,cADD
(PD)
pNO  pTYPE,pDATE,pPRICE
(PD)
aNO  aNAME,aTEL
(PD)
56
RECEIPT(rNO,date,tNO,sID,sNAME,iNO,mNO,mDES,qty,price,amt,total,pay,balance)
Functional Dependencies:
rNO,mNO
 qty, iNO, amt
(PK)
rNO
 date,tNO,sID,sNAME,total,pay,balance
(PD)
mNO
 mDES,price
(PD)
sID sNAME
(TD)
Second Normal Form (2NF)
57
Second Normal Form = A relation that is in 1NF and every non-PK
2NF
attribute is fully functionally depends on the PK.

Based on the concept of partial dependency.



Dependencies based on only a part of composite primary key
2NF applies to relations with composite keys, that is, relations
with PK composed of two or more attributes.
A relation with a single-attribute PK is automatically in at
least 2NF.
58
CONTINUE TO NEXT
PRESENTATION
TRANSFORMING UNF TO 2NF