Normalisation

Download Report

Transcript Normalisation

Normalisation
The theory of Relational Database
Design
Introduction
• Normalisation is a theory for designing
relational schema that “make sense” and
work well.
• Well-normalised tables avoid redundancy
and thereby reduce inconsistencies.
• Redundancy is unnecessary duplication.
• In well-normalised DBs semantic
dependencies are maintained by primary
key uniqueness.
Goals of Normalisation
•
•
•
•
Eliminate certain kinds of redundancy
avoid certain update anomalies
good reresentation of real world
simplify enforcement of DB integrity
Update anomalies
• Undesirable side-effects that occur when
performaing insertion, modification or
deletion operations on badly designed
relational DBs.
SSN
987
654
333
321
678
467
Name
J Smith
M Burke
A Dolan
K Doyle
O O’Neill
R McKay
Dept
1
2
1
1
3
2
DeptMgr Dept Name
321
…
467
...
321
321
678
467
Representing
Department info
in the Employee
table causes
problems.
Sample anomalies
• Modification – when the manager of a dept changes we have to
change many values.
– If we are not careful the DB will contain
inconsistencies.
– There is no easy way to get the DB to ensure
that a department has only one manager and
only one name.
Anomalies continued
• Deletion – if O O’Neill leaves we delete his tuple and lose
• the fact that there is a department 3
• the name of dept 3
• who is the manager of dept. 3
• Insertion
– how would we create a new department before
any employees are assigned to it ?
Better design
• Separate entities are represented in separate
tables.
SSN
987
654
333
321
678
467
Name
J Smith
M Burke
A Dolan
K Doyle
O O’Neill
R McKay
Dept
1
2
1
1
3
2
Dept
1
2
3
DeptMgr Dept Name
321
…
467
...
678
Note that mapping from an ER model following the steps given
will give a well-normalised DB.
Boyce-Codd Normal Form
• After a lot of other approaches Boyce and
Codd noticed a simple rule for ensuring
tables are well-normalised. Tables which
obey the rule are in BCNF (Boyce Codd
Normal Form).
• BCNF rule:
Every determinant in a table must be a
candidate key for that table.
Determinants
• A is a determinant of B if each value of A
has precisely one (possibly null) associated
value of B.
Said another way • A is a determinant of B if and only if
whenever two tuples agree on their A value
they agree on their B value.
A
B
Determinants
• Note that determinancy depends on
semantics of data
– cannot be decided from individual table
occurences.
• Alternative terminology
– if A (functionally) determines B then
– B is (functionally) dependent on A
Example determinants
•
•
•
•
SSN determines employee name
SSN determines employee department
Dept. No. determines Dept. Name
Dept. Name determines Dept. No.
– assuming Dept. names are also unique
• Emp. Name does not determine Emp. Dept
– two John Smiths could be in difft. Depts.
• Emp. Name does not determine SSN.
Determinancy Diagram
Name
SSN
Department
Dept. Name
Dept. Mgr
In general key attributes of an entity determine all the
single-valued attributes of the entity.
Composite Determinants
• (SSN, Project#) together
determine the hours that the
employee works on the
project.
Name
SSN
hours
Project#
PName
• Suppose packsize of a part
depends on the supplier.
S#
packsize
P#
PName
Superfluous Attrbiutes
• Superfluous attributes
– If SSN determines name, so does (SSN, Dept)
and (SSN, Dept, salary), etc.
– Always remove superfluous attributes from
determinants.
Transitive Dependencies
• SSN actually determines
DeptMgr
• but only because
DeptNo
SSN
– SSN determines DeptNo and
– DeptNo determines DeptMgr.
• Be careful to remove
transitive dependencies.
– They mess up normalisation.
Dept. Mgr
Candidate keys
• candidate key = any attribute or set of
attributes which will be unique for a table
(set of attributes).
– As well as the primary key there may be other
candidate keys.
– E.g. DNUMBER and DNAME are both
candidate keys for the Department table.
• Key = row identifier
• Candidate key = candidate identifier
Finding candidate keys
• Every key is by definition a determinant of
all other attributes in a relation.
– So in a diagram, any attribute (or composite)
from which all other attributes are reachable is
a candidate key.
Name
(SSN, Project#) is a
(composite) candidate
key for a table
containing these five
attributes.
SSN
hours
Project#
PName
What are the candidate keys ?
G
B
H
J
student
teacher
E
subject
D
F
P
L
F
G
S
M
W
T
N
D
R
Q
V
K
E
U
Z
H
X
Y
A
B
C
Problems occur when ...
• Redundancy and anomalies occur when
there are determinants which are not
candidate keys.
SSN
Name
• SSN is the only key for a table
containing these attributes
DeptNo
Dept. Name
– all attributes are reachable from SSN.
• SSN, DeptNo and DeptName are
determinants
– they have arrows coming out of them.
Dept. Mgr
BCNF rule
• In well-normalised relations (Boyce-Codd
normal form)
every determinant is a candidate key.
SSN
Name
DeptNo
DeptNo
Dept. Name
Dept. Mgr
The employee/dept table decomposed to BCNF.
Note that both DeptNo and DeptName are candidate keys of
the second table.
Transformation to BCNF
• Create new tables such that each
non-key determinant is a candidate
key in a new table.
• The new table contains the
attributes which are directly
determined by the new candidate
key.
V
W
Z
X
V
B
C
Z
A
W
X
Y
A
B
C
V
W
A
V
Y
BCNF tables :
(V, X)
(A, B, C)
(V, W, Z, A)
(V, W, Y)
Other Normal Forms
• First NF - no multi-valued attributes
– all relational DBs are 1NF
• 2NF - every non-key attribute is fully
dependent on the primary key
G
H
J
• 3NF - eliminate functional
is in 2NF
dependencies between non-key Table
but not 3NF
attributes
– all dependencies can then be enforced
by uniqueness of keys.
BCNF vs. 3NF
• BCNF goes further than 3NF, some say too far.
• A 3NF table that has no overlapping composite keys is in
BCNF.
A teacher teaches only one subject.
For a given subject a given student has only one teacher.
student
teacher
student
teacher
subject
3NF, not BCNF
keys: (student, subject)
(student, teacher)
teacher is a determinant
teacher
subject
BCNF
but tables are not independent
4NF : Multi-valued dependencies
• If a course can have multiple teachers and
multiple texts, blind mapping to 1NF will
give
Subject
Physics
Physics
Physics
Physics
Maths
Maths
Maths
Teacher
Green
Brown
Green
Brown
Green
Green
Green
Text
Basic Mechanis
Basic Mechanics
Principles of Optics
Principles of Optics
Basic Mechanics
Vector Analysis
Trigonometry
which clearly
has
redundancy.
Fully-normalised
• BCNF relations are well-normalised
• Fully-normalised relations are those with no
multi-valued dependencies (4NF) and no
join dependencies (5NF).