Module 4: Part 1

download report

Transcript Module 4: Part 1

Database Management
System
Module 4: Schema Refinement
Pitfalls in relational database
design

Relational database design requires that
we find a “good” collection of relation
schemas. A bad design may lead to

Repetition of Information.

Inability to represent certain information.
Repetition of Information
(Redundancy)

Storing the same information
redundantly, that is in more than one
place within a database, can lead to
several problems:


Redundant storage: Some information
stored repeatedly.
Update anomalies: If one copy of such
repeated data is updated inconsistency is
created unless all copies are similarly
updated.
Redundancy Problems


Insertion anomalies: It may not be
possible to store some informstion
unless some other infromation is stored
as well.
Deletion anomalies: it may not be
possible to delete some information
without losing some other information
as well.
Example of Redundancy
Problem


Consider a relation instance of Hourly_Emps
below
The key for Hourly_Emps is ssn. In addition,
suppose the hourly_wages attribute is
determined by the rating attribute.
Example of Redundancy
Problem



That is for given rating value, there is
oly one permissible hourly_wages value.
This IC is an example of a functional
dependency.
This lead to redundancy in the relation
Hourly_Emps as illustrated above.
Consequence

Some information is stored multiple
times. E.g. the rating value 8 with
hourly_wage 10 repeated three times.


Wasting space
Inconsistency; a hourly_wages in the first
tuple could be updated without making a
similar change in the second tuple (Update
Anomaly)
Example 2

Consider the relation schema:
Lending-schema = (branch-name, branch-city,
assets, customer-name, loan-number, amount)

.
Example 2

Redundancy:




Data for branch-name, branch-city, assets are
repeated for each loan that a branch makes
Wastes space
Complicates updating, introducing possibility of
inconsistency of assets value
Null values


Cannot store information about a branch if no
loans exist
Can use null values, but they are difficult to
handle
Problem Solution

Design Goals:

Avoid redundant data



Can be easily identified with Functional
Dependency
Ensure that relationships among attributes
are represented
Facilitate the checking of updates for
violation of database integrity constraints.
Problem Solution

There are various ways of solving
redundancy problem in your
design:
Use Decomposition
 Normalisation with the aid of
Functional Dependencies

Use Decomposition



Problems arising from redundancy can
be addressed by replacing a relation
with collection of smaller relations.
Each of the smaller relations contains a
subset of the attributes of the original
relation.
This process is called Decomposition of
the larger relation to smaller relation.
Decomposition
Example we can deal with the
redundancy in Hourly_Emps by
decomposing it into two relations.
Hourly_Emps2 (ssn, name, lot, rating,

hours,_worked)
Wages (rating, hourly_wages)
Decomposition

The Instances Hourly_Emps2 and
Wages are shown below:
Decomposition

Decompose the relation schema Lendingschema into:
Branch-schema = (branch-name, branchcity,assets)
Loan-info-schema = (customer-name, loannumber, branch-name, amount)
 All attributes of an original schema (R) must
appear in the decomposition (R1, R2):
R = R 1  R2
Lossless-Join Decomposition

Let R be a relation schema and let F be a set
of FDs over R. A decomposition of R into two
schemas with attribute set X and Y is said to
be lossless-join decomposition with
respect to F if for every instance r of R that
satisfies the dependencies in F,
x(r )  Y (r )  r
Example


Below are instances illustrating Lossy
Decomposition
All decomposition used to eliminate
redundancy must be lossless which it is not in
our case above.
Functional Dependence:
Revisited

We use functional dependencies to:

test relations to see if they are legal under
a given set of functional dependencies.

If a relation r is legal under a set F of
functional dependencies, we say that r satisfies
F.

specify constraints on the set of legal
relations

We say that F holds on R if all legal relations on
R satisfy the set of functional dependencies F.
Functional Dependence:
Revisited

Note: A specific instance of a relation
schema may satisfy a functional
dependency even if the functional
dependency does not hold on all legal
instances.

For example, a specific instance of Loanschema may, by chance, satisfy
loan-number  customer-name.
Closure of a Set of Functional
Dependencies

Given a set F set of functional
dependencies, there are certain other
functional dependencies that are
logically implied by F.



E.g. If A  B and B  C, then we can
infer that A  C
The set of all functional dependencies
logically implied by F is the closure of F.
We denote the closure of F by F+.
Closure of a Set of Functional
Dependencies

We can find all of F+ by applying Armstrong’s
Axioms:




if   , then   
(reflexivity)
if   , then     
(augmentation)
if   , and   , then    (transitivity)
These rules are


sound (generate only functional dependencies that
actually hold) and
complete (generate all functional dependencies
that hold).
Goal — Devise a Theory for
the Following


Decide whether a particular relation R is in
“good” form.
In the case that a relation R is not in “good”
form, decompose it into a set of relations {R1,
R2, ..., Rn} such that



each relation is in good form
the decomposition is a lossless-join decomposition
Our theory is based on:

functional dependencies
Normal Forms and
Normalisation



In a given relation schema, we need to
decide whether it is a good design or
whether it requires a decomposition
into smaller relations.
Such a decision must be guided by an
understanding of what problems from
current schema.
Normal forms are used for such
guidance.
Normal Forms

Normal forms based on FDs are:





First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
Boyce-Codd normal form (BCNF)
The process of using normal forms to
improve database design by breaking
up relations into smaller relation is
called Normalization
Normal Forms

These forms have increasingly resrictive
requirements:



Every relation in BCNF is also in 3NF
Every relation in 3NF is also in 2NF
Every relation in 2NF is also in 1NF
Unnormalised Data-Set to First
normal form (1NF)


A relation is in first normal forms
(1NF ) if every field contains only
atomic values, that is, not lists or sets.
Consider the data set below
Unnormalised Data-Set to First
normal form (1NF)


In the dataset above suppose we
choose the data item moduleName as
the key of this data-set.
There are some cells in the table such
as studentNo, studentName, assGrade
and assType contains multiple values.

All repeat with respect to moduleName
First Normal Form



The attribute studentNo, studentName,
assGrade and assType are not
functionally dependent on our chosen
primary key moduleName.
The attributes staffNo and staffName
are dependent to key.
Form two tables; one for the
functionally dependent attributes and
other for non-functionally dependent.
1NF Tables
1NF to 2NF


Remove part key dependencies
Involves examining those tables that
have a compound key and for each
non-key data item in the table asking
the question: can the data-item be
uniquely identified by part of the
compound key?
1NF to 2NF


A relation is in second normal form if
and only if it is in first normal form and
every non-key attribute is fully
functionally dependent on the primary
key.
E.g table Assessments have three-part
compound key moduleName, studentNo
and assType
1NF to 2NF




We observed that ModuleName has no
influence on the studentName.
StudentNo alone determine
studentName.
Hence we have to break out the
determinant and dependent data-items
into their own table.
This lead to decomposition of the tables
as follows:
2NF Tables
2NF to 3NF



To move from second normal form to third
normal form we remove inter-data
dependencies.
To do this examine every table and ask of
each pair of non-key data-items; is the value
of data item A dependent on the value of
data-item B, or vice versa?
If the answer is yes, split off the relevant
data-items into a separate table
2NF to 3NF



A relation is in third normal form if and only if
it is in second normal form and every-key
attribute is non-transitively dependent on the
primary key.
The only place where this is relevant is in the
table called Modules.
Here staffNo determines staffName.
staffName hence is transitively dependent on
moduleName
2NF to 3NF


StaffNo is therefore asking to be a
primary key.
Hence, we create a separate table to be
called Lecturers with staffNo as the
primary key as illustrated in the below
tables:
3NF Tables