Normalization

Download Report

Transcript Normalization

Normalization
(Codd, 1972)
Practical Information
For Real World Database Design
Requirements for Relational DB
• Table format
• Supports Boolean Algebra
– Selects, joins, projects + 5 other operations to define
queries
• Supports mathematical, relational, and logical
operators (And-Or-Not)
• Codd’s Twelve Rules (abstracted)
–
–
–
–
–
Null values can be present except in primary key
All data represented in tables
Must be able to update views
Access data with table name, field name, or value
Data and programs should be independent
– Should enforce integrity and validity constraints
Normalization Defined
• Normalization
– Purpose is to avoid potential update problems called
anomalies
– Assigned attributes to entities using 1NF, 2NF, 3NF,
4NF, 5NF
• Denormalization
– Moving back a level to gain better performance in the
real-world database; in practice 3NF is most common,
however, to gain efficiency and speed, minor changes
may need to be made
Why Normalization Is Important
• If not done, updates are less efficient (larger
tables, possibly more than one update per
data item change)
• If not done, indexing is more cumbersome –
impractical to build large databases
• If not done, no simple strategies for creating
views required by users
Design Rules
• Determine Business Rules
– A company manages many different projects
– Each project requires the services of many employees
– Employees may be assigned to work on more than one
project
– Each employee has a job classification
– Many employees have the same job classification
• Translate business rules to validity constraints and
relationships
Design Rules, Continued
• Analyze documents, interview key users,
etc. to develop a field list
• Determine entities to be used (see next slide
for definition)
• Determine relationships between entities
• Assign the attributes to the entities
• Identify primary and foreign keys
• Check for 1NF, 2NF, and 3 NF
Definitions
• Entity – the subject to be modeled by the database
file (table or relation)
• Primary Key – the field value that uniquely
identifies the entity entry (row, tuple, record); all
other attributes are functionally dependent on it;
can’t be null
• Foreign Key – the field (attribute, column) that
relates the table to a pre-existing table
• Functional Dependence - determines or depends
on, e.g. advisor name depends on advisor ID
Definitions, Continued
• Views
– Selected group of records (select)
– Selected group of fields (project)
– Selected group of records and fields from two
or more tables (join)
– A query
– A report
– A set of labels
Definitions, Continued
• Determinant
– Determines value of another attribute; e.g. primary key
• Indexes
– Tables that contain record numbers only arranged in an
order based on some field value
• Entity Integrity
– Every table must have a field to uniquely identity each
record and there must be a field value for every record
• Referential Integrity
– If a record has a value in a foreign key field, it must
match an exiting value in the original table to which it
is linked
1 Normal Form
• Table does not contain repeating groups
– To put it another way, each record has at least one field
that differentiates it from every other record in the file;
e.g. a unique primary key
Examples:
Faculty ID is primary key and the same faculty id is associated
with two or more courses
Solve by creating a course file
Faculty ID is primary key and the same faculty id is associated
with two or more offices
Solve by redesigning database to include offices as a
separate table
2 Normal Form
• Table must be in 1 Normal Form
• No non-key attribute is dependent on only
part of the concatenated key
– Concatenated key (two or more fields taken
together represent primary key)
• In course table, concatenated key is faculty ID and
Catalog No – every field in table must be dependent
on both faculty ID and catalog number
Anomalies Avoided By 2 NF
• Only have one data item to change when update is
made
• Avoids “loose” data when deletes are made
– When a part number is deleted, could lose reference to
invoice
– How do you add a new course when there is no
associated faculty ID?
– A new office with no assigned faculty?
• Avoids inconsistent data
3 Normal Form
• The only determinants are candidate keys
– Candidate keys in student file are social
security number and patron ID (both are
unique)
• To put it another way, there are no transitive
dependencies
– If student file contains Dept ID (foreign key)
and department name, this is a transitive
dependency
4 and 5 Normal Form
• 4 Normal Form
– There are no multivalued dependencies – is like
Boyce Codd
• 5 Normal Form
– Holds only theoretical interest