Normalisation

Download Report

Transcript Normalisation

Normalisation
NoKats Dog Club
• A club formed since 1973.
• Keeps records of members and their dogs on index
cards.
• Cards are managed by the secretary and used for
reference e.g. to inform members of shows.
• Excellent system in the early days as there were only
100 or so members!
• Now membership has grown to well over 5000. This
is proving a real headache for the secretary.
Customer Record for the NoKats Dog Club
Record Number:
Initial:
Surname:
Title:
Sex:
Postcode:
Tel No:
Woof!
Dog Details:
Name
Sex
DofB
Breed
Origen of
Breed
Breed life
expectancy
Problems for the secretary
• Takes a long time to find an individual member’s record as there are now
5000 of them.
– Especially true if you use multiple search criteria.
• Takes even longer to find a dogs record.
– Especially if the member’s details are unknown for some reason.
• Finding all dogs born after a certain date.
• Sometimes records need to be sorted into a different order.
• When records are filled in some details are recorded over and over again.
• Details for letters have to be manually transferred from the cards (i.e. no
mail merging!!!)
• New records sometimes contain spelling mistakes or errors in details
about the dog, such as life expectancy etc.
Why would a computerised system help?
• New records can be monitored for errors using
validation rules.
• All records can be easily sorted by different fields
and multiple fields.
• Easy to search and find records.
• Reports can be generated showing search results.
• Mail merge is possible thus making letter writing a lot
easier.
The flat file db
ID
Initial
Last
Name
Title
Post
code
Tel
Dog
Name
Gen
DofB
Breed
Origin
L/E
Years
1
A
Fish
Mrs
CV35QW
02476111111
Bongo
M
21/08/99
Poodle
China
5
1
A
Fish
Mrs
CV35QW
02476111111
Hiccup
F
08/08/98
Poodle
China
5
1
A
Fish
Mrs
CV35QW
02476111111
Rizla
F
09/09/00
Poodle
China
5
1
A
Fish
Mrs
CV35QW
02476111111
Gov
F
11/01/01
Alsatian
Germany
10
2
C
Here
Mrs
CV27RF
01788222222
3
D
Lapidated
Mr
CV14RR
02476333333
Manic
M
11/01/01
Poodle
China
5
3
D
Lapidated
Mr
CV14RR
02476333333
Blip
F
02/02/01
Spaniel
France
7
4
X
Ray
Ms
CV12YY
02476444444
5
Y
Nott
Mr
CV24TT
01788555555
Ruff
M
08/08/00
Poodle
China
5
5
Y
Nott
Mr
CV24TT
01788555555
Addi
M
10/02/00
Poodle
China
5
5
Y
Nott
Mr
CV24TT
01788555555
Catnip
F
10/03/99
Poodle
China
5
5
Y
Nott
Mr
CV24TT
01788555555
Emmi
F
11/03/01
Poodle
China
5
5
Y
Nott
Mr
CV24TT
01788555555
Gov
F
11/01/01
Alsatian
Germany
10
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
Good points about flat files
• Quick and easy to set up.
• Ideal for smaller databases.
• They provide many of the searching and
sorting tools required by most users of
databases.
Problems with flat files
• Lots of repeated data makes the database larger.
• Searching becomes inefficient if using repeated data.
• Dogs don’t have their unique ID.
– Can’t add a new breed until a member introduces one to the club.
• Lots of details are held over and over again.
• If one of the members changed their name ALL their records
would need to be changed.
• If a member joined with lots of dogs their records would have
to be added over and over again.
– Adding an anomaly.
Improving the flat file design
• Having two smaller tables is no different to
having one large one…as long as we link
them.
A table for members’
details
owns
A table for dogs’
details
is owned by
• Each member can own many dogs.
• Each dog is owned by one member.
Member
ID
Initial
Last Name
Title
Post code
Tel
1
A
Fish
Mrs
CV35QW
02476111111
2
C
Here
Mrs
CV27RF
01788222222
3
D
Lapidated
Mr
CV14RR
02476333333
4
X
Ray
Ms
CV12YY
02476444444
5
Y
Nott
Mr
CV24TT
01788555555
Dog
DogID
Dog Name
Gen
DofB
Breed
Origin
L/E Years
ID
1
Bongo
M
21/08/99
Poodle
China
5
1
2
Hiccup
F
08/08/98
Poodle
China
5
1
3
Rizla
F
09/09/00
Poodle
China
5
1
4
Gov
F
11/01/01
Alsatian
Germany
10
1
5
Manic
M
11/01/01
Poodle
China
5
3
6
Blip
F
02/02/01
Spaniel
France
7
5
7
Ruff
M
08/08/00
Poodle
China
5
5
8
Addi
M
10/02/00
Poodle
China
5
5
9
Catnip
F
10/03/99
Poodle
China
5
5
10
Emmi
F
11/03/01
Poodle
China
5
5
11
Gov
F
11/01/01
Alsatian
Germany
10
5
How has it improved?
• Each member’s details are now only stored once.
• Each dog is now identified by their own unique
identity number.
• A new member now has their details entered only
once into the database.
• Changes to their record only need to happen once.
• Records that need to be deleted only need to be
deleted once.
Can we improve further?
• At present, we still need to record dog details
repeatedly.
• We still can’t add a breed unless a member
introduces one to the club.
• We can try splitting up the dog table into two…
Like this…
A table for members’
details
owns
is owned by
A table for dogs’
details
• Each member can own many dogs
• Each dog can be owned by only one owner
• Each dog can only be one particular breed
• Each breed can appear in many different dogs
is a particular
appears in
A table for breeds’
details
What we have now
ID
Initial
Last Name
Title
Post code
Tel
1
A
Fish
Mrs
CV35QW
02476111111
2
C
Here
Mrs
CV27RF
01788222222
3
D
Lapidated
Mr
CV14RR
02476333333
4
X
Ray
Ms
CV12YY
02476444444
5
Y
Nott
Mr
CV24TT
01788555555
BreedID
Breed
Origin
L/E Years
DogID
Dog Name
Gen
DofB
ID
BreedID
1
Poodle
China
5
1
Bongo
M
21/08/99
1
1
2
Alsatian
Germany
10
2
Hiccup
F
08/08/98
1
1
3
Spaniel
France
7
3
Rizla
F
09/09/00
1
1
4
Gov
F
11/01/01
1
2
5
Manic
M
11/01/01
3
1
6
Blip
F
02/02/01
3
3
7
Ruff
M
08/08/00
5
1
8
Addi
M
10/02/00
5
1
9
Catnip
F
10/03/99
5
1
10
Emmi
F
11/03/01
5
1
11
Gov
F
11/01/01
5
2
Even better improvements
• Each member’s details are now only stored once.
• Each dog is now identified by their own unique identity number.
• A new member now has their details entered only once into the database.
• Changes to their record only need to happen once.
• Records that need to be deleted only need to be deleted once.
And now…
• We can store information about each breed only once.
• We can add a new breed to the database without needing a member to
introduce a new breed to the club.
Question
• Define a flat file.
A flat file is…
A flat file is a file where all the data in the
database is stored Click
in only
one
table.
The
rows
to reveal
in the table correspond to records whilst the
columns correspond to fields.
ER Diagrams
• Entity Relationship Diagrams
• They give a quick idea of how relationships
are formed within a database system.
• They can show:
– One-to-One
– One-to-Many
– Many-to-Many
One-to-One
Fish_Info
Product
CD_Info
• A shop may sell CD’s and Fish.
• The product table will store details of the items they
sell.
• The _Info tables give information about the product.
• A shop can sell one type of COD and that particular
type of COD can only have one description and
price.
One-to-Many
Owner
owns
Pet
is owned by
• A owner can have many dogs
• A dog can be owned by only one owner
Many-to-Many
Student
studies
A-Level
is studied by
• A student can study many A-Levels
• An A-Level subject can have many students
studying it.
Fixing a Many-to-Many
• This:
Primary Key = X
Primary Key = Y
Entity A
Entity B
• Converts to this:
Primary Key = X
Entity A
XY
New Table
Primary Key = Y
Entity B
• By adding a linking table which uses both original
primary keys as its primary key.
– Known as a compound primary key.
Our previous example of Many-to-Many
Pupil
Pupil_A-Level
A-Level
Pupil_ID
Name
Pupil_ID
A-Level_ID
A-Level_ID
Code
1
Smith
1
1
1
Maths
2
Jones
1
4
2
ICT
3
Ali
1
5
3
Eng
4
Potts
2
2
4
Graphics
5
Kanu
2
3
5
Bus_Stud
6
Stay
2
4
6
History
…
…
…
…
…
…
• Many-to-Many Problem Resolved!
• Each record in the Pupil_A-Level table relates to only one ALevel and one pupil. There are no duplicating records!
Normalisation
• There are 3 stages of normalisation that you need to
know about:
– First Normal Form (1NF)
– Second Normal Form (2NF)
– Third Normal Form (3NF)
• Minimise redundant data
• Minimise the chance of making data inconsistent.
– Think about flat files and how you have to change all the
records if a member’s name changes…you may miss one
in the flat file…but not in a relational database!!
The Analysis Table
• Create a table with 5 columns.
UNF
1NF
2NF
3NF
Name
Here you write
down all the
attributes which are
contained in the flat
file database.
Write down the
attributes as they
would appear in
first normal form.
Write down the
attributes as they
would appear in
second normal
form.
Write down the
attributes as they
would appear in
third normal form.
Here you put the
names of the
tables that will be
in your 3NF
relational
database.
• This table will be used to help you work
through the normalisation process.
Example – DVD NOW!
Member’s Record
ID number:
241
Surname:
Smith
Address 1:
23 Jones Rd.
Address 2:
Walforth
Phone:
123456
Joined:
10/01/99
DVD_ID
DVDName
DateDueBack
Cert
CertDescription
323
Roboteacher
11/08/04
18
18 and over only
6512
Harrytron
12/09/04
PG
Parental Guidance
441
The Brainbox
30/09/04
PG
Parental Guidance
UNF
• Un-normal form
• First stage…write down all the attributes that
you need to store in your database in the first
column of your analysis table.
• Underline the possible primary keys
• Put the repeating data together within
brackets.
UNF
UNF
1NF
2NF
3NF
Name
MemID
Surname
Address1
Address2
Phone
Join
(DVD_ID
DVDName
Due
Cert
CertDescription)
• Now we know what attributes we need to
store…repeating attributes are in (brackets).
• Time to move onto first normal form!
Converting to 1NF
• A table is in 1NF if it contains:
– No repeating groups.
• You need to copy the primary key of the nonrepeating group into the table of the repeating
group.
– You will need to underline this foreign key as it will
form part of a compound primary key.
• Copy the attributes into their own tables under
the 1NF column.
1NF
UNF
1NF
MemID
Surname
Address1
Address2
Phone
Join
MemID
Surname
Address1
Address2
Phone
Join
(DVD_ID
DVDName
Due
Cert
CertDescription)
MemID
DVD_ID
DVDName
Due
Cert
CertDescription
2NF
3NF
Name
• You now have two groups of attributes. Not just the one (like in UNF).
• Basically, you have created two tables that are linked by the MemID.
Converting to 2NF
• A database is in 2NF if it is in 1NF and the non-key attributes
depend ENTIRELY on the primary key.
• If any table in 1NF has only one primary key then it is already
in 2NF and can be copied to the 2NF column as it is.
• You will need to identify those attributes that are not related to
the compound key.
– E.g. If I wanted to find out the title of a DVD and I had the DVD_ID I
would be able to find the DVD title. But If I gave you the Mem_ID I
wouldn’t be able to find out the DVD title.
– Again, giving the DVD_ID I would be able to find the certificate rating
for it.
• The only attribute in our example that requires some thought
is ‘Due’. A DVD can only be due if a member has it, which
means we need both Mem_ID and DVD_ID to find this out.
Converting to 2NF
• What do you do now?
• Copy the non-key attributes that are dependant on
DVD and put them in their own table.
• Copy part of the primary key that those non-key
attributes depend upon from 1NF into the new table
you just created in 2NF.
– E.g. Copy DVD_ID attribute across. Underline this to show
that it is the primary key of the new group.
• Finally, copy across any attributes left over from the
old group in 1NF into 2NF, into their own table.
2NF
UNF
1NF
2NF
MemID
Surname
Address1
Address2
Phone
Join
MemID
Surname
Address1
Address2
Phone
Join
MemID
Surname
Address1
Address2
Phone
Join
(DVD_ID
DVDName
Due
Cert
CertDescription)
MemID
DVD_ID
DVDName
Due
Cert
CertDescription
MemID
DVD_ID
Due
3NF
Name
You have fixed the manyto-many problem!
DVD_ID
DVDName
Cert
CertDescription
• You now have two groups of attributes. Not just the one (like in UNF).
• Basically, you have created two tables that are linked by the MemID.
Converting to 3NF
• Needs to be in 2NF first.
• Time to remove non-key dependencies
– Attributes which are dependent on another
attribute that isn’t a primary key.
• For example:
– The certificate description is dependent on the
certificate ID…but the certificate ID is not a
primary key…it is an attribute of the DVD.
3NF
UNF
1NF
2NF
3NF
Name
MemID
Surname
Address1
Address2
Phone
Join
MemID
Surname
Address1
Address2
Phone
Join
MemID
Surname
Address1
Address2
Phone
Join
MemID
Surname
Address1
Address2
Phone
Join
MEMBER
(DVD_ID
DVDName
Due
Cert
CertDescription)
MemID
DVD_ID
DVDName
Due
Cert
CertDescription
MemID
DVD_ID
Due
MemID
DVD_ID
Due
MEMBER_DVD
DVD_ID
DVDName
Cert
CertDescription
DVD_ID
DVDName
CertID*
DVD
CertID
CertDescription
CERTIFICATE
• This is now in 3NF because by definition it is in 2NF and doesn’t contain
any non-key dependencies.
What have we ended up with?
Member
Member_DVD
•
•
Member can rent many DVDs.
DVD can only be rented by a member at any one time.
•
•
DVD can be rented many times.
A rental can only be for a particular DVD.
•
•
Certificate can be on many DVDs.
DVD can only have one certificate.
DVD
Certificate
Rules to remember
• 1NF
– No repeating attributes and primary key of non repeating
data must be placed in table of repeating data.
• 2NF
– Look for non-key attributes that depend upon all the
attributes in the compound primary keys i.e. does the
attribute depend on both primary keys?
• 3NF
– Look for non-key attributes that depend upon other nonkey attributes.
Try for yourself…
• Normalise the following examples…
• The answers will be posted later in the VLE.
1. Bank Account Record
Customer ID:
Surname:
Postcode:
Telephone:
43543
Jones
BA3 6TH
654321
Account Number
Account Type
Description
Balance
22442546
Gold
Savings Account
£10.00
44342322
Platinum
Current Account
£100.00
77164645
Silver
Internet Account
£500.00
Normalise this data using the Analysis Table.
Some of it has been done for you on the next slide.
1. Bank Account Record
UNF
1NF
2NF
3NF
Name
Cust-ID
Surname
Postcode
Telephone
Cust-ID
Surname
Postcode
Telephone
Cust-ID
Surname
Postcode
Telephone
Cust-ID
Surname
Postcode
Telephone
CUSTOMER
(AccountNumber
AccountType
Description
Balance)
Cust-ID
AccountNumber
AccountType
Description
Balance
CUST_ACCOUNT
AcountType
Description
ACCOUNT_DESC
2. Student Record
Student ID:
Surname:
Initial:
DofB:
1232223
Patel
A
09/06/86
CourseNo CourseName
TeacherID TeacherSurname
TeacherInitial
1
Maths
23
Harris
P
2
ICT
18
Dolinski
A
3
Drama
23
Morse
B
4
Business
Studies
11
Woodhall
S
Normalise this data using the Analysis Table.
Some of it has been done for you on the next slide.
2. Student Record
UNF
1NF
2NF
3NF
Name
STUDENT
STUDENT_COURSE
CourseNumber
CourseName
TeacherID*
COURSE
TEACHER
3. Camera Parts
•
A database is to be kept that records all the parts that make up a particular model
of digital camera.
•
Following an analysis of the problem, it is found that these details are to be kept
about each individual camera:
– The camera model (which is unique for every camera)
– The date the camera was released for sale.
•
For each camera, the following details about every part needed to build it will be
kept:
–
–
–
–
A part number.
A part name.
The ID of the manufacturer of the part.
The name of the manufacturer of the part.
•
Your job is to produce a normalised design for the database.
•
HINT: If you are unsure of how to start why not try sketching out a typical record
card for one camera and then fill it in with some made-up data. It will help you
visualise the problem.
3. Camera Parts
UNF
1NF
2NF
3NF
Name