Chapters 4-7

Download Report

Transcript Chapters 4-7

Relational Database Systems
Higher Information Systems
The Relational Model


data is grouped into entities which are
related, in order to minimise data
duplication and achieve data integrity
many-to-many relationships between
entities are removed and replaced with
one-to-many relationships
Entity-Occurrence Modelling
Entity-Occurrence Modelling



Lines indicate how
the instances of
each entity are
linked
E.g. Member 1034 has rented DVDs 002
and 015
DVD 003 has been rented by members
1012 1056
Entity-Occurrence Modelling



Each DVD can be
rented by many
Members
Each Member can
rent many DVDs
So there is a many-to-many relationship
between Member and DVD
Entity-Occurrence Modelling


This method is
only as good as
the available data
Make up “dummy”
data if necessary to
fill in the gaps
More about keys

An atomic key consists of one attribute


A compound key consists of two or
more attributes


MEMBER(Member Number, Name, Telephone Number)
MEMBER(Member Number, Name, Telephone Number)
A surrogate key is a made up attribute
designed to uniquely identify a record

Member Number is a surrogate key
Surrogate Key
RENTAL
(RegNo
Make
Model
HirerID
HirerName
DateOfHire)
RENTAL
(RegNo
Make
Model
HirerID
HirerName
DateOfHire)
RENTAL
(RentalNo
RegNo
Make
Model
HirerID
HirerName
DateOfHire)
Choosing a key





An atomic key is better than a compound key
A numeric attribute is better than a text
attribute
KISS = Keep It Short and Simple
A key must have a value—it cannot be blank
(or “null”)
A key should not change over time


Andrew Smith class 1B reg teacher C Walker
PupilCode = AS1BCW
The flat file revisited…
DVD
Code
Title
Cost
Date Out
Date Due
Member
Number
002
Finding Nemo
£2.50
03/09/04
04/09/04
1034
John Silver
142536
003
American Pie
£2.50
27/08/04
28/08/04
1056
Fred Flintstone
817263
003
American Pie
£2.50
01/09/04
02/09/04
1012
Isobel Ringer
293847
008
The Pianist
£2.50
04/09/04
06/09/04
1097
Annette Kirton
384756

What is a suitable key?



DVD Code?
Member Number?
(DVD Code, Member Number)?
Name
Telephone
Number
Update Anomalies
DVD
Code
Title
Cost
Date Out
Date Due
Member
Number
002
Finding Nemo
£2.50
03/09/04
04/09/04
1034
John Silver
142536
003
American Pie
£2.50
27/08/04
28/08/04
1056
Fred Flintstone
817263
003
American Pie
£2.50
01/09/04
02/09/04
1012
Isobel Ringer
293847
008
The Pianist
£2.50
04/09/04
06/09/04
1097
Annette Kirton
384756



Name
There is no way of storing the details of a member
who hasn’t rented any DVDs
A value must be provided for both DVD Code and
Member Number for the key
This is called an insertion anomaly
Telephone
Number
Update Anomalies
DVD
Code
Title
Cost
Date Out
Date Due
Member
Number
002
Finding Nemo
£2.50
03/09/04
04/09/04
1034
John Silver
142536
003
American Pie
£2.50
27/08/04
28/08/04
1056
Fred Flintstone
817263
003
American Pie
£2.50
01/09/04
02/09/04
1012
Isobel Ringer
293847
008
The Pianist
£2.50
04/09/04
06/09/04
1097
Annette Kirton
384756



Name
Telephone
Number
If a member’s details have to be amended, this must
be done in each record with those details
This can lead to data inconsistency if there is an error
or omission in making the change
This is called a modification anomaly
Update Anomalies
DVD
Code
Title
Cost
Date Out
Date Due
Member
Number
002
Finding Nemo
£2.50
03/09/04
04/09/04
1034
John Silver
142536
003
American Pie
£2.50
27/08/04
28/08/04
1056
Fred Flintstone
817263
003
American Pie
£2.50
01/09/04
02/09/04
1012
Isobel Ringer
293847
008
The Pianist
£2.50
04/09/04
06/09/04
1097
Annette Kirton
384756


Name
If a DVD is removed from the database, then it may
also remove the only record of a member’s details
This is called a deletion anomaly
Telephone
Number
Update Anomalies






Insertion anomalies
Modification anomalies
Deletion anomalies
These are characteristics of poorly designed
databases
The solution is to use a relational database
We use normalisation to help work out what
tables are required and which data items
should be stored in each table
Normalisation
Un-normalised Form (UNF)



Identify an entity
List all the attributes
Identify a key
ORDER (Order Number
Order Date
Customer Number
Customer Name
Address
Post Code
Telephone Number
Item Code
Description
Unit Cost
Quantity)
Un-normalised Form (UNF)

Identify repeating data items
Un-normalised Form (UNF)

Identify repeating
data items
ORDER (Order Number
Order Date
Customer Number
Customer Name
Address
Post Code
Telephone Number
Item Code
Description
Unit Cost
Quantity)
First Normal Form (1NF)


Remove repeating
data items to
form a new entity
Take the key with
you!
ORDER (Order Number
Order Date
Customer Number
Customer Name
Address
Post Code
Telephone Number
Item Code
Description
Unit Cost
Quantity)
First Normal Form (1NF)


Remove repeating
data items to
form a new entity
Take a copy of
the key with you!
ORDER (Order Number
Order Date
Customer Number
Customer Name
Address
Post Code
Telephone Number)
ORDER_ITEM (Item Code
Description
Unit Cost
Quantity
Order Number)
First Normal Form (1NF)


Identify a key for the
new entity
It will be a
compound key
ORDER (Order Number
Order Date
Customer Number
Customer Name
Address
Post Code
Telephone Number)
ORDER_ITEM (Item Code
Description
Unit Cost
Quantity
Order Number)
First Normal Form (1NF)




ORDER
Identify a key for the
new entity
It will be a
compound key
Label the foreign key
Order Number is
ORDER_ITEM
both part of the
compound primary
key and also a
foreign key.
(Order Number
Order Date
Customer Number
Customer Name
Address
Post Code
Telephone Number)
(Item Code
Description
Unit Cost
Quantity
Order Number*)
First Normal Form (1NF)

A data model is in
1NF if it has no
multi-valued
attributes
ORDER (Order Number
Order Date
Customer Number
Customer Name
Address
Post Code
Telephone Number)
ORDER_ITEM (Item Code
Description
Unit Cost
Quantity
Order Number*)
First Normal Form (1NF)
ORDER
Order
Order
Number Date
Customer
Number
Customer Name
654321
234567
1/9/04
ORDER-ITEM
Order
Item
Number Code
654321 1170
654321 5499
654321 7937
Mrs Joanna Bloggs
Address
12 High Street
Auchenshoogle
Inverness-shire
Description
Medium widgets (blue)
Large deluxe widget (red)
Small economy widget (green)
Post Telephone
Code Number
IV99
2QW
Unit
Cost
£2.25
£9.99
£0.59
01999
123456
Quantity
3
1
5
First Normal Form (1NF)

Order
Number
654321
975310
864208

But what if there were lots of orders for
large deluxe red widgets…?
Item
Code
5499
5499
5499
Description
Large deluxe widget (red)
Large deluxe widget (red)
Large deluxe widget (red)
Unit Cost
£9.99
£9.99
£9.99
There are still update anomalies
Quantity
1
2
3
Update Anomalies



Item Code, Description and Unit Cost values are
duplicated in each entry. If the price of item 5499
were to change, this would have to be updated three
times, which is both inefficient and could result in
data inconsistency. This is a modification anomaly.
it is not possible to enter details for an item which
has not yet been ordered (because an order number
is required as part of the compound key). This is an
insertion anomaly.
if an order is deleted (perhaps because it is
cancelled), this could remove the only record
containing an item’s details. This is a deletion
anomaly.
Dependencies



An attribute A is dependent on another attribute B if
there is only one possible value for A given any value
for B.
For example, in the mail order system, consider the
relationship between Order Number and Order Date.
Order Date is dependent on Order Number, because
for any given Order Number, there is only one
possible date (i.e. the date the order was placed).
However, the reverse is not true. Order Number is
not dependent on Order Date, because for any given
Order Date, there may be more than Order Number
(because many orders may have been placed on that
date).
Dependencies
Airport
National
JFK
LaGuardia
Gatwick
Heathrow
City
Washington, DC
New York
New York
London
London
City is dependant on Airport
The City can be determined if the Airport is known.
There is only one possible City for a given Airport.
Airport is not dependant on City.
Knowing the City does not determine the Airport.
A given City could have more than one Airport.
Second Normal Form (2NF)



ORDER (Order Number
Examine any entity with
Order Date
a compound key (in this
Customer Number
case ORDER_ITEM)
Customer Name
See if any attributes are
Address
Post Code
dependent on just one
Telephone Number)
part of the compound
key
ORDER_ITEM (Item Code
These are called partial
Description
Unit Cost
dependencies
Quantity
Order Number*)
Second Normal Form (2NF)





Order Number is part of
ORDER (Order Number
the key
Order Date
Item Code is part of the
Customer Number
key
Customer Name
Address
Description is
Post Code
dependent on the Item
Telephone Number)
Code
Unit Cost is dependent
ORDER_ITEM (Item Code
on the Item Code
Description
Quantity is dependent
Unit Cost
on both Order Number
Quantity
and Item Code.
Order Number*)
Second Normal Form (2NF)




Description and Unit
ORDER (Order Number
Order Date
Cost are partial
Customer Number
dependencies
Customer Name
They are dependent on
Address
Item Code
Post Code
Remove all these
Telephone Number)
attributes to a new
entity
ORDER_ITEM (Item Code
Description
Leave a copy of the
Unit Cost
attribute they are
Quantity
dependent on as the FK
Order Number*)
Second Normal Form (2NF)


Item Code becomes
the key of the new
entity
And becomes a
foreign key in
ORDER-ITEM
ORDER (Order Number
Order Date
Customer Number
Customer Name
Address
Post Code
Telephone Number)
ORDER_ITEM (Item Code*
Quantity
Order Number*)
ITEM (Item Code
Description
Unit Cost)
Second Normal Form (2NF)

A data model is in
2NF if it is in 1NF
and there are no
partial
dependencies
ORDER (Order Number
Order Date
Customer Number
Customer Name
Address
Post Code
Telephone Number)
ORDER_ITEM (Item Code*
Quantity
Order Number*)
ITEM (Item Code
Description
Unit Cost)
Second Normal Form (2NF)
ORDER-ITEM
Order
Item
Number Code Quantity
654321
5499
1
975310
5499
2
864208
5499
3



ITEM
Item
Code
1170
5499
9737
Description
Medium widget (blue)
Large deluxe widget (red)
Small economy widget (green)
Unit
Cost
£2.25
£9.99
£0.59
We can add an item to the Item table without it
having to be on an order
We can delete an order in the Order table without
deleting details of the items on the order
We can update item details once in the Item table
without affecting the orders for that item in the
Order-Item table
Second Normal Form (2NF)
654321 1/9/04
Mrs Joanna
234567
Bloggs
135790 15/9/04 234567
Mrs Joanna
Bloggs
246801 28/9/04 234567
Mrs Joanna
Bloggs
12 High Street
Auchenshoogle IV99 2QW
Inverness-shire
12 High Street
Auchenshoogle IV99 2QW
Inverness-shire
12 High Street
Auchenshoogle IV99 2QW
Inverness-shire
ORDER (Order Number
Order Date
Customer Number
Customer Name
Address
Post Code
Telephone Number)
Telephone
Number
Post Code
Address
Customer
Name
But there are still
update anomalies
with the Order entity
Customer
Number
Order Date
Order
Number

01999 123456
01999 123456
01999 123456
Anomalies



Clearly, the customer details are duplicated in each
entry. If a customer’s address or telephone number
were to change, this would have to be updated three
times, which is both inefficient and presents the
possibility of data inconsistency. This is a
modification anomaly.
It is not possible to enter details for a customer
unless they have placed an order (because Order
Number is required as a key). This is an insertion
anomaly.
If an order is deleted, this could remove the only
record containing a customer’s details. This is a
deletion anomaly.
Third Normal Form (3NF)



Examine all the entities
produced so far
See if there are any
non-key attributes
which are dependent on
any other non-key
attributes
These are called nonkey dependencies
ORDER (Order Number
Order Date
Customer Number
Customer Name
Address
Post Code
Telephone Number)
ORDER_ITEM (Item Code*
Quantity
Order Number*)
ITEM (Item Code
Description
Unit Cost)
Third Normal Form (3NF)

In the ORDER entity,
Customer Name,
Address, Post Code
and Telephone
Number are all
dependent on
Customer Number
ORDER (Order Number
Order Date
Customer Number
Customer Name
Address
Post Code
Telephone Number)
ORDER_ITEM (Item Code*
Quantity
Order Number*)
ITEM (Item Code
Description
Unit Cost)
Third Normal Form (3NF)

Remove these
attributes to a new
entity
ORDER (Order Number
Order Date
Customer Number
Customer Name
Address
Post Code
Telephone Number)
ORDER_ITEM (Item Code*
Quantity
Order Number*)
ITEM (Item Code
Description
Unit Cost)
Third Normal Form (3NF)



Remove these
attributes to a new
entity
Customer Number is
the key of the new
entity
Leave Customer
Number behind as a
foreign key
ORDER (Order Number
Order Date
*Customer Number)
CUSTOMER (Customer Number
Customer Name
Address
Post Code
Telephone Number)
ORDER_ITEM (Item Code*
Quantity
Order Number*)
ITEM (Item Code
Description
Unit Cost)
Third Normal Form (3NF)

A data model is in
3NF if it is in 2NF
and there are no
non-key
dependencies
ORDER (Order Number
Order Date
*Customer Number)
CUSTOMER (Customer Number
Customer Name
Address
Post Code
Telephone Number)
ORDER_ITEM (Item Code*
Quantity
Order Number*)
ITEM (Item Code
Description
Unit Cost)
Third Normal Form (3NF)
ORDER
Order
Number
654321
135790
246801
Order
Date
1/9/04
15/9/04
28/9/04



Customer
Number
234567
234567
234567
CUSTOMER
Customer Customer
Number
Name
Mrs
234567
Joanna
Bloggs
Address
12 High Street
Auchenshoogle
Inverness-shire
Post Tel
Code Number
IV99
2QW
01999
123456
We can add a customer to the Customer table
without the customer having to place an order
We can delete an order in the Order table without
deleting details of the customer who placed the order
We can update a customer’s details once in the
Customer table without affecting the orders placed by
that customer in the Order table
Memory Aid




In 3NF, each attribute is dependent on
the key
the whole key
and nothing but the key
Normalisation
A data model is in
 1NF if it has no multi-valued
attributes
 2NF if it is in 1NF and there are no
partial dependencies
 3NF if it is in 2NF and there are no
non-key dependencies
Entity-Relationship Diagram
ORDER (Order Number
Order Date
*Customer Number)
CUSTOMER (Customer Number
Customer Name
Address
Post Code
Telephone Number)
ORDER_ITEM (*Order Number
*Item Code
Quantity)
ITEM (Item Code
Description
Unit Cost)
CUSTOMER
ITEM
ORDER
ORDER_ITEM
Entity-Relationship Diagram
ORDER
CUSTOMER
ORDER_ITEM
ITEM

(Order Number
Order Date
*Customer Number)
(Customer Number
Customer Name
Address
Post Code
Telephone Number)
CUSTOMER
ORDER
ITEM
(*Order Number
*Item Code
Quantity)
(Item Code
Description
Unit Cost)
ORDER_ITEM
The foreign key is always at the “many”
end of the relationship
Source documents
Source documents

List all the
attributes which
must be stored
in the database
DVD_RENTAL (Member Number
Title
Forename
Surname
Telephone No
DVD Code
Title
Cost
Date Hired
Date Due
Member Number
Name)
Source documents


DVD_RENTAL (Member Number
List all the
Title
attributes which
Forename
must be stored
Surname
in the database
Telephone No
Identify a key
DVD Code
Title
Cost
Date Hired
Date Due
Member Number
Name)
Source documents

There are two DVD_RENTAL (Member Number
Title
attributes called
Forename
Title
Surname
Telephone No
DVD Code
Title
Cost
Date Hired
Date Due
Member Number
Name)
Source documents


There are two DVD_RENTAL (Member Number
Title
attributes called
Forename
Title
Surname
Member
Telephone No
Number is
DVD Code
Title
stored twice
Cost
Date Hired
Date Due
Member Number
Name)
Source documents



There are two DVD_RENTAL
attributes called
Title
Member Number
is the same as
Member Number
Name is already
stored as
Forename and
Surname
(Member Number
Title
Forename
Surname
Telephone No
DVD Code
Title
Cost
Date Hired
Date Due
Member Number
Name)
Source documents




There are two
DVD_RENTAL (Member Number
attributes called
Title
Title
Forename
Member Number is
Surname
the same as
Telephone No
Member Number
DVD Code
Name is already
Title
stored as
Cost
Forename and
Date Hired
Surname
Date Due
Number or No?
Member Number
Name)
Source documents


Tidy up UNF
Carry on as
before to 3NF
DVD_RENTAL (Member Number
Title
Forename
Surname
Telephone Number
DVD Code
DVD Title
Cost
Date Hired
Date Due)
Database Design

For each attribute you must decide



its name
its data type
its properties
Database Design

For each attribute you must decide

its name
 Choose sensible and meaningful field
names
 Be consistent!
 e.g. Number/Num/No/#
Database Design

For each attribute you must decide


its name
its data type






text (alphanumeric, string)
numeric (integer, real, currency)
date or time
Boolean (yes or no)
link
object (e.g. picture, sound, file)
Data Types

Text

Alphabetic



Alphanumeric




“Smith”
“John Smith”
“IV99 9ZZ”
“01234 567890”
“10 Downing Street”
Free text: “The cat sat on the mat, etc…”
Data Types

Numeric



Integer: 3, 1246, 0, -5
Real/floating point: 3.14, 1246.0, 0, -5.2
Currency: 3.14, 1246.00, 0.00, -5.20

Note that the currency symbol is not stored
Data Types

Date

“Short” date: 20/09/2010


“Medium” date: 20 Sep 10


dd mmmm yyyy
Custom Monday, 20 September 2010


dd mmm yyyy
“Long” date: 20 September 2010


dd/mm/yyyy
dddd dd mmm yyyy
Watch out for US dates: mm/dd/yyyy
Database Design

Names are usually stored as 3 or 4 fields




Title (Mr/Mrs/Miss/Ms)
Forename
Initials/Other Names
Surname
Database Design

Addresses are usually stored as 3 or 4
fields





Address1 (Street Address)
Address2 (Town)
Address3 (District)
Post Code
Sometimes the house number is stored
separately from the Street Name
Database Design


Telephone Numbers are always text
Other “numbers” are also stored as text




ISBNs
Vehicle Registration “numbers”
Serial numbers
Use integers for whole numbers
Database Design

For each attribute you must decide



its name
its data type
its properties





Primary key/foreign key PK/FK
Validation (presence, range, restricted choice)
Default value
Format
Store this information in a Data Dictionary
3NF
Member

MemberNumber

Title

Forename

Surname

Address1

Address2

Address3

PostCode

TelephoneNumber
Film

FilmCode

Title
Loan

MemberNumber*

DVDCode*

DateHired

DateDue
DVD

DVDCode

FilmCode*

Cost
Data Dictionary
Entity
Attribute
Key
Data Type
Required
Unique
DVD
DVD Code
PK
Film Code
FK
Integer
Y
Y
0000
Integer
Y
N
0000
Currency
Y
N
Integer
Y
Y
Title
Text (4)
Y
N
Forename
Text (15)
Y
N
Surname
Text (20)
Y
N
Address 1
Text (20)
Y
N
Address 2
Text (20)
N
N
Address 3
Text (20)
N
N
Post Code
Text (7)
Y
N
L?09 0LL
Telephone
Number
Text (11)
N
N
(99999) 000000
Integer
Y
Y
0000
Text (30)
Y
N
Cost
MEMBER
FILM
Member Number
Film Code
PK
PK
Title
LOAN
Format
Validation
Lookup value from FILM table
>=1 and <=3
0000
Choice of Mr/Mrs/Miss/Ms/Dr
Member Number
PK/FK Integer
Y
N
0000
Lookup value from MEMBER table
DVD Code
PKFK
Integer
Y
N
0000
Lookup value from DVD table
Date Hired
PK
Date
Y
N
dd/mm/yy
Date
Y
N
dd/mm/yy
Date Due
Data Dictionary
Entity
Attribute
Key
Data Type
DVD
DVD Code
PK
Film Code
FK
Cost
Required
Unique
Format
Validation
Integer
Y
Y
Integer
Y
N
Lookup value from FILM table
Currency
Y
N
>=1 and <=3
Data Dictionary
Entity
Attribute
Key
Data Type
Required
Unique
MEMBER
Member Number
PK
Format
Integer
Y
Y
Title
Text
Y
N
Forename
Text (15)
Y
N
Surname
Text (20)
Y
N
Address 1
Text (20)
Y
N
Address 2
Text (20)
N
N
Address 3
Text (20)
N
N
Post Code
Text (7)
Y
N
A?09 0AA
Telephone Number
Text (11)
N
N
(99999) 000000
Validation
Choice of Mr/Mrs/Miss/Ms/Dr
Data Dictionary
Entity
Attribute
Key
Data Type
FILM
Film Code
PK
Title
Required
Unique
Integer
Y
Y
Text (30)
Y
N
Format
Validation
Data Dictionary
Entity
Attribute
Key
Data Type
LOAN
Member Number
PK
DVD Code
Date Hired
Date Due
Required
Unique
Format
Validation
Integer
Y
N
Lookup value from MEMBER table
PK
Integer
Y
N
Lookup value from DVD table
PK
Date
Y
N
dd/mm/yy
Date
Y
N
dd/mm/yy