File and Database Design

Download Report

Transcript File and Database Design

File and Database Design
SYS364
Today’s Agenda




WHTSA DBMS, RDBMS, SQL
A place for everything and everything in its
place.
Entity Relationship Diagrams
to figure it out
Normalization
to make it work
Why DB?





Data & Information is lifeblood of most
systems, esp. business systems
faster and fewer programs
OLTP for accurate and efficient data
easy retrieval for EIS, DSS, OLAP and
data warehouse creation
easy interface to heterogeneous systems
via SQL, ODBC, JDBC
Why else?
DB
analysts make
more money than
programmers.
Data Terminology and Concepts
It’s a flat (sequential) file if it needs
fscanf. Application programs parse
the byte stream into data.
 DBs use Entities, fields, records, files
and keys so programs do not have to
know about physical storage of data,
only logical

Flat file data
Quote,
comma delimited bytes interpreted by
program into data
222222222,"Evans","Gil",234.56,"CPO","3","
arrpiano"333333333,“Young",“Dave",34.56,
"CPA","4","bass"111111111,"Ferguson","Ma
ynard",123.45,"CPAC","5","trumpet"444444
444,“Bickert",“Ed",456.78,"CPA","6",“guita
r"666666666,"Koffman","Moe",678.94,"CPAC
","5","flute"555555555,"Clarke","Terry",
567.88,"CPA","5","drums"888888888,"Krall
","Diana",876.54,"CPAC","5","vocalfem"77
7777777,"Eikhart","Shirley",789.01,"CPAC",
"5","composer"999999999,"Peterson","Osca
r",-9.87,"CPA","6","piano"
DB file

DB knows about the data
StdNo
111111111
222222222
333333333
444444444
555555555
666666666
777777777
888888888
999999999
Lname
Ferguson
Evans
Young
Bickert
Clarke
Koffman
Eikhart
Krall
Peterson
Fname Balance Pgm Semester email
Maynard 123.45 CPAC
5
trumpet
Gil
234.56 CPO
3
arrpiano
Dave
34.56 CPA
4
bass
Ed
456.78 CPA
6
guitar
Terry
567.88 CPA
5
drums
Moe
678.94 CPAC
5
flute
Shirley
789.01 CPAC
5
composer
Diana
876.54 CPAC
5
vocal
Oscar
-9.87 CPA
6
piano
Relational Model Advantages




logical and physical characteristics of the
DB are separated. e.g. order of rows and
columns is immaterial
much more easily understood by humans
powerful operators available, enable
complex operations with brief commands.
sound framework for the design of DBs
DBMS library or Relational Database
or SQL collection or E-R Diagram:
DBMS: file
Rel: relation
SQL: table
E-R: Entity
DBMS
record
SQL
row
Relational
tuple
E-R
instance
DBMS
field
SQL
column
Relational
E-R
attribute attribute
key
key
key
identifier
Key Fields

May be one or two or more fields
(combination / multivalued / composite keys)





Primary keys are unique and minimal
Candidate keys (possible Primaries)
Secondary keys (alternate key, views)
Foreign keys (field(s) in one file is a
primary key in another file - a relationship!
Referential Integrity - checks relationship
Data Relationships and EntityRelationship Diagrams
 ERD
 Graphical
model that shows the
relationships among system entities
 Entities
are drawn as rectangles
 Labeled
with singular nouns
 Relationships
 Labeled
are drawn as diamonds
as active verbs
Types of Relationships



One to One (1:1)
One to Many (1:M)
Many to Many (M:N)
Cardinality


how many relationships among entities
Analysts need to understand cardinality to
design files and databases that reflect
accurately all relationships among entities




0, 1 or many relationships between Entities
Entity can be mandatory
Entity can be optional
Crows Foot Notation
Creating an ERD




Identify the Entities
Determine all significant events or
activities for two or more entities
Analyze the nature of the interaction
Draw the ERD
Try it out yourself

Seneca has courses, sections, professors,
and students
Normalization


A process by which you identify and
correct inherent problems in record design
Involves three stages



First Normal Form
Second Normal Form
Third Normal Form
First Normal Form


1NF = no repeating attributes (columns)
Is this déjà vu all over again?


If so, reorganize the data into separate
relations (tables)
You are unique – just like everyone else


Establish a minimal & unique primary key
Include a key to identify the repeating
columns now in their own table
Second Normal Form



1NF with no partial dependencies, i.e.
where the value of a non-key column is
dependent on only part of the key (usually
on one column of a multi-column key)
1NF record with a primary key that is a
single column is automatically in 2NF
See next slide for translation
2NF

Who does this belong to?



Each field belongs to the whole primary key
If not, split the row into new tables
When does it matter? (not in the textbooks)


The Item Master file contains a price field. If only the
current price matters, this is where the price belongs.
An Invoice Item file has its own copy of the price field
because the price is time dependent – it belongs with
the invoice at the time of sale.
Third Normal Form



2NF with no transitive dependencies, i.e. where
a non-key column is dependent on another nonkey column
To go from 2NF to 3NF, remove the dependent
column
Do we know this already?


If have Unit-Price & Qty, do not need Extended-Price
If Postal Code, what don’t you need?
Summary

Entity Relationship Diagrams



Who is in our model?
How do they relate?
Normalization


Confirms ERD
Organizes data:
Data is stored only once
 Keys uniquely and minimally identify relations
