Databases and Database Design William A. Yasnoff, MD, PhD Oregon Health Division 1
Download
Report
Transcript Databases and Database Design William A. Yasnoff, MD, PhD Oregon Health Division 1
Databases and Database Design
William A. Yasnoff, MD, PhD
Oregon Health Division
1
Databases & Database Design
Introduction
Relational
Databases
Database Design: Data Models
2
Introduction
Database
= large collection of
information
Stored on hard disk
Retrieval
– 100,000 items X 10 ms = 17 minutes
– Need to organize data to improve retrieval
speed
– How do you organize your paper
files?
3
Introduction (continued)
Data
Organization
– sorting data: phonebook in alpha v. random
order
– “tabs” at key points: dictionary with tabs
for each letter
– direct pointers to information = index
» table of key element value and address of other
data
» e.g. name & mailbox number --> letters
in mailbox
4
Relational Database Model
5
Relational Database Model
Developed
by EF Codd, CJ Date (70s)
Table = Entity = Relation
Table row = tuple = instance
Table column = attribute
Table linkage by values
Entity-Relationship Model
6
Attributes
Key
– uniquely identify row/tuple
– may be one or more attributes
Non-key
– other properties of instance
– dependent on key
7
Retrieval in RDMS
SQL
– Select ...
– From ...
» optional: can be computed
– Where ...
Query
by example
– Fill in the blanks
– WONDER
8
Advantages of RDMS
Very
strong theoretical basis
– storage
– retrieval
Easy
to implement
Conflicts and anomalies can be avoided
Intuitive appeal
Easy retrieval
9
Relational Model: Summary
popular
method of organizing data
strong theoretical properties facilitate
retrieval
relation = table
attribute = column
tuple = row
key = {attributes} that uniquely
identify each row
10
Relational Model: Summary 2
Allows
complex data relationships with
multiple tables:
– {patient id, patient demographics}
– {patient id, patient visit date, blood
pressure}
Structured
Query Language (SQL)
retrieval
» Select patient _name where blood_lead_level > 10
11
Database Design: Data Models
12
Data Models
Definition: A representation
of the data
and and data relationships of an activity
– data: case report of serious E. Coli illness
– data: E. Coli serotype O157
– relationship: case report “contains” serotype
(semantic relationship)
Database
Design = Development of a
quality data model
13
Data Model Quality
Correctness
– Conceptual (concepts
represented properly)
» real world
representation
– Syntactic
(relationships
represented properly)
» real world language
14
Completeness
(wholeness)
– Conceptual (all
concepts represented)
– Syntactic (all
relationships
represented)
Entity
“Any
distinguishable object that is to be
represented in a database” [C.J. Date]
Properties
–
–
–
–
within scope of model
single concept
a set of distinguishable “instances”
satisfies normalization rules
Also
15
called a relation
Entity Guidelines
Relationship
with at least one other
entity
Unique, descriptive name
Class or set of things (not just one)
Single meaning (no homonyms)
No synonyms (two entities describing
same class)
16
Attribute
“a
type or characteristic of an entity”
(e.g. “gender” is an attribute of the entity
“patient”)
Domain = the set of values from which an
attribute may be selected (e.g. the
domain of the attribute “gender” is
[male, female])
An entity typically has many attributes
17
Attribute Characteristics
Key
– value uniquely identifies entity
» e.g. “Lab test ID” is key attribute of “specimen”
Non-Key
– value does not uniquely identify entity
» e.g. “author” does not uniquely identify
“publication”
18
Attribute Characteristics
Atomic
= individual data value (one and
only one fact)
Description = complete and clear
definition
– e.g. “professional privilege date” = The date
on which a health care professional is
granted privilege to practice in a particular
health care facility, establishing the
provider’s eligibility for patient care
assignments and liability coverage
19
Attribute Guidelines
Unique
name
– no plurals, possessives, articles,
conjunctions, verbs, or prepositions
Clear,
complete, unambiguous
description
Atomic (no positional information)
Domain with 2 or more values
Originates in only one entity
20
Primary Key
Attribute
of an entity whose values
uniquely determine its occurrences
– {birth certificate number, person name, birth
date, mother name, physician name}
– {facility, patient name, physician name, date,
temperature, pulse, blood pressure}
21
Primary Key Characteristics
Stable:
does not change over time
Minimal: fewest attributes necessary
Factless: no hidden information
Definitive: value always exists
Accessible: available when data created
Unique: absolutely no duplicates
22
Relationship
Semantic:
“contains”, “is part of”,
“belongs to”
One-to-one
– serotype of an organism
– immunization status of a child
One-to-many
– antibiotic resistances of an organism
– vaccines administered to a child
23
Relationship Guidelines
circular references (e.g. “health plans
-> markets -> products -> health plans”
should be “health plans -> health plan
markets -> market products <- health
plan products <- health plans”)
Single relationship between two entities
No recursive relationships
No
24
Single Relationship of Entities
Problem:
employee <--> job assignment
<--> job [double circular]
Solution: person --> job assignment <-job [“person” includes “status” as
employee or contractor]
25
Recursive Relationships
Problem:
supervisor <--> employee [may
be circular]
Solution: person {person ID } -->
employee-supervisor relationship
{employee ID, supervisor ID}
– multiple supervisory roles for each person
26
Normalization
Formalization
of common sense rules of
information organization
An attribute is functionally dependent on
X only if each of its values is determined
by the value of X (X may be composite)
Example: DOB is functionally dependent
on Driver’s License number
27
Key Normalization Concept
Functional
dependence of each entity
must be
– based on entire primary key
– NOT based on any other attributes
28
Benefits of Normalization
Aids
in database design, integration
Ensures precise capture of business logic
Minimizes redundancy
Minimizes need for null values
Prevents
– information loss
– unintentional results
29
Summary of Normalization
One
Fact in
One Place
30
Common DB Design Errors
Multiple
instances in same row of table,
e.g. first_value, second_value,
third_value [Problem: what to do with
4th value?]
Same data item repeated in multiple
places, e.g. address appears in two
different tables [Problem: how to keep
two values synchronized?]
31
Data Model Quality
Correctness
– Conceptual (concepts
represented properly)
» real world
representation
– Syntactic
(relationships
represented properly)
» real world language
32
Completeness
(wholeness)
– Conceptual (all
concepts represented)
– Syntactic (all
relationships
represented)
Database Design Pearls
Accommodate
all data needed
Correct relationships between data items
No duplicate representation
Anticipated retrievals use indexes
Meet confidentiality requirements
33
References - 1
Reingruber
MC & Gregory WW: The
Data Modeling Handbook (New York:
John Wiley & Sons, 1994)
Montgomery SL: Object-Oriented
Information Engineering (Boston: AP
Professional/Harcourt Brace, 1994)
34
References - 2
Codd
EF: The Relational Model for
Database Management (Reading, MA:
Addison-Wesley, 1990)
Date CJ: An Introduction to Database
Systems, 5th ed. (Reading, MA: AddisonWesley, 1990)
Duncan KA: Health Information and
Health Reform (San Francisco: JosseyBass, 1994)
35