Databases and Database Design William A. Yasnoff, MD, PhD Oregon Health Division 1

Download Report

Transcript Databases and Database Design William A. Yasnoff, MD, PhD Oregon Health Division 1

Databases and Database Design
William A. Yasnoff, MD, PhD
Oregon Health Division
1
Databases & Database Design
 Introduction
 Relational
Databases
 Database Design: Data Models
2
Introduction
 Database
= large collection of
information
 Stored on hard disk
 Retrieval
– 100,000 items X 10 ms = 17 minutes
– Need to organize data to improve retrieval
speed
– How do you organize your paper
files?
3
Introduction (continued)
 Data
Organization
– sorting data: phonebook in alpha v. random
order
– “tabs” at key points: dictionary with tabs
for each letter
– direct pointers to information = index
» table of key element value and address of other
data
» e.g. name & mailbox number --> letters
in mailbox
4
Relational Database Model
5
Relational Database Model
 Developed
by EF Codd, CJ Date (70s)
 Table = Entity = Relation
 Table row = tuple = instance
 Table column = attribute
 Table linkage by values
 Entity-Relationship Model
6
Attributes
 Key
– uniquely identify row/tuple
– may be one or more attributes
 Non-key
– other properties of instance
– dependent on key
7
Retrieval in RDMS
 SQL
– Select ...
– From ...
» optional: can be computed
– Where ...
 Query
by example
– Fill in the blanks
– WONDER
8
Advantages of RDMS
 Very
strong theoretical basis
– storage
– retrieval
 Easy
to implement
 Conflicts and anomalies can be avoided
 Intuitive appeal
 Easy retrieval
9
Relational Model: Summary
 popular
method of organizing data
 strong theoretical properties facilitate
retrieval
 relation = table
 attribute = column
 tuple = row
 key = {attributes} that uniquely
identify each row
10
Relational Model: Summary 2
 Allows
complex data relationships with
multiple tables:
– {patient id, patient demographics}
– {patient id, patient visit date, blood
pressure}
 Structured
Query Language (SQL)
retrieval
» Select patient _name where blood_lead_level > 10
11
Database Design: Data Models
12
Data Models
 Definition: A representation
of the data
and and data relationships of an activity
– data: case report of serious E. Coli illness
– data: E. Coli serotype O157
– relationship: case report “contains” serotype
(semantic relationship)
 Database
Design = Development of a
quality data model
13
Data Model Quality

Correctness
– Conceptual (concepts
represented properly)
» real world
representation
– Syntactic
(relationships
represented properly)
» real world language
14

Completeness
(wholeness)
– Conceptual (all
concepts represented)
– Syntactic (all
relationships
represented)
Entity
 “Any
distinguishable object that is to be
represented in a database” [C.J. Date]
 Properties
–
–
–
–
within scope of model
single concept
a set of distinguishable “instances”
satisfies normalization rules
 Also
15
called a relation
Entity Guidelines
 Relationship
with at least one other
entity
 Unique, descriptive name
 Class or set of things (not just one)
 Single meaning (no homonyms)
 No synonyms (two entities describing
same class)
16
Attribute
 “a
type or characteristic of an entity”
(e.g. “gender” is an attribute of the entity
“patient”)
 Domain = the set of values from which an
attribute may be selected (e.g. the
domain of the attribute “gender” is
[male, female])
 An entity typically has many attributes
17
Attribute Characteristics
 Key
– value uniquely identifies entity
» e.g. “Lab test ID” is key attribute of “specimen”
 Non-Key
– value does not uniquely identify entity
» e.g. “author” does not uniquely identify
“publication”
18
Attribute Characteristics
 Atomic
= individual data value (one and
only one fact)
 Description = complete and clear
definition
– e.g. “professional privilege date” = The date
on which a health care professional is
granted privilege to practice in a particular
health care facility, establishing the
provider’s eligibility for patient care
assignments and liability coverage
19
Attribute Guidelines
 Unique
name
– no plurals, possessives, articles,
conjunctions, verbs, or prepositions
 Clear,
complete, unambiguous
description
 Atomic (no positional information)
 Domain with 2 or more values
 Originates in only one entity
20
Primary Key
 Attribute
of an entity whose values
uniquely determine its occurrences
– {birth certificate number, person name, birth
date, mother name, physician name}
– {facility, patient name, physician name, date,
temperature, pulse, blood pressure}
21
Primary Key Characteristics
 Stable:
does not change over time
 Minimal: fewest attributes necessary
 Factless: no hidden information
 Definitive: value always exists
 Accessible: available when data created
 Unique: absolutely no duplicates
22
Relationship
 Semantic:
“contains”, “is part of”,
“belongs to”
 One-to-one
– serotype of an organism
– immunization status of a child
 One-to-many
– antibiotic resistances of an organism
– vaccines administered to a child
23
Relationship Guidelines
circular references (e.g. “health plans
-> markets -> products -> health plans”
should be “health plans -> health plan
markets -> market products <- health
plan products <- health plans”)
 Single relationship between two entities
 No recursive relationships
 No
24
Single Relationship of Entities
 Problem:
employee <--> job assignment
<--> job [double circular]
 Solution: person --> job assignment <-job [“person” includes “status” as
employee or contractor]
25
Recursive Relationships
 Problem:
supervisor <--> employee [may
be circular]
 Solution: person {person ID } -->
employee-supervisor relationship
{employee ID, supervisor ID}
– multiple supervisory roles for each person
26
Normalization
 Formalization
of common sense rules of
information organization
 An attribute is functionally dependent on
X only if each of its values is determined
by the value of X (X may be composite)
 Example: DOB is functionally dependent
on Driver’s License number
27
Key Normalization Concept
 Functional
dependence of each entity
must be
– based on entire primary key
– NOT based on any other attributes
28
Benefits of Normalization
 Aids
in database design, integration
 Ensures precise capture of business logic
 Minimizes redundancy
 Minimizes need for null values
 Prevents
– information loss
– unintentional results
29
Summary of Normalization
One
Fact in
One Place
30
Common DB Design Errors
 Multiple
instances in same row of table,
e.g. first_value, second_value,
third_value [Problem: what to do with
4th value?]
 Same data item repeated in multiple
places, e.g. address appears in two
different tables [Problem: how to keep
two values synchronized?]
31
Data Model Quality

Correctness
– Conceptual (concepts
represented properly)
» real world
representation
– Syntactic
(relationships
represented properly)
» real world language
32

Completeness
(wholeness)
– Conceptual (all
concepts represented)
– Syntactic (all
relationships
represented)
Database Design Pearls
 Accommodate
all data needed
 Correct relationships between data items
 No duplicate representation
 Anticipated retrievals use indexes
 Meet confidentiality requirements
33
References - 1
 Reingruber
MC & Gregory WW: The
Data Modeling Handbook (New York:
John Wiley & Sons, 1994)
 Montgomery SL: Object-Oriented
Information Engineering (Boston: AP
Professional/Harcourt Brace, 1994)
34
References - 2
 Codd
EF: The Relational Model for
Database Management (Reading, MA:
Addison-Wesley, 1990)
 Date CJ: An Introduction to Database
Systems, 5th ed. (Reading, MA: AddisonWesley, 1990)
 Duncan KA: Health Information and
Health Reform (San Francisco: JosseyBass, 1994)
35