Introduction to Database Systems Conceptual Modeling

Download Report

Transcript Introduction to Database Systems Conceptual Modeling

Introduction to Database Systems
Conceptual Modeling
Irvanizam Zamanhuri, M.Sc
Computer Science Study Program
Syiah Kuala University
http://www.informatika.unsyiah.ac.id
Email: [email protected]
Database Design Goes Through Several Stages
Problem
Requirements Analysis
Data Requirements
Data Analysis, Conceptual Design
Conceptual Schema
Logical Database Design
Logical Schema
Physical Database Design
Physical Schema
Conceptual and Logical Design
name
Conceptual Model:
Product
price
Relational Model:
(plus functional
dependencies)
Normalization:
Person
buys
name
ssn
University Data Requirements

A student has a name, which consists of a given name and a family
name, and a student ID. Each student is uniquely identified by his/her
student ID.

A course has a subject and a course ID. For each course, we want to
record the number of students taking that course and the type of
equipment being used for the course. A course is uniquely identified by
its course ID.

A student can be enrolled in an arbitrary number of courses, and an
arbitrary number of students can be enrolled in a course. For each
course in which they are enrolled students receive a lab mark and an
exam mark.

A course cannot exist if there is no student enrolled in it.

A school is distinguished by the honour's degree that it awards. We
also want to record to which faculty a school belongs. A student is
registered with at most one school, while a school can have an arbitrary
number of students.
University Data Requirements (cntd.)

A student is also registered for a year of study. An year of study is
identified by a number between 1 and 4. A student is registered for only
one year of study, but each cohort can have many students.

For each member of staff we want to record their name and their room
number. A member of staff is identified by the combination of these two
pieces of data. Staff are appraised by other staff. A member of staff has
no more than one appraiser.

Students can be allocated to a member of staff as their tutor. A student
can have no more than one tutor. The tutor and the student agree upon
a time slot for regular meetings.

For each year of study, there is one member of staff who acts as the
year tutor. A member of staff can only be responsible for one year of
study. Students can be registered for a year of study.

Courses are taught by members of staff. A course can have several
teachers, and a staff member can teach several courses.
Conceptual Design with the ER Model
The questions to ask:
 What are the entities (= objects, individuals) in the
organisation?
 Which relationships exist among the entities?
 What information do we want to store about these
entities and relationships?
 What are the business rules of the organisation?
 Which integrity constraints do arise from them?
The answers are represented in an Entity Relationship
Diagram (ER diagram)
Entities and Entity Sets/Types
Entity: An object distinguishable from other objects
(e.g., an employee)
 An entity is described by a set of attributes.
Examples of entities? Examples of things that are not entities?
Entity Set/Entity Type: A collection of similar entities
(e.g., all employees)
 All entities in an entity set have the same set of attributes.
 Each attribute has a domain.
 Each entity set has a key
(i.e., one or more attributes whose values uniquely identify an entity)
Graphical Representation of Entity Sets
given
family
name
STUDENT
studno

no. of
students
courseno
equip
COURSE
subject
Entity Sets are drawn as rectangles
 Attributes are drawn using ovals
 Simple vs. composite attributes
 Derived attributes are indicated by dashed lines
 The attributes making up the key are underlined
Composite Keys
name
STAFF
roomno

Some entities cannot be uniquely identified by the
values of a single attribute …
 … but may be identified by the combination of two or
more attribute values
 several attributes together make up a compound key
Relationships and Relationship Sets/Types
Relationship: An association between two or more entities
(e.g., “Joe Smith” is “enrolled” in “CS123”)
 Relationships may have attributes
Examples of relationships?
Relationship Set/Type: A collection of similar relationships
 An n-ary relationship type relates n entity types E1,…,En
 Each relationship involves n entities e1E1,…,en En
Examples of relationship sets?
An Instance of a Relationship Type
Student
Course
Enrol
r1
ST1
ST2
ST3
C1
r2
C2
C3
r3
ST4
r4
ST5
….
r5
ST6
….
….
Enrol
Student
r6
Course
Graphical Representation of Relationship Types
given
family
labmark
name
STUDENT
studno

ENROL
exammark
no. of
students
courseno
equip
COURSE
subject
Relationship sets are drawn as diamonds
How many labmarks can a student have?
Roles and Recursive Relationships
An entity type can

participate in several
relationship types
name
and
STAFF
roomno

participate more than once in
one relationship set
appraiser
appraisee
APPRAISAL
(taking on different “roles”)
Which are other examples of recursive relationships?
Multiplicity (cardinality) of
Relationship Types

one-one:
a
b
c
d
1
2
3

1
1
many-one:
1
2
3

1
a
b
c
d
many-many:
1
2
3
a
b
c
d
Sometimes letters m, n are used
to indicate the “many” side of
relationships.
Participation Constraints




Participation constraints specify whether or not an
entity must participate in a relationship set
When there is no participation constraint, it is
possible that an entity will not participate in a
relationship set
When there is a participation constraint, the entity
must participate at least once
Participation constraints are drawn using a
double line from the entity set to the relationship set
Mandatory Participation
Staff
Department
Works_for
r1
ST1
ST2
ST3
D1
r2
D2
D3
r3
ST4
r4
ST5
….
r5
ST6
….
….
Works-for
Staff
r6
1
Department
Optional and Mandatory Participation
Staff
Department
Manages
r1
ST1
ST2
ST3
D1
D2
D3
r2
ST4
r3
ST5
….
ST6
….
….
1
Staff
manages
1
Department
Many:many Relationship Type with Optional
and Mandatory Participation
Staff
Course
Teaches
r1
ST1
ST2
C1
r2
ST3
C2
C3
r3
ST4
C4
r4
ST5
….
….
r5
….
Teaches
Staff
r6
Course
Recursive Relationship Type with Optional
Participation
Staff
ST1
unmanaged ST2
ST3
ST4
ST5
ST6
….
r1 Manages
1
1
2
r2
2
2
2
1
2
r3
1
r4
r5
….
1
1. Manager
2. Employee
Staff
Manager
Employee
Manages
Summary: Properties of Relationship Types

Degree


Cardinality ratios


The number of participating entity types
The number of instances of each of the
participating entity types which can partake in a
single instance of the relationship type 1:1,
1:many, many:1, many:many
Participation (optionality)


Whether an entity instance has to participate in a
relationship instance
Represented with a double line
Attributes in ER Modelling

For every attribute we define




Domain or data type
Format, i.e., composite or atomic
whether it is derived
Every entity type must have as key an attribute or
a set of attributes
given
family
labmark
name
STUDENT
studno
ENROL
exammark
no. of
students
courseno
equip
COURSE
subject
ER Model of the University Requirements
given
family
1
studno
hons
m
STUDENT
faculty
m
m
n
SCHOOL
REG
name
YEARREG
year
1
labmark
YEAR
ENROL
TUTOR
1
exammark
slot
YEARTUTOR
courseno
m
COURSE
subject
equip
m
n
TEACH
name
roomno
1
1
STAFF
1
appraiser
m
appraisee
APPRAISAL
Exercise: Supervision of PhD Students
A database needs to be developed that keeps track of PhD students:

For each student store the name and matriculation number.
Matriculation numbers are unique.

Each student has exactly one address. An address consists of street,
town and post code, and is uniquely identified by this information.

For each lecturer store the name, staff ID and office number. Staff ID's
are unique.

Each student has exactly one supervisor. A staff member may
supervise a number of students.

The date when supervision began also needs to be stored.
Exercise: Supervision of PhD Students

For each research topic store the title and a short description. Titles
are unique.

Each student can be supervised in only one research topic, though
topics that are currently not assigned also need to be stored in the
database.
Task:
Design an entity relationship diagram that covers the requirements
above. Do not forget to include cardinality and participation
constraints.
Roles in Non-recursive Relationships

Also in nonrecursive relationships we may annotate
relationship links with the roles that entities play in
the relationship
given
1
m
STAFF
m STUDENT
1
roomno
name
SUPERVISE
name
family
EXAMINE
studno
What would be appropriate roles in this example?
Multiway (non-binary) Relationship
Relationships can involve more than two entity types…
roomno
name
STAFF
STAFF
p
given
family
courseno
equip
name
STUDENT
m
TUTORS
n
COURSE
slot
subject
studno
Strong and Weak Entities
Dominating and subordinate types are modeled as
 Entities
(also “strong”, or “identifying” entities)
and
 Weak entities
Identifying
entity
Supporting, or
identifying
relationship
Weak entity
customer
CUSTOMER
1
address
CUST-ORDER
m
ORDER
orderid
date
Strong and Weak Entities (Identifier Dependency)



A strong entity type has an identifying primary key
A weak entity’s key comes not (completely) from its
own attributes, but from the keys of one or more
entities to which it is linked by a supporting many-one
relationship.
A weak entity type does not have a primary key but
does have a discriminator
Identifying
entity
customer
CUSTOMER
1
address
CUST-ORDER
m
ORDER
orderid
date
Weak Entities May Depend on
Other Weak Entities
• Strong Entity
• Identifying entity for
ORDER
• Identifying entity for
ORDER_LINES
• Weak entity
• Identifying entity for
ORDER_LINES
c-name
CUSTOMER
address
1
CUST-ORDER
m
ORDER
orderid
date
1
ORDER-MAKEUP
m
• Weak entity
ORDER_
ORDER
LINES
LINE
lineno
quantity
Turning Relationships into Entities
Relationship types are less natural if
 the relationships have many attributes, or
 we want to model
a relationship with that relationship type
Example:
labmark
STUDENT
m
exammark
ENROL
TUTOR
STAFF
n
COURSE
Association Entity Types
An entity type that represents a relationship type:
given
family
courseno
equip
name
m
STUDENT
COURSE
1
subject
studno
m
STUD_ENROL
labmark




m
ENROL
1
COURSE_ENROL
exammark
The association entity type is a subordinate type
The participating entity types become dominating types
Attributes of the relationship become attributes of the entity
Relationships with the dominating entity types are many-one and
mandatory
How Does This Solve Our Problem?
STAFF
given
family
courseno
equip
name
m
TUTOR
STUDENT
COURSE
1
subject
studno
m
STUD_ENROL
m
ENROL
labmark
1
COURSE_ENROL
exammark
Association entity types can participate in any
relationship type
Constraints: Definition
A
constraint is an assertion about the
database that must be true at all times
 Constraints are part of the database
schema
Modeling Constraints
Finding constraints is part of the modeling process. They reflect
facts that hold in the world or business rules of an organisation.
Examples:
Keys: codice fiscale uniquely identifies a person
Single-value constraints: a person can have only one father
Referential integrity constraints: if you work for a company,
it must exist in the database
Domain constraints: peoples’ ages are between 0 and 150
Cardinality constraints: at most 100 students enroll in a course
Keys
A key is a set of attributes that uniquely identify an object or
entity:
Person: social-security-number (U.S.)
national insurance number (U.K.)
codice fiscale (Italy)
name
name + address
name + address + dob
(Why not “age”?)
Perfect keys are often hard to find, so organisations
usually invent something.
Do invented keys (e.g., course id) have drawbacks?
Variants of Keys

Multi-attribute (composite) keys:


E.g. name + address
Multiple keys:

E.g social-security-number, name + address
Functional Dependencies
in Multiway Relationships
“Every invoice, issued by a video store for some person, refers
only to one movie.”
Invoice
VideoStore
Rental
Movie
Person
The fact that the movie is functionally dependent on video-store,
person, and invoice is expressed with an arrow.
Functional Dependencies
in Multiway Relationships (cntd.)
Each arrow denotes a separate functional dependency:
Invoice
VideoStore
Rental
Person
Movie
Functional Dependencies
in Multiway Relationships (cntd.)
How can we say,
“The invoice determines the store”?
Invoice
VideoStore
Rental
Person
What about this?
Movie
Existence Constraints
Sometimes, the existence of an entity of type X
depends on the existence of an entity of type Y:
Examples:
 Book chapters presume the existence of a book
 Tracks on a CD presume the existence of the CD
 Orders depend on the existence of a customer
We call Y the dominating entity type and
X the subordinate type
Design Principles for EER Modeling
There are usually several ways to model a real world concept,
e.g.:
 entity vs. attribute
 entity vs. relationship
 binary vs. ternary relationships, etc.
Design choices can have an impact on
 redundancies among the data that we store
 integrity constraints captured by the database structure
Avoid Redundancy!
“Don’t say the same thing more than once.”
Redundancy wastes space and encourages inconsistency
Example:
name
name
Beer
name
ManfBy
Manf
Which is good design,
and which is bad?
manf
Beer
1
addr
Manf
addr
Why?
And What About This?
name
manf
Beer
name
ManfBy
1
addr
Manf
Entities Vs. Attributes
Sometime it is not clear
 which concepts are worthy of being entities, and
 which are handled more simply as attributes
Example:
Which are the pros and cons of each of
the two designs below?
name
Beer
name
ManfBy
1
Manf
name
manf
Beer
Entity Vs. Attribute: Rules of Thumb
Make an entity only if either:
1.
It is more than a name of something; i.e.,
it has non-key attributes or relationships
with a number of different entities, or
2.
It is the “many” in a many-one relationship
Entity Vs. Attribute: Example
The following design illustrates both points:
name
Beer


name
ManfBy
1
addr
Manf
Manfs deserves to be an entity because we record
addr, a non-key attribute
Beers deserves to be an entity because it is at the
“many” end
If not, we would have to make “set of beers”
an attribute of Manfs
Use the Schema to Enforce Constraints


The design schema should enforce as many
constraints as possible
Don't rely on future data to follow assumptions
Example:
 If the registrar wants to associate only one instructor
with a course,
 don't allow sets of instructors and
 don’t count on departments to enter only one
instructor per course
Hints for EER Modelling

Identify entity types by searching for nouns and noun
phrases
 Assume all entities are strong and check for weak ones
on a later pass
 You need an identifier for each strong entity
 Assume all relationships have optional participation
and check for mandatory (total) ones on a later pass
 Expect to keep changing your mind about whether
things are entities, relationships or attributes
 Keep the level of detail relevant and consistent
(for example leave out attributes at first)
 Approach diagram through different views …
… and merge them
Exercise: A Record Company Database



A record company wishes to use a computer database to help with
its operations regarding its performers, recordings and song
catalogue.
Songs have a unique song number, a non-unique title and a
composition date. A song can be written by a number of
composers; the composer’s full name is required. Songs are
recorded by recording artists (bands or solo performers). A song is
recorded as a track of a CD. A CD has many songs on it, called
tracks. CDs have a unique record catalogue number, a title and
must have a producer (the full name of the producer is required).
Each track must have the recording date and the track number of
the CD.
A song can appear on many (or no) CDs, and be recorded by many
different recording artists. The same recording artist might re-record
the same song on different CDs. A CD must have only 1 recording
artist appearing on it. CDs can be released a number of times, and
each time the release date and associated number of sales is
required.
References
In preparing these slides I have used several sources.
The main ones are the following:
Books:
 A First Course in Database Systems, by J. Ullman and J.
Widom
 Fundamentals of Database Systems, by R. Elmasri and S.
Navathe
Slides from Database courses held by the following people:
 Enrico Franconi (Free University of Bozen-Bolzano)
 Carol Goble and Ian Horrocks (University of Manchester)
 Warner Nutt (Free University of Bozen-Bolzano)