Document 7222296

Download Report

Transcript Document 7222296

Data Modeling
Introduction

The presentation will address the following questions:







What is systems modeling and what is the difference between
logical and physical system models?
What is data modeling and what are its benefits?
Can you recognize and understand the basic concepts and
constructs of a data model?
Can you read and interpret a entity relationship data model?
When in a project are data models constructed and where are they
stored?
Can you discover entities and relationships?
Can you construct an entity-relationship context diagram?
1
Data Modeling
Introduction

The presentation will address the following questions:


Can you discover or invent keys for entities?
Can you construct a fully attributed entity relationship diagram and
describe all data structures and attributes to the repository or
encyclopedia?
2
Data Modeling
An Introduction to Systems Modeling

Systems Modeling



One way to structure unstructured problems is to draw models.
 A model is a representation of reality. Just as a picture is worth
a thousand words, most system models are pictorial
representations of reality.
Models can be built for existing systems as a way to better
understand those systems, or for proposed systems as a way to
document business requirements or technical designs.
What are Logical Models?
 Logical models show what a system ‘is’ or ‘does’. They are
implementation-independent; that is, they depict the system
independent of any technical implementation. As such, logical
models illustrate the essence of the system.
3
Data Modeling
An Introduction to Systems Modeling

Systems Modeling


What are Physical Models?
 Physical models show not only what a system ‘is’ or ‘does’,
but also how the system is physically and technically
implemented. They are implementation-dependent because they
reflect technology choices, and the limitations of those
technology choices.
Systems analysts use logical system models to depict business
requirements, and physical system models to depict technical
designs.
4
Data Modeling
An Introduction to Systems Modeling

Systems Modeling

Systems analysis activities tend to focus on the logical system
models for the following reasons:
 Logical models remove biases that are the result of the way the
current system is implemented or the way that any one person
thinks the system might be implemented.
 Logical models reduce the risk of missing business
requirements because we are too preoccupied with technical
details.
 Logical models allow us to communicate with end-users in
non-technical or less technical languages.
5
Data Modeling
An Introduction to Systems Modeling

Systems Modeling



Data modeling is a technique for defining business requirements
for a database.
 Data modeling is a technique for organizing and documenting
a system’s DATA. Data modeling is sometimes called database
modeling because a data model is usually implemented as a
database. It is sometimes called information modeling.
Many experts consider data modeling to be the most important of
the modeling techniques.
Why is data modeling considered crucial?
 Data is viewed as a resource to be shared by as many processes
as possible. As a result, data must be organized in a way that is
flexible and adaptable to unanticipated business requirements –
and that is the purpose of data modeling.
6
Data Modeling
An Introduction to Systems Modeling

Systems Modeling

Why is data modeling considered crucial? (continued)
 Data structures and properties are reasonably permanent –
certainly a great deal more stable than the processes that use the
data. Often the data model of a current system is nearly
identical to that of the desired system.
 Data models are much smaller than process and object models
and can be constructed more rapidly.
 The process of constructing data models helps analysts and
users quickly reach consensus on business terminology and
rules.
7
Data Modeling
CUSTOMER
Customer Number (PK)
Customer Name
Shipping Address
Billing Address
Balance Due
ORDER
has placed
Order Number (PK)
Order Date
Order Total Cost
Customer Number (FK)
sold
ORDERED PRODUCT
INVENTORY PRODUCT
Product Number (PK)
Product Name
Product Unit of Measure
Product Unit Price
sold as
8
Ordered Product ID (PK)
. Order Number (FK)
. Product Number (FK)
Quantity Ordered
Unit Price at Time of Order
Data Modeling
System Concepts for Data Modeling

System Concepts


Most systems analysis techniques are strongly rooted in systems
thinking.
 Systems thinking is the application of formal systems theory
and concepts to systems problem solving.
There are several notations for data modeling, but the actual model
is frequently called an entity relationship diagram (ERD).
 An ERD depicts data in terms of the entities and relationships
described by the data.
9
Data Modeling
System Concepts for Data Modeling

Entities

STUDENT
An entity


All systems contain data.
Data describes ‘things’.
A concept to abstractly represent all instances of a group of similar
‘things’ is called an entity.
 An entity is something about which we want to store data.
Synonyms include entity type and entity class.
 An entity is a class of persons, places, objects, events, or
concepts about which we need to capture and store data.
 An entity instance is a single occurrence of an entity.
10
Data Modeling
System Concepts for Data Modeling

Attributes

STUDENT
Name
. Last Name
. First Name
. Middle Initial
Address
. Street Address
. City
. State or Province
. Country
. Postal Code
Phone Number
. Area Code
. Exchange Number
. Number Within Exchange
Date of Birth
Gender
Race
Major
Grade Point Average
Attributes and
compound attributes

The pieces of data that we want to store about each instance of a
given entity are called attributes.
 An attribute is a descriptive property or characteristic of an
entity. Synonyms include element, property, and field.
Some attributes can be logically grouped into super-attributes
called compound attributes.
 A compound attribute is one that actually consists of more
primitive attributes. Synonyms in different data modeling
languages are numerous: concatenated attribute, composite
attribute, and data structure.
11
Data Modeling
System Concepts for Data Modeling

Attributes

Domains:
 The values for each attribute are defined in terms of three
properties: data type, domain, and default.
• The data type for an attribute defines what class of data can be
stored in that attribute.
• For purposes of systems analysis and business requirements
definition, it is useful to declare logical (non-technical) data types
for our business attributes.
• An attribute’s data type determines its domain.
– The domain of an attribute defines what values an attribute can
legitimately take on.
• Every attribute should have a logical default value.
– The default value for an attribute is that value which will be
recorded if not specified by the user.
12
Data Modeling
Logical Data Type
Logical Business Meaning
NUMBER
Any number, real or integer
TEXT
A string of characters, inclusive of numbers. When numbers are
included in a TEXT attribute, it means we do not expect to
perform arithmetic or comparisons with those numbers.
MEMO
Same as TEXT but of an indeterminate size. Some business
systems require the ability to attach potentially lengthy note to a
give database record.
DATE
Any date in any format.
TIME
Any time in any format.
YES/NO
An attribute that can only assume one of these two values
VALUE SET
A finite set of values. In most cases, a coding scheme would be
established (e.g., FR=freshman, SO=sophomore, JR=junior,
SR=senior, etc.)
IMAGE
Any picture or image.
13
Data Modeling
Data Type
Domain
Examples
NUMBER
For integers, specify the range:
{10- 99}
{minimum - maximum}
For real numbers, specify the range and
{1.000 - 799.999}
precision:
{minimum.precision maximum.precision}
TEXT
TEXT (maximum size
of attribute)
TEXT (30)
Actual values are usually infinite;
however, users may specify certain
narrative restrictions.
MEMO
Not applicable. There are no restrictions
Not applicable.
on size or content.
DATE
Variation on the MMDDYYYY format. To
accommodate the year 2000, do not
MMDDYYYY
MMYYYY
abbreviate year to YY. Formatting
YYYY
characters are rarely stored; therefore, do
not include hyphens or slashes.
TIME
For AM/PM times:
HHMMT
- or -
HHMMT
HHMM
14
Data Modeling
Default Value
Interpretation
Examples
A legal value from the
For an instance of the attribute, if the user
0
domain (as described above) does not specify a value, then use this value.
1.00
FR
NONE
or NULL
For an instance of the attribute, if the user
does not specify a value, then leave it blank.
REQUIRED
or NOT NULL
For an instance of the attribute, require the
user to enter a legal value from the domain.
(This is used when no value in the domain is
common enough to be a default, but a some
value must be entered.)
15
NONE
NULL
REQUIRED
NOT NULL
Data Modeling
System Concepts for Data Modeling

Attributes

Identification:
 An entity typically has many instances; perhaps thousands or
millions and there exists a need to uniquely identify each
instance based on the data value of one or more attributes.
 Every entity must have an identifier or key.
• An key is an attribute, or a group of attributes, which assumes a
unique value for each entity instance. It is sometimes called an
identifier.

Sometimes more than one attribute is required to uniquely
identify an instance of an entity.
• A group of attributes that uniquely identifies an instance of an
entity is called a concatenated key. Synonyms include composite
key and compound key.
16
Data Modeling
System Concepts for Data Modeling

Attributes

Identification:
 Frequently, an entity may have more than one key.
 Each of these attributes is called a candidate key.
• A candidate key is a ‘candidate to become the primary identifier’
of instances of an entity. It is sometimes called a candidate
identifier. (Note: A candidate key may be a single attribute or a
concatenated key.)
• A primary key is that candidate key which will most commonly
be used to uniquely identify a single entity instance.
• Any candidate key that is not selected to become the primary key
is called an alternate key.
17
Data Modeling
System Concepts for Data Modeling

Attributes

STUDENT
Student Number (Primary Key 1)
Name (Alternate Key 1)
. Last Name
. First Name
. Middle Initial
Address
. Street Address
. City
. State or Province
. Country
. Postal Code
Phone Number
. Area Code
. Exchange Number
. Number Within Exchange
Date of Birth
Gender (Subsetting Criteria 1)
Race (Subsetting Criteria 2)
Major (Subsetting Criteria 3)
Grade Point Average
Identification:
 Sometimes, it is also necessary to identify a subset of entity
instances as opposed to a single instance.
• For example, we may require a simple way to identify all male
students, and all female students.
• A subsetting criteria is a attribute (or concatenated attribute)
whose finite values divide all entity instances into useful subsets.
Some methods call this an inversion entry.
Keys and submitting criteria
18
Data Modeling
System Concepts for Data Modeling

Relationships


Conceptually, entities and attributes do not exist in isolation.
Entities interact with, and impact one another via relationships to
support the business mission.
 A relationship is a natural business association that exists
between one or more entities. The relationship may represent an
event that links the entities, or merely a logical affinity that
exists between the entities.
 A connecting line between two entities on an ERD represents a
relationship.
 A verb phrase describes the relationship.
• All relationships are implicitly bidirectional, meaning that they can
interpreted in both directions.
19
Data Modeling
STUDENT
is being studied by
is enrolled in
20
CURRICULUM
Data Modeling
System Concepts for Data Modeling

Relationships

Cardinality:
 Each relationship on an ERD also depicts the complexity or
degree of each relationship and this is called cardinality.
• Cardinality defines the minimum and maximum number of
occurrences of one entity for a single occurrence of the related
entity. Because all relationships are bi-directional, cardinality must
be defined in both directions for every relationship.
21
Data Modeling
Cardinality
Interpretation
Minimum
Instances
Maximum
Instances
Exactly one
1
1
Zero or one
0
1
One or more
1
many ( > 1 )
Zero, one, or more
0
many ( > 1 )
>1
>1
More than one
Figure 5.3
22
Graphic Notation
Data Modeling
System Concepts for Data Modeling

Relationships

Degree:
 The degree of a relationship is the number of entities that
participate in the relationship.
• A binary relationship has a degree = 2, because two different
entities participated in the relationship.

Relationships may also exist between different instances of the
same entity.
• This is called a recursive relationship (sometimes called a unary
relationship; degree = 1).
23
Data Modeling
COURSE
Course Id (Primary Key)
. Subject Abbreviation
. Course Number
Course Title
Course Credit
is a prerequisite for
has as a prerequisite
24
Data Modeling
System Concepts for Data Modeling

Relationships

Degree: (continued)
 Relationships can also exist between more than two different
entities.
• These are sometimes called N-ary relationships.
• A relationship existing among three entities is called a 3-ary or
ternary relationship.
• An N-ary relationship maybe associated with an associative entity.
– An associative entity is an entity that inherits its primary key
from more than one other entity (parents). Each part of that
concatenated key points to one and only one instance of each
of the connecting entities.
25
Data Modeling
INSTRUCTOR
COURSE
Instructor ID Code (Primary
Key)
Instructor Name
. Last Name
. First Name
. Middle Initial
Course ID (Primary Key)
. Subject Abbreviation
. Course Number
Course Title
Credit
meets as
is assigned to
SCHEDULED CLASS
Scheduled Class ID (Primary Key)
. Course ID
. Instructor ID
. Room ID
Division Number
Days of Week
Start Time
End Time
is assigned to
ROOM
Classroom ID
. Building Abbreviation
. Room Number
Number of Seats
26
Data Modeling
System Concepts for Data Modeling

Relationships

Foreign Keys:
 A relationship implies that instances of one entity are related to
instances of another entity.
 To be able to identify those instances for any given entity, the
primary key of one entity must be migrated into the other entity
as a foreign key.
• A foreign key is a primary key of one entity that is contributed to
(duplicated in) another entity for the purpose of identifying
instances of a relationship. A foreign key (always in a child entity)
always matches the primary key (in a parent entity).
27
Data Modeling
CURRICULUM
Program of Study Code (Primary Key)
Title of Program
Type of Degree Awarded (Subsetting Criteria 1)
Department Number (Foreign Key)
offers
is offered by
28
DEPARTMENT
Department Number (Primary Key)
Department Name
Data Modeling
System Concepts for Data Modeling

Relationships

Foreign Keys: (continued)
 When you have a relationship that you cannot differentiate
between parent and child it is called a non-specific relationship.
• A non-specific relationship (or many-to-many relationship) is
one in which many instances of one entity are associated with
many instances of another entity. Such relationships are suitable
only for preliminary data models, and should be resolved as
quickly as possible.
• All non-specific relationships can be resolved into a pair of one-tomany relationships by inserting an associative entity between the
two original entities.
29
Data Modeling
STUDENT
Student Number (Primary Key 1)
Name (Alternate Key 1)
. Last Name
. First Name
. Middle Initial
Address
. Street Address
. City
. State or Province
. Country
. Postal Code
Phone Number
. Area Code
. Exchange Number
. Number Within Exchange
Date of Birth
Gender (Subsetting Criteria 1)
Race (Subsetting Criteria 2)
Grade Point Average
CURRICULUM
applies to
is enrolled in
Program of Study Code (Primary Key)
Title of Program
Type of Degree Awarded (Subsetting Criteria 1)
FIGURE(a)
STUDENT
Student Number (Primary Key 1)
Name (Alternate Key 1)
. Last Name
. First Name
. Middle Initial
Address
. Street Address
. City
. State or Province
. Country
. Postal Code
Phone Number
. Area Code
. Exchange Number
. Number Within Exchange
Date of Birth
Gender (Subsetting Criteria 1)
Race (Subsetting Criteria 2)
Grade Point Average
MAJOR
Major ID (Primary Key)
. Student Number (Foreign Key)
. Program of Study Code (Foriegn Key)
Date Enrolled
Current Candidate for Degree?
has declared
is being studied by
CURRICULUM
Program of Study Code (Primary Key)
Title of Program
Type of Degree Awarded (Subsetting Criteria 1)
FIGURE (b)
30
Data Modeling
System Concepts for Data Modeling

Relationships

Generalization:
 Generalization is an approach that seeks to discover and exploit
the commonalties between entities.
• Generalization is a technique wherein the attributes that are
common to several types of an entity are grouped into their own
entity, called a supertype.
• An entity supertype is an entity whose instances store attributes
that are common to one or more entity subtypes.
– The entity supertype will have one or more one-to-one
relationships to entity subtypes. These relationships are
sometimes called IS A relationships (or WAS A, or COULD
BE A) because each instance of the supertype ‘is also an’
instance of one or more subtypes.
31
Data Modeling
System Concepts for Data Modeling

Relationships

Generalization: (continued)
• An entity subtype is an entity whose instances inherit some
common attributes from an entity supertype, and then add other
attributes that are unique to an instances of the subtype.
An entity can be both a supertype and subtype.
 Through inheritance, the concept of generalization in data
models permits the the reduction of the number of attributes
through the careful sharing of common attributes.

• The subtypes not only inherit the attributes, but also the data types,
domains, and defaults of those attributes.
• In addition to inheriting attributes, subtypes also inherit
relationships to other entities.
32
Data Modeling
PERSON
Personal ID Number (Primary Key)
Name
. Last Name
. First Name
. Middle Initial
Gender (Subsetting Criteria 1)
Race (Subsetting Criteria 2)
Marital Status (Subsetting Criteria 3)
is a
can be
contacted
at
ADDRESS
is a
EMPLOYEE
STUDENT
Personal ID Number = Student Number (Primary
Key)
all attributes from PERSON
Personal ID Number = Social Security Number (Primary
Key)
all attributes from PERSON plus
Pension Plan Code
Life Insurance Plan Code
Medical Insurance Plan Code
Vacation Days Accumulated
Sick Days Acculumlated
is bound by
PROSPECT
is a
all attributes from PERSON and STUDENT plus
First Contact Date
Last Contact Date
Has Visited Campus?
CURRENT STUDENT
is a
all attributes from PERSON and STUDENT plus
Number of Credits Earned
Grade Point Average
Encumberance Status
Financial Aid Eligibility Status
FORMER STUDENT
could be a
all attributes from PERSON and STUDENT plus
Reason for Withdrawal
Plans to Return?
ALUMNUS
could be a
all attributes from PERSON and STUDENT plus
Member of Alumni Association?
Job in Field of Study?
Last Known Salary
33
has earned
AWARDED
DEGREE
CONTRACT
Data Modeling
The Process of Logical Data Modeling

Strategic Data Modeling


Many organizations select application development projects based
on strategic information system plans.
Strategic planning is a separate project.
 This project produces an information systems strategy plan that
defines an overall vision and architecture for information
systems.
• Almost always, the architecture includes an enterprise data
model.
34
Data Modeling
The Process of Logical Data Modeling

Strategic Data Modeling



An enterprise data model typically identifies only the most
fundamental of entities.
 The entities are typically defined (as in a dictionary) but they
are not described in terms of keys or attributes.
The enterprise data model may or may not include relationships
(depending on the planning methodology’s standards and the level
of detail desired by executive management).
 If relationships are included, many of them will be non-specific.
The enterprise data model is usually stored in a corporate
repository.
35
Data Modeling
The Process of Logical Data Modeling

Data Modeling During Systems Analysis





The data model for a single system or application is usually called
an application data model.
Logical data models have a DATA focus and a SYSTEM USER
perspective.
Logical data models are typically constructed as deliverables of
the study and definition phases of a project.
Logical data models are not concerned with implementation
details or technology, they may be constructed (through reverse
engineering) from existing databases.
Data models are rarely constructed during the survey phase of
systems analysis.
36
Data Modeling
INFORMATION SYSTEMS FRAMEWORK
FOCUS ON
SYSTEM
DATA
FOCUS ON
SYSTEM
PROCESSES
FOCUS ON
SYSTEM
INTERFACES
FOCUS ON
SYSTEM
GEOGRAPHY
Business Subjects
SYSTEM
OWNERS
(scope)
Survey Phase
(establish scope
and project plan)
Custom ers order zero,
one, or m ore products.
Products m ay be ordered
by zero, one, or m ore
custom ers.
Study Phase
(establish
system
improvement
objectives)
entities and definitions
Data Requirements
S
Y
S
T
E
M
A
N
A
L
Y
S
T
S
SYSTEM
USERS
(requirements)
PRODUCT
product-no
product-name
unit-of-measure
unit-price
quantity-available
CUSTOMER
customer-no
customer-name
customer-rating
balance-due
ORDER
order-no
order-date
products-ordered
quantities-ordered
Definition Phase
(establish and
prioritize
business system
requirements)
data models
SYSTEM
DESIGNERS
(specification)
Reverse
Engineering
(optional)
SYSTEM
BUILDERS
(components)
Existing
Databases
and
Technology
FAST
Methodology
Existing
Interfaces
and
Technology
Existing
Applications
and
Technology
37
Existing
Networks
and
Technology
Data Modeling
The Process of Logical Data Modeling

Data Modeling During Systems Analysis

Data modeling is rarely associated with the study phase of systems
analysis. Most analysts prefer to draw process models to document
the current system.
 Many analysts report that data models are far superior for the
following reasons:
• Data models help analysts to quickly identify business vocabulary
more completely than process models.
• Data models are almost always built more quickly than process
models.
• A complete data model can be fit on a single sheet of paper.
Process models often require dozens of sheets of paper.
• Process modelers too easily get hung up on unnecessary detail.
38
Data Modeling
The Process of Logical Data Modeling

Data Modeling During Systems Analysis

Many analysts report that data models are far superior for the
following reasons: (continued)
• Data models for existing and proposed systems are far more
similar than process models for existing and proposed systems.
Consequently, there is less work to throw away as you move into
later phases.

A study phase model should include only entities relationships, but
no attributes – a context data model.
 The intent is to refine the understanding of scope; not to get
into details about the entities and business rules.
39
Data Modeling
The Process of Logical Data Modeling

Data Modeling During Systems Analysis

The definition phase data model will be constructed in at least two
stages:
1 A key-based data model will be drawn.
• This model will eliminate non-specific relationships, add
associative entities, include primary, alternate keys, and foreign
keys, plus precise cardinalities and any generalization hierarchies.
2
A fully attributed data model will be constructed.
• The fully attributed model includes all remaining descriptive
attributes and subsetting criteria.
– Each attribute is defined in the repository with data types,
domains, and defaults.

The completed data model represents all of the business
requirements for a system’s database.
40
Data Modeling
The Process of Logical Data Modeling

Looking Ahead to Systems Configuration and Design



The logical data model from systems analysis describes business
data requirements, not technical solutions.
The purpose of the configuration phase is to determine the best
way to implement those requirements with database technology.
During system design, the logical data model will be transformed
into a physical data model (called a database schema) for the
chosen database management system.
 This model will reflect the technical capabilities and limitations
of that database technology, as well as the performance tuning
requirements suggested by the database administrator.
 The physical data model will also be analyzed for adaptability
and flexibility through a process called normalization.
41
Data Modeling
The Process of Logical Data Modeling

Fact-Finding and Information Gathering for Data
Modeling

Data models cannot be constructed without appropriate facts and
information as supplied by the user community.
 These facts can be collected by a number of techniques such as
sampling of existing forms and files; research of similar
systems; surveys of users and management; and interviews of
users and management.
 The fastest method of collecting facts and information, and
simultaneously constructing and verifying the data models is
Joint Application Development (JAD).
42
Data Modeling
Purpose
Discover the system
entities
Discover the entity keys
Discover entity subsetting
criteria
Discover attributes and
domains
Discover security and
control needs
Discover data timing
needs
Discover generalization
hierarchies
Discover relationships
and degrees
Discover cardinalities
Candidate Questions
What are the subjects of the business? In other words, what
types of persons, organizations, organizational units, places,
things, materials, or events are used in, or interact with this
system, about which data must be captured or maintained?
How many instances of each subject exist?
What unique characteristic (or characteristics) distinguishes an
instance of each subject from other instances of the same
subject? Are there any plans to change this identification
scheme in the future?
Are there any characteristics of a subject that divide all
instances of the subject into useful subsets? Are there any
subsets of the above subjects for which you have no convenient
way to group instances?
What characteristics describe each subject? For each of these
characteristics: (1) what type of data is stored? (2) who is
responsible for defining legitimate values for the data? (3) what
are the legitimate values for the data? (4) is a value required?
and (5) is there any default value that should be assigned if you
don’t specify otherwise?
Are there any restrictions on who can see or use the data? Who
is allowed to create the data? Who is allowed to update the
data? Who is allowed to delete the data?
How often does the data change? Over what period of time is
the data of value to the business? How long should we keep the
data? Do you need historical data or trends? If a characteristic
changes, must you know the former values?
Are all instances of each subject the same? That is, are there
special types of each subject that are described or handled
differently? Can any of the data be consolidated for sharing?
What events occur that imply associations between subjects?
What business activities or transactions require involve
handling or changing data about several different subjects of the
same or a different type?
Is each business activity or event handled the same way or are
there special circumstances? Can an event occur with only
some of the associated subjects, or must all the subjects be
involved?
43
Data Modeling
The Process of Logical Data Modeling

Computer-Aided Systems Engineering (CASE) for
Data Modeling


Data models are stored in the repository.
 In a sense, the data model is metadata – that is, data about the
business’ data.
Computer-aided systems engineering (CASE) technology, provides
the repository for storing the data model and its detailed
descriptions.
44
Data Modeling
The Process of Logical Data Modeling

Computer-Aided Systems Engineering (CASE) for
Data Modeling

Using a CASE product, you can easily create professional,
readable data models without the use of paper, pencil, erasers, and
templates.
 The models can be easily modified to reflect corrections and
changes suggested by end-users.
 Most CASE products provide powerful analytical tools that can
check your models for mechanical errors, completeness, and
consistency.
45
Data Modeling
The Process of Logical Data Modeling

Computer-Aided Systems Engineering (CASE) for
Data Modeling

Not all data model conventions are supported by all CASE
products.
 It is very likely that any given CASE product may force the
company to adapt their methodology’s data modeling symbols
or approach so that it is workable within the limitations of their
CASE tool.
46
Data Modeling
How to Construct Data Models

1st Step - Entity Discovery


The first task in data modeling is to discover those fundamental
entities in the system that are or might be described by data.
There are several techniques that may be used to identify entities.
 During interviews or JAD sessions with system owners and
users, pay attention to key words in their discussion.
 During interviews or JAD sessions, specifically ask the system
owners and users to identify things about which they would like
to capture, store, and produce information.
 Study existing forms and files.
 Some CASE tools can reverse engineer existing files and
databases into physical data models.
47
Data Modeling
How to Construct Data Models

1st Step - Entity Discovery



A true entity has multiple instances—dozens, hundreds, thousands,
or more!
Entities should be named with nouns that describe the person,
event, place, or tangible thing about which we want to store data.
 Try not to abbreviate or use acronyms.
 Names should be singular so as to distinguish the logical
concept of the entity from the actual instances of the entity.
Define each entity in business terms.
 Don’t define the entity in technical terms, and don’t define it as
‘data about …’.
 Your entity names and definitions should establish an initial
glossary of business terminology that will serve both you and
future analysts and users for years to come.
48
Data Modeling
Entity Name
Business Definition
AGREEMENT
A contract whereby a member agrees to purchase a certain number of
products within a certain time. After fulfilling that agreement, the
member becomes eligible for bonus credits that are redeemable for free
or discounted products.
Note: A major system improvement objective is to make agreements
more flexible with respect to other clubs. Currently, only purchases
within the club that issued an agreement count toward credits. Another
system improvement objective would award bonus credits for each
purchase leading up to fulfillment of the agreement, with accelerated
bonuses after fulfillment of the agreement.
A SoundStage membership group to which members can belong. Clubs
tend to be organized according to product interests such as music versus
movies versus games; or specialized media interests such as Digital
Video Disks (DVD) or Nintendo.
Note: Cross-club interaction is a desired objective for the new system.
An active member of one or more clubs.
Note: A target system objective is to re-enroll inactive members as
opposed to deleting them.
An order generated for a member as part of a monthly promotion, or an
order initiated by a member.
Note: The current system only supports orders generated from
promotions; however, customer initiated orders have been given a high
priority as an added option in the proposed system.
An inventoried product available for promotion and sale to members.
Note: System improvement objectives include (1) compatibility with
new bar code system being developed for the warehouse, and (2)
adaptability to a rapidly changing mix of products.
A monthly or quarterly event whereby dated orders are generated for all
members in a club. Members then have some period of time to cancel or
accelerate fulfillment of that order, after which the order is automatically
filled.
CLUB
MEMBER
MEMBER ORDER
PRODUCT
PROMOTION
49
Data Modeling
How to Construct Data Models

2nd Step - The Context Data Model

The second task in data modeling is to construct the context data
model.
 The context data model includes the fundamental or
independent entities that were previously discovered.
• An independent entity is one which exists regardless of the
existence of any other entity. Its primary key contain no attributes
that would make it dependent on the existence of another entity.
• Independent entities are almost always the first entities discovered
in your conversations with the users.

Relationships should be named with verb phrases that, when
combined with the entity names, form simple business
sentences or assertions.
• Always name the relationship from parent-to-child.
50
Data Modeling
responds to
MEMBER ORDER
Comment
places
MEMBER
Comment
binds
sells
PRODUCT
Comment
is featured in
belongs to
generates
AGREEMENT
Comment
PROMOTION
Comment
sponsors
51
CLUB
Comment
establishes
Data Modeling
How to Construct Data Models

3rd Step - The Key-Based Data Model


The third task is to identify the keys of each entity.
The following guidelines are suggested for keys:
 The value of a key should not change over the lifetime of each
entity instance.
 The value of a key cannot be null.
 Controls must be installed to ensure that the value of a key is
valid.
52
Data Modeling
How to Construct Data Models

3rd Step - The Key-Based Data Model

The following guidelines are suggested for keys: (continued)
 Some experts suggest that you avoid intelligent keys because
the key may change over the lifetime of the entity instance.
• An intelligent key is a business code whose structure
communicates data about an entity instance (such as its
classification, size, or other properties).
• A code is a group of characters and/or digits that identifies and
describes something in the business system.

Other experts suggest that you avoid intelligent keys because
business codes can return value to the organization because
they can be quickly processed by humans without the assistance
of a computer.
53
Data Modeling
How to Construct Data Models

3rd Step - The Key-Based Data Model

The following guidelines are suggested for keys: (continued)
 Consider inventing a surrogate key instead to substitute for
large concatenated keys of independent entities.
• This suggestion is not practical for associative entities since
because each part of the concatenated key is a foreign key that
must precisely match its parent entity’s primary key.

If you cannot define keys for an entity, it may be that the entity
doesn’t really exist—that is, multiple occurrences of the so-called
entity do not exist.
54
Data Modeling
How to Construct Data Models

3rd Step - The Key-Based Data Model

Business Codes
 There are several types of codes and they can be combined to
form effective means for entity instance identification.
• Serial codes assign sequentially generated numbers to entity
instances.
– Many database management systems can generate and
constrain serial codes to a business’ requirements.
• Block codes are similar to serial codes except that serial numbers
are divided into groups that have some business meaning.
• Alphabetic codes use finite combinations of letters (and possibly
numbers) to describe entity instances.
– Alphabetic codes must usually be combined with serial or
block codes in order to uniquely identify instances of most
entities.
55
Data Modeling
How to Construct Data Models

3rd Step - The Key-Based Data Model

Business Codes
 There are several types of codes and they can be combined to
form effective means for entity instance identification.
(continued)
• In significant position codes, each digit or group of digits
describes a measurable or identifiable characteristic of the entity
instance.
– Significant digit codes are frequently used to code inventory
items.
• Hierarchical codes provide a top-down interpretation for an entity
instance.
– Every item coded is factored into groups, subgroups, and so
forth.
56
Data Modeling
How to Construct Data Models

3rd Step - The Key-Based Data Model

Business Codes
 The following guidelines are suggested when creating a
business coding scheme:
• Codes should be expandable to accommodate growth.
• The full code must result in a unique value for each entity instance.
• Codes should be large enough to describe the distinguishing
characteristics, but small enough to be interpreted by people
without a computer.
• Codes should be convenient. A new instance should be easy to
create.
57
Data Modeling
responds to
sells
PRODUCT ON ORDER
Key Data
Order-Number [PK 1] [FK ]
Member-Number [PK2] [FK]
Product-Number [PK3] [FK]
Universal-Product-Code [PK 4] [FK ]
ME MBER ORDE R
Key Data
Or der -Number [P K1]
Member-Number [PK2] [FK]
ME MBER
Key Data
Member-Number [PK1]
places
enroll s in
CLUB ME MB ERSHIP
Key Data
Member-Number [PK2] [FK]
Cl ub-Name [P K3] [FK ]
gener ates
bi nds
AGRE EMENT
Key Data
Club-Name [P K2] [FK]
Agreement-Number [PK1]
sold as
PRODUCT
Key Data
Pr oduct-Number [PK1]
Univ ersal-P roduct-Code [P K2]
is featured in
spons ors
PROMOTION
Key Data
Pr oduct-Number [PK2] [FK]
Club-Name [P K1] [FK]
Univ ersal-P roduct-Code [P K3] [FK]
sponsors
58
CLUB
Key Data
Club-Name [P K1]
establishes
Data Modeling
How to Construct Data Models

4th Step - Generalized Hierarchies

At this time, it would be useful to identify any generalization
hierarchies in a business problem.
59
Data Modeling
responds to
sells
MEMBER ORDER
Key Data
Order-Number [PK1]
MEMBER
Key Data
Member-Number [PK1]
placed
PRODUCT ON AN ORDER
Key Data
Order-Number [PK1] [FK]
Produc t-Number [PK2] [FK]
Universal-Product-Code [PK3] [FK]
CLUB MEMBERSHIP
Key Data
Club-Name [PK1] [FK]
Member-Number [PK2] [FK]
Agreement-Number [PK3] [FK]
enrolls in
generates
sold as
binds
PRODUCT
Key Data
Produc t-Number [PK1]
Universal-Product-Code [PK2]
sponsors
is a
AGREEMENT
Key Data
Club-Name [PK2] [FK]
Agreement-Number [PK1]
MERCHANDISE
Key Data
Produc t-Number [PK1] [FK]
Universal-Product-Code [PK2] [FK]
TITLE
Key Data
Produc t-Number [PK1] [FK]
Universal-Product-Code [PK2] [FK]
PROMOTION
Key Data
Club-Name [PK1] [FK]
generates
is a
AUDIO TITLE
Key Data
Produc t-Number [PK1] [FK]
Universal-Product-Code [PK2] [FK]
VIDEO TITLE
Key Data
Produc t-Number [PK1] [FK]
Universal-Product-Code [PK2] [FK]
GAME TITLE
Key Data
Produc t-Number [PK1] [FK]
Universal-Product-Code [PK2] [FK]
60
sponsors
establishes
CLUB
Key Data
Club-Name [PK1]
Data Modeling
How to Construct Data Models

5th Step - The Fully Attributed Data Model

The fifth task is to identify the remaining data attributes.
 The following guidelines are offered for attribution.
• Many organizations have naming standards and approved
abbreviations.
– The data or repository administrator usually maintains such
standards.
• Many attributes share common base names such as NAME,
ADDRESS, DATE.
– Unless the attributes can be generalized into a supertype, it is
best to give each variation a unique name such as:
CUSTOMER NAME vs SUPPLIER NAME
– Names must be distinguishable across projects.
• Logical attribute names should not be abbreviated.
61
Data Modeling
How to Construct Data Models

5th Step - The Fully Attributed Data Model

The following guidelines are offered for attribution. (continued)
• For attributes that have only YES or NO values, name as
questions.
– For example, CANDIDATE FOR A DEGREE?
• Each attribute should be mapped to only one entity.
– Foreign keys are the exception – they identify associated
instances of related entities.
• An attribute’s domain should not be based on logic.
62
Data Modeling
PRODUCT ON AN ORDER
Key Data
Order-Number [PK1] [FK]
Product-Number [PK2] [FK]
Universal-Product-Code [PK3] [FK]
Non-Key Data
Quantity-Ordered
Quantity-Shipped
Quantity-Backordered
Purchase-Unit-Price
Credits-Earned
sells
sold as
PRODUCT
Key Data
Product-Number [PK1]
Universal-Product-Code [PK2]
Non-Key Data
Product-Quantity -in-Stock
Product-Type
Manf-Suggested-Price
Club-Default-Price
Special-Price
Units-Sold-Month-to-Date
Units-Sold-Year-to-Date
Units-Sold-Lifetime
MEMBER ORDER
Key Data
Order-Number [PK1]
Non-Key Data
Order-Creation-Date
Order-Fill-Date
Shipping-Addres s-Name
Shipping-Street-Address
Shipping-City
Shipping-State
Shipping-Zip
Shipping-Ins tructions
Order-Sub-Total
Order-Sales -Tax
Order-Shipping-Method
Order-Shipping-&-Handling-Cost
Order-Status
Order-Prepaid-Amount
Order-Prepayment-Method
Member-Number [FK]
Club-Name [FK]
Promotion-Number
responds to
placed
generates
MEMBER
Key Data
Member-Number [PK1]
Non-Key Data
Member-Name
. Last-Name
. First-Name
. Middle-Initial
Member-Status
Member-Street-Addres s
Member-Post-Office-Box
Member-City
Member-State
Member-Zip-Code
Member-Daytime-Phone-Number
. Area-Code
. Phone-Number
. Extension ()
Member-Date-of-Last-Order
Member-Balance
Member-Credit-Card-Type
Member-Credit-Card-Number
Member-Credit-Card-Expire-Date
Member-Bonus-Balance
enrolls in
binds
spons ors
AGREEMENT
Key Data
Club-Name [PK2] [FK]
Agreement-Number [PK1]
Non-Key Data
Agreement-Active-Date
Agreement-Expire-Date
Fulfillment-Period
establishes
Required-Number-of-Credits
is a
MERCHANDISE
Key Data
Product-Number [PK1] [FK]
Universal-Product-Code [PK2] [FK]
Non-Key Data
Merchandise-Name
Merchandise-Desc ription
Merchadise-Type
Unit-of-Measure
TITLE
Key Data
Product-Number [PK1] [FK]
Universal-Product-Code [PK2] [FK]
Non-Key Data
Title-of-Work
Title-Cover
Catalog-Description
Copyright-Date
Entertainment-Category
Credit-Value
PROMOTION
Key Data
Club-Name [PK1] [FK]
Non-Key Data
Promotion-Number
Promotion-Release-Date
Promotion-Status
Promotion-Type
Automatic -Fill-Delay
Product-Number [FK]
Universal-Product-Code [FK]
generates
is a
AUDIO TITLE
Key Data
Product-Number [PK1] [FK]
Universal-Product-Code [PK2] [FK]
Non-Key Data
Artist
Audio-Category
Audio-Sub-Category
Number-of-Units -in-Package
Audio-Media-Code
Content-Adv isory-Code
CLUB MEMBERSHIP
Key Data
Club-Name [PK1] [FK]
Member-Number [PK2] [FK]
Agreement-Number [PK3] [FK]
Non-Key Data
Date-Enrolled
Expiration-Date
Number-of-Credits-Required
Number-of-Credits-Earned
VIDEO TITLE
Key Data
Product-Number [PK1] [FK]
Universal-Product-Code [PK2] [FK]
Non-Key Data
Producer
Director
Video-Category
Video-Sub-Category
Closed-Captioned
Language
Running-Time
Video-Media-Type
Video-Encoding
Screen-Aspect
MPA-Rating-Code
GAME TITLE
Key Data
Product-Number [PK1] [FK]
Universal-Product-Code [PK2] [FK]
Non-Key Data
Manufacturer
Game-Category
Game-Sub-Category
Game-Platform
Game-Media-Type
Number-of-Players
Parent-Advisory-Code
63
spons ors
CLUB
Key Data
Club-Name [PK1]
Non-Key Data
Club-Description
Club-Charter-Date
Data Modeling
How to Construct Data Models

6th Step - The Fully Described Model

The last task is to fully describe the data model.
 This task is the most time consuming.
 This task can be started in parallel with the key-based model or
fully attributed model, but it is usually the last data modeling
task completed.
 At this time the descriptions for the attributes are still
incomplete – they require domains.
• Most CASE tools provide extensive facilities for describing the
data types, domains, and defaults for all attributes to the repository.
64
Data Modeling
How to Construct Data Models

6th Step - The Fully Described Model

Additional descriptive properties may be recorded for attributes
such as:
• Who should be able to create, delete, update, and access each
attribute?
• How long should each attribute (or entity) be kept before the data
is deleted or archived?
65
Data Modeling
The Next Generation

Data modeling should remain a value-added skill for many
years.

The demand for data modeling as a skill is dependent on two
factors:
 (1) the need for databases, and
 (2) the use of relational database management system
technology to implement those databases.
• There is some belief that relational database technology will
eventually be replaced by object technology.
• If that were to happen, data modeling would be replaced by object
modeling techniques.
• Even as object database technology becomes available, we expect
the relational database industry to add object features and
technologies to their product lines.
66
Data Modeling
The Next Generation

CASE technology will continue to improve.


Today’s better CASE tools provide a two-way synchronization
between the logical data models and their database designs.
This synchronization will likely extend as CASE vendors enable
their tools to directly communicate and interoperate with database
management systems and working databases.
67
Data Modeling
Summary






Introduction
An Introduction to Systems Modeling
System Concepts for Data Modeling
The Process of Logical Data Modeling
How to Construct Data Models
The Next Generation
68