Software Project Management
Download
Report
Transcript Software Project Management
Introduction to Information
Systems Analysis
Data, Process, and Network Modeling
INFO 503
Glenn Booker
INFO 503
Lecture #4
1
Data Modeling
• Data modeling (or database or information
modeling) is a way of organizing and
describing the data in a system
• It is a logical model to describe the specific
data fields (elements) we wish to capture,
and how they are related to each other
INFO 503
Lecture #4
2
Where to start?
• Data modeling starts with thinking about
the things involved in your system
• These things are formally called “entities” –
nouns, if you will
• Start by identifying all of the places,
people, events, and ideas which are
affected by your system
INFO 503
Lecture #4
3
Permanent vs. Transient Data
• A key for relational data modeling is that
we are primarily concerned with data we
need to keep permanently
• Data which is only needed briefly isn’t
modeled in an ERD
– Major difference between relational and
object-oriented analysis
INFO 503
Lecture #4
4
Characterize Entities
• Then examine each entity and determine the
attributes which you are interested in – what
do you need to know in order to describe
one such entity meaningfully?
• Consider if some attributes can be readily
grouped together, thereby forming
compound attributes (e.g. name)
INFO 503
Lecture #4
5
Characterize Entities
• Entities are generally one of two types:
– A set of data you want to keep permanently
(customer orders, product information, etc.), or
– A lookup list or table (types of status codes,
shipping rates, tax rates, etc.)
• Data which is transient is generally kept in
local variables, and doesn’t appear in an
ERD (e.g. change of address info)
INFO 503
Lecture #4
6
Keep it or not?
• In trying to decide if data needs to be kept,
consider whether someone might want
to analyze that data in the future
• For examples, to look for sales patterns,
trace relocation history, keep record of
data changes (who modified what data
and when?)
• When in doubt, keep it for now
INFO 503
Lecture #4
7
Characterize Attributes
• For each attribute, define its data type:
–
–
–
–
–
–
INFO 503
Text (“Fred”) [and the character set (Latin)]
Number (real (3.56) or integer (124))
Date and/or time
Yes/No (a.k.a. T/F, binary, or Boolean)
A fixed set of possible values (e.g. grades)
Multimedia: photos, drawings, movies, sounds
Lecture #4
8
Relevant Data Type Standards
• Character sets
– ISO/IEC 8859
– Unicode
• Representation of dates and times
– ISO 8601
INFO 503
Lecture #4
9
Characterize Attributes
• Identify the domain of each attribute – the
range of allowable values
• Determine if there is a default value for
each attribute
• Is each attribute mandatory (required) for
each entity? (Avoid many mandatory fields)
• Is an attribute uniquely suited to be a key?
INFO 503
Lecture #4
10
Key Attributes
• An attribute or group of attributes may be
a unique identifier, or key, for each entity
– Examples are Social Security Number,
driver’s license number, ISBN, Student ID
• If a group of attributes is used, it is
a concatenated (a.k.a composite or
compound) key
INFO 503
Lecture #4
11
Many Keys Possible
• There might be more than one key for
an entity
• Each possible key is called a candidate key
• One candidate key is selected primary key
• All others are alternate keys
– Example: the electric company may use a
customer ID or account # as primary key, and
your phone number as an alternate key
INFO 503
Lecture #4
12
Primary Key may be Meaningless
• A primary key may correspond to some
important piece of information
– SSN, student ID, ISBN, etc.
• Or it may be completely meaningless
– A sequential number, called Order_ID
• As long as the primary key is unique
for every record, either is acceptable
INFO 503
Lecture #4
13
Relationships
• Entities affect each other by means
of relationships
• Relationships are described by a verb
phrase, e.g. “is a member of”, “is part of”,
“is a prerequisite for”, etc.
• A different verb phrase may be used for
each direction between two entities, “is
enrolled in” versus “is being studied by”
INFO 503
Lecture #4
14
p. 299
(180)
Cardinality and Relationships
Here we are using the Martin notation; many others exist
• Relationships are described by how many
records of each entity may be related: 0
(shown by a ‘0’), 1 (shown with a single or
double line), or many (shown by a trident)
• Cardinality of zero means the relationship
is optional in that direction
• One-to-one is a unique relationship
INFO 503
Lecture #4
15
Cardinality and Relationships
• Cardinality conveys the minimum and maximum
number of relationships, and must be defined in
both directions for all relationships:
– Only one
– Zero or one
– One or many (more)
– Zero, (one), or many
– Many (only >1)
INFO 503
Lecture #4
16
Cardinality and Relationships
• To determine cardinality, ask “for one
record in A, how many possible records
could exist in B?”
A
B
• Consider extreme cases; a Customer may
have no Orders briefly, before their first
order is completed
INFO 503
Lecture #4
17
Degree of Relationships
• The degree of a relationship is the number
of entities involved
• Most relationships are binary (two entities)
• Recursive (unary) relationships involve one
entity, e.g. list of employees and managers
• N-ary (3-ary, or ternary) relationships
involve more than two entities
INFO 503
Lecture #4
18
Foreign Keys
• A foreign key (FK) is an attribute which
exists, in an entity other than where it is a
primary key (PK), to establish the
relationship between the two entities
– Primary key must be unique for each record,
but a foreign key value may appear many times
– Only one PK-FK connection is required for the
relationship to exist
INFO 503
Lecture #4
19
Other Relationships
– Entity with FK generally has a PK of its own
• A PK may also be a FK
– Especially for 1:1 relationships or
when generalization is used
• An associative entity builds a concatenated
primary key from more than one entity
– Uses a diamond shape inside the normal box
to show its special nature
p. 301 (182)
INFO 503
Lecture #4
20
Other Relationships
• A many-to-many (non-specific) relationship
implies a lot of one-to-many relationships
– Often use an associative entity to bridge
between them
• An identifying relationship is when a parent
entity’s PK is used as part of the PK for a
child entity
– Child entity is then considered “weak”
because it depends on the parent
INFO 503
Lecture #4
21
Supertype
• A supertype is the result of generalizing
similar characteristics of several entities
– E.g. Students and Faculty are both People
– Also used as basis for object modeling
– Also known as an “is a”, “was a”, or “could
be a” relationship
– Uses one-to-one relationships
INFO 503
Lecture #4
22
Subtype
• The subtype inherits some characteristics
from the supertype, and adds other specific
characteristics (attributes) to each entity
• The same entity can be both supertype and
subtype from different perspectives
– Kind of like you could be a child and a parent
at the same time
INFO 503
Lecture #4
23
Data Modeling Process
• Data models evolve throughout the life
of the system
• An organization may plan on a large scale
using strategic data modeling to create an
enterprise data model
• This is refined for each system with an
application data model
INFO 503
Lecture #4
24
Data Modeling Process
• To start the model, look for nouns which
are frequently used during fact finding;
consider each a possible entity
• Note that entities should each appear lots
of times; if it’s rarer than that, it may not
be an entity
• Give entities a singular name, not plural
– Customer, not Customers
INFO 503
Lecture #4
25
Data Modeling Process
• Independent entities exist without any other
entities, and are often found first
• Don’t be afraid to reconsider the structure
of each entity, or remove useless ones
– This is an iterative process!
• Then name each relationship and define
its cardinality
INFO 503
Lecture #4
26
Data Modeling Process
• Identify keys for each entity; keep them as
simple as possible (PK, FK)
• Look for supertypes and subtypes
• Describe all data elements for each entity
– Identify what type of data they will contain
– Identify default values and whether they
are mandatory
INFO 503
Lecture #4
27
Data Modeling Process
• The bottom line for keys is:
–
–
–
–
–
Each entity must have at least one PK
Alternate keys are completely optional
Each entity may have from zero to many FK’s
Each FK is a PK in another, related entity
Only one PK-FK relationship is needed to
relate two entities
– Some keys are not inherently meaningful data
INFO 503
Lecture #4
28
Data Normalization
• Analysis of a data model for
implementation is done using
data normalization
– Normalization organizes data attributes to
form simple, non-redundant, flexible,
adaptive entities
• There are five levels of data normalization,
of which three are generally used
INFO 503
Lecture #4
29
First Normal Form (1NF)
• An entity is in first normal form if there are
no attributes which can have more than one
value for each instance (record) of the entity
• Attributes which could have more than one
value for a given entity belong to a different
kind of entity
• In other words, every attribute appears only
once for each record
INFO 503
Lecture #4
30
Second Normal Form (2NF)
Look at concatenated keys only!
• Must be first normal form, and:
• Each non-primary-key attribute is uniquely
determined by the entire primary key
• Non-primary-key attributes may not be
dependent on only part of the primary key
– If any are, move them to another table which
uses only that part of the primary key
INFO 503
Lecture #4
31
Third Normal Form (3NF)
• Must be second normal form, and:
• The value of each non-primary-key attribute
is not dependent upon any other
non-primary-key attribute
– Everything depends only on the primary key
• The two ways to look for this are derived
attributes and transitive dependencies...
INFO 503
Lecture #4
32
Third Normal Form (3NF)
• Derived attributes (data) are fields
calculated or logically derived from
other fields
– Exception: OK to keep attribute if multiple
entities are involved in deriving an attribute
• Transitive dependencies may exist for
non-concatenated keyed tables; is when
a non-key attribute depends on another
non-key attribute
INFO 503
Lecture #4
33
Third Normal Form (3NF)
• Or in brief, for third normal form…
An entity is in third normal form if every
non-primary key attribute is dependent on
the primary key, the whole primary key, and
nothing but the primary key
(as in, “Do you swear to tell the truth…”)
INFO 503
Lecture #4
34
Further Normalization
• Additional improvement in data structure
is possible through “Simplification by
Inspection” - look for other redundancies
or simplifications possible
• Many CASE tools can also inspect for first
level normalization, but generally no further
• Just for the record, here are the 4th and 5th
normal forms…
INFO 503
Lecture #4
35
Fourth and Fifth Normal Forms
INFO 605 text, pp. 351-354
• Fourth normal form (4NF) involves
removing multivalued dependencies
– If a pair of records has two matching attributes,
decompose the data structure to remove that
• Fifth normal form (5NF) involves removing
join dependencies (nearly impossible to do)
– This is when business rules define a connection
among many entities (e.g. if you replace a tire,
you must also replace the valve stem)
INFO 503
Lecture #4
36
Process Modeling
• Process modeling describes the way data
flows throughout an organization or system
• A context diagram is a special process
model which shows interfaces
• Data flow diagrams (DFDs) (a.k.a. bubble
chart or transformation graph) are the most
common process model
INFO 503
Lecture #4
37
Data Flow Diagrams
p. 346 (213)
• Notation has three shapes
– Processes are in rounded-corner rectangles
– External systems and users are in squares
– Open-ended boxes are data storage
files (may be more general than a single entity)
• Arrows show how data flows from one
shape to another
This is the Gane and Sarson notation
INFO 503
Lecture #4
38
DFD is not a Program Flowchart
• Data Flow Diagram
• Program Flowchart
– Abstract
– Can have parallel
(simultaneous)
activities
– Shows all possible
paths of data
– Has no time scale, no
decisions or logic
INFO 503
– Precise
– Shows one activity
at a time
– Must show loops and
branches (decisions)
– Often must recognize
time dependencies
Lecture #4
39
Data Flow Diagrams
• Popular for supporting BPR
• Processes respond to business events
and conditions
• Processes transform data into information
• A system embodies a set of processes
INFO 503
Lecture #4
40
Rules for Data Flow Diagrams
• A user or external system can only connect
to one or more process boxes
• Each process will connect to at least one
user or external system, and one data store
– Each process may send data to a data store,
and/or get data from a data store
– Processes rarely connect to other processes
– Each process needs data flowing in and out of it
INFO 503
Lecture #4
41
DFD Cleanup
• Every data store needs data flowing both in
and out (no black hole = inputs but no
output, or miracle = outputs without input)
• Fix processes which have logically
incomplete inputs and outputs
• Leave in processes which calculate
something, make decisions, manipulate
data, or organize data
INFO 503
Lecture #4
42
Process Decomposition
• A process transforms or responds to
incoming data or events
– Focus on what is done, and by whom
– Ignore how it is accomplished
• Process decomposition breaks a system
down into smaller subsystems and
processes, until each is readily understood
INFO 503
Lecture #4
43
Decomposition Diagram
p. 350
(243)
• A decomposition diagram uses an
organization chart structure to show
how a system is broken down logically
into smaller pieces or functions
– Car: start, go faster, slow down, turn, stop
– University: admissions, registration, take
courses, grading, graduation
INFO 503
Lecture #4
44
Other Processes
• Functions are related ongoing activities
• Events (transactions) are units of work
performed at a certain time
– Events tend to activate various functions
• Elementary (primitive) processes are the
lowest level of detail in a process model;
should have a strong action verb
INFO 503
Lecture #4
45
Process Logic
• Then identify the logic involved in
processes using Structured English
• Use simple declarative sentences to describe
–
–
–
–
INFO 503
Sequences of actions
Conditional actions (if…then)
Decision tables
Iterations
Lecture #4
46
Data Packets
• Think of data between shapes as packets
of information, regardless of their actual
contents or form (e.g. drive up window air
tube at bank)
• It may help to start at a very high level,
then decompose each step into more
detailed processes; a composite data flow
INFO 503
Lecture #4
47
Other Considerations
• Different types of data may be distinguished
at a junction
• A control flow represents an event which
triggers a process (end of month, etc.)
• More detailed process modeling can be
performed ad nauseum
INFO 503
Lecture #4
48
Network Modeling
• Network modeling describes a system in
terms of its business locations
• These locations may cover suppliers,
customers, and various aspects within
the system
• A location connectivity diagram may be
used to show the network model
INFO 503
Lecture #4
49
Network Hardware
• For more information on the physical parts
of a network, try the Cisco tutorials, such as
for educational or small business networks
INFO 503
Lecture #4
50
Model Synchronization
• It is important to make sure that the
data, network, interface, and process
models agree
• Map Data to Process, and Data to Location
using a CRUD matrix
• Optionally, map Process to Location
INFO 503
Lecture #4
51
CRUD matrix
• A CRUD matrix maps two system models
to ensure complete coverage and
coordination of requirements
• CRUD refers to the possible activities
–
–
–
–
INFO 503
Create new data
Read existing data
Update or change existing data
Delete existing data
Lecture #4
52
CRUD matrix
• The CRUD matrix shows each element
from two different models, and identifies
which properties (permissions) exist
for communication
• A blank indicates those two elements are
not related for those models
• Other properties can be defined as needed
INFO 503
Lecture #4
53
Process-Location Association
• Similarly, each process can be mapped to
the locations from which it is performed
INFO 503
Lecture #4
54
Requirements Traceability
• Similar matrices can be done to map
between the system requirements and
the major functions
• This proves where each requirement is
implemented in the system
• Tedious to generate, but invaluable!
INFO 503
Lecture #4
55