Logical Data Modelling - DCU School of Computing

Download Report

Transcript Logical Data Modelling - DCU School of Computing

Logical Data Modelling
 The sole purpose of an Information System is to support or
automate business activities by storing and processing
relevant business information or data.
 It is therefore critical to the success of any IS development
that the meaning, structure and business rules of the required
data are fully analysed, understood and modelled.
 During the Investigation phase we are concerned with
understanding the underlying (i.e. logical) data requirement
rather than making decisions about its physical
implementation.
Current and Required Data
 Most of the data required for the future system will be the
same in content or meaning as that used currently.
 The other form of data analysis concerns requirements for
new business data.
 The approach of starting with an analysis of existing data (or
even re-using existing data models) will provide the most
rigorous and efficient approach.
 This approach will also help in driving out restrictions in data
support arising from existing technical constraints.
Physical vs Logical Data Structures



An organisation’s data will be physically stored in many
different places, e.g. paper files, computer files.
This data will almost inevitably contain duplications and
compromises due to the physical restrictions of storage,
processing or practicality.
Example:
 A physical purchase order form will hold information about products
(product name, product number, product price), suppliers (supplier name,
supplier address), the order’s heading (purchase order number, purchase
order date) as well as the quantity of each product ordered (quantity
ordered).
 While we may have a single physical grouping of data on one purchase
order form, what we actually have is information about several different
things - products, suppliers, and purchase orders.
 In other words the underlying logical view is of a number of separate data
groupings, each describing a different business concept or object.
 We will also find that information on, for example, products is physically
held in many other places, such as on customer orders, invoices and
despatch notes.
 This all leads to a confusing mess of duplication and interconnecting
information, which in turn leads to problems in maintaining data
consistency and integrity.
The LDM
 In SSADM the vehicle for analysing the logical structure
of an organisation’s information is the Logical Data Model
(LDM).
 A Logical Data Model is a way of graphically representing
what that information is really all about, how it relates to
other information and business concepts, and how
business rules are applied to its use in the system.
 The LDM is possibly the most important and ultimately
the most rigorous product of an entire SSADM project.
 Logical Data Models consist of two parts:
 a diagram called the Logical Data Structure (LDS);
 a set of associated textual descriptions that explain each part of
the diagram.
Entities
 Any object or concept about which a system needs to
hold information is known as an Entity Type (or entity for
short).
 To be a valid entity we must wish to hold information on
more than one occurrence of it. Entity occurrences are
real world instances of an entity type.
 For example the entity type Supplier will have
occurrences such as:
Supplier No
3621
2327
Supplier Name
Off Beat Recordings
Bella Sonic
Supplier Address
12 High Street etc.
Lake Industrial Estate etc.
Entities (continued)
 The symbol for an entity in an LDS is a round cornered
rectangle containing the entity’s name (which must be
unique):
Supplier
unique name
 An entity must have a number of properties to qualify as
such:
- There must be more than one occurrence of the entity.
- Each occurrence should be uniquely identifiable.
- There must be data that we want to hold about the entity.
- It should be of direct interest to the system.
Attributes
 Each item of information (or data) that we hold
about an entity is known as an attribute or data
item.
 Examples of attributes for Supplier might be
supplier number, supplier name, supplier address,
and supplier telephone no.
 The detail of an entity’s attributes is not formally
included on the LDS itself. This is held in separate
textual descriptions, which will be discussed later.
Relationships
 Entities do not exist in isolation, but are related to other
entities.
 In physical data structures these relationships are
signified by physical links such as pointers or placement
in the same file or document.
 In logical models relationships represent business
associations or rules and not physical links.
 Any entities that are related are linked by a line on the
LDS.
 The line is labelled with the name of the relationship, and
is named in both directions.
supplier for
Supplier
?
?
placed with
Purchase
Order
Degree
 The number of occurrences of each entity type participating in a
given relationship is denoted by the degree or cardinality of that
relationship, and illustrated on the LDS by adding ‘crow’s feet’ to the
relationship’s line.
A
B
A
B
A
B
m:n
1:m
1:1
 There are three types of degree:
 Many to Many (m:n). This tells us that each occurrence of A is related to one or more
occurrences of B, and each occurrence of B is related to one or more occurrences of
A.
 One to Many (1:m). This tells us that each occurrence of A is related to one or more
occurrences of B, but each occurrence of B is related to only one occurrence of A.
 One to One (1:1). This tells us that each occurrence of A is related to only one
occurrence of B, and each occurrence of B is related to only one occurrence of A.
Optionality
 Each relationship is further annotated to show if it must
exist for all occurrences of the participating entity types.
 If there can be occurrences of one entity that are not
related to at least one occurrence of the other, then the
relationship is said to be optional for that entity.
 The relationship line is then converted to a dashed line at
its optional end (which could mean both ends if both
entities are optional participants).
Supplier
supplier for
placed with
Purchase
Order
Developing the LDS
 To start with we are only interested in producing a high
level model of the current system’s underlying data
structure.
 Due to its largely conceptual nature Logical Data
Modelling can be one of the most intense activities of an
SSADM project.
 In many projects development of the LDM is started by
holding brainstorming sessions with small groups of
analysts and users.
 With a little practice analysts often find that the best
method of data modelling is to draw up possible LDSs
almost instinctively
 Relationships are added as each entity is identified and
then checked with users on the spot.
 This approach has a lot to recommend it, particularly at
this level of detail or for small systems, as diagrams are
produced and verified quickly.
Identifying Entities
 To identify entities in the current environment we can
begin by looking at our physical data stores to find out
exactly what it is that they hold information about.
 If we take the customer order file and discuss it with
users, we find that it not only contains details of each
individual order, but of the customers themselves,
 i.e. customer address, customer telephone number etc., and so
encompasses at least two entities, namely Customer and
Customer Order.
Supplier
Purchase Order
Customer
Delivery (from a supplier)
Product
Stock
Customer Order
Despatch (to customers)
Verification
 Once the list has been drawn up we should verify it with
key users during preliminary scoping interviews.
 The key questions to ask of each entity are:
 Are any of the candidates merely attributes of another entity?
 Do any of the candidates represent a subset of occurrences of
another entity?
 Do all of the entities have a unique identifier?
 During this process we may discover new entities, merge
existing entities or discard candidates as being outside
the area of investigation.
Note: There will often be relationships between entities that exist in the
real world, but which are not of relevance to the system under discussion.
E.g a customer of ZigZag may well be employed by one of its suppliers.
This is NOT something that ZigZag will be interested in recording!
Adding Relationships
 We now examine each entity to see if it is directly related,
in a way that is of interest to the system, to any of the
other entities.
 The best way to do this is in discussion with users, either
taking each entity in turn, or starting with a key entity and
moving around the LDS “network” as the relationships
are identified.
 Having identified where we think relationships exist, we
now consider their degree, optionality and names.
 We do this by identifying the business rules that apply to
each entity pairing.
 The basic process is the same for all pairings, so we will
look at just one example.
Stock - Delivery
 We first consider the relationship from the
Stock perspective:
 Each Stock occurrence will consist of a quantity of a
single product, all of which was delivered on the
same delivery.
 If within the depot we have a quantity of a given
product, some of which was delivered in one delivery
and some in another, then we will have more than
one Stock.
 This is an example of one of ZigZag’s business rules,
and one that will continue in the new system.
 Thus each Stock occurrence is related to just
one Delivery.
 Each delivery may contain a number of different
products, each of which will be stored as a separate
stock (remember that each Stock occurrence is a
quantity of a single product).
 Thus each Delivery is related to one or more
Stock occurrences.
Delivery
Stock
Stock – Delivery (continued)
 We now consider the optionality of the relationship:
 Each Stock must have been delivered by a Delivery.
 So the relationship at the Stock end is mandatory.
 However a Delivery could be rejected for quality reasons
by the depot, in which case the delivery would be recorded
but would not be related to any subsequent Stock
occurrences.
 So the relationship is optional at the Delivery end.
 Choosing a name is often the hardest part of the
procedure.
 It is important to name a relationship in both directions
as it forces us to examine the true nature of the
relationship, sometimes leading to the discovery of
additional relationships or even entities.
 We should always try to choose phrases that
accurately reflect the users’ view of the relationship. In
our example it is not too difficult to find reasonable
names: delivery of and delivered by.
Delivery
Stock
Delivery
delivery
of
delivered
by
Stock
Overview LDS

Continuing this process for all of the relationships identified on the
matrix gives us a first-cut overview LDS for the current system:
supplier
of
Supplier
supplier
for
supplier of
Despatch
despatch
of
Delivered
by
placed
with
Purchase
Order
ordered
by
despatched
in
supplied by
ordered by
Product
customer
order for
order for
results in
held as
holding of
result of
delivered by
Delivery
Stock
delivery of
Customer
orderer of
ordered by
Customer
Order
Drilling Down…….
 The overview LDS provides us with a good basis for
building a more complete model of current data.
 We begin the process of creating a detailed model by
looking at this model and discussing it with users to
check our understanding of the scope of current data and
to uncover lower level entities which can be added
immediately.
Product
Type
Product
Depot
Zone
Supplier
Invoice
Delivery
Masters and Details
 Most relationships are 1:m.
 The entity at the ‘1’ end is known as the
master and the entity at the ‘m’ end as the
detail.
Master
Detail
Supplier
 The terms master and detail refer only to an
entity’s role in a particular relationship.
 It is quite possible for an entity to be the
master in one relationship and the detail in
another.
Product
Stock
Keys
 We should be able to select at least one identifier for each
entity type,
 i.e. an attribute that enables each occurrence of an entity to be
uniquely identified,
 e.g. for Customer we could use customer number.
 Any attribute or set of attributes which together uniquely
identify an entity is known as a candidate key.
 One of these candidates (there will often only be one)
should be selected as the primary key.
 Whenever we require direct access to an entity, the
primary key is used to identify which occurrence we are
interested in.
 For example, if we needed to access the Supplier entity to
find out a supplier’s address, we would use the primary
key of supplier number to identify the correct occurrence.
Foreign Keys
 If we have a relationship between two entities we need to
be able to associate the occurrences at one end with the
related occurrences at the other.
 In a relational model (such as the LDM) we do this by
including the primary key of the master in the set of
attributes of the detail.
 The copy of the master’s primary key in the detail entity is
known as a foreign key.
Key Navigation

Supplier attributes:
 Supplier No. (Primary Key), e.g 271
 Supplier Address etc.

Purchase Order attributes:
 P.O. Number (Primary Key), e.g 5001
 P.O. Date etc.
 Supplier Number (Foreign Key), e.g 271


Supplier
To access all purchase orders placed with
supplier number 271, we look for all
occurrences of Purchase Order with a
supplier number attribute value of 271.
Coming in the opposite direction, to access
the supplier for purchase order 5001, we
look for the single occurrence of the
Supplier entity whose primary key is equal
to the supplier number given in the foreign
key of purchase order number 5001, i.e.
supplier number 271.
supplier for
placed with
Purchase
Order
Types and Notation

Primary Keys belong to one of three types:
1. A Simple Key, consisting of a single attribute;
2. A Compound Key, consisting of two or more foreign keys;
3. A Hierarchic or Composite Key, consisting of one or more
foreign keys and a qualifying non-foreign key attribute.

Notation

The primary key is underlined and the foreign key preceded by
an asterisk to show the contents of each entity:
Supplier (supplier number, supplier address, supplier tel. no.)
Purchase Order (P.O. number, P.O. date, *supplier number)
Resolving Many-to-Many Relationships
 Many design techniques can only be carried out on
hierarchical (i.e. master-detail) relationships which are
hidden by m:n relationships.
 m:n relationships make navigation around the model very
difficult or even impossible (and, although we are not
really concerned with technical issues at this point, they
cannot be implemented).
 m:n relationships very often hide information about the
participating entities or the relationships themselves.
Resolving Many-to-Many Relationships - example
 Each Product may be ordered by one or more Purchase
Orders.
 Each Purchase Order must be an order for one or more
Products.
 So where do we place the quantity ordered?
Resolving Many-to-Many Relationships - example
Purchase Order Number:
0021113
Purchase Order Date:
Supplier:
Delivery Address:
2327
Depot 1
Bella Sonic
Harrow Way
Lake Industrial Estate
Harrow
Unit 5
HA4 3NB
4/3/01
NE3 7AJ
Qty
Your
Product
Ref
Our
Product
Description
Format
Unit
Price
Ref
100
BJB001
884690
The Best of Johnnie Boy
CD
6.99
500
3485VHS/3
993201
Unbranded Blank 3hr Video Tapes
BV
0.53
 If we look at a sample
purchase order of
ZigZag, we will
discover that details
of quantities and
products are held in
individual purchase
order lines.
Resolving Many-to-Many Relationships - example
 So in this case we can
choose a natural link
entity, which we will call
Purchase Order Item.
 Purchase Order Line
sounds a bit too similar to
the physical printed line
on the order form.
 The key for Purchase
Order Item will be
Purchase Order Number
plus Product Number – a
compound key.
Purchase
Order
Product
contains
ordered by
order for
contained in
Purchase
Order
Item
Relationships in M:N Resolutions
 Whenever we introduce a link entity we need to ensure
that the relationships we recorded previously with its
master entities are still valid.
 For example, in our overview LDS we recorded a many to
many relationship between Despatch and Customer
Order.
 This may at a high level appear reasonable as it is
common for some items in an order to go into one van
load (Despatch) and some into another.
 However, the contents of each item within the order is
always despatched in its entirety in the same van load
(i.e. if 3 copies of Puccini’s Tosca are ordered within a
single customer order, they will all be delivered together).
 Therefore, each Despatch is actually related to many
Customer Order Items, rather than to whole Customer
Orders.
Link Entities




Depot Zone and Product Type provide another more complex
illustration of many to many relationships:
Each Depot Zone may store one or more Product Types.
Each Product Type must be storable in one or more Depot Zones.
The attributes that make up Depot Zone are Depot Zone Number,
Shelf Height, and Depot Zone Description etc. Depot Zone
Number is a unique identifier that is assigned to each Depot
Zone, and is the label attached to the end of each row of shelving
in the zone. Depot Zone Description would include values such
as CD and DVD, Videos and Books, and Tape etc, which describe
the sorts of products that the shelving in each zone can
accommodate.
The attributes of Product Type include Product type code and
Product type name, where the Product type code is an
abbreviation of the Product type name, e.g. ‘BV’ for ‘Blank
Video’, ‘DVD’ for ‘DVD’ etc.
So, for example, we might have the following cases:
Depot Zones 101 and 105 store ‘DVD’ and ‘CD’ product types;
Depot Zone 102 stores ‘VHS’, ‘BV’ and ‘SPB’ (small paperback book) product
types.
Link Entities
 To make these associations we would have to set up lists
of foreign keys in both entities, of arbitrary length.
 Significant maintenance overhead
 Navigation around the model very difficult
 Against the rules of relational data modelling
 Solution: A link entity
 Each occurrence will store a valid association or pairing of a
Depot Zone occurrence with a Product Type occurrence, such
as:
Depot Zone
Product Type
Product
101
DVD
Depot Zone
Type
101
CD
105
DVD
allocated by
stored
in
105
CD
allocation for
allocates
102
VHS
storage for
102
BV
102
SPB
Depot Zone
Allocation
Pigs Ear
substitute for
Product
sub with
substituted by
sub by
Product
Substitute
Product
sub of
sub for
Resolving One-to-One Relationships
 The problems associated with 1:1 relationships are less
clear-cut than with m:n relationships:
 1:1 relationships often obscure an underlying single entity.
 There may be a missing link entity.
 Later design techniques may require all relationships to be
master-detail.
 In the ZigZag overview LDS there are two 1:1
relationships - between Delivery and Purchase Order and
between Supplier Invoice and Delivery .
 Deliveries are identified by the purchase order they are satisfying
 The only information currently held about them details which
parts of the purchase order they have successfully delivered.
 It is quite easy in this case to view Delivery as a logical extension
(or conclusion) of a Purchase Order, so we will merge the two
entities and transfer all of Delivery’s relationships to Purchase
Order.
 To do this successfully, Purchase Order will contain attributes
delivery date and supplier’s delivery reference while Purchase
Order Item will contain quantity delivered.
Resolving One-to-One Relationships (continued)
Supplier
Invoice
Purchase
Order
Invoice
Item
Purchase
Order
Item
Supplier
Invoice
Purchase
Order
Purchase
Order
Item
Resolving One-to-One Relationships (continued)
Supplier
Invoice
Purchase
Order
Purchase
Order
Item
Purchase
Order
Supplier
Invoice
Purchase
Order
Item
Removing Redundant Relationships
 One of our aims when drawing up an LDS should be to
include only the minimum number of relationships needed
to apply all of the business rules relating to data.
 Any unnecessary relationships are termed ‘redundant’, and
will involve us in a maintenance overhead if implemented.
Customer
Customer
Order
Customer
Order
Item
Is this
relationship
Redundant?
 The major difference
between relationships
and a route map is
that each relationship
carries with it a
meaning, and so
different ‘routes’
between entities will
often have different
meanings, or enforce
different rules.
Removing Redundant Relationships

Purchase
Order

Is this
relationship
Redundant?
Supplier
Invoice


Purchase
Order
Item
Each Purchase Order may
be related to a number of
Supplier Invoice, each of
which is related to a PO
Item.
Each PO Item may relate to
just one Supplier Invoice,
which relates to just one
Purchase Order.
HOWEVER – Each Purchase
Order MUST contain at least
one PO Item.
If the Invoice is not present
then removing the direct
relationship would mean that
a relationship could not be
established between
Purchase Order and PO
Item.
Product
Type
Supplier
Depot
Zone
Depot Zone
Allocation
Product
Substitute
Product
Purchase
Order
Despatch
Supplier
Invoice
Customer
Order
Purchase
Order
Item
Customer
Order
Item
Stock
Customer
Selected ZigZag Entities and Attributes
PRODUCT
Product Number
*Product Type Code
Product Name
Product Description
Release Date
Sell-by Date (special
promotional products)
Sell-from Date
Standard Purchase Price
Standard Selling Price
PRODUCT SUBSTITUTE
*Product Number [substitute]
*Product Number [substituted]
PRODUCT TYPE
Product Type Code
Product Type Name
Product Type Description
DEPOT ZONE
Depot Zone Number
Depot Zone Description
PURCHASE ORDER
Purchase Order Number
*Supplier Number
Supplier’s Delivery Reference
Purchase Order Date
Purchase Order Status
Delivery Date
Delivery Start Time
Delivery End Time
SUPPLIER
Supplier Number
Supplier Name
Supplier Address
Supplier Tel. No.
Supplier Contact Name
SUPPLIER INVOICE
*Purchase Order Number
Supplier’s Invoice Number
Invoice Date
PURCHASE ORDER ITEM
*Purchase Order Number
*Product Number
*Invoice Number
Quantity Required
Quantity Confirmed
Quantity Delivered
Quantity Accepted
Invoiced Quantity
Agreed Unit Price
Required-By Date
Required-By Time-Period
STOCK
Stock Id
*Purchase Order Number
*Product Number
*Zone Code
Quantity Stocked
Quantity Stocked
Quantity Reserved
DEPOT ZONE ALLOCATION
*Depot Zone Number
*Product Type Code
Completing the Documentation
Entity Description
Purchase Order
Entity Name
Description
A request for purchase and delivery of goods
from a single supplier.
Attribute
Primary
Key
Purchase Order Number
Yes
Foreign
Key
Mandatory/
Optional
M
Purchase Order Date
M
Supplier Number
Yes
M
Purchase Order Status
M
Delivery Date
O
Delivery Start Time
O
Delivery End Time
O
must/may be
either
Link Phrase
one & only one
/or
Entity Name
/one or more
must be
placed with
one & only one
Supplier
may
result in
one or more
Supplier
Invoice
must
contain
one or more
P.O. Item
Entity Volumes:
Max.
15000
Min.
6000
Average
10000
User
Access
P.O. Clerk
Read, Create, Delete, Modify
Despatch Scheduler
Read
Purchaser
Read, Create
Growth Rate: 15% per year
Purchase Orders should be archived to tape six months
after the last related line has been delivered or cancelled.
Archiving
Small Projects
Entity Name
Short Description or
Comments (optional)
Min
Volume
Max
Volume
Ave
Volume
Growth
Rate
Supplier
Product
Depot Zone
Numbers rise at Xmas
400
10000
26
750
100000
40
500
25000
32
5%
25%
20%
1500
90000
16000
420000
10000
200000
15%
25%
Purchase Order
Purchase Order
Item
Storage Area or Aisle
within Depot
Numbers rise at Xmas
Attribute Name
Short Description or
Comments (optional)
Domain
Length
Purchase Order
Number
Purchase Order
Date
Purchase Order
Status
Automatically
generated by system
Date order placed
Integer
9
DDMMYYYY
8
P(Provisional) V(Placed)
C(Confirmed) D(Delivered)
1
Entity Description Table and Data Catalogue Table for Small Projects
Validating the LDM
 we need to check that the LDM can provide access to all of
the data items required by each update or enquiry process.
 Most processes will need to access a number of data items,
which will be specified by some selection criteria.
 These items will often be represented by the attributes of
more than one entity.
 navigate around the relationships of the LDS, applying the
selection criteria to filter out the entity occurrences we need
to provide all of the necessary data. These navigations are
called ‘Access Paths’.
Validating the LDM



For example, when
allocating a zone in which to
store the stock of a
particular product received
in a delivery (a process
called ‘Allocate Stock Zone’),
we will need to find out
which depot zones have
been designated for the
storage of that type of
product.
The entry point to the LDS is
via the product number in
the entity Product.
We can then access its
product type, and then the
possible zones in which this
product can be stored by
reading through all the
occurrences of Depot Zone
Allocation for that Product
Type.
Product
Type
Product
Depot Zone
Allocation
Depot Zone
Product Type
Supplier
Depot Zone
Allocation
Product
Substitute
Product
Purchase Order
Customer
Despatch
Supplier
Invoice
Customer Order
Purchase
Order
Item
Customer
Order
Item
Stock