Transcript CHAPTER

CHAPTER 4
DATABASES AND DATA
WAREHOUSES
A Gold Mine of Information
4-2
Introduction
Today, Organizations Need...




Information to compete effectively
Information just to stay alive in the information age
Information organized in such a way that you can
easily and quickly get to it
Information-processing tools that help you work with
information
Using Databases and Data
Warehouses Instead of
Shopping Carts
Mervyn抯
 Data warehouse and data mining tools

4-3
Introduction
YOUR FOCUS IN THIS CHAPTER
The Difference Between Logical and
Physical Views of Information
 Databases and Database Management
Systems
 How You Can Develop Database
Applications
 Data Warehouses and Data Mining Tools

4-4
Information Revisited
THREE THINGS
ORGANIZATIONS DO WITH
INFORMATION
1.Process information in the form of
transactions
2.Use information to make a decision
3.Manage information while it抯 used
4-5
Information Revisited
PROCESSING INFORMATION IN
THE FORM OF TRANSACTIONS



Such as payroll processing, order processing, and
handling your registration requests for classes.
This is called ONLINE TRANSACTION
PROCESSING (OLTP) - the gathering of input
information, processing that information, and updating
existing information to reflect the gathered and
processed information.
Operational databases support OLTP.
4-6
Information
Revisited
USING INFORMATION TO MAKE
A DECISION
For answering such questions as, 揌ow many
senior-level marketing majors have not taken
statistics?
 This is called ONLINE ANALYTICAL
PROCESSING (OLAP) - the manipulation of
information to support decision making.
 Data warehouses support OLAP.

4-7
Information Revisited
MANAGING INFORMATION
WHILE IT扴 USED
Determining who can view or use information
 Specifying how to back up information
 Identifying what storage technologies to use

Most importantly, managing information includes
organizing it so that people can logically use it
without having to know anything about its
physical structure. The difference between
logical and physical is key.
4-8
Information
Revisited
 In managing information, physical deals
with the

structure of information as it resides on various
storage media.
Logical deals with how knowledge workers view their
information needs, and includes such terms as:
–
–
–
–
–
–
CHARACTER - our smallest unit of information.
FIELD - group of related characters.
RECORD - group of related fields.
FILE - group of related records.
DATABASE - group of logically associated files.
DATA WAREHOUSE - information from many databases.
4-9
Databases
DATABASE
a collection of information that you organize and
access according to the logical structure of that
information.
A database is actually composed of two parts:
1. the information itself
–
the files that are logically associated
2. the logical structure of the information
–
called the data dictionary
4-10
Databases
A Database Is a Collection of
Information
Most databases contain two or more files with
related information.
 The Inventory database (Figure 4.4, page 125)
contains two files - Part and Facility.
 These two files are logically related because
parts are stored in facilities and because you
would use both of these files to manage your
inventory.

4-11
Databases
A Database Contains a Logical
Structure




You organize and access a database by its logical
structure, not its physical position.
DATA DICTIONARY - contains the logical structure of
information in a database.
The data dictionary contains the logical properties that
describe information in a database.
See Figure 4.5 (page 126) for the data dictionary of
the Percentage Markup field in the Inventory
database.
4-12
Databases
A Database Has Logical Ties Among
the Information



A PRIMARY KEY is a field in a database file that
uniquely describes each record.
A FOREIGN KEY is a primary key of one file that also
appears in another file. So, foreign keys specify how
files are logically related.
For example, the Part and Facility files are logically
related. So, in Figure 4.4 you can see that Facility
Number (the primary key for the Facility file) exists in
the Part file (where it抯 a foreign key).
4-13
Databases
A Database Contains Built-in
Integrity Constraints



An INTEGRITY CONSTRAINT is a rule that helps
assure the quality of the information in a database.
A registration database at your school includes
integrity constraints concerning prerequisites for
certain classes.
Our Inventory database includes an integrity
constraint that says a part in the Part file cannot be
assigned to a facility that does not exist in the Facility
file.
4-14
Database Management Systems
DATABASE MANAGEMENT
SYSTEM (DBMS)
the software you use to specify the logical
organization for a database and access it.
A DBMS contains 5 software components:
1. DBMS engine
2. Data definition subsystem
3. Data manipulation subsystem
4. Application generation subsystem
5. Data administration subsystem
4-15
DBMSs
DBMS ENGINE
accepts logical requests from the various other
DBMS subsystems, converts them to their
physical equivalent, and actually accesses the
database and data dictionary as they exist on a
storage device.
Recall that:


PHYSICAL VIEW deals with how information is
physically arranged, stored, and accessed on some
type of secondary storage device.
LOGICAL VIEW focuses on how you need to arrange
and access information to meet your particular
business needs.
4-16
DBMSs
DATA DEFINITION SUBSYSTEM
helps you create and maintain the data
dictionary and define the structure of the files in
a database.



You use this subsystem to define the information
logical structure when you first create a database.
Once you抳e created a database, you use this
subsystem to define new fields, delete fields, or
change field properties.
Figure 4.5 (page 126) contains this subsystem screen
for the Part file.
4-17
DBMSs
DATA MANIPULATION SUBSYSTEM
helps you add, change, and delete information in
a database and mine it for valuable information.


This subsystem is most often the primary interface
between you as a user and the information contained
in a database.
Tools in this subsystem include views, report
generators, query-by-example tools, and structured
query language.
4-18
DBMSs
DATA MANIPULATION TOOLS


VIEW - allows you to see the content of a database
file, make whatever changes you want, perform
simple sorting, and query to find the location of
specific information. See Figure 4.7 page 129.
REPORT GENERATOR - helps you quickly define
formats of reports and what information you want to
see in a report. See Figures 4.8 and 4.9 page 130.
4-19
DBMSs
DATA MANIPULATION TOOLS


QUERY-BY-EXAMPLE (QBE) TOOL - helps you
graphically design the answer to a question. Figure
4.10 (page 130) shows the QBE for displaying the
names and phone numbers of facility managers in
charge of parts that cost more than $10.
STRUCTURED QUERY LANGUAGE (SQL) - a
standardized fourth-generation language found in
most database environments. SQL is the same as
QBE, except that you perform a query by creating a
statement instead of pointing, clicking, dragging.
4-20
DBMSs
APPLICATION GENERATION
SUBSYSTEM
contains facilities to help you develop
transaction-intensive applications. This
subsystem includes:



Tools for creating data entry screens (See Figure 4.12
page 131 for an example)
Programming languages specific to a particular
DBMS
Interfaces to commonly used programming languages
that are independent of any DBMS.
4-21
DBMSs
DATA ADMINISTRATION
SUBSYSTEM
helps you manage the overall database
environment by providing facilities for:






Backup and recovery
Security management
Query optimization
Reorganization
Concurrency control
Change management
4-22
Database Models
THE RELATIONAL DATABASE
MODEL
a database model that uses a series of twodimensional tables or files to store information.



This is the most popular model.
Each table is called a RELATION.
A relation contains information about a particular
ENTITY CLASS (a concept - people, places, or things
- about which you wish to store information and that
you can identify with a unique key).
4-23





Database Models
Figure 4.14 (page 136) shows a relational database
for a video rental store.
The entity classes are Customer, Video, Video Rental,
and Distributor.
Notice how these tables are related to each other
through the use of foreign keys.
In the Video Rental relation, you抣l find a primary key
that uses more than one one field to create a unique
description. This is called a COMPOSITE PRIMARY
KEY.
A primary key that uses only one field is called an
ATOMIC PRIMARY KEY.
4-24
Database Models
THE OBJECT-ORIENTED (O-O)
DATABASE MODEL
a database model that brings together, stores,
and allows you to work with both information and
procedures that act on the information.

An OBJECT-ORIENTED DATABASE
MANAGEMENT SYSTEM (O-O DBMS) is the DBMS
software that allows you to develop and work with an
O-O database.
4-25




Database Models
This model takes advantage of the concept of an
OBJECT - a software module containing information
that describes an entity class along with a list of
procedures that can act on the information
describing the entity class.
Figure 4.15 (page 138) shows the same video rental
store using the O-O database model.
Notice that the objects (entity classes) - which
include Customer, Video Rental, Video, and
Distributor - contain both information and
procedures for working with that information.
See Appendix C for more on objects.
4-26


Developing Databases
DEVELOPING YOUR OWN
DATABASE
Being able to develop your own database is a part of
knowledge worker computing.
Building a database for your personal needs includes
the following 4 steps:
1. Defining entity classes and primary keys
2. Defining relationships among entity classes
3. Defining information (fields) for each relation
4. Using a data definition language to create the database

Follow along as we build the database to support the
report in Figure 4.16 on page 140.


You own a small business and are interested in
tracking employees by the department in which
they work, job assignment, and the number of
hours assigned.
Each of your employees can be assigned to only
one department, but a department may have many
employees (a department, however, may not have
any employees assigned to it). Each employee can
be assigned to any number of jobs and a job can
have many employees assigned to it, but it抯 not
necessary that any employees be assigned to a
certain job.
4-27
Developing Databases
#1 - DEFINING ENTITY CLASSES
AND PRIMARY KEYS
From the report in Figure 4.16, you can identify
the entity classes as Employee, Department,
and Job.
 Now, for each entity class, you must define a
primary key that provides a unique description.
These include:

•
•
•
Employee entity class - Emp ID (e.g., 2345 for Smith)
Department entity class - Dept (e.g., 15)
Job entity class - Job (e.g., 14 for Acct)
4-28
Developing Databases
#2 - DEFINING RELATIONSHIPS
AMONG ENTITY CLASSES

For this step, use an ENTITY-RELATIONSHIP
(E-R) DIAGRAM, a graphical method of
representing entity classes and their
relationships.

See Figure 4.17 (page 140) for the initial E-R
diagram of our database and a listing of E-R
diagram symbols.
4-29
EMPLOYEE




Developing Databases
M:1
DEPARTMENT
An Employee must be assigned to a Department.
An Employee cannot be assigned to more than one
Department.
A Department may have many Employees assigned
to it.
A Department is not required to have any Employees
assigned to it.
4-30



Developing Databases
After building the initial E-R diagram, you must follow
the process of normalization.
NORMALIZATION is a process of assuring that a
relational database structure can be implemented as
a series of two-dimensional tables.
Normalization includes the following 3 steps:
1.Eliminate repeating groups or M:M relationships
2.Assure that each field in a relation depends only on the primary
key of that relation
3.Remove all derived fields from the relations.
4-31





Developing Databases
The first rule of normalization states that no M:M
relationships can exist.
There is an M:M between Employee and Job.
You eliminate this by creating an INTERSECTION
RELATION - a relation you create to eliminate a
repeating group.
An intersection relation will have a composite primary
key that consists of the primary key fields from the
two intersecting relations.
In Figure 4.18 (page 142), we created an intersection
relation called Employee-Job to eliminate the M:M
relationship.
4-32
Developing Databases
#3 - DEFINING INFORMATION
(FIELDS) FOR EACH RELATION
In this step, you follow rules #2 and #3 of
normalization.
 Your goal here is two-fold:

1.Make sure that the information in each relation is indeed
in the correct relation
2.Make sure that the information cannot be derived from
other information.
4-33




Developing Databases
To determine if information is in the correct relation,
ask:
揇oes this piece of information depend only on the
primary key for this relation?
If the answer is yes, the information is in the correct
relation.
In the Employee relation (Figure 4.20 page 144), we
currently store Dept Sup. Does Dept Sup depend on
Emp ID?
The answer is no - Dept Sup depends on Dept, so it
should be in the Department relation.
4-34




Developing Databases
Derived information - information that can be
mathematically determined from other information should not be stored in your database.
For example, # Emp is a field in the Department
relation.
However, we can simply count the number of
occurrences of each Dept in the Employee relation
and determine the number of employees.
So, we remove # Emp from the database.
4-35
Developing Databases
#4 - USING A DATA DEFINITION
LANGUAGE TO CREATE THE
DATABASE
The final step is to actually create the relations
you identified in steps 1-3.
 You do this with a data definition language.
 This step includes:

–
–
–
Developing a data dictionary
Defining the various relations
Defining primary keys and relationships
字段名
Emp Id
Name
Dept
字段名
Job
Job Name
Employee Relation
类型
长度
4
字符型
20
字符型
2
字符型
Job Relation
类型
长度
2
字符型
10
字符型
说明
员工的代码
说明
工作的代码
Department Relation
字段名
类型
长度
说明
Department 字符型
2
部门的代码
Dept Sup
20
字符型
字段名
Emp Id
Job
Hours
Employee Relation
类型
长度
4
字符型
10
字符型
数字型
单精度
说明
员工的代码
4-36
Data Warehouses
DATA WAREHOUSE
a logical collection of information - gathered from
many different operational databases - that
supports business analysis activities and
decision-making tasks. Data warehouses...



are a logical extension of databases
support OLAP
are among the newest and hottest buzz words and
concepts in the IT field.
4-37
Data Warehouses
DATA WAREHOUSE FEATURES

Data warehouses combine information from different
databases
–

Data warehouses are multi-dimensional
–
–

Making them a true repository of all an organization抯
information
As opposed to 2 dimensions in the relational model
Often called hypercubes (See Figure 4.23 page 148)
Data warehouses support decision making
–
While databases support OLTP, data warehouses support OLAP
4-38
Data Warehouses
DATA MINING TOOLS
the software tools you use to query information in
a data warehouse.



QUERY-AND-REPORTING TOOLS - QBE tools,
SQL, and report generators.
INTELLIGENT AGENTS - various artificial
intelligence tools that form the basis for 搃nformation
discovery?in OLAP.
MULTIDIMENSIONAL ANALYSIS (MDA) TOOLS slice-and-dice techniques that allow you to view
multidimensional information from different
perspectives.
4-39
Data Warehouses
IMPORTANT CONSIDERATIONS
IN USING A DATA WAREHOUSE
Do you need a data warehouse?
 Do you already have a data warehouse?
 Who will the users be?
 How up-to-date must the information be?
 What data mining tools do you need?

How will changes in technology affect
organizing and managing information
 What types of database model and
databases are most appropriate

4-40
Managing Information
MANAGING THE INFORMATION
RESOURCE
How will changes in technology affect
organizing and managing information?
 What types of database models and databases
are most appropriate?
 Who should oversee the organization抯
information?

4-41
Managing Information
OVERSEEING YOUR
ORGANIZATION扴 INFORMATION



CHIEF INFORMATION OFFICER (CIO) is the IT
manager who directs all IT systems and personnel
while communicating directly with the highest levels of
the organization.
DATA ADMINISTRATION plans for, oversees the
development of, and monitors the information
resource.
DATABASE ADMINISTRATION is responsible for the
more technical and operational aspects of managing
information in databases.
4-42
Managing Information
MANAGING THE INFORMATION
RESOURCE
Is information ownership a consideration?
 What are the ethics involved in organizing and
managing information?
 How should databases and database
applications be developed and maintained?

4-43
TO SUMMARIZE

How we view information:
–
–


The physical view of information deals with how information is
physically arranged, stored, and accessed on some type of
secondary storage device.
The logical view of information focuses on how you need to
arrange and access information to meet your particular business
needs.
A database is a collection of information that you
organize and access according to the logical
structure of that information.
The data dictionary contains the logical structure of
information in a database.
4-44
TO SUMMARIZE



A database management system is the software you
use to specify the logical organization for a database
and access it.
Popular database models include the relational
model and the object-oriented model.
The four steps of developing a personal database
application include:
1. Define entity classes and primary keys
2. Define relationships among entity classes
3. Define information (fields) for each relation
4. Use a data definition language to create the database
4-45
TO SUMMARIZE

Data warehouses are a logical collection of
information - gathered from many different
operational databases - that supports business
analysis activities and decision-making tasks.

Data mining tools - the software tools you use to
query information in a data warehouse - include
query-and-reporting tools, intelligent agents, and
multidimensional analysis (MDA) tools.