Document 7382271
Download
Report
Transcript Document 7382271
1
Module
3
The concept of data
processing
Major issues in database
management
2
Learning Objectives
Explain
the importance of implementing data
resource management processes and
technologies in an organization.
Understand
the advantages of a database
management approach to managing the data
resources of a business.
3
Learning Objectives (continued)
Explain
how database management software helps
business professionals and supports the operations
and management of a business.
Illustrate each of the following concepts:
Major types of databases
Data warehouses and data mining
Logical data elements
Fundamental database structures
Database access methods
Database development
4
Section I
Managing Data Resources
5
Data Resource Management
A
managerial activity
Applies information systems technology to
managing data resources to meet needs of
business stakeholders.
6
Foundation Data Concepts
Levels
of data
Character
Single alphabetical, numeric, or other
symbol
Field
Groupings of characters
Represents an attribute of some entity
7
Foundation Data Concepts (continued)
Records
Related
fields of data
Collection of attributes that describe an
entity
Fixed-length or variable-length
8
Foundation Data Concepts (continued)
Files
(table)
A group of related records
Classified by
Primary use
Type of data
permanence
9
Foundation Data Concepts (continued)
Database
Integrated
collection of logically related
data elements
Consolidates records into a common pool
of data elements
Data is independent of the application
program using them and type of storage
device
10
Foundation Data Concepts (continued)
Logical
Data Elements
11
Types of Databases
Operational
Supports
business processes and operations
Also called subject-area databases,
transaction databases, and production
databases
12
Types of Databases (continued)
Distributed
Replicated
and distributed copies or parts of
databases on network servers at a variety of
sites.
Done to improve database performance and
security
13
Types of Databases (continued)
External
Available
for a fee from commercial sources
or with or without charge on the Internet or
World Wide Web
Hypermedia
Hyperlinked
pages of multimedia
14
Data Warehouses and Data Mining
Data
warehouse
Stores data extracted from operational,
external, or other databases of an
organization
Central source of “structured” data
May be subdivided into data marts
15
Data Warehouses and Data Mining (continued)
Data
mining
A major use of data warehouse databases
Data is analyzed to reveal hidden
correlations, patterns, and trends
16
Database Management Approach
Consolidates
data records and objects into
databases that can be accessed by many
different application programs
17
Database Management Approach (continued)
Database
Management System
Software interface between users and
databases
Controls creation, maintenance, and use of
the database
18
Database Management Approach (continued)
19
Database Management Approach (continued)
Database
Interrogation
Query
Supports
ad hoc requests
Tells the software how you want to
organize the data
SQL queries
Graphical (GUI) & natural queries
20
Database Management Approach (continued)
Report
Generator
Turns results of query into a useable
report
Database
Maintenance
Updating and correcting data
21
Database Management Approach (continued)
Application
Development
Data manipulation language
Data entry screens, forms, reports, or web
pages
22
Implementing Data Resource Management
Database
Administration
Develop and maintain the data dictionary
Design and monitor performance of
databases
Enforce database use and security standards
23
Implementing Data Resource Management (continued)
Data
Planning
Corporate planning and analysis function
Developing the overall data architecture
24
Implementing Data Resource Management (continued)
Data
Administration
Standardize collection, storage, and
dissemination of data to end users
Focused on supporting business processes
and strategic business objectives
May include developing policy and setting
standards
25
Implementing Data Resource Management (continued)
Challenges
Technologically
complex
Vast amounts of data
Vulnerability to fraud, errors, and failures
26
Section II
Technical Foundations of Database
Management
27
Database Structures
Hierarchical
Treelike
One-to-many
relationship
Used for structured, routine types of
transaction processing
28
Database Structures (continued)
Network
More
complex
Many-to-many relationship
More flexible but doesn’t support ad hoc
requests well
29
Database Structures (continued)
Relational
Data
elements stored in simple tables
Can link data elements from various tables
Very supportive of ad hoc requests but
slower at processing large amounts of data
than hierarchical or network models
30
Database Structures (continued)
Multi-Dimensional
A
variation of the relational model
Cubes of data and cubes within cubes
Popular for online analytical processing
(OLAP) applications
31
Database Structures (continued)
32
Database Structures (continued)
Object-oriented
Key
technology of multimedia web-based
applications
Good for complex, high-volume applications
33
Database Structures (continued)
34
Accessing Databases
Key
fields (primary key)
A field unique to each record so it can be
distinguished from all other records in a
table
35
Accessing Databases (continued)
Sequential
access
Data is stored and accessed in a sequence
according to a key field
Good for periodic processing of a large
volume of data, but updating with new
transactions can be troublesome
36
Accessing Databases (continued)
Direct
access
Methods
Key transformation
Index
Indexed sequential access
37
Database Development
Data
dictionary
Directory containing metadata (data about
data)
Structure
Data elements
Interrelationships
Information regarding access and use
Maintenance & security issues
38
Database Development (continued)
Data
Planning & Database Design
Planning & Design Process
Enterprise model
Entity relationship diagrams (ERDs)
Data modeling
Develop logical framework for the
physical design
39
Discussion Questions
How
should an e-business enterprise store,
access, and distribute data & information
about their internal operations & external
environment?
What
roles do database management, data
administration, and data planning play in
managing data as a business resource?
40
Discussion Questions (continued)
What
are the advantages of a database
management approach to organizing,
accessing, and managing an organization’s
data resources?
What
is the role of a database management
system in an e-business information system?
41
Discussion Questions (continued)
Databases
of information about a firm’s
internal operations were formerly the only
databases that were considered to be
important to a business. What other kinds of
databases are important for a business today?
What
are the benefits and limitations of the
relational database model for business
applications?
42
Discussion Questions (continued)
Why
is the object-oriented database model
gaining acceptance for developing applications
and managing the hypermedia databases at
business websites?
How
have the Internet, intranets, extranets,
and the World Wide Web affected the types
and uses of data resources available to
business end users?
43
Real World Case 1 – IBM versus Oracle
What
key business strategies did Janet Perna
implement to help IBM catch up to Oracle in
the database management software market?
What
is the business case for both IBM’s and
Oracle’s product strategy for their database
software?
44
Real World Case 1 (continued)
Which
approach would you recommend to a
company seeking a database system today?
What
do you see as the key factor to IBM’s
success?
45
Real World Case 1 (continued)
The
case states that “database software has
become more of a commodity.” Do you agree?
46
Real World Case 2 – Experian Automotive
How
do the database software tools discussed
in this case help companies exploit their data
resources?
What
is the business value of the automotive
database created by Experian?
47
Real World Case 2 (continued)
What
other business opportunities could you
recommend to Experian that would capitalize
on their automotive database?
The
case states that Experian’s automotive
database “has raised the hackles of privacy
advocates.” What legitimate privacy concerns
and safeguard suggestions might be raised
about this database and its use?
48
Real World Case 3 – Shell Exploration
Why
do companies still have problems with
the quality of the data resources stored in their
business information systems?
What
is a “data silo?”
49
Real World Case 3 (continued)
How
do data warehouse approaches help
companies like Shell and OshKosh meet their
data resource management challenges?
What
business benefits can companies derive
from a data warehouse approach?
50
Real World Case 4 – BlueCross BlueShield & Warner Bros.
What
is a storage area network? Why are so
many companies installing SANs?
What
are the reasons for the quick payback on
SAN investments?
51
Real World Case 4 (continued)
What
are the challenges and alternatives to
SANs as a data storage technology?
What
are some advantages of SANs?
52
Real World Case 5 – Sherwin-Williams & Krispy Kreme
Tips
for Managing External Data
Purchase external data from a reliable
source that will do most of the refining for
you and will work with you on contingency
plans.
Run
a test load first. A load of test data can
pave the way for accurate production loads.
53
Real World Case 5 (continued)
Managing
external data (continued)
Don’t collect data until business and IT staff
have agreed on the amount, frequency,
format, and content of the data you need.
Don’t
acquire more data or use more data
sources than you really need.
54
Real World Case 5 (continued)
Managing
external data (continued)
Don’t mingle external and homegrown data
without adding unique identifiers to each
record, in case you need to pull it out.
Don’t
overestimate the data’s integrity.
Nothing beats direct customer contact and
tactical details behind the data.
55
Real World Case 5 (continued)
What
challenges in acquiring and using data
from external sources are identified in this
case?
Do
you prefer the Sherwin-Williams or Krispy
Kreme approach to acquiring external data?
56
Real World Case 5 (continued)
What
other sources of external data might a
business use to gain valuable marketing and
competitive intelligence?
57
CS 317 - Data Management and
Information Processing
58
What Is a Database System?
Database:
a very
large, integrated collection of data.
Models a real-world enterprise
Entities (e.g., teams, games)
Relationships
(e.g., The
Forty-Niners are playing in The Superbowl)
More recently, also includes active components , often called
“business logic”. (e.g., the BCS ranking system)
A Database Management System (DBMS) is a software system
designed to store, manage, and facilitate access to databases.
59
Database Systems: Then
60
Database Systems: Today
From Friendster.com on-line
61
Other Ways Databases Make Life Better?
“Players
could finally
sign up for the Star
Wars Galaxies game
last week as Sony
opened up registration
to the public.”
“Once
players got in to
the game they found
that the game servers
were offline because of database
problems.”
62
Other databases you may use
63
=
Is the WWW a DBMS?
Fairly sophisticated search available
crawler indexes pages on the web
Keyword-based search for pages
But, currently
data is mostly unstructured and untyped
search only:
can’t modify the data
can’t get summaries, complex combinations of data
few guarantees provided for freshness of data, consistency across
data items, fault tolerance, …
Web sites typically have a DBMS in the background to provide
these functions.
The picture is changing
New standards e.g., XML, Semantic Web can help data modeling
Research groups (e.g., at Berkeley) are working on providing some
of this functionality across multiple web sites.
64
“Search” vs. Query
What
if you
wanted to find
out which actors
donated to John
Kerry’s
presidential
campaign?
“actors
donated to john
kerry” in your
favorite search
Try
65
A “Database Query” Approach
66
Why Study Databases??
Shift from computation to information
always true for corporate computing
Web made this point for personal computing
more and more true for scientific computing
Need for DBMS has exploded in the last years
Corporate: retail swipe/clickstreams, “customer relationship
mgmt”, “supply chain mgmt”, “data warehouses”, etc.
Scientific: digital libraries, Human Genome project, NASA
Mission to Planet Earth, physical sensors, grid physics network
DBMS encompasses much of CS in a practical discipline
OS, languages, theory, AI, multimedia, logic
Yet traditional focus on real-world apps
?
67
What’s the intellectual content?
representing
data
information
modeling
languages
and systems for querying data
complex
queries with real semantics*
over massive data sets
concurrency
control for data
manipulation
controlling
concurrent access
ensuring transactional semantics
reliable
data storage
maintain
data semantics even if you pull the
68
Describing Data: Data Models
A
data model is a collection of concepts
for describing data.
A
schema is a description of a particular
collection of data, using a given data
model.
The
relational model of data is the most
widely used model today.
Main concept: relation, basically a table
with rows and columns.
69
Levels of Abstraction
Users
Views
describe how
users see the data.
View 1
Conceptual
schema
defines logical
structure
View 2
Conceptual Schema
Physical Schema
DB
Physical
schema
describes the files and
View 3
70
Example: University Database
Conceptual
View 1
View 2
View 3
schema:
Conceptual Schema
Students(sid: string, name: string,
login: string, age: integer, gpa:real)
Physical Schema
Courses(cid: string, cname:string,
credits:integer)
DB
Enrolled(sid:string, cid:string,
grade:string)
External Schema (View):
Course_info(cid:string,enrollment:intege
r)
71
Data Independence
Applications
insulated
from how data is
structured and stored.
Logical data
independence:
Protection from changes
in logical structure of
data.
Physical
data
View 1
View 2
View 3
Conceptual Schema
Physical Schema
DB
72
Queries, Query Plans, and Operators
SELECT
SELECT eid,
E.loc,
ename,
AVG(E.sal)
title
COUNT
DISTINCT
(E.eid)
FROM
Emp
E
FROM
Emp
Proj
P, Asgn A
GROUP
BYE,E.loc
WHERE
E.sal
> $50K
WHERE E.eid = A.eid
HAVING Count(*) > 5
AND P.pid = A.pid
AND E.loc <> P.loc
Count
Having
distinct
Group(agg)
Join
Select
Join
Emp
System handles query plan
generation & optimization;
ensures correct execution.
Proj
Emp
Emp
Asgn
Employees
Projects
Assignments
• Issues: view reconciliation, operator ordering, physical operator
choice, memory management, access path (index) use, …
73
Concurrency Control
Concurrent execution of user programs: key to good DBMS performance.
Disk accesses frequent, pretty slow
Keep the CPU working on several programs concurrently.
Interleaving actions of different programs: trouble!
e.g., account-transfer & print statement at same time
DBMS ensures such problems don’t arise.
Users/programmers can pretend they are using a single-user system.
(called “Isolation”)
Thank goodness! Don’t have to program “very, very carefully”.
74
Transactions: ACID Properties
Key concept is a transaction: a sequence of database actions
(reads/writes).
DBMS ensures atomicity (all-or-nothing property) even if system crashes
in the middle of a Xact.
Each transaction, executed completely, must take the DB between
consistent states or must not run at all.
DBMS ensures that concurrent transactions appear to run in isolation.
DBMS ensures durability of committed Xacts even if system crashes.
Note: can specify simple integrity constraints on the data. The DBMS
enforces these.
Beyond this, the DBMS does not understand the semantics of the data.
Ensuring that a single transaction (run alone) preserves consistency is
largely the user’s responsibility!
77
Structure of a
A
typical DBMS has a
layered architecture.
The figure does not
show the concurrency
control and recovery
components.
Each database system
has its own variations.
These layers
DBMSmust consider
concurrency
control and
recovery
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
78
Advantages of a DBMS
Data independence
Efficient data access
Data integrity & security
Data administration
Concurrent access, crash recovery
Reduced application development time
So why not use them always?
Expensive/complicated to set up & maintain
This cost & complexity must be offset by need
General-purpose, not suited for special-purpose tasks (e.g. text search!)
79
Databases make these folks happy ...
DBMS vendors, programmers
Oracle, IBM, MS, Sybase, …
End users in many fields
Business, education, science, …
DB application programmers
Build enterprise applications on top of DBMSs
Build web services that run off DBMSs
Database administrators (DBAs)
Design logical/physical schemas
Handle security and authorization
Data availability, crash recovery
Database tuning as needs evolve
…must understand how a DBMS works
80
Summary (part 1)
DBMS used to maintain, query large
datasets.
can manipulate data and exploit
semantics
Other benefits include:
recovery from system crashes,
concurrent access,
quick application development,
data integrity and security.
Levels of abstraction provide data
81
Summary, cont.
DBAs,
DB developers the
bedrock of the information
economy
•
DBMS R&D represents a broad,
fundamental branch of the science
of computation