Document 7382271

Download Report

Transcript Document 7382271

1
Module
3
The concept of data
processing
Major issues in database
management
2
Learning Objectives
Explain
the importance of implementing data
resource management processes and
technologies in an organization.
Understand
the advantages of a database
management approach to managing the data
resources of a business.
3
Learning Objectives (continued)
 Explain
how database management software helps
business professionals and supports the operations
and management of a business.
 Illustrate each of the following concepts:
 Major types of databases
 Data warehouses and data mining
 Logical data elements
 Fundamental database structures
 Database access methods
 Database development
4
Section I
Managing Data Resources
5
Data Resource Management
A
managerial activity
Applies information systems technology to
managing data resources to meet needs of
business stakeholders.
6
Foundation Data Concepts
Levels
of data
Character
Single alphabetical, numeric, or other
symbol
Field
Groupings of characters
Represents an attribute of some entity
7
Foundation Data Concepts (continued)
Records
Related
fields of data
Collection of attributes that describe an
entity
Fixed-length or variable-length
8
Foundation Data Concepts (continued)
Files
(table)
A group of related records
Classified by
Primary use
Type of data
permanence
9
Foundation Data Concepts (continued)
Database
Integrated
collection of logically related
data elements
Consolidates records into a common pool
of data elements
Data is independent of the application
program using them and type of storage
device
10
Foundation Data Concepts (continued)
 Logical
Data Elements
11
Types of Databases
Operational
Supports
business processes and operations
Also called subject-area databases,
transaction databases, and production
databases
12
Types of Databases (continued)
Distributed
Replicated
and distributed copies or parts of
databases on network servers at a variety of
sites.
Done to improve database performance and
security
13
Types of Databases (continued)
External
Available
for a fee from commercial sources
or with or without charge on the Internet or
World Wide Web
Hypermedia
Hyperlinked
pages of multimedia
14
Data Warehouses and Data Mining
Data
warehouse
Stores data extracted from operational,
external, or other databases of an
organization
Central source of “structured” data
May be subdivided into data marts
15
Data Warehouses and Data Mining (continued)
Data
mining
A major use of data warehouse databases
Data is analyzed to reveal hidden
correlations, patterns, and trends
16
Database Management Approach
Consolidates
data records and objects into
databases that can be accessed by many
different application programs
17
Database Management Approach (continued)
Database
Management System
Software interface between users and
databases
Controls creation, maintenance, and use of
the database
18
Database Management Approach (continued)
19
Database Management Approach (continued)
Database
Interrogation
Query
Supports
ad hoc requests
Tells the software how you want to
organize the data
SQL queries
Graphical (GUI) & natural queries
20
Database Management Approach (continued)
Report
Generator
Turns results of query into a useable
report
Database
Maintenance
Updating and correcting data
21
Database Management Approach (continued)
Application
Development
Data manipulation language
Data entry screens, forms, reports, or web
pages
22
Implementing Data Resource Management
Database
Administration
Develop and maintain the data dictionary
Design and monitor performance of
databases
Enforce database use and security standards
23
Implementing Data Resource Management (continued)
Data
Planning
Corporate planning and analysis function
Developing the overall data architecture
24
Implementing Data Resource Management (continued)
Data
Administration
Standardize collection, storage, and
dissemination of data to end users
Focused on supporting business processes
and strategic business objectives
May include developing policy and setting
standards
25
Implementing Data Resource Management (continued)
Challenges
Technologically
complex
Vast amounts of data
Vulnerability to fraud, errors, and failures
26
Section II
Technical Foundations of Database
Management
27
Database Structures
Hierarchical
Treelike
One-to-many
relationship
Used for structured, routine types of
transaction processing
28
Database Structures (continued)
Network
More
complex
Many-to-many relationship
More flexible but doesn’t support ad hoc
requests well
29
Database Structures (continued)
Relational
Data
elements stored in simple tables
Can link data elements from various tables
Very supportive of ad hoc requests but
slower at processing large amounts of data
than hierarchical or network models
30
Database Structures (continued)
Multi-Dimensional
A
variation of the relational model
Cubes of data and cubes within cubes
Popular for online analytical processing
(OLAP) applications
31
Database Structures (continued)
32
Database Structures (continued)
Object-oriented
Key
technology of multimedia web-based
applications
Good for complex, high-volume applications
33
Database Structures (continued)
34
Accessing Databases
Key
fields (primary key)
A field unique to each record so it can be
distinguished from all other records in a
table
35
Accessing Databases (continued)
Sequential
access
Data is stored and accessed in a sequence
according to a key field
Good for periodic processing of a large
volume of data, but updating with new
transactions can be troublesome
36
Accessing Databases (continued)
Direct
access
Methods
Key transformation
Index
Indexed sequential access
37
Database Development
Data
dictionary
Directory containing metadata (data about
data)
Structure
Data elements
Interrelationships
Information regarding access and use
Maintenance & security issues
38
Database Development (continued)
Data
Planning & Database Design
Planning & Design Process
Enterprise model
Entity relationship diagrams (ERDs)
Data modeling
Develop logical framework for the
physical design
39
Discussion Questions
How
should an e-business enterprise store,
access, and distribute data & information
about their internal operations & external
environment?
What
roles do database management, data
administration, and data planning play in
managing data as a business resource?
40
Discussion Questions (continued)
What
are the advantages of a database
management approach to organizing,
accessing, and managing an organization’s
data resources?
What
is the role of a database management
system in an e-business information system?
41
Discussion Questions (continued)
Databases
of information about a firm’s
internal operations were formerly the only
databases that were considered to be
important to a business. What other kinds of
databases are important for a business today?
What
are the benefits and limitations of the
relational database model for business
applications?
42
Discussion Questions (continued)
Why
is the object-oriented database model
gaining acceptance for developing applications
and managing the hypermedia databases at
business websites?
How
have the Internet, intranets, extranets,
and the World Wide Web affected the types
and uses of data resources available to
business end users?
43
Real World Case 1 – IBM versus Oracle
What
key business strategies did Janet Perna
implement to help IBM catch up to Oracle in
the database management software market?
What
is the business case for both IBM’s and
Oracle’s product strategy for their database
software?
44
Real World Case 1 (continued)
Which
approach would you recommend to a
company seeking a database system today?
What
do you see as the key factor to IBM’s
success?
45
Real World Case 1 (continued)
The
case states that “database software has
become more of a commodity.” Do you agree?
46
Real World Case 2 – Experian Automotive
How
do the database software tools discussed
in this case help companies exploit their data
resources?
What
is the business value of the automotive
database created by Experian?
47
Real World Case 2 (continued)
What
other business opportunities could you
recommend to Experian that would capitalize
on their automotive database?
The
case states that Experian’s automotive
database “has raised the hackles of privacy
advocates.” What legitimate privacy concerns
and safeguard suggestions might be raised
about this database and its use?
48
Real World Case 3 – Shell Exploration
Why
do companies still have problems with
the quality of the data resources stored in their
business information systems?
What
is a “data silo?”
49
Real World Case 3 (continued)
How
do data warehouse approaches help
companies like Shell and OshKosh meet their
data resource management challenges?
What
business benefits can companies derive
from a data warehouse approach?
50
Real World Case 4 – BlueCross BlueShield & Warner Bros.
What
is a storage area network? Why are so
many companies installing SANs?
What
are the reasons for the quick payback on
SAN investments?
51
Real World Case 4 (continued)
What
are the challenges and alternatives to
SANs as a data storage technology?
What
are some advantages of SANs?
52
Real World Case 5 – Sherwin-Williams & Krispy Kreme
Tips
for Managing External Data
Purchase external data from a reliable
source that will do most of the refining for
you and will work with you on contingency
plans.
Run
a test load first. A load of test data can
pave the way for accurate production loads.
53
Real World Case 5 (continued)
Managing
external data (continued)
Don’t collect data until business and IT staff
have agreed on the amount, frequency,
format, and content of the data you need.
Don’t
acquire more data or use more data
sources than you really need.
54
Real World Case 5 (continued)
Managing
external data (continued)
Don’t mingle external and homegrown data
without adding unique identifiers to each
record, in case you need to pull it out.
Don’t
overestimate the data’s integrity.
Nothing beats direct customer contact and
tactical details behind the data.
55
Real World Case 5 (continued)
What
challenges in acquiring and using data
from external sources are identified in this
case?
Do
you prefer the Sherwin-Williams or Krispy
Kreme approach to acquiring external data?
56
Real World Case 5 (continued)
What
other sources of external data might a
business use to gain valuable marketing and
competitive intelligence?
57
CS 317 - Data Management and
Information Processing
58
What Is a Database System?
Database:
a very
large, integrated collection of data.
 Models a real-world enterprise
 Entities (e.g., teams, games)
 Relationships
(e.g., The
Forty-Niners are playing in The Superbowl)
 More recently, also includes active components , often called
“business logic”. (e.g., the BCS ranking system)


A Database Management System (DBMS) is a software system
designed to store, manage, and facilitate access to databases.
59
Database Systems: Then
60
Database Systems: Today
From Friendster.com on-line
61
Other Ways Databases Make Life Better?
“Players
could finally
sign up for the Star
Wars Galaxies game
last week as Sony
opened up registration
to the public.”
“Once
players got in to
the game they found
that the game servers
were offline because of database
problems.”
62
Other databases you may use
63
=
Is the WWW a DBMS?
Fairly sophisticated search available
 crawler indexes pages on the web
 Keyword-based search for pages
 But, currently
 data is mostly unstructured and untyped
 search only:



can’t modify the data
can’t get summaries, complex combinations of data
few guarantees provided for freshness of data, consistency across
data items, fault tolerance, …
 Web sites typically have a DBMS in the background to provide
these functions.
 The picture is changing
 New standards e.g., XML, Semantic Web can help data modeling
 Research groups (e.g., at Berkeley) are working on providing some
of this functionality across multiple web sites.

64
“Search” vs. Query
What
if you
wanted to find
out which actors
donated to John
Kerry’s
presidential
campaign?
“actors
donated to john
kerry” in your
favorite search
Try
65
A “Database Query” Approach
66
Why Study Databases??
Shift from computation to information
 always true for corporate computing
 Web made this point for personal computing
 more and more true for scientific computing
 Need for DBMS has exploded in the last years
 Corporate: retail swipe/clickstreams, “customer relationship
mgmt”, “supply chain mgmt”, “data warehouses”, etc.
 Scientific: digital libraries, Human Genome project, NASA
Mission to Planet Earth, physical sensors, grid physics network
 DBMS encompasses much of CS in a practical discipline
 OS, languages, theory, AI, multimedia, logic
 Yet traditional focus on real-world apps

?
67
What’s the intellectual content?
representing
 data
information
modeling
languages
and systems for querying data
 complex
queries with real semantics*
 over massive data sets
concurrency
control for data
manipulation
 controlling
concurrent access
 ensuring transactional semantics
reliable
data storage
 maintain
data semantics even if you pull the
68
Describing Data: Data Models
A
data model is a collection of concepts
for describing data.
A
schema is a description of a particular
collection of data, using a given data
model.
The
relational model of data is the most
widely used model today.
Main concept: relation, basically a table
with rows and columns.
69
Levels of Abstraction
Users
Views
describe how
users see the data.
View 1
Conceptual
schema
defines logical
structure
View 2
Conceptual Schema
Physical Schema
DB
Physical
schema
describes the files and
View 3
70
Example: University Database
Conceptual
View 1
View 2
View 3
schema:
Conceptual Schema
 Students(sid: string, name: string,
login: string, age: integer, gpa:real)
Physical Schema
 Courses(cid: string, cname:string,
credits:integer)
DB
 Enrolled(sid:string, cid:string,
grade:string)
External Schema (View):
Course_info(cid:string,enrollment:intege
r)
71
Data Independence
Applications
insulated
from how data is
structured and stored.
Logical data
independence:
Protection from changes
in logical structure of
data.
Physical
data
View 1
View 2
View 3
Conceptual Schema
Physical Schema
DB
72
Queries, Query Plans, and Operators
SELECT
SELECT eid,
E.loc,
ename,
AVG(E.sal)
title
COUNT
DISTINCT
(E.eid)
FROM
Emp
E
FROM
Emp
Proj
P, Asgn A
GROUP
BYE,E.loc
WHERE
E.sal
> $50K
WHERE E.eid = A.eid
HAVING Count(*) > 5
AND P.pid = A.pid
AND E.loc <> P.loc

Count
Having
distinct

Group(agg)
Join
Select

Join
Emp

System handles query plan
generation & optimization;
ensures correct execution.
Proj
Emp
Emp
Asgn
Employees
Projects
Assignments
• Issues: view reconciliation, operator ordering, physical operator
choice, memory management, access path (index) use, …
73
Concurrency Control
Concurrent execution of user programs: key to good DBMS performance.
 Disk accesses frequent, pretty slow
 Keep the CPU working on several programs concurrently.
 Interleaving actions of different programs: trouble!
 e.g., account-transfer & print statement at same time
 DBMS ensures such problems don’t arise.
 Users/programmers can pretend they are using a single-user system.
(called “Isolation”)
 Thank goodness! Don’t have to program “very, very carefully”.

74
Transactions: ACID Properties

Key concept is a transaction: a sequence of database actions
(reads/writes).
DBMS ensures atomicity (all-or-nothing property) even if system crashes
in the middle of a Xact.
 Each transaction, executed completely, must take the DB between
consistent states or must not run at all.
 DBMS ensures that concurrent transactions appear to run in isolation.
 DBMS ensures durability of committed Xacts even if system crashes.


Note: can specify simple integrity constraints on the data. The DBMS
enforces these.
 Beyond this, the DBMS does not understand the semantics of the data.
 Ensuring that a single transaction (run alone) preserves consistency is
largely the user’s responsibility!
77
Structure of a
A
typical DBMS has a
layered architecture.
The figure does not
show the concurrency
control and recovery
components.
Each database system
has its own variations.
These layers
DBMSmust consider
concurrency
control and
recovery
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
78
Advantages of a DBMS
Data independence
 Efficient data access
 Data integrity & security
 Data administration
 Concurrent access, crash recovery
 Reduced application development time
 So why not use them always?
 Expensive/complicated to set up & maintain
 This cost & complexity must be offset by need
 General-purpose, not suited for special-purpose tasks (e.g. text search!)

79
Databases make these folks happy ...
DBMS vendors, programmers
 Oracle, IBM, MS, Sybase, …
 End users in many fields
 Business, education, science, …
 DB application programmers
 Build enterprise applications on top of DBMSs
 Build web services that run off DBMSs
 Database administrators (DBAs)
 Design logical/physical schemas
 Handle security and authorization
 Data availability, crash recovery
 Database tuning as needs evolve

…must understand how a DBMS works
80
Summary (part 1)



DBMS used to maintain, query large
datasets.
 can manipulate data and exploit
semantics
Other benefits include:
 recovery from system crashes,
 concurrent access,
 quick application development,
 data integrity and security.
Levels of abstraction provide data
81
Summary, cont.
 DBAs,
DB developers the
bedrock of the information
economy
•
DBMS R&D represents a broad,
fundamental branch of the science
of computation