Topics for Today (10/13/99)

Download Report

Transcript Topics for Today (10/13/99)

Categories of data
Operational and very short-term decision making data
Current, short-term decision making, related to financial
transactions, detailed data are stored, not structured for decision
making.
Historical and long-term decision making data
Saved for a pre-determined period of time, usually related to longterm decision making, structured for decision making.
Contains data that will support decisions of strategic importance.
Referred to as a “data warehouse”.
Archival data
Saved for a pre-determined period of time, used to track
transactions for audit, not structured for decision making.
1
Webflix data storage requirements
Operational needs.
What are examples of questions management needs to be able to
answer to handle daily operations effectively?
Decision support needs.
What are examples of questions management needs to be able to
answer to manage the organization effectively on a short and
long-term basis?
Governmental, legal or auditing needs.
What types of questions might be relevant for this type of
organization?
2
Operational data
Includes:
Master data (also called reference data): Customer,
employee, video, distribution center, critic, keyword.
Transaction data: Queue, Copy, Customer Contract.
Must store both master and transaction data.
Must store changes to both master and
transaction data.
3
Problems with operational data
May not be integrated.
May not be of good quality:
Incomplete.
Not accurate.
Inconsistent.
The meaning of the data is not fully defined and/or
understood by all stakeholders.
4
Archival data
Examples of archived data:
Emergency dispatch calls.
Credit card transactions.
Accounts payable transactions.
Tax-related data.
Does not usually have to be accessed quickly.
Must have procedures for extracting, transforming and loading
(ETL) data as necessary.
Archive database design is usually a copy of the transaction
database design.
5
Topics about Data Warehouses
What is a data warehouse?
How does a data warehouse differ from a transaction
processing database?
What are the characteristics of a data warehouse?
What are the components of a data warehousing
system?
How is a data warehouse created?
How is a data warehouse accessed?
6
Compare and Contrast TPS and DSS
Issue
TPS/MIS
DSS
Definition
Systems to
support day-today operations.
Systems to support
ad-hoc decision
making.
Users
clerks, data entry, managers, analysts,
low-level
support staff,
supervisors.
researchers.
Design goal
Performance.
Flexibility, ease of
use, ease of access.
Transaction
Type
Updates.
Queries.
Query
Activity
low; few joins.
high; many joins.
We use data to answer management questions
TPS Questions
How many customers currently
have “Skyfall” in the queue?
Data Warehouse Questions
How long does a customer
usually keep a video?
How many copies of “Skyfall”
are in inventory in
Sacramento?
Which customers return videos
within 2 days of receiving
them?
How many customers do we
have in Nevada City?
Which city has the most
customers who return videos
within 2 days of receiving
them?
When is “Cloud Atlas” going to
be released?
What is the most popular
genre for customers in Reno?
8
Operational vs. Data Warehouse databases
Issue
Content
Operational
database
Data
Warehouse
Internal data,
Internal and
process-oriented. external data.
Subject-oriented.
Data
currency
Real time.
Batch.
Current.
Historical.
Volatile.
Non-volatile.
Summary
level
Details of
transactions; no
(or very little)
derived data.
Summarized;
many
aggregation
levels.
Volume
Megabytes to
gigabytes.
Gigabytes to
terabytes.
Design
Normalized to
prevent
anomalies.
Denormalized to
enhance query
performance.
So, can one database support both transaction
processing and decision support applications?
Yes??
No??
11
Historical
Data
12
A Business Intelligence “System”
A business intelligence system encompasses all
processes, hardware and software necessary to
extract data, transform it, integrate it, store it,
and provide information. The information is then
made effective and accessible to users to support
decision making.
Sounds like just another information system...
So what makes it different?
13
Big
Data!
14
No data marts option
Applications
(Visualization)
Data
Sources
Access
ETL
Process
Metadata
Extract
POS
Transform
Enterprise
Data warehouse
Integrate
Other
OLTP/wEB
Data mart
(Finance)
Load
Replication
External
data
Data mart
(Engineering)
Data mart
(...)
/ Middleware
Data mart
(Marketing)
Select
Legacy
Routine
Business
Reporting
API
ERP
Data/text
mining
OLAP,
Dashboard,
Web
Custom built
applications
15
Components of a business intelligence/data
warehousing system
Data store.
Extraction/transformation/loading processes.
Analysis tools – both end-user and IT professional.
Visualization tools – primarily end-user.
16
What is a data warehouse (data store)?
A data warehouse is a database designed to support a
decision support system.
A data warehouse is:
Integrated: It is a centralized, consolidated database integrating data
from an entire organization.
Subject-oriented: Data warehouse data are organized around key
subjects. The data are usually arranged by topic, such as customers,
products, suppliers, etc.
Time-variant: Data in the warehouse contain a time dimension so
that they may be used as a historical aggregation.
Non-volatile: Once data enter, they seldom leave. Data are
appended rather than overwritten. Data are updated in batches.
Issues in creating a data warehouse
How to get accurate and complete data?
How to consolidate data?
Differing data meanings.
Differing storage mechanisms.
Differing data formats.
18
Customer
Transaction
Database
Order
Transaction
Database
Product
Transaction
Database
Data
Scrubbing
Data
Scrubbing
Data
Scrubbing
Data
Extraction
Data
Extraction
Data
Extraction
Data
Integration
Creating a
Data
Warehouse
Sales Data
Warehouse
Data mart extraction data warehouse
Data mart
Operational
database
Extract,
Transform
and Load
Processes
User
departments
Data mart
Operational
database
External
data source
Data mart
20
Two-tier data warehouse architecture
D ata warehouse
serv er
Operational
database
Transf orm ation
proces s
Operational
database
Sum marized
data
External
data source
ED M
Data warehouse
U ser departments
Three-tier data warehouse architecture
D ata warehouse
serv er
Operational
database
Transf orm ation
proces s
D ata m art tier
Extraction
proces s
Data mart
Operational
database
Sum marized
data
External
data sourc e
ED M
Data warehouse
Data mart
U ser
departm ents
Issues in designing a data warehouse
Must have a predefined subject focus.
Has the potential to be very large – must define
the “grain” or granularity level of storage.
Will always have a dimension of time.
May contain derived data.
May be a summary of data, rather than each
detailed transaction.
Does not always adhere to standard
normalization rules.
23
Analysis tools
Standard old queries
Online Analytical Processing
Data Mining
24
Online analytical processing
Provides multi-dimensional data analysis techniques.
Works primarily with data aggregation.
Provides advanced statistical analysis.
Supports access to very large databases.
Provides enhanced query optimization algorithms.
Lots of acronyms: OLAP, ROLAP, MOLAP, HOLAP.
Can be add-ons to existing products, example is Excel.
Can have their own user interfaces.
25
OLAP vs. Data Mining questions
OLAP
Which customers spent
the most with us in the
past year?
Data Mining
Which types of customers
are likely to spend the
most with us in the coming
year?
How much did the bank
What are the
lose from loan defaulters characteristics of the
within the past two years? customers most likely to
default on their loans
before the year is over?
What were the highest
selling fashion items in
our London stores?
What additional products
are most likely to be sold
to customers who buy
shorts?
Which store/location
made the highest sales in
the past year?
In which area whould we
open a new store next
year?
Data mining
Data mining tools:
analyze the data;
uncover patterns hidden in the data;
form computer models based on the findings; and
use the models to predict business behavior.
Proactive tools.
Based on artificial intelligence software such as
decision trees, neural networks, fuzzy logic systems,
inductive nets and classification networking.
27
Visualization tools
Graphical.
Spreadsheet format - usually Excel look-and-feel.
Beyond the spreadsheet using discovery tools.
Example: http://www.gapminder.org/
Dashboard. Examples:
http://www.dundas.com/dashboard/onlineexamples/
Web-based.
28