Database - McMaster University

Download Report

Transcript Database - McMaster University

Final Exam




Thursday Dec. 9, 19:30, BSB B154
Lecture notes
20 True/False and 20 multiple Choice
questions: 40 points
4 Short answer questions (application
and conceptual): 40 points
Project Presentation
Email me Power Point presentation the day before
 Presentation by all members: 15 minutes
 Question and answer: 5 minutes
 Peer evaluation:
1) Objective and the value of the project
2) Data modeling
3) Functions of the application (with demo)
4) Quality of presentation
5) Overall

Data Warehouse
Data Warehouse


The idea of a data warehouse is to put a wide
range of operational data from internal and
external sources into one place so it can be
better utilized by executives, line of business
managers and other business analysts.
Once the information is gathered, OLAP (online analytical processing ) software comes
into play by providing the desktop analysis
tools for querying, manipulating and reporting
the data from the data warehouse.
Data Warehouse environment




the source systems from which data is
extracted
the tools used to extract data for
loading the data warehouse
the data warehouse database itself
where the data is stored
the desktop query and reporting tools
used for decision support
The Data Warehouse

The Data Warehouse is an integrated,
subject-oriented, time-variant, nonvolatile database that provides support
for decision making.
Creating A Data Warehouse
Figure 13.3
Operational Vs. Multidimensional View Of Sales
The Data Warehouse

Integrated


The Data Warehouse is a centralized,
consolidated database that integrates data
retrieved from the entire organization.
Subject-Oriented

The Data Warehouse data is arranged and
optimized to provide answers to questions
coming from diverse functional areas
within a company.
The Data Warehouse

Time Variant
 The
Warehouse data represent the flow of data
through time. It can even contain projected
data.

Non-Volatile
 Once
data enter the Data Warehouse, they are
never removed.
 The Data Warehouse is always growing.
Operational Database vs.
Data warehouse
Data Warehouse
Operational DB
 Unified view of all
 Similar data can have
data elements
different representations
 Subject orientation
or meanings
for decision support
 Functional or process
 Historical information
orientation
with time dimension
 Current transaction
 Data are added
 Frequent updating
without change
Data Mart

A data mart is a small, single-subject
data warehouse subset that provides
decision support to a small group of
people.
Data Mart


Data Marts can serve as a test vehicle
for companies exploring the potential
benefits of Data Warehouses.
Data Marts address local or
departmental problems, while a Data
Warehouse involves a company-wide
effort to support decision making at all
levels in the organization.
Star Schema


The star schema is a data modeling technique
used to map multidimensional decision
support into a relational database.
Star schemas yield an easily implemented
model for multidimensional data analysis
while still preserving the relational structure
of the operational database.
Star Schema

Four Components:
Facts
 Dimensions
 Attributes
 Attribute hierarchies

Figure 13.14 A Three-Dimensional View of Sales
Figure 13.17 Attribute Hierarchies in Multidimensional Analysis
Figure 13.17 Star Schema For Sales
Star Schema Representation



Facts and dimensions are normally
represented by physical tables in the data
warehouse database.
The fact table is related to each dimension
table in a many-to-one (M:1) relationship.
Fact and dimension tables are related by
foreign keys and are subject to the
primary/foreign key constraints.
Figure 13.18 Orders Star Schema
Star Schema

Performance-Improving Techniques




Normalization of dimensional tables
Multiple fact tables representing different
aggregation levels
Denormalization of fact tables
Table partitioning and replication
Figure 13.19 Normalized Dimension Tables
Multiple Fact Tables
Data Warehouse
Implementation




The Data Warehouse as an Active
Decision Support Network
A Company-Wide Effort that Requires
User Involvement and Commitment at
All Levels
Satisfy the Trilogy: Data, Analysis, and
Users
Apply Database Design Procedures
Data Warehouse Implementation Road Map
On-Line Analytical Processing


On-Line Analytical Processing (OLAP) is an
advanced data analysis environment that
supports decision making, business modeling,
and operations research activities.
Four Main Characteristics of OLAP




Use multidimensional data analysis techniques.
Provide advanced database support.
Provide easy-to-use end user interfaces.
Support client/server architecture.
Figure 13.7 OLAP Server Arrangement
http://www.dwinfocenter.org/
Data Mining


The data warehouse that enterprises
are building until now have largely
ignored
Factors make data mining feasible
organizations are gathering more data
from on-line TPS with lower storage cost
 high computation power allows using
complex data mining algorithm

Data Mining

With data mining, it is possible to better
manage product warranties, predict
purchases of retail stock, unearth fraud,
determine credit risk, and define new
products and services.
Data-Mining Phases
Four Phases of Data Mining
1. Data Preparation


Identify and cleanse data sets.
Data Warehouse is usually used for data
mining operations.
2. Data Analysis and Classification

Identify common data characteristics or
patterns using



Data groupings, classifications, clusters, or
sequences.
Data dependencies, links, or relationships.
Data patterns, trends, and deviations.
Four Phases of Data Mining
3. Knowledge Acquisition


Select the appropriate modeling or knowledge
acquisition algorithms.
Examples: neural networks, decision trees, rules
induction, genetic algorithms, classification and
regression tree, memory-based reasoning, or
nearest neighbor and data visualization).
4. Prognosis

Predict future behavior and forecast business
outcomes using the data mining findings.
Data Mining

Data mining yields five basic type of
information:



Association - occurrences are linked to a single
event. “beer purchasers also buy peanuts 70% of
the time”
Sequences - events are linked over time. “a new
carpet purchase linked to new curtains”
Classification - patterns are recognized that describe
the characteristics of a group, such as customers
who cancel credit cards
Data Mining

Clustering - discovers undiscovered
groupings ``Buyers of expensive sport
cars are typically young urban professionals
whereas luxury sedans are bought by elderly
wealthy persons.''

Forecasting - estimates future value
such as inventory turnover
Database Marketing


It seems a lot of companies are taking a
friendly interest in your life these days
Companies are collecting mountains of
information about you, crunching it to
predict how likely you are to buy a
products, and using that knowledge to
craft a marketing message precisely
calibrated to get you to do so.
Database Marketing

The trend:




Mass marketing
Marketing segmentation
Individual marketing
Nothing is more powerful than
knowledge about customers’
individual practice and preferences.
Database Marketing





Gathering massive quantity of data about
consumers from multiple sources
Data are combined and analyzed using
powerful tools
Has a primary goal of better understanding
current and potential customers in order to
boost sales and build customer loyalty
American Express
Reader’s Digest
Pioneers of New Marketing


General Motors surveys 12 million GM
Card holders on their car preferences
Blockbuster has a database of 36
million households and 2 million daily
transactions. It is testing a system that
will recommend movies based on a
customer’s past rentals
Pioneers of New Marketing



Kraft amassed a list of 30 million users from
coupons and survey questions. It regularly
send them tips on nutrition and recipes, as
well as coupons for specific brands
56% of manufacturers and retailers are
currently building a database for marketing
85% believe they will do database marketing
in 2000.
Some concerns



Private intelligence-gathering gives
some people the creeps
Targeted marketing efforts are intrusive
and annoying
The collection, manipulation, and
combination of lists of personal
information amount to an ominous
invasion of privacy
A Sample Of Current Data Warehousing And
Data Mining Vendors
Table 13.10
http://www.irmac.ca/
http://www.almaden.ibm.com
/cs/quest/TECH.html
Deploying Data Mining for
Competitive Advantage


The act of building data-mining models
does not, by itself, guarantee any
business value
To be used as competitive weapon,
data mining must be part of a larger
process that ensures that the
information learned by data mining is
transformed into actionable results
A process of deploying data mining
for competitive advantage





Problem definition
Discovery
Implementation
Taking action
Monitoring the results
Anderson 2001 survey: Data
mining in retailer industry

Using data mining: 52.5% in total





75% of very large retailers (>$500 million)
46.4% of large ($200-499 million)
34.8% of medium ($50-199 million)
20% of small (<$50 million)
Effect (contribution to the bottom line)




52.5% said "no contribution"
19.8% said "very little."
17.8% said "somewhat”
8.9% said "very much."
Hunt for terrorists


Banks probe credit, debit card records in hunt
for terrorists: 'Data mining': Any suspicious
transactions handed over to RCMP
Canada's major financial institutions are
reviewing thousands of their customers'
confidential transactions as part of a probe
into terrorist funding that has gone beyond a
list of 27 suspects provided by U.S. law
enforcement agencies.
Source: Financial Post (National Post)
September 27, 2001