Chapter 15 - Personal.kent.edu

Download Report

Transcript Chapter 15 - Personal.kent.edu

Database Processing

Chapter 15

Business Intelligence & Data Warehousing David M. Kroenke DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1

Business Intelligence (BI) Systems

Business Intelligence (BI)

systems are information systems that assist managers and other professionals: – To analyze current and past activities, and – To predict future events.

• Two broad categories: –

Reporting

Data mining

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-2

The Relationship of Operational and BI Applications

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-3

Reporting Applications

Reporting system

applications: – Filter, Sort, Group, Simple Calculations using SQL – Classify entities (customers, products, employees, etc.) •

RFM Analysis

– Deal with critical report delivery DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-4

Data Mining Applications

Data mining

applications are used to: – Perform what-if analysis – Make predictions – Facilitate decision making • Data mining applications use sophisticated statistical and mathematical techniques.

• Report delivery is not as critical.

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-5

Need for Data Warehousing

• Integrated, company-wide view of high-quality information (from disparate databases) • Separation of

operational

and

informational

and data (for improved performance) systems

Comparison of Operational and Informational Systems

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-6

Definitions

Data Warehouse

: – A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision making processes –

Subject-oriented:

products e.g. customers, patients, students, –

Integrated:

Consistent naming conventions, formats, encoding structures; from multiple data sources – –

Time-variant:

Can study trends and changes

Non-updatable:

Read-only, periodically refreshed •

Data Mart

: – A data warehouse that is limited in scope DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-7

Data Warehouse vs. Data Mart

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-8

Components of a Data Warehouse

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-9

Data Warehouse and Data Marts

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-10

Data Warehouses and Data Marts:

Problems of Using Transaction Data for BI DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-11

The ETL Process

• Extract/Capture – Static vs. Incremental • Transform – Scrub or data cleansing – Data selection, joining, aggregation • Load and Index – Refresh vs. Update DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-12

Data Warehouse ETL Sequence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-13

Components of a Star Schema

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-14

Star Schema Example

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-15

Issues Regarding Star Schema

• Dimension table keys should be

surrogate

: – Keys may change over time – Length/format consistency • Granularity of Fact Table – what level of detail?

– Transactional grain – finest level – Aggregated grain – more summarized – Finer grain: better

market basket analysis

capability, but much more data (more dimension tables, more rows in fact table) • Duration of the database – how much history should be kept?

– Natural duration – 13 months or 5 quarters – Financial institutions may need longer duration – Older data is more difficult to source and cleanse DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-16

Data Warehousing at Wal-Mart

• As of 2000 (Foote and Krishnamurthi, 2001) – Held 7.5 TB, with plans to reach 24 TB (1TB = 250M pages of text) – Kept 65 weeks of data – Had invested $4 Billion – Power users generated $12,000/query • As of 2005 (Wall Street Journal, December 3-4, 2005) – Held 570 TB (more than Internet’s fixed pages) – Predicted Hurricane Ivan would spur demand for easy breakfasts • Stocked Florida stores with Pop-Tarts DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-17

Reporting Systems:

RFM Analysis •

RFM Analysis

according to purchasing patterns: –

R

analyzes and ranks customers = Recent (most recent order) – –

F

= Frequent (how often an order is made)

M

= Money (dollar amount of orders) • Customers are sorted into five groups, each containing – –

1 2, 3, 4 20%

= Top 20% of the customers.

• Each group is given a numerical value: = Each 20% in between top and bottom 20% –

5

= Bottom 20% DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-18

Reporting Systems:

RFM Analysis (Continued) Ajax ordered recently (1), orders often (1) but does not order the most expensive items (3) – Try to sell Ajax more expensive goods!

Bloominghams has not ordered recently (5), but has ordered often (1) and purchased the most expensive items (1).

This customer may be looking for a different vendor – better call!

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-19

Reporting Systems:

OnLine Analytical Processing [OLAP] • • An

OLAP report

– – has measures and dimensions:

Measure

— A data item of interest.

Dimension

— A characteristic of a measure.

OLAP cube

— A presentation of a measure with associated dimensions.

– An OLAP cube can have

any

number of axes.

– The terms

OLAP cube

synonymous.

and

OLAP report

are • OLAP allows

drill-down

— a further division of the data into more detail.

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-20

Reporting Systems:

OLAP Drill Down: Product Family by Store Type DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-21

Reporting Systems:

OLAP Drill Down: Product Family and Store Location by Store Type DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-22

Reporting Systems:

OLAP Drill Down: Store Location and Product Family by Store Type DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-23

Data Mining Applications

Data mining

– • applications use sophisticated statistical and mathematical techniques to find patterns and relationships that can be used to classify and predict.

Unsupervised data mining

are used to identify groups of entities with similar characteristics.

Cluster Analysis

— Statistical techniques –

Supervised data mining

: • A model is developed.

• Statistical techniques are used to estimate parameter values of the model.

Regression analysis

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-24

Data Mining Applications:

The Convergence of the Disciplines DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-25

Data Mining Applications:

Three Popular Data Mining Techniques • • •

Decision tree analysis

— Classifies entities into groups based on past history.

Logistic regression

— Produces equations that offer probabilities that certain events will occur.

Neural Networks

— Complex statistical prediction techniques DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-26

Data Mining Applications:

Market Basket Analysis •

Market Basket Analysis

— Determines patterns of associated buying behavior.

Support

— The probability that two items will be purchased together.

– –

Confidence

— The probability that an item will be purchased given the fact that the customer has already purchased another particular item.

Lift

— the ratio of confidence to the basic probability that a particular item will be purchased.

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-27

Data Mining Applications:

Market Basket Analysis Example DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-28