Data Warehouse

Download Report

Transcript Data Warehouse

Data Warehouse:
Methodology and Tools
Concepts, Architectures and Products
FORWISS - Bavarian Research Centre for Knowledge Based Systems
© 1999 FORWISS
2
Overview

The Process of Planning and Building a Warehouse

Data Warehouse Architecture (revisited)

Classification of Tools

Focus: OLAP Tools
– Multidimensional Data Modeling
– OLAP Architectures
– OLAP query languages

Tool Demonstration

Summary
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
3
OLAP Design Cycle
Using the Data
Warehouse
Requirement
Analysis
Conceptual Design
(Implementation
Independent)
Implementation
Logical + Physical Design
(e.g. Product specific)
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
4
Data Warehouse Architecture
Data
Analysis
Data
Storage
Data-Migration
Reporting, OLAP,
Data Mining
Repository
Middleware (Populations-Tools)
Operational
Data Sources
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
5
Classification of Tools

Frontend Tools

Data Storage Tools (Databases)

ETL Tools
– Extraction
– Transformation
– Loading

Repository Systems
– Metadata Storage
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
6
Repository Systems

Manage Different Kinds of Metadata
– Business Metadata
» E.g. How is revenue computed
– Technical Metadata
» When was data last loaded from which system
» Data model for OLTP and OLAP databases

Functionality
– Communication ‚hub‘ for different tools
– Guides user exploration
– Guides development process
– Impact analysis

E.g. Viasoft Rochade, Softlab Enabler,...
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
7
ETL Tools

Extraction: Range of Supported Data Sources
– Mainframe legacy databases
– COBOL Files
– Relational Databases
– Filebased data storage (Excel, Word,XML,...)

Transformation
– (Graphical) Specification of Transformation Rules (Expressive Power)

Loading
– Ability to use database features (e.g. bulk loading)

Process Management
– Scheduling, Monitoring, Error Handling

Informatica PowerMart, Hummingbird Genio, Acta...
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
8
Databases for DW

Special Indexing Techniques
– Multidimensional Indexes
– Bitmap Indexes
– Foreign Column Indexes

Support for Materialized Views (Preaggregation)

Special Analytical Capabilities (e.g. SQL Extensions)
– Top N
– Ranking

Bulk Loading Capabilities
– Offline, No concurrency control
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
9
Frontend Tools
Reporting
Why did
it happen?
Interactive OLAP
Ad hoc-Queries
What will
happen?
Additional Benefit
DW: Tools and Projects
What
happened?
What happened
why and how?
Data Mining
Number of Users
© 2001 FORWISS,Carsten Sapia - [email protected]
10
The User‘s view (OLAP Tool)
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
11
Multidimensional OLAP (MOLAP)
Frontend
Tool

specialized database technology

multidimensional storage structures

E.g. Hyperion Essbase, Oracle Express,
Cognos PowerPlay (Server)
Multidim.
Database
+
Query Performance
+
Powerful MD Model
+
write access

Database Features
multiuser access/ backup and recovery

DW: Tools and Projects
Sparsity Handling -> DB Explosion
© 2001 FORWISS,Carsten Sapia - [email protected]
12
Relational OLAP (ROLAP)
Frontend
Tool
MDInterface
ROLAPEngine
SQL
Relational DB
DW: Tools and Projects

idea: use relational data storage

star (snowflake) schema

E.g. Microstrategy, SAP BW
+
advantages of RDBMS
+ scalability, reliability, security etc.
Meta Data
+
Sparsity handling

Query Performance

Data Model Complexity

no write access
© 2001 FORWISS,Carsten Sapia - [email protected]
13
Client (Desktop) OLAP
ClientOLAP
DW: Tools and Projects

proprietary data structure on the client

data stored as file

mostly RAM based architectures

E.g. Business Objects, Cognos PowerPlay
+
mobile user
+
ease of installation and use

data volume

no multiuser capabilites
© 2001 FORWISS,Carsten Sapia - [email protected]
14
DW Integration
MOLAP
ROLAP
ClientOLAP
ROLAPEngine
Multidim.
Database
DW-DB (mostly relational)
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
15
Combining Architectures I
Drill through
 highly
aggregated data
Multidim.
Database
 dense
data
Relational
Database
 detailed
DW: Tools and Projects
 95%
 5%
of the analysis requirements
data (sparse)
of the requirements
© 2001 FORWISS,Carsten Sapia - [email protected]
16
Combining Architectures II
 equal
Hybrid OLAP (HOLAP)
treatment of MD and Rel
Data
 Storage
type at the discretion of
the administrator
 Cube
Partitioning
HOLAP System
Meta Data
Multidim. Storage
DW: Tools and Projects
Relational Storage
© 2001 FORWISS,Carsten Sapia - [email protected]
17
OLAP Standards

Idea: define interface between client and server

Benefit: Component oriented architectures

Proposal 1: OLAP Council
– union of OLAP Tool producers
– not implemented so far (even by the council members)

Proposal 2: Microsoft - OLEDB for OLAP (shot ODBO)
– standardizes a data model and an MD query language (MDX)
– specification contains lots of optional functionality
– all major vendors committed themselves to the standard
– will be the de facto standard
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
Practical Case Study
Building a Warehouse
artwork copyright Intersystems GmbH
© 1999 FORWISS
19
Conceptual Design
Using the Data
Warehouse
Requirement
Analysis
Conceptual Design
(Implementation
Independent)
Implementation
Logical + Physical Design
(tool specific)
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
20
The Modeling Process

Which business process is being modeled?

What is the subject of analysis (fact) and what is being
measured?

On what granularity level is active analysis being done?

Which properties (dimensions) determine the measures?

Which different levels of aggregation are meaningful?

What additional information is needed for the different levels?

What is the variability and the cardinality of the dimensions?
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
21
Facts

Fact = Subject of Analysis
Sales

Measures = Attributes describing facts
Quantity, Price

Derived Measures
Profit

Additivity of Measures

globally additiv
additiv for some dimensions
Quantity
Items in stock
additiv resp. to plants/
not additiv w.r.t. time

not additiv at all
profit margin

DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
22
Dimensions

Dimensions = static structure of business information

Used for navigating the data space

Choosing the necessary granularity

Dimension Members = Instances of a dimension
– e.g. 8.12.1997 and Juli 1997 are members of dimension “time”

Structuring Dimension
– using different dimension levels (hierarchies)
– using descriptive attributes
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
23
Simple Hierarchies
Month
Quarter
1/2 Year
Period
Year
Januar 99
Februar 99
1. Quartal 99
März 99
1. Halbjahr 99
April 99
Mai 99
Dimension
Level
2. Quartal 99
Juni 99
1999
Juli 99
August 99
3. Quartal 99
2. Halbjahr 99
Sept. 99
............
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
24
Unbalanced Hierarchies
Plant/Site
Business
Unit
Business
Division
Enterprise
Plant1
Div A
...
Great
Outdoors
Plant 1
Bu 1
...
Div B
Plant 0815
Bu 2
...
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
25
Alternative Hierarchies
Customer
Geogr. Region
Country
Bavaria
Customer 01
Customer 02
Hessen
Germany
Hamburg
Customer 03
Customer 04
Customer 05
Customer 06
Partner
Retailer
Consumer
Customer Group
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
26
Alternative Pathes
Ort
Munich 01
Munich 02
Munich 03
Geogr. Region
Bayern
Hessen
Hamburg
Germany
Würzburg 01
Würzburg 02
Frankfurt 01
Country
Germany (South)
Germany (West)
Germany (North)
Sales Region
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
27
Criteria for a ‘good’ MD Design

dimensions should be independent

dimensionality of a cube should be max. 7-8 dimensions
– interpretation of results is difficult for a large number of dimensions

hierarchies should have a fan-out of max. 30
– long drill-down times
– large drill-down results
– insert additional levels for structuring purposes (e.g. insert state
between city and country)
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
28
Graphical Notation (ME/R)
Fact-Name
Measure 1
...
Measure n
A Fact and its measures
.. is characterized by dimensions
Level-Name
Attribute 1
...
Attribute n
A Dimension Level with attributes
..can be classified according to...
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
29
Example Data Model
Year
Month
Region
Day
Country
Sale
Line
Prod.
Type
Product
Revenue
Cost
Order Qty
Customer
Sales Rep
Name
Code
Branch
Margin
Range
Customer
Type
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
30
Cognos PowerPlay- Architecture
Client
PowerPlay Client
PowerCube
(Proprietary,
Compressed)
Server
Transformer
DW: Tools and Projects
Impromptu
OLEDB for OLAP
PowerPlay Server
OLEDB Provider
e.g. MS OLAP
Services,
SAP BW,…
© 2001 FORWISS,Carsten Sapia - [email protected]
31
Logical+Physical Design
Using the Data
Warehouse
Requirement
Analysis
Conceptual Design
(Implementation
Independent)
Implementation
Logical + Physical Design
(tool specific)
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
32
Practical Demonstration
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
33
Summary and Conclusions

Multidimensional modeling is performed on different levels
– conceptual model (tool independent level) following requirement analysis
– logical and physical design before implementation

Distinction between two types of data
– quantifying data: measures, cells of the cube, fact table
– qualifying data: properties, dimensions, dimension tables

Hierarchical structures of dimensions can be complex

ME/R notation can be used to document conceptual models

Several ways to map an MD model to a relational DB
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
34
Canonical Query (I)
Restriction Element
Result
Measures
A
m1 m2
B
A
B
Query Result
Result Granularity
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]
35
Canonical Query (II)
Canonical Query Definition
…
Result Measures
m1
mk
Restriction Elements
r1
r2
…
rn
Result Granularity
g1
g2
…
gn
SELECT g1,...,gn, aggr(m1),..., aggr(mk)
FROM
FactName, Dim1,..., Dimn
WHERE
Dim1.level(r1) = r1 AND ... AND Dimn.level(rn) = rn
AND Dim1.d1=FactName.d1 AND ... AND Dimn.dn=FactName.dn
GROUP BY g1,...,gn
DW: Tools and Projects
© 2001 FORWISS,Carsten Sapia - [email protected]