Aggregating Knowledge in a Data Warehouse and

Download Report

Transcript Aggregating Knowledge in a Data Warehouse and

The IT Perspective:
Data Warehousing, Management,
and Analytical Structures
Rafal Lukawiecki
Strategic Consultant, Project Botticelli Ltd
[email protected]
1
1
Objectives
Explain the basics of:
1.
2.
3.
4.
Master Data Management
Data Warehousing
ETL
OLAP/Multidimensional Data
This seminar is based on a number of sources including a few dozen of
Microsoft-owned presentations, used with permission. Thank you to
Chris Dial, Tara Seppa, Aydin Gencler, Ivan Kosyakov, Bryan Bredehoeft,
Marin Bezic, and Donald Farmer with his entire team for all the support.
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The
material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the
information in this presentation.
Portions © 2010 Project Botticelli Ltd & entire material © 2010 Microsoft Corp. Some slides contain quotations from copyrighted materials by
other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows,
Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The
information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this
presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a
commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the
date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.
2
2
3
3
SQL Services – Why?
Install only the ones you need
Which?
Integration Services
Get your data from the world outside (ETL)
Analysis Services
Cubes, Data Mining, support for PowerPivot on SharePoint
Reporting Services
DIY Report Builder and traditional “big” reports
Master Data Services
Quality of critical master data (cities, colours, customers)
Database Engine
Data warehouse and OLTP relational storage
4
4
Master Data Management
5
MDM
Ensures consistency of data across all
organisational uses
Impacts overall data quality
Processes and tools for:
Collection, aggregation, matching, distribution, and
persistence of master data
Consistently
Related to Federated Data Management
Key to MDM: Modelling
6
6
Why MDM?
It’s About Evolution of Enterprise Architecture
7
7
MDM Processes
• Batched Acquisition from
Staging Tables
• Members, Attributes,
Parent-Child Relationships
• SQL Integration Services
Import &
Integration
8
Modeling
• Versioning Changes
• Auditing
• Compliance
• Tracking of Instances
• Subscription Views
• Export to:
• Operational Systems
• Data Warehouses
• BI Analytics
• Reporting Tools
Export &
Subscription
8
Microsoft Master Data Services
SQL 2008 R2 Enterprise, Datacenter, Developer
Tools:
Master Data Manager
Primary tool for managing
your master data
MDS Configuration
Manager
IT Pro tool
MDS Web Service
For developers wanting to
extend MDS
9
Concepts:
Models
Entities
Attributes
Members
Hierarchies
Collections
Versions
Database
9
Modelling Master Data
Model organises data at highest level
Allowing versioning of changes to data
There are typically four categories of models:
People (Customers, Staff)
Places (Geographies, Cities, Countries)
Things (Products)
Concepts (Accounts, Behaviours, Transactions)
10
10
Example: Product MDM Model
Product
(model)
Product
(entity)
Name (freeform attr)
Code (freeform attr)
Subcategory
(domainbased attr)
StandardCost
(free-form
attr)
Name (freeform attr)
Code (freeform attr)
Category
(domainbased attr)
Name (freeform attr)
11
ListPrice
(free-form
attr)
Photo (file
attr)
Code (freeform attr)
11
1. Reviewing a Data Model Using Master
Data Services
12
Data Warehouse
13
Rich Connectivity
Data Providers
ODBC
SQL Server
SAP
NetWeaver BI
SQL Server
Integration Services
DB2
MySAP
SQL Server
Report Server Models
Teradata
XML
OLE DB
SQL Server
Data Mining Models
Oracle
SQL Server
Analysis Services
14
Hyperion Essbase
14
Star Schema
15
15
Star Schema Benefits
Simple, not-so-normalized model
High-performance queries
Especially with Star Join Query Optimization
Mature and widely supported
Low-maintenance
16
16
Snowflake Dimension Tables
Define hierarchies using multiple dimension tables
Support fact tables with varying granularity
Simplify consolidation of heterogeneous data
Potential for slower query performance in relational reporting
No difference in performance in Analysis Services database
17
17
Slowly Changing Dimensions
Maintain historical context as dimension data
changes
Three common ways (there are more):
Type 1: Overwrite the existing dimension record
Type 2: Insert a new ‘versioned’ dimension record
Type 3: Track limited history with attributes
23
23
Integration and ETL
27
Let’s do ETL with SSIS
SQL Server Integration
service
SSIS object model
Two distinct runtime
Services (SSIS)
engines:
Control flow
Data flow
32-bit and 64-bit
28
editions
28
The Package
The basic unit of work, deployment, and execution
An organized collection of:
Connection managers
Control flow components
Data flow components
Variables
Event handlers
Configurations
Can be designed graphically or built programmatically
Saved in XML format to the file system or SQL Server
29
29
Control Flow
Control flow is a process-oriented workflow
engine
A package contains a single control flow
Control flow elements
Containers
Tasks
Precedence constraints
Variables
30
30
Data Flow
The Data Flow Task
Performs traditional ETL and more
Fast and scalable
Data Flow Components
Extract data from Sources
Load data into Destinations
Modify data with Transformations
Service Paths
Connect data flow components
Create the pipeline
31
31
1. Using SQL Server Integration Services
for Splitting Data
32
OLAP/Multidimensional Data
33
Cube = Unified Dimensional Model
Multidimensional data
Combination of measures and dimensions as
one conceptual model
Measures are sourced from fact tables
Dimensions are sourced from dimension tables
34
34
Hierarchies
Benefits
View of data at different levels of summarization
Path to drill down or drill up
Implementation
Denormalized star
schema dimension
Normalized snowflake
dimension
Self-referencing
relationship
36
36
Hierarchy
Defined in Analysis Services
Ordered collection of attributes into levels
Navigation path through dimensional space
Very important to get right!
Customers by Geography
Customers by Demographics
Country
Marital
State
Gender
City
Customer
Customer
37
37
Measure Group
Group of measures with same dimensionality
Analogous to a fact table
Cube can contain more than one measure
group
E.g. Sales, Inventory, Finance
Defined by dimension relationships
38
38
Measure Group
Measure Group
Dimension
Sales
39
Inventory
Customers
X
Products
X
X
Time
X
X
Promotions
X
Warehouse
Finance
X
X
Department
X
Account
X
Scenario
X
39
1. Using BIDS to Review Dimension
Design
2. Cube Design and Functionality
42
Summary
As a platform for enterprise Business Intelligence you
should consider four services:
Data Warehouse (can be relational)
Process for Data Management (MDS)
Process for Data Integration (ETL)
Analysis (OLAP, Data Mining, Columnar)
= SQL Server 2008 R2
43
43
© 2010 Microsoft Corporation & Project Botticelli Ltd. All rights reserved.
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is
not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation.
Portions © 2010 Project Botticelli Ltd & entire material © 2010 Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as
individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are
or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the
current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should
not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after
the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.
44
44