Data Warehouse Architecture

Download Report

Transcript Data Warehouse Architecture

Data Warehouse Architecture

By Slavko Stemberger Copyright © Starsoft Inc, 2000 1

Some Acronyms/Terms

• OLTP – On-Line Transaction Processing (operational system) • OLAP – On-line Analytical Processing • ROLAP – Relational OLAP Copyright © Starsoft Inc, 2000 2

Some Acronyms/Terms

• MOLAP – Multidimensional OLAP • Metadata – Data about data (data dictionary) • Source System – An operational system that provides data for the data warehouse Copyright © Starsoft Inc, 2000 3

Some Acronyms/Terms

• ETL – Extract, Transform and Load • Data Warehouse – A queryable source of data • Data Mart – A logical subset of a data warehouse • Data Staging Area – An intermediate storage location used for ETL Copyright © Starsoft Inc, 2000 4

Data Structures/Databases

• Flat Files • Hierarchical DB • Network DB • Relational DB • O-O DB • Dimensional DB Copyright © Starsoft Inc, 2000 5

Modeling Methods

• Entity-Relationship (E-R) • Dimensional • Object Oriented (O-O) Copyright © Starsoft Inc, 2000 6

Entity-Relationship Modeling

• Used for operational system • Instantaneous snapshot of the business • Removed data redundancy (eliminates update anomalies) • Shows detail relationships • Complex network of entities can be difficult for end-users to understand Copyright © Starsoft Inc, 2000 7

Dimensional Modeling

• Used in data warehouses • Data duplication is allowed (in the dimensions) • Query based • Easier for users to understand – Not as much detail shows as in E-R Copyright © Starsoft Inc, 2000 8

Dimensional Models

• The “Cube” • Star Schema • Snowflake Schema Copyright © Starsoft Inc, 2000 9

The “Cube”

• Logical structure of ALL data warehouses • Can be implemented physically in an RDB like Oracle • Some view this as limited to data marts Copyright © Starsoft Inc, 2000 10

Star Schema

• Easy to understand • Flexible in type of questions that can be asked • Supports very large data warehouses • There is data redundancy (in the dimensions) Copyright © Starsoft Inc, 2000 11

Snowflake Schema

• “Normalized” star schema • More complex than the star schema - harder to understand and work with • Solves some problems that cannot be done with star schema Copyright © Starsoft Inc, 2000 12

Dimension Tables

• A set of independent variables that affect an observation • Each variable has a set of known, relatively small, set of values • 4 - 20 dimensions per data warehouse/data mart is the norm Copyright © Starsoft Inc, 2000 13

Dimension Tables

(cont…) • Columns are descriptive and usually textual • Some numeric values are descriptive – Numeric descriptive values should be suspect of being facts e.g. standard product price may be a fact because it can change and one can ask “what was the average standard price of the product over the last 12 months” Copyright © Starsoft Inc, 2000 14

Dimension Tables

(cont…) • Dimension keys should be meaningless surrogate keys • Time dimension keys may be/should be assigned in the order of the dates in the fact table - this allows physical partitioning • In general avoid “smart” keys - they should be meaningless • Avoid production keys Copyright © Starsoft Inc, 2000 15

Dimension Tables - Granularity

• Definition: The level of detail of the data • Keep the grain of the data as small as possible (as detail as possible) – This makes the warehouse more resistant to change – It is easier to add attributes to existing dimensions – superior results in data mining operations Copyright © Starsoft Inc, 2000 16

Dimension Tables - “Types”

• Time • Degenerate • “Junk” • Other Copyright © Starsoft Inc, 2000 17

Dimension Tables - Time

• All data marts and warehouses have at least one time dimension • Must be consistent across all fact tables • Create partial attributes year, month and day and their concatenations (year + month, year + month + day, year + week, …) – Without the concatenations, it is difficult to ask for time ranges Copyright © Starsoft Inc, 2000 18

Dimension Tables - Degenerate

• Dimensions with only one attribute • Usually a control document id such as order number, invoice number, etc • No value in creating a physical table • Put the id into the fact table Copyright © Starsoft Inc, 2000 19

Dimension Tables - “Junk”

• Given: Leftover flags and text attributes • Possible Actions: – Put the these flags into the fact table – Make each one into a dimension – Drop them from the design – Create one dimension with all combinations of these flags Copyright © Starsoft Inc, 2000 20

Fact Tables

• Dimension keys • Degenerate dimension keys (if they exist) • Facts – Additive – Semi-additive – Non-additive – None (factless tables) Copyright © Starsoft Inc, 2000 21

Facts - Additive

• These are measures of activity • Can be added across all combination of dimensions • Examples: sales in dollars or units Copyright © Starsoft Inc, 2000 22

Facts - Semi-additive/non additive

• These are measures of intensity • Some may be added across some dimensions but not others – e.g. Bank Balance • Some may not be added at all – e.g. Temperature Copyright © Starsoft Inc, 2000 23

Closing

• Other things to look at – Mutating dimensions – Hierarchical data (e.g. product structures) – Security – Data Loading – Cleansing – etc.

Copyright © Starsoft Inc, 2000 24