Transcript Data Warehouse Architecture
Data Warehouse Architecture
By Slavko Stemberger Copyright © Starsoft Inc, 2000 1
Some Acronyms/Terms
• OLTP – On-Line Transaction Processing (operational system) • OLAP – On-line Analytical Processing • ROLAP – Relational OLAP Copyright © Starsoft Inc, 2000 2
Some Acronyms/Terms
• MOLAP – Multidimensional OLAP • Metadata – Data about data (data dictionary) • Source System – An operational system that provides data for the data warehouse Copyright © Starsoft Inc, 2000 3
Some Acronyms/Terms
• ETL – Extract, Transform and Load • Data Warehouse – A queryable source of data • Data Mart – A logical subset of a data warehouse • Data Staging Area – An intermediate storage location used for ETL Copyright © Starsoft Inc, 2000 4
Data Structures/Databases
• Flat Files • Hierarchical DB • Network DB • Relational DB • O-O DB • Dimensional DB Copyright © Starsoft Inc, 2000 5
Modeling Methods
• Entity-Relationship (E-R) • Dimensional • Object Oriented (O-O) Copyright © Starsoft Inc, 2000 6
Entity-Relationship Modeling
• Used for operational system • Instantaneous snapshot of the business • Removed data redundancy (eliminates update anomalies) • Shows detail relationships • Complex network of entities can be difficult for end-users to understand Copyright © Starsoft Inc, 2000 7
Dimensional Modeling
• Used in data warehouses • Data duplication is allowed (in the dimensions) • Query based • Easier for users to understand – Not as much detail shows as in E-R Copyright © Starsoft Inc, 2000 8
Dimensional Models
• The “Cube” • Star Schema • Snowflake Schema Copyright © Starsoft Inc, 2000 9
The “Cube”
• Logical structure of ALL data warehouses • Can be implemented physically in an RDB like Oracle • Some view this as limited to data marts Copyright © Starsoft Inc, 2000 10
Star Schema
• Easy to understand • Flexible in type of questions that can be asked • Supports very large data warehouses • There is data redundancy (in the dimensions) Copyright © Starsoft Inc, 2000 11
Snowflake Schema
• “Normalized” star schema • More complex than the star schema - harder to understand and work with • Solves some problems that cannot be done with star schema Copyright © Starsoft Inc, 2000 12
Dimension Tables
• A set of independent variables that affect an observation • Each variable has a set of known, relatively small, set of values • 4 - 20 dimensions per data warehouse/data mart is the norm Copyright © Starsoft Inc, 2000 13
Dimension Tables
(cont…) • Columns are descriptive and usually textual • Some numeric values are descriptive – Numeric descriptive values should be suspect of being facts e.g. standard product price may be a fact because it can change and one can ask “what was the average standard price of the product over the last 12 months” Copyright © Starsoft Inc, 2000 14
Dimension Tables
(cont…) • Dimension keys should be meaningless surrogate keys • Time dimension keys may be/should be assigned in the order of the dates in the fact table - this allows physical partitioning • In general avoid “smart” keys - they should be meaningless • Avoid production keys Copyright © Starsoft Inc, 2000 15
Dimension Tables - Granularity
• Definition: The level of detail of the data • Keep the grain of the data as small as possible (as detail as possible) – This makes the warehouse more resistant to change – It is easier to add attributes to existing dimensions – superior results in data mining operations Copyright © Starsoft Inc, 2000 16
Dimension Tables - “Types”
• Time • Degenerate • “Junk” • Other Copyright © Starsoft Inc, 2000 17
Dimension Tables - Time
• All data marts and warehouses have at least one time dimension • Must be consistent across all fact tables • Create partial attributes year, month and day and their concatenations (year + month, year + month + day, year + week, …) – Without the concatenations, it is difficult to ask for time ranges Copyright © Starsoft Inc, 2000 18
Dimension Tables - Degenerate
• Dimensions with only one attribute • Usually a control document id such as order number, invoice number, etc • No value in creating a physical table • Put the id into the fact table Copyright © Starsoft Inc, 2000 19
Dimension Tables - “Junk”
• Given: Leftover flags and text attributes • Possible Actions: – Put the these flags into the fact table – Make each one into a dimension – Drop them from the design – Create one dimension with all combinations of these flags Copyright © Starsoft Inc, 2000 20
Fact Tables
• Dimension keys • Degenerate dimension keys (if they exist) • Facts – Additive – Semi-additive – Non-additive – None (factless tables) Copyright © Starsoft Inc, 2000 21
Facts - Additive
• These are measures of activity • Can be added across all combination of dimensions • Examples: sales in dollars or units Copyright © Starsoft Inc, 2000 22
Facts - Semi-additive/non additive
• These are measures of intensity • Some may be added across some dimensions but not others – e.g. Bank Balance • Some may not be added at all – e.g. Temperature Copyright © Starsoft Inc, 2000 23
Closing
• Other things to look at – Mutating dimensions – Hierarchical data (e.g. product structures) – Security – Data Loading – Cleansing – etc.
Copyright © Starsoft Inc, 2000 24