An Introduction to Dimensional Data Warehousing Design

Download Report

Transcript An Introduction to Dimensional Data Warehousing Design

An Introduction to Dimensional Data
Warehouse Design
Presented by
Joseph J. Sarna Jr.
JJS Systems, LLC
Relational Normal Form
•
Most relational databases are set to 3rd normal form

1st Normal form – Tables have unique keys and no repeating
groups or multi-value fields

2nd Normal form – Every attribute is dependent ont the entire
key of the table

3rd Normal form – Attributes are dependent only on the key.
No derived elements
Northwind Database Model – Relational Format
Territories
Region
PK
PK
RegionID
RegionDescription
FK1
EmployeeTerritories
Employees
TerritoryID
TerritoryDescription
RegionID
FK2
FK1
TerritoryID
EmployeeID
PK
EmployeeID
I1
LastName
FirstName
Title
TitleOfCourtesy
BirthDate
HireDate
Address
City
Region
PostalCode
Country
HomePhone
Extension
Photo
Notes
ReportsTo
PhotoPath
I2
FK1
Categories
PK
CategoryID
I1
CategoryName
Description
Picture
Suppliers
PK
SupplierID
I1
CompanyName
ContactName
ContactTitle
Address
City
Region
PostalCode
Country
Phone
Fax
HomePage
I2
Orders
Products
PK
ProductID
I3
ProductName
FK2,I4,I5 SupplierID
FK1,I2,I1 CategoryID
QuantityPerUnit
UnitPrice
UnitsInStock
UnitsOnOrder
ReorderLevel
Discontinued
Order Details
PK,FK1,I2,I1
PK,FK2,I4,I3
OrderID
ProductID
UnitPrice
Quantity
Discount
PK
OrderID
FK1,I1,I2 CustomerID
FK2,I4,I3 EmployeeID
I5
OrderDate
RequiredDate
I6
ShippedDate
FK3,I7
ShipVia
Freight
ShipName
ShipAddress
ShipCity
ShipRegion
I8
ShipPostalCode
ShipCountry
Customers
PK
CustomerID
I2
CompanyName
ContactName
ContactTitle
Address
City
Region
PostalCode
Country
Phone
Fax
I1
I4
I3
CustomerCustomerDemo
PK,FK2
PK,FK1
CustomerID
CustomerTypeID
CustomerDemographics
PK
CustomerTypeID
CustomerDesc
Shippers
PK
ShipperID
CompanyName
Phone
Why Dimensional Data Warehouses?
•
•
Business needs to analyze data so that it can:
–
Understand trends
–
Predict future behavior and needs
–
Personalize contact with customers
–
Be competitive
All of this in a speedy manner, with the ability to do
“What if’s”
Drawbacks to Relational Data Structures
•
Data is not structured for analytical usage
•
Multiple Joins are resource intensive
•
Missing data from external sources, context history, not
operational sources
What Is a Dimensional Data Warehouse?
“A structured repository of validated and integrated
historical information accessible to business people to
provide the basis for both tactical and strategic
business decisions.”
•
•
•
•
Centralized extract and staging
Separate from operational system
Structured for analysis
Historically contexted
Dimensional Data Warehouse Architecture
Relational Data
External Data
Enterprise Data
Data Distribution
Acquisition, Staging, Cleaning,
Transformation
Data Warehouse
Storage
Analytical
Applications
Levels of Design
•
Detail Level
–
–
•
Analytical Level
–
•
Dimensional Normal form
Value and feasibility
Structured for the required analyses
Summary Level
–
–
Summaries for user requirements
Better response time
Dimensional Normal Form
•
Normalized for maintainability
•
De-normalized for performance, based on rules
•
2 level structure, therefore only one level of joins
required for queries
Northwind Database Model – Dimensional Format
Employees
PK
Territories
Region
PK
RegionKey
RegionID
RegionDescription
EmployeeTerritories
PK
TerritoryKey
PK
EmployeeTerritoryKey
FK1
RegionKey
TerritoryID
TerritoryDescription
FK1
FK2
EmployeeKey
TerritoryKey
TerritoryID
TerritoryDescription
RegionKey
RegionID
RegionDescription
FK1
Order Details
Products
Categories
PK
PK
CategoryKey
CategoryID
CategoryName
Description
Picture
Suppliers
PK
FK1
SupplierKey
SupplierID
CompanyName
ContactName
ContactTitle
Address
City
Region
PostalCode
Country
Phone
Fax
HomePage
FK2
ProductKey
PK
OrderDetailKey
ProductID
ProductName
QuantityPerUnit
UnitPrice
UnitsInStock
UnitsOnOrder
ReorderLevel
Discontinued
CategoryKey
CategoryID
CategoryName
Description
SupplierKey
SupplierID
CompanyName
ContactName
ContactTitle
Address
City
Region
PostalCode
Country
Phone
Fax
HomePage
FK2
OrderKey
OrderID
UnitPrice
Quantity
Discount
ExtendedPrice
ProductKey
ProductID
ProductName
QuantityPerUnit
UnitPrice
UnitsInStock
UnitsOnOrder
ReorderLevel
Discontinued
CategoryKey
CategoryID
CategoryName
Description
SupplierKey
SupplierID
CompanyName
ContactName
ContactTitle
Address
City
Region
PostalCode
Country
Phone
Fax
HomePage
FK1
Orders
PK
FK1
FK2
FK3
OrderKey
OrderID
CustomerKey
EmployeeID
ShipperKey
OrderDate
RequiredDate
ShippedDate
Freight
ShipName
ShipAddress
ShipCity
ShipRegion
ShipPostalCode
ShipCountry
Shippers
PK
ShipperKey
ShipperID
CompanyName
Phone
EmployeeKey
EmployeeID
LastName
FirstName
Title
TitleOfCourtesy
BirthDate
HireDate
Address
City
Region
PostalCode
Country
HomePhone
Extension
Photo
Notes
ReportsTo
PhotoPath
EmployeeTerritoryKey
TerritoryKey
TerritoryID
TerritoryDescription
RegionKey
RegionID
RegionDescription
CustomerCustomerDemo
Customers
PK
CustomerKey
FK1
CustomerID
CompanyName
ContactName
ContactTitle
Address
City
Region
PostalCode
Country
Phone
Fax
CustomerTypeID
CustomerDesc
CustomerDemographicsKey
PK
CustomerDemographicsKey
FK1
CustomerKey
CustomerTypeKey
CustomerDesc
CustomerDemographics
PK
CustomerTypeKey
CustomerTypeID
CustomerDesc
Dimensional Schema
•
Fact Tables
–
–
–
–
–
•
contain related measures
Usually the largest tables
Usually appended to
Can contain detail or summary data
Measures are usually additive
Dimension Tables
–
–
–
–
Contain descriptors
Utilize business terminology
Textual and discrete data
Attributes through which the table measures are analyzed
Northwind Database – Star Schema – Orders
dimDate
PK
dimOrderDetails
PK
DateKey
DayDate
DayDate_YYYYMMDD
DayOfWeekName
DayOfWeekNameAbbrv
DayNumberInWeek
DayNumberInMonth
DayNumberInQuarte
DayNumberInYear
WeekDayIndicator
WeekEndIndicator
Week_YYYYWW
WeekNumberInYear
Month_YYYYMM
MonthName
MonthNameAbbrv
MonthNumberInYear
Quarter_YYYYQ
QuarterName
QuarterNameAbrv
QuarterNumberInYear
Year
ProductKey
OrderID
UnitPrice
Quantity
Discount
ExtendedPrice
ProductID
ProductName
QuantityPerUnit
UnitPrice
UnitsInStock
UnitsOnOrder
ReorderLevel
Discontinued
CategoryID
CategoryName
Description
SupplierID
CompanyName
ContactName
ContactTitle
Address
City
Region
PostalCode
Country
Phone
Fax
HomePage
fctOrders
PK
OrderKey
FK3
FK2
FK1
FK4
FK6
FK5
FK7
ProductKey
EmployeeKey
CustomerKey
ShipperKey
OrderDateKey
RequiredDateKey
ShippedDateKey
OrderID
ShipVia
Freight
ShipName
ShipAddress
ShipCity
ShipRegion
ShipPostalCode
ShipCountry
dimShippers
PK
ShipperKey
ShipperID
CompanyName
Phone
dimEmployees
PK
EmployeeKey
EmployeeID
LastName
FirstName
Title
TitleOfCourtesy
BirthDate
HireDate
Address
City
Region
PostalCode
Country
HomePhone
Extension
Photo
Notes
ReportsTo
PhotoPath
TerritoryID
TerritoryDescription
RegionID
RegionDescription
dimCustomers
PK
CustomerKey
CustomerID
CompanyName
ContactName
ContactTitle
Address
City
Region
PostalCode
Country
Phone
Fax
CustomerTypeID
CustomerDesc
Resources
•
Books
–
–
–
•
The Data Warehouse Toolkit, Ralph Kimball
The Data Warehouse Lifecycle Toolkit, Ralph Kimball, et al
Data Warehouse Design Solutions, Adamson / Venerable
Websites
–
–
–
http://www.ralphkimball.com/
http://www.atre.com
http://www.microsoft.com/sql/evaluation/overview/datawa
re.asp