Programming Patterns with BISM Tabular

Download Report

Transcript Programming Patterns with BISM Tabular

Programming Patterns with
BISM Tabular
Alberto Ferrari
Senior Consultant at SQLBI.COM
[email protected]
Who’s Speaking?
•
•
•
•
Independent consultant
Fond of BI… I love it!
Book author
Founder of SQLBI.COM
Traditional Cube Development
User Requirements Change…
•
•
•
•
•
•
I need a simple report
I want to analyze Sales
Now I need Time Intelligence
Well, how many new customers each month?
Oh, am I losing or gaining market share?
Listen… are my customers switching behavior?
Model Complexity Increasing
Data
Model
Complexity
Custom
Modeling
Dimensional
Modeling
User Requirements
Why Dimensional Modeling?
• Users understand it
• Fast data model
• Consumed by Excel
• Custom Dimensional Modeling?
Vertipaq: a new kid in town
OLAP
BISM Multidimensional
• Dimensional Modeling
• Facts, Dimensions
• Complex Relationships
• MDX
• Script
• Powerful, complex
Vertipaq
BISM Tabular
• Relational Modeling
• Tables
• Basic Relationships (1:N)
• DAX
• Calculated Columns
• Measures
Vertipaq
•
•
•
•
Simpler data model
Less dimensional tools
Amazingly fast
Relational modeling
Speed  Different Modeling Options?
Tabular Data Models… this way?
Or this way?
Agenda
• Scenarios
–
–
–
–
Warehouse Stock Analysis
Customer Analysis
Transition Matrix
Pareto / ABC Analysis
NO DEMOS
Think in DAX, forget dimensional modeling
Calculate Warehouse Availability from transactions
WAREHOUSE STOCK ANALYSIS
Warehouse Data Model
Dim_Date
PK
Fact_Movements
ID_Date
PK
ID_Movement
Date
Year
Month
Day
FK1
FK2
ID_Date
ID_Product
Quantity
Dim_Product
PK
ID_Product
Product
MDX Query
WITH MEMBER MEASURES.Stock AS
SUM (
NULL : [Date Order].[Calendar].CURRENTMEMBER,
[Measures].[Quantity]
)
SELECT
Stock ON 0,
NON EMPTY [Product].[Product].[Product].MEMBERS
* [Date Order].[Calendar].[Month].MEMBERS
ON 1
FROM [Movements]
Snapshot Table
Dim_Date
PK
Fact_Movements
ID_Date
PK
ID_Movement
Date
Year
Month
Day
FK1
FK2
ID_Date
ID_Product
Quantity
Snapshot
PK
ID_Snapshot
FK1
FK2
ID_Date
ID_Product
Quantity
Dim_Product
PK
ID_Product
Product
SQL Query for Snapshot
SELECT
t.TimeKey,
t.FullDateAlternateKey,
s1.*
FROM
dbo.DimTime t
CROSS APPLY (SELECT
s.ProductKey,
SUM (s.Quantity) AS Stock
FROM
dbo.FactMovements s
WHERE
s.OrderDateKey <= t.TimeKey
GROUP BY
s.ProductKey) s1
WHERE
DayNumberOfMonth = 1
SQL Query Plan
Each date  Full Fact Table Scan
DAX Query
Stock :=
CALCULATE(
SUM (FactMovements[OrderQuantity]),
FILTER(
ALL (DimTime),
DimTime[TimeKey] <= MAX(DimTime[TimeKey])
)
)
Vertipaq Solution
• No Snapshot Table
• Leverages Only Movements
• Amazingly Fast
Speed  Different Modeling Options
Count new and lost customers
CUSTOMER ANALYSIS
Customer Analysis
Dim_Date
PK
ID_Date
Date
Year
Month
Day
Fact_Sales
PK
ID_Sale
Dim_Customer
PK
FK1
FK2
ID_Date
ID_Customer
Sales
ID_Customer
Customer
Customer Analysis
• How many new customers we had this month?
• Dimensional model
– Complex and slow query (either SQL or MDX)
• Slowness is caused by looking for the lack of an event in a
dimensional model
– Optimization: monthly snapshot of new customers
• Possible shortcut: saving date of first and last sale for each
customer
• Extraction logic embedded in ETL – not flexible
Snapshot Required
Dim_Date
PK
ID_Date
Date
Year
Month
Day
Fact_Sales
PK
ID_Sale
Dim_Customer
PK
FK1
FK2
ID_Date
ID_Customer
Sales
Snapshot
PK
ID_Snapshot
FK1
FK2
ID_Date
ID_Customer
ID_Customer
Customer
Customer Analysis
Fact_Sales
PK
ID_Sale
FK2
FK1
ID_Date
ID_Customer
Sales
Dim_Date
PK
ID_Date
Date
Year
Month
Day
Dim_Customer
PK
ID_Customer
FK1
FK2
Customer
ID_FirstSaleDate
ID_LastSaleDate
MDX Query
WITH
MEMBER MEASURES.TotalCustomers AS
AGGREGATE( NULL : [Date].[Calendar].CURRENTMEMBER,
[Measures].[Customer Count])
MEMBER MEASURES.NewCustomers AS
MEASURES.TotalCustomers
- AGGREGATE( NULL : [Date].[Calendar].PREVMEMBER,
[Measures].[Customer Count])
MEMBER MEASURES.ReturningCustomers AS
MEASURES.[Customer Count] - MEASURES.NewCustomers
SELECT
{ [Measures].[NewCustomers],
[Measures].[Customer Count],
[Measures].[ReturningCustomers]
} ON COLUMNS,
Date.Calendar.Month.MEMBERS ON ROWS
FROM [Adventure Works]
MDX Query
WITH
MEMBER MEASURES.TotalCustomers AS
AGGREGATE( NULL : [Date].[Calendar].CURRENTMEMBER,
[Measures].[Customer Count])
MEMBER MEASURES.NewCustomers AS
MEASURES.TotalCustomers
- AGGREGATE( NULL : [Date].[Calendar].PREVMEMBER,
[Measures].[Customer Count])
MEMBER MEASURES.ReturningCustomers AS
MEASURES.[Customer Count] - MEASURES.NewCustomers
SELECT
{ [Measures].[NewCustomers],
[Measures].[Customer Count],
[Measures].[ReturningCustomers]
} ON COLUMNS,
Date.Calendar.Month.MEMBERS ON ROWS
FROM [Adventure Works]
DISTINCT COUNT
Measures…
Distinct Count in OLAP
•
•
•
•
Distinct Count is very expensive
Separate measure group
Expensive processing (ORDER BY required)
Different partitioning for query optimization
Vertipaq Solution
•
•
•
•
BISM Tabular
Query in DAX is fast enough
Extraction logic in DAX, not in ETL
No special ETL and data modeling required
DAX Formula
NewCustomers :=
CALCULATE(
DISTINCTCOUNT (FactInternetSales[CustomerKey]),
FILTER (
ALL (DimTime),
DimTime[TimeKey] <= MAX (DimTime[TimeKey])
)
)
CALCULATE(
DISTINCTCOUNT (FactInternetSales[CustomerKey]),
FILTER(
ALL (DimTime),
DimTime[TimeKey] < MIN (DimTime[TimeKey])
)
)
Vertipaq Solution
• Simpler data model
• Very fast
• Simpler formulas
Speed  Different Modeling Options
Transition of an attribute over time in SCD2
TRANSITION MATRIX
Relational Data Model
Dim_Date
PK
ID_Date
Date
Year
Month
Day
Dim_Customer
Fact_Sales
PK
ID_Sale
FK1
FK2
ID_Date
ID_Customer
Amount
PK
ID_Customer
Customer
ScdStartDate
ScdEndDate
Rating
Transition Matrix
• How many customers moved
– From Rating AAA
– To rating AAB
– In the last period / month?
• Complex SQL query
• Complex OLAP model
Data Model for OLAP
Dim_Date
PK
Fact_RatingValues
Dim_Rating
ID_Date
PK
ID_RatingValues
PK
ID_Rating
Date
Year
Month
Day
FK1
FK2
ID_Rating
ID_Date
AmountUsed
AmountAuthorized
FK1
ID_Customer
Rating
Classification
Segmentation
Rating Snapshot
SCD 2
Dim_Customer
PK
ID_Customer
CODE_Customer
Customers
SCD 1
The DSV Data Model
SSAS Dimension Usage
• 2 Dimensions and 2 Measure Groups for Rating A and B
• Complex Dimension Usage
The Final Result in OLAP
Start Rating
Start Month
(January)
End Month
(Jan-June)
Number of
Customers
Transition Matrix in Vertipaq
•
•
•
•
•
Simpler data model
Facts table are not duplicated
Complexity in DAX measures
Easier to adapt to existing relational models
Calculated columns to avoid snapshot tables
Transition Matrix – Tabular
• Dim_Customer is still
SCD Type 2
• Rating_Snapshot is a
snapshot of ratings
measured monthly
Many To Many Structure
Dim_Date
PK
ID_Date
Date
Year
Month
Day
RatingSnapshot
PK,FK1
PK,FK2
PK
ID_Date
Customer
Rating
Dim_Customer
Fact_Sales
PK
ID_Sale
FK2
FK1
ID_Date
ID_Customer
Amount
PK
ID_Customer
Customer
scdStartDate
scdEndDate
Rating
Dim_DateSnapshot
PK
ID_Date
Date
Year
Month
Day
Transition Matrix - Tabular
NumOfCustomers :=
CALCULATE (
DISTINCTCOUNT (Dim_Customers[Customer]),
FILTER (
Dim_Customers,
CALCULATE (
COUNTROWS (RatingSnapshot),
RatingSnapshot[CustomerSnapshot] = EARLIER (Dim_Customers[Customer])
) > 0
&& (Dim_Customers[scdStartDate] <= MAX (Dim_Date[ID_Date]) )
&&
(
Dim_Customers[scdEndDate] > MIN (Dim_Date[ID_Date])
|| ISBLANK (Dim_Customers[scdEndDate])
)
)
)
Vertipaq Solution
• Much simpler data model
• Very fast
• Formulas slightly more complex
Speed  Different Modeling Options
From two loops to single calculation
ABC CLASSIFICATION
Pareto Principle
80% of effects comes from 20% of the causes
In other words
80% of revenues comes from
• 20% of customers
• 20% of products
ABC Analysis
• ABC Analysis
– A items for 70% of total
– B items for 20% of total
– C items for 10% of total
ABC Analysis in Excel
=IF(
H6<=0,7; "A";
IF(
H6<=0,9; "B";
"C“
)
)
ABC Used to Slice Data
ABC Possible Solutions
• Heavy Calculation in ETL
• Could be very long in SQL
• Faster in MDX
– Requires double cube processing to store
ABC value in a dimension attribute
ABC in SQL
SUM( Amount)
GROUP BY Product
INTO #temp
SQL
UPDATE #temp
SET Percentage = RunningTotal on Product Amount / SUM(
Amount )
SQL
UPDATE product
SET Class = A/B/C (from #temp)
SQL
ABC in MDX
Process Sales cube
MDX query
on Sales cube
to extract ABC Class
OLAP
OLAP
Process Product
Dimension
UPDATE product
SET Class = A/B/C
• Update Sales aggregations
(ABC flexible attribute)
• from MDX query
• Requires ETL or Linked Server
OLAP
SQL
BISM Tabular
• Calculated columns
• Computed during processing
• No need for double processing
Intermediate Measures
SalesAmountProduct :=
CALCULATE( SUM( SalesOrderDetail[LineTotal] ) )
CumulatedProduct :=
SUMX(
FILTER (Product,
Product[SalesAmountProduct]
>= EARLIER (Product[SalesAmountProduct])),
Product[SalesAmountProduct]
)
Class Computation
SortedWeightProduct =
Product[CumulatedProduct]
/ SUM( Product[SalesAmountProduct] )
[ABC Class Product] =
IF( Product[SortedWeightProduct] <= 0.7, "A",
IF( Product[SortedWeightProduct] <= 0.9, "B", "C" ) )
Vertipaq Solution
• No need for ETL
• Computed at Process Time
• Very fast
Speed  Different Modeling Options
Time to go to an end…
CONCLUSIONS
Data Modeling Considerations
• Dimensional Modeling
– Not the only option
– Not easy for complex scenarios
• Vertipaq: game changer
– Simpler modeling
– More natural
– Vertipaq speed leads to simpler data models
What Next?
•
•
•
•
•
•
Consider simpler models for Vertipaq
Learn DAX
Seriously… learn it!
BISM Tabular side-by-side with Multidimensional
Complex Scenarios with Tabular
Think in DAX, it’s not easy…
THANK YOU!
For attending this session and
PASS SQLRally Nordic 2011, Stockholm