Advantages using DAX in Tabular data model
Download
Report
Transcript Advantages using DAX in Tabular data model
SQL Server
as a Data Warehousing Platform
Sponsors
Gold Sponsors:
Bronze Sponsors:
Swag Sponsors:
2 |
About me
DW / BI Consultant @
7 years professional experience
ASP.NET, C# Web Development
ETL and Business Intelligence
Contacts
[email protected]
www.linkedin.com/in/stakvorian
3 |
Agenda
4 |
Overview
Basic Data Warehouse Architectures
Dimensional Modeling
SQL Server DW Best Practices
SSIS Performance Boost
A quick glance at Microsoft BI Tools
OVERVIEW
5 |
Why Do I Need a Data Warehouse?
Where is my data?
Physically stored across multiple platforms
(potentially heterogeneous)
Over different subsets of the organization’s data
When did it changed?
Online applications keep a very limited set of historical data
How can I access my data?
Online systems do not allow power
users to access it
How long would it take?
Transactional applications are
modeled to optimize the speed of
inserts and updates.
6 |
What is a Data Warehouse?
Definition: A centralized store of business data that can be used for
reporting and analysis to inform business decisions.
7 |
Subject-orientated
Integrated
Time variant
Non-volatile
In support of
management’s
decisions
Online Applications vs. the EDW
Aspect
Online Applications
EDW environment
User
Function
Unit of work
DB Design
Data
Usage
Access
# of Records
# of Users
DB size
Tools
Front-office Personnel
Back-office Personnel
Day to Day Operations
Decision Support
Short Transactions
Complex Queries
App-oriented, Normalized
Subject-oriented, De-normalized
Current, Detailed
Historical, Summarized
Real Time
Structured and Ad-hoc
Read and Write
Read Only
Tens
Millions
Thousands
Hundreds
100 MB to 1 GB
100 GB to 10 TB
Applications specific
Various reporting products.
8 |
Data Mart
Definition
Set of processes and structures that support business requirements
that are specific to a business unit.
“Hello, I am a
data warehouse.”
9 |
“And I am a
data mart.”
BASIC ARCHITECTURES
10 |
Hub and Spoke
‘You can catch all the minnows in the ocean and stack them
together and they still do not make a whale.’
Bill Inmon
Normalized, relational data warehouse, typically in 3NF(atomic data)
Dependent data marts (mostly summarized data)
Source
Systems
11 |
Data
Acquisition
Data
Integration
Data
Warehouse
Data
Marts
End-user access
and applications
Data Warehouse Bus
“The data warehouse is nothing more than the union
of all the data marts”
Ralph Kimball
Dimensional data marts linked by conformed dimensions
One mart is created for a single business process
DW
Bus
Source
Systems
12 |
Data
Acquisition
Data
Integration
Data
Marts
End-user access
and applications
Independent Data Marts
Source
Systems
13 |
Data marts are loaded independently
Data marts are not consolidated via DW bus
Data
Acquisition
Data
Integration
Data
Marts
End-user access
and applications
How to Decide?
Inmon’s Approach
Enterprise view
Easy maintenance
High initial cost
Longer start-up time
Kimball’s Approach
Low initial cost
Shorter start-up time
Low reusability
Difficult maintenance
14 |
Independent DM
Shortest start-up time
Low cost
Incompatibility issues
Low reusability
No enterprise view
DIMENSIONAL MODELING
15 |
Dimensional Model
Dimensional Model
Denormalized. Organized for understandability and ease of reporting
rather than update
Dimensions store properties for a specific entity
Facts describe the business events regarding the entities
Time
Business questions
Product Line
Customer
Cost by product
Order quantity by product
Sales revenue by customer
Profit by region
Cost
Quantity
Revenue
Profit
Region
Product
Salesperson
16 |
Star Schema
Fact table in the center
surrounded by dimensional
tables (in the shape of star)
Denormalized dimensions
Simple queries
Effortless design
Optimized for fast
aggregations
Surrogate keys
are used
17 |
Snowflake Schema
Fact tables and dimensions are related to further dimensions
Normalized dimension tables
Complex levels of
relationship
Complex queries
Optimized for
less disk space usage
Surrogate keys
are used
18 |
Surrogate Keys
Definition
Unique identifier that is used as a substitute for a natural key
Preferably simple integer
Advantages
Eliminates the need for composite keys in dimension tables
Optimizes the JOIN process
Limits the impact caused by changes in natural key format
19 |
Dimensions
Definition
Dimensions store properties for a specific entity
No records are deleted. Instead, they are expired
Keep data at its lowest level of granularity
Keys
The primary key is a surrogate key
The business key is also retained
Attributes
Typically textual fields used for filtering
and query result set labeling
Examples
Product, Employee, Geography, Date
20 |
Slowly Changing Dimensions (SCD)
Definition
It is an optimization technique which leverages the knowledge
that data changes very infrequently and stores data over time in
a very efficient way
Types:
0, 1, 2, 3, 4 and hybrids
21 |
SCD Types (0-2)
Type 0: No effort has been made to deal with the issues
Type 1: Keeps the latest version of any record. Changes overwrite the
previous instance.
Type 2: Keeps track of historical data by creating multiple records with
different keys
22 |
Example: Change of postal code from M3T 8L0 to M3T 8L1
SCD Types (3-6)
Type 3: Changes are tracked by using separate columns. The number
of changes is limited to the number of columns designated to store
historical data.
Type 4: This method uses two tables; one for the current records and
the other for all or some of the changes
Type 6: Uses a combination of 1, 2 and 3
23 |
Example: Change of postal code from M3T 8L0 to M3T 8L1
Dimension Types
24 |
Regular dimension
Conformed dimensions
Time dimensions
Degenerate dimensions
Parent-child dimension
Snowflake dimension
Junk dimension
Role-playing dimensions
Mini dimensions
Inferred dimensions
Monster dimensions
Static dimension
Multi value dimension
Shrunk dimension
Conformed Dimensions
Definition
Dimensions that are shared
by two or more fact tables.
Two dimensions are conformed
when they are exactly the
same, or one is a perfect
subset of the other - Kimball
Advantages
Saving time and effort
Fact tables sharing
conformed dimensions
can be joined together.
Data consistency between
data marts is ensured
25 |
Conformed Dimensions
Fact1
Fact2
Subject Area 1
Subject Area 2
Time Dimensions
Definition
Provide consistent granularity
for temporal analysis
and reporting
Fiscal Year
Calendar Year
Fiscal Quarter
Calendar
Quarter
Fiscal Month
Calendar
Month
Temporal hierarchies
Year > Quarter > Month > Week
Business-specific attributes
Fiscal Periods
Public Holidays
26 |
Fiscal Week
Calendar
Week
Type of Day
Day of Week
Day
Holiday
Facts and Measures
Definition
Facts describe the business events regarding
the entities
Keys
FKs from all the dimensional tables in the star
The PK is usually a composite
Additive
key that contains dimension FK
Numeric measures
Non-additive
Additive
Non-additive
Semi-additive
Degenerate Dimensions
Factless Fact
27 |
Semi-additive
FactOrders
CustomerKey
SalesPersonKey
TimeKey
Quantity
Cost
Profit
DiscountRate
FactAccountTran
CustomerKey
AccountTypeKey
CreditDebitAmount
AccountBalance
Grain (Granularity)
The grain represents the level of detail of a fact table
The grain of the fact table should always represent the lowest
level for each corresponding dimension.
Create multiple fact tables if multiple grains are required
Example:
28 |
SQL SERVER DW
BEST PRACTICES
29 |
Indexing
Dimension table indexing
Business key
Clustered index
Surrogate key
Non-clustered index
Frequently searched columns
Non-clustered index
Large dimensions
Columnstore indexes
Fact table indexing
DISABLE before load, then REBUILD
30 |
Identity column
Time dimension key
Combination of all the dimension FKs
Clustered index
Frequently searched dimension keys
Non-clustered indexes
Fact tables
Columnstore indexes
Columnstore index
Columnstore index
Column-based in-memory data storage
High compression rates
Improves query performance
Must be partition-aligned (on partitioned tables)
Non-clustered Columnstore Index
Read-Only (DISABLE and REBUILD or partition swap is required)
Can be combined with other indexes on the table
Clustered Columnstore Index (introduced in SQL Server 2014)
31 |
Updatable
The only index on the table
Best for bulk-loaded tables
Cannot be created on tables with XML, varchar(max) columns, etc...
Partitioning
Definition
Splitting the data in a single table or index across multiple filegroups
Best practices
Avoid partitioning dimension tables
Partition large fact tables (on a date key)
Use partition-aligned indexed views
Filter on the partitioning key (WHERE Date = ….)
Benefits
32 |
Improved query performance (achieved by parallelism)
Faster data loading and deletion (~ by sliding window approach)
Increased backup and restore flexibility (~ by multiple filegroups)
Manage indexes at the partition level
Efficient Initial Data Load
Recovery Model
Simple
Bulk-logged
Populate the staging tables in parallel
Use multiple BULK INSERT, BCP or SSIS tasks.
Boost your INSERT INTO … SELECT performance
Use TABLOCK or TABLOCKX query hints
Avoid enforcing foreign key relationships
Create foreign key constraints with NOCHECK option
33 |
SSIS
PERFORMANCE BOOST
34 |
SQL Server Integration Services
SSIS
Feature of SQL Server
Used for ETL (Extract Transform Load)
Data Flow Engine
Defines movement and transformation of data
Control Flow Engine
35 |
Controls the execution of the Data Flow tasks
Executing SQL scripts
Error Handling
Looping
etc…
SSIS Performance Tuning Tips
Optimize queries
Select only the columns that you need
Choose better performing components when possible
LOOKUP components are faster than MERGE JOIN
Use a SQL Server Destination instead of an OLE DB Destination
Don’t use the SCD Transformation for large scale data
Fine tune the components if applicable
Set the IsSorted property (avoid unnecessary sorting)
Use caching in your LOOKUP components
Use FastLoad data access mode in OLE DB Destination
Use Parallel Processing
36 |
SSIS Parallel Processing
Control Flow Parallelism
‘MaxConcurrentExecutables’ package property (default -1)
Data Flow Parallelism
‘EngineThreads’ property (default 10)
37 |
Choosing between T-SQL and SSIS
Performance
T-SQL is processed within the SQL engine (faster)
SSIS tasks are processed in the SSIS memory space
T-SQL MERGE is faster than SCD task in SSIS
Multiple Heterogeneous Sources and Destinations
SSIS is designed to work with different type of sources (easier)
Features
Some features only exist in either T-SQL or SSIS
Skill Set
Which one are you more familiar with?
Ease of Maintenance
SSIS is graphical
38 |
A QUICK GLANCE AT
MICROSOFT BI TOOLS
39 |
Microsoft BI Reporting Tools
High Level Reporting. Dashboards.
Target audience: Executives
Performance Point
High Volume Static reporting
Target audience: End-users
(daily operations). Managers.
Reporting Services (SSRS)
Self-Service Business Intelligence. Ad-hoc reporting
Target audience: Business analysts and casual users
Pivot Table (Part of Excel)
Power Pivot (Excel add-in and SharePoint integration)
Power View (Excel add-in and SharePoint integration)
Power BI (Part of Office 365, Stand-Alone)
Happy Customers
THANK YOU
FOR
YOUR ATTENTION!
42 |
Reference list
Books
Implementing a Data Warehouse with Microsoft SQL Server 2012,
D. Sarka, M. Lah, G. Jerkič , © 2012 by SolidQuality Europe GmbH
The Data Warehouse Toolkit Third Ed., © 2013 by R. Kimball, M. Ross
Websites
http://msdn.com
http://www.sql-server-pro.com
http://www.kimballgroup.com/
Articles
https://ramprapanna.wordpress.com/tag/dwbi-generic/
http://www.dynamics101.com/2013/10/ssis-parallel-processing/
http://blogs.msdn.com/b/sqlcat/archive/2013/09/16/top-10-bestpractices-for-building-a-large-scale-relational-data-warehous.aspx
Sponsors
Gold Sponsors:
Bronze Sponsors:
Swag Sponsors: