Advantages using DAX in Tabular data model

Transcript Advantages using DAX in Tabular data model

SQL Server
as a Data Warehousing Platform
Sponsors
Gold Sponsors:
Bronze Sponsors:
Swag Sponsors:
2 |
About me
 DW / BI Consultant @
 7 years professional experience
 ASP.NET, C# Web Development
 ETL and Business Intelligence
 Contacts
[email protected]
www.linkedin.com/in/stakvorian
3 |
Agenda






4 |
Overview
Basic Data Warehouse Architectures
Dimensional Modeling
SQL Server DW Best Practices
SSIS Performance Boost
A quick glance at Microsoft BI Tools
OVERVIEW
5 |
Why Do I Need a Data Warehouse?
 Where is my data?
 Physically stored across multiple platforms
(potentially heterogeneous)
 Over different subsets of the organization’s data
 When did it changed?
 Online applications keep a very limited set of historical data
 How can I access my data?
 Online systems do not allow power
users to access it
 How long would it take?
 Transactional applications are
modeled to optimize the speed of
inserts and updates.
6 |
What is a Data Warehouse?
 Definition: A centralized store of business data that can be used for
reporting and analysis to inform business decisions.





7 |
Subject-orientated
Integrated
Time variant
Non-volatile
In support of
management’s
decisions
Online Applications vs. the EDW
Aspect
Online Applications
EDW environment
User
Function
Unit of work
DB Design
Data
Usage
Access
# of Records
# of Users
DB size
Tools
Front-office Personnel
Back-office Personnel
Day to Day Operations
Decision Support
Short Transactions
Complex Queries
App-oriented, Normalized
Subject-oriented, De-normalized
Current, Detailed
Historical, Summarized
Real Time
Structured and Ad-hoc
Read and Write
Read Only
Tens
Millions
Thousands
Hundreds
100 MB to 1 GB
100 GB to 10 TB
Applications specific
Various reporting products.
8 |
Data Mart
 Definition
 Set of processes and structures that support business requirements
that are specific to a business unit.
“Hello, I am a
data warehouse.”
9 |
“And I am a
data mart.”
BASIC ARCHITECTURES
10 |
Hub and Spoke
‘You can catch all the minnows in the ocean and stack them
together and they still do not make a whale.’
Bill Inmon


Normalized, relational data warehouse, typically in 3NF(atomic data)
Dependent data marts (mostly summarized data)
Source
Systems
11 |
Data
Acquisition
Data
Integration
Data
Warehouse
Data
Marts
End-user access
and applications
Data Warehouse Bus
“The data warehouse is nothing more than the union
of all the data marts”
Ralph Kimball


Dimensional data marts linked by conformed dimensions
One mart is created for a single business process
DW
Bus
Source
Systems
12 |
Data
Acquisition
Data
Integration
Data
Marts
End-user access
and applications
Independent Data Marts


Source
Systems
13 |
Data marts are loaded independently
Data marts are not consolidated via DW bus
Data
Acquisition
Data
Integration
Data
Marts
End-user access
and applications
How to Decide?

Inmon’s Approach
Enterprise view
Easy maintenance
High initial cost
Longer start-up time

Kimball’s Approach
Low initial cost
Shorter start-up time
Low reusability
Difficult maintenance
14 |

Independent DM
Shortest start-up time
Low cost
Incompatibility issues
Low reusability
No enterprise view
DIMENSIONAL MODELING
15 |
Dimensional Model
 Dimensional Model
 Denormalized. Organized for understandability and ease of reporting
rather than update
 Dimensions store properties for a specific entity
 Facts describe the business events regarding the entities
Time

Business questions




Product Line
Customer
Cost by product
Order quantity by product
Sales revenue by customer
Profit by region
Cost
Quantity
Revenue
Profit
Region
Product
Salesperson
16 |
Star Schema
 Fact table in the center
surrounded by dimensional
tables (in the shape of star)
 Denormalized dimensions
 Simple queries
 Effortless design
 Optimized for fast
aggregations
 Surrogate keys
are used
17 |
Snowflake Schema
 Fact tables and dimensions are related to further dimensions
 Normalized dimension tables
 Complex levels of
relationship
 Complex queries
 Optimized for
less disk space usage
 Surrogate keys
are used
18 |
Surrogate Keys
 Definition
 Unique identifier that is used as a substitute for a natural key
 Preferably simple integer
 Advantages
 Eliminates the need for composite keys in dimension tables
 Optimizes the JOIN process
 Limits the impact caused by changes in natural key format
19 |
Dimensions
 Definition
 Dimensions store properties for a specific entity
 No records are deleted. Instead, they are expired
 Keep data at its lowest level of granularity
 Keys
 The primary key is a surrogate key
 The business key is also retained
 Attributes
 Typically textual fields used for filtering
and query result set labeling
 Examples
 Product, Employee, Geography, Date
20 |
Slowly Changing Dimensions (SCD)
 Definition
 It is an optimization technique which leverages the knowledge
that data changes very infrequently and stores data over time in
a very efficient way
 Types:
 0, 1, 2, 3, 4 and hybrids
21 |
SCD Types (0-2)
 Type 0: No effort has been made to deal with the issues
 Type 1: Keeps the latest version of any record. Changes overwrite the
previous instance.
 Type 2: Keeps track of historical data by creating multiple records with
different keys
22 |
Example: Change of postal code from M3T 8L0 to M3T 8L1
SCD Types (3-6)
 Type 3: Changes are tracked by using separate columns. The number
of changes is limited to the number of columns designated to store
historical data.
 Type 4: This method uses two tables; one for the current records and
the other for all or some of the changes
 Type 6: Uses a combination of 1, 2 and 3
23 |
Example: Change of postal code from M3T 8L0 to M3T 8L1
Dimension Types










24 |
Regular dimension
Conformed dimensions
Time dimensions
Degenerate dimensions
Parent-child dimension
Snowflake dimension
Junk dimension
Role-playing dimensions
Mini dimensions
Inferred dimensions
 Monster dimensions
 Static dimension
 Multi value dimension
 Shrunk dimension
Conformed Dimensions


Definition
 Dimensions that are shared
by two or more fact tables.
 Two dimensions are conformed
when they are exactly the
same, or one is a perfect
subset of the other - Kimball
Advantages
 Saving time and effort
 Fact tables sharing
conformed dimensions
can be joined together.
 Data consistency between
data marts is ensured
25 |
Conformed Dimensions
Fact1
Fact2
Subject Area 1
Subject Area 2
Time Dimensions
 Definition
 Provide consistent granularity
for temporal analysis
and reporting
Fiscal Year
Calendar Year
Fiscal Quarter
Calendar
Quarter
Fiscal Month
Calendar
Month
 Temporal hierarchies
 Year > Quarter > Month > Week
 Business-specific attributes
 Fiscal Periods
 Public Holidays
26 |
Fiscal Week
Calendar
Week
Type of Day
Day of Week
Day
Holiday
Facts and Measures
 Definition
 Facts describe the business events regarding
the entities
 Keys
 FKs from all the dimensional tables in the star
 The PK is usually a composite
Additive
key that contains dimension FK
 Numeric measures
Non-additive
 Additive
 Non-additive
 Semi-additive
 Degenerate Dimensions
 Factless Fact
27 |
Semi-additive
FactOrders
CustomerKey
SalesPersonKey
TimeKey
Quantity
Cost
Profit
DiscountRate
FactAccountTran
CustomerKey
AccountTypeKey
CreditDebitAmount
AccountBalance
Grain (Granularity)
 The grain represents the level of detail of a fact table
 The grain of the fact table should always represent the lowest
level for each corresponding dimension.
 Create multiple fact tables if multiple grains are required
 Example:
28 |
SQL SERVER DW
BEST PRACTICES
29 |
Indexing
 Dimension table indexing
Business key
Clustered index
Surrogate key
Non-clustered index
Frequently searched columns
Non-clustered index
Large dimensions
Columnstore indexes
 Fact table indexing
 DISABLE before load, then REBUILD
30 |
Identity column
Time dimension key
Combination of all the dimension FKs
Clustered index
Frequently searched dimension keys
Non-clustered indexes
Fact tables
Columnstore indexes
Columnstore index
 Columnstore index




Column-based in-memory data storage
High compression rates
Improves query performance
Must be partition-aligned (on partitioned tables)
 Non-clustered Columnstore Index
 Read-Only (DISABLE and REBUILD or partition swap is required)
 Can be combined with other indexes on the table
 Clustered Columnstore Index (introduced in SQL Server 2014)




31 |
Updatable
The only index on the table
Best for bulk-loaded tables
Cannot be created on tables with XML, varchar(max) columns, etc...
Partitioning
 Definition
 Splitting the data in a single table or index across multiple filegroups
 Best practices




Avoid partitioning dimension tables
Partition large fact tables (on a date key)
Use partition-aligned indexed views
Filter on the partitioning key (WHERE Date = ….)
 Benefits




32 |
Improved query performance (achieved by parallelism)
Faster data loading and deletion (~ by sliding window approach)
Increased backup and restore flexibility (~ by multiple filegroups)
Manage indexes at the partition level
Efficient Initial Data Load
 Recovery Model
 Simple
 Bulk-logged
 Populate the staging tables in parallel
 Use multiple BULK INSERT, BCP or SSIS tasks.
 Boost your INSERT INTO … SELECT performance
 Use TABLOCK or TABLOCKX query hints
 Avoid enforcing foreign key relationships
 Create foreign key constraints with NOCHECK option
33 |
SSIS
PERFORMANCE BOOST
34 |
SQL Server Integration Services
 SSIS
 Feature of SQL Server
 Used for ETL (Extract Transform Load)
 Data Flow Engine
 Defines movement and transformation of data
 Control Flow Engine





35 |
Controls the execution of the Data Flow tasks
Executing SQL scripts
Error Handling
Looping
etc…
SSIS Performance Tuning Tips
 Optimize queries
 Select only the columns that you need
 Choose better performing components when possible
 LOOKUP components are faster than MERGE JOIN
 Use a SQL Server Destination instead of an OLE DB Destination
 Don’t use the SCD Transformation for large scale data
 Fine tune the components if applicable
 Set the IsSorted property (avoid unnecessary sorting)
 Use caching in your LOOKUP components
 Use FastLoad data access mode in OLE DB Destination
 Use Parallel Processing
36 |
SSIS Parallel Processing
 Control Flow Parallelism
 ‘MaxConcurrentExecutables’ package property (default -1)
 Data Flow Parallelism
 ‘EngineThreads’ property (default 10)
37 |
Choosing between T-SQL and SSIS
 Performance
 T-SQL is processed within the SQL engine (faster)
 SSIS tasks are processed in the SSIS memory space
 T-SQL MERGE is faster than SCD task in SSIS
 Multiple Heterogeneous Sources and Destinations
 SSIS is designed to work with different type of sources (easier)
 Features
 Some features only exist in either T-SQL or SSIS
 Skill Set
 Which one are you more familiar with?
 Ease of Maintenance
 SSIS is graphical
38 |
A QUICK GLANCE AT
MICROSOFT BI TOOLS
39 |
Microsoft BI Reporting Tools
 High Level Reporting. Dashboards.
 Target audience: Executives
 Performance Point
 High Volume Static reporting
 Target audience: End-users
(daily operations). Managers.
 Reporting Services (SSRS)
 Self-Service Business Intelligence. Ad-hoc reporting





Target audience: Business analysts and casual users
Pivot Table (Part of Excel)
Power Pivot (Excel add-in and SharePoint integration)
Power View (Excel add-in and SharePoint integration)
Power BI (Part of Office 365, Stand-Alone)
Happy Customers
THANK YOU
FOR
YOUR ATTENTION!
42 |
Reference list
 Books
 Implementing a Data Warehouse with Microsoft SQL Server 2012,
D. Sarka, M. Lah, G. Jerkič , © 2012 by SolidQuality Europe GmbH
 The Data Warehouse Toolkit Third Ed., © 2013 by R. Kimball, M. Ross
 Websites
 http://msdn.com
 http://www.sql-server-pro.com
 http://www.kimballgroup.com/
 Articles
 https://ramprapanna.wordpress.com/tag/dwbi-generic/
 http://www.dynamics101.com/2013/10/ssis-parallel-processing/
 http://blogs.msdn.com/b/sqlcat/archive/2013/09/16/top-10-bestpractices-for-building-a-large-scale-relational-data-warehous.aspx
Sponsors
Gold Sponsors:
Bronze Sponsors:
Swag Sponsors:

Advantages using DAX in Tabular data model

Transcript Advantages using DAX in Tabular data model

Directory