Performance Tuning Cubes and Queries in Analysis Services 2008 Chris Webb [email protected] www.crossjoin.co.uk Who am I? • Chris Webb ([email protected]) • Independent consultant specialising in Analysis Services.

Download Report

Transcript Performance Tuning Cubes and Queries in Analysis Services 2008 Chris Webb [email protected] www.crossjoin.co.uk Who am I? • Chris Webb ([email protected]) • Independent consultant specialising in Analysis Services.

Performance Tuning Cubes and
Queries in Analysis Services 2008
Chris Webb
[email protected]
www.crossjoin.co.uk
Who am I?
• Chris Webb ([email protected])
• Independent consultant specialising in
Analysis Services and MDX:
http://www.crossjoin.co.uk
• Blogger: http://cwebbbi.spaces.live.com
• Author:
– MDX Solutions 2nd Edition
– Expert Cube Development With Analysis Services
2008
Agenda
•
•
•
•
•
•
Designing for Performance
How Analysis Services Answers Queries
Partitions
Aggregations
Formula Engine bottlenecks
Caching
Designing for Performance
• Some features give increased flexibility at a cost
to performance
– Eg parent/child hierarchies, referenced relationships,
unnatural hierarchies, many-to-many, distinct count,
semi-additive measures...
• Mistakes made in modelling the data warehouse
are often ‘papered over’ using SSAS features
– Eg calculations being done in MDX when they should
have been done in the ETL
• Design decisions should always be made with
query performance in mind
Attribute Relationships
• Attribute relationships describe the one-tomany relationships between attributes in your
dimensions
• Will probably not be optimal after you’ve run
the Dimension wizard – only knows about
foreign key relationships
• Are very important for query performance
Parent/Child Hierarchies
• Parent/child hierarchies are very flexible, but can
perform badly sometimes
– Especially where the hierarchy contains thousands of
members
• If you know the maximum depth of the hierarchy,
use a ragged hierarchy instead
– Although ragged hierarchies can also perform badly
• The ‘parent/child hierarchy naturaliser’ in BIDS
Helper can do this automatically
SSAS Query Processing
MDX Query In
Cellset Out
Cache
Formula Engine
works out what data is
needed for each query,
and requests it from the
Storage Engine
Query Subcube
Requests
Cache
Disk
Storage Engine
handles retrieval of
raw data from disk,
and any aggregation
required
Tuning the Storage Engine
• Tuning the Storage Engine is all about:
– Reducing the time taken to read data from disk
– Reducing the time taken to aggregate that data to
the required granularity
• As a result, disk and CPU have an important
role to play in Storage Engine performance
• However, throwing hardware at the problem is
rarely the whole answer
Partitions
• Measure groups can be divided up into multiple
partitions
– Enterprise Edition only
• You should partition your measure groups to reflect the
slices that users are likely to use in their queries
– Usually by the Time dimension
• Partitioning improves query performance in two ways:
– It reduces overall IO because AS should only scan the
partitions containing the data requested
– It increases parallelism because AS can scan more than
one partition simultaneously
Partitions
• You should be partitioning so that a single
partition contains between 2 million rows and
30 million rows of data
• Although SSAS should auto-detect the slice of
a partition, it is good practice to set the Slice
property manually, even on MOLAP partitions
• In Profiler, Progress Report Begin/End events
show which partitions are being read
Aggregations
• Aggregations are pre-calculated sets of summary
values
– Similar to what is returned by a GROUP BY query in SQL
• Building aggregations improves query performance by
reducing the amount of calculation done at query time
• Aggregations start to show an effect on regular Sum
measures where partition size is greater than a few
million rows
• A Measure Group can have multiple Aggregation
Designs
• A Partition can be associated with one Aggregation
Design – aggregations are built on a per-partition basis
Aggregation Design Methodology
1. Make your dimension design as ‘clean’ as
possible – delete or disable unnecessary
attributes, set attribute relationships
2. Set the AggregationUsage property
appropriately on each Cube dimension
3. Run the Aggregation Design Wizard
1. First using the ‘I Click Stop’ option
2. Then, if this creates too many aggregations, using
‘Performance Gain’ of 30%
Aggregation Design Methodology
4. Set up Query Logging and run Usage-Based
Optimisation
5. If necessary, design aggregations manually
using BIDS Helper or the Aggregation Design
tab
6. Check aggregations are being used by looking
for the Get Data From Aggregation event in
Profiler
The Formula Engine
• If Storage Engine requests make up a negligible
percentage of the time taken by the query, then the
Formula Engine is the problem
– Profiler and MDX Studio can be used to find out if this is
the case
– MDX Script Performance Analyser can help to work out
which, if any, calculation on the cube is the problem
• Another giveaway sign is that when a query runs on a
multi-processor server, only one CPU shows activity
– The Formula Engine is single-threaded
• Rewriting MDX calculations can result in massive gains
in performance
Writing Efficient MDX
• As with all code, using efficient algorithms is very important
– It’s important to avoid doing expensive operations more than is
necessary
– Named sets can store the result of expensive set operations
– Calculated members can cache the result of expensive numeric
calculations
• In the implementation of these algorithms, it’s important to
know which MDX functions perform best
– MDX Studio can tell you which functions to avoid
– Will vary by version and SP
• Calculations that execute in ‘bulk mode’ can perform many
times better than those that execute in ‘cell-by-cell’ mode
Caching
• Caching can take place in the Storage Engine
and the Formula Engine
• Caching is extremely important to query
performance
• SE caching is very reliable, and caches can be
shared between all users
• FE caching is much more temperamental...
Formula Engine Caching
• There are three different ‘scopes’ or lifetimes of a FE
cache:
– Query – for calculations defined in the WITH clause of a
query, the FE values can only be cached for the lifetime of
the query
– Session – for calculations defined for a session, using the
CREATE MEMBER statement executed on the client, FE
values can only be cached for the lifetime of a session
– Global – for calculations defined in the cube’s MDX Script,
FE values can be cached usually until processing takes
place
• Global scope is best from a performance point of view!
Restrictions on FE Caching
• Any calculations defined in the WITH clause of
a query will force query scope for the FE cache
• As will the use of cell security
• As will the use of non-deterministic functions
like NOW()
• With dimension security, calculation caches
can only be shared between users who have
the same permissions
Exploiting FE Caching
• Caching numeric values, so that expensive
calculations are only performed once, can be
very important for calculation performance
• Use of certain MDX functions and constructs
can force Query scope for FE caching, and hurt
performance
Cache Warming
• Data can be loaded into the SE cache by
executing CREATE CACHE statements
• The FE cache can only be warmed by running
MDX queries
• Cache warming involves automatically loading
data into the cache to improve query
performance for real users
– SSIS packages can be used to do this
Thanks!
Links
• http://www.packtpub.com/article/queryperformance-tuning-microsoft-analysisservices-part1
http://www.packtpub.com/article/queryperformance-tuning-microsoft-analysisservices-part2