SQLBits - Perf Session

Download Report

Transcript SQLBits - Perf Session

Performance Investigations
with Analysis Services 2012
Akshai Mirchandani
[email protected]
Principal Software Design Engineer
Session Overview
• Tabular versus Multidimensional: a very brief overview
• Performance investigation tools and techniques
• Formula engine performance
•
Frequently asked questions/issues
• Common pitfalls for processing performance
• Tuning Analysis Services for high end hardware
• Q&A
Tabular versus Multidimensional
• Common
•
•
•
•
•
Configuring overall server memory
Heap Type
Tuning query thread pools
Data source performance
Tuning MDX queries
Tabular versus Multidimensional
• Multidimensional
• Optimizing disk system
• Processing performance
• Parallel processing of partitions
• Performance of building aggregations
• Performance of calculated members, scope assignments, unary operators,
and other advanced calculations
• Tuning IO thread pools (SSAS 2012)
• Multi-user settings (CoordinatorQueryBalancingFactor and
CoordinatorQueryBoostPriorityLevel)
Tabular versus Multidimensional
• Tabular
•
•
•
•
•
Memory, memory, memory…
DAX query performance
No parallel processing of partitions (yet)
Query performance versus Processing performance?
• Adjusting compression levels lets you favor processing time versus query time
and vice versa
Hardware: CPU and Memory performance
• Much more noticeable for an in-memory system – just like disk performance is
more of an issue for MOLAP
• Often the smaller systems show significantly better performance
• No NUMA issues, and the server-grade CPUs take longer to become
available
Performance Investigation Tools
• Profiler
• XEvents
•
•
•
Lighter weight than Profiler
You can merge the events with a Windows ETW trace and look at CPU, I/O together was SSAS events
Currently the set of XEvents is the same as Profiler events – but in the future newer XEvents should
show up
• MDX
•
Use MDX queries and MDX calculation modifications to identify/isolate performance problems
•
System DMVs like DISCOVER_PARTITION_STAT, DISCOVER_MEMORYUSAGE,
DISCOVER_OBJECT_ACTIVITY.
• DMVs
• ETW (Event Tracing for Windows)
•
•
This is a Windows capability – to efficiently gather fine-grained information about CPU, I/O,
wait states, etc.
This can show which partition/aggregation files are “hot”, and perhaps you can tune them to
be on fast disks
Demo
• XEvents
• Profiler
• SSAS 2012 Calculation Events
• Sample tool
Identifying Query Problems
• Identify the queries that take a long time to run
• Test each query in isolation
• Determine if the issue is primarily in the Formula Engine
(FE) or the Storage Engine (SE)
• Test the parts of the query
•
•
Ensure to test cold/warm cache
Ensure to test after running clean cache
• After identifying individual query performance, ensure
you are also running concurrency query tests
Identifying the Long Duration Queries
•
•
•
•
•
•
Run a trace that saves to a database table
Select TOP X
Filter by the Query End event (EventClass ID = 10)
Order by Duration
Include the TextData field
Include the CurrentTime field
To identify shorter running queries that run a lot and burn CPU
• Order by Duration Descending
• GROUP BY TextData field
• SUM by Duration
Determining Storage Engine or Formula Engine
• Run a trace while testing the query
• Sum the duration field for the Subcube (EventClass = 11)
•
This is the amount of time the query spends in the Storage Engine
•
•
This is the amount of time the query spends in the Formula Engine
Also look at the count of subcube events; many == FE issue
•
•
Query duration is <30% SE means you focus on FE.
Query duration is >= 30% SE means you first need to rule out SE before focusing FE
• Subtract the sum of the duration field from the duration in the
Query Cube End event (EventClass = 10)
• Do this both with a cold cache and a warm cache
• Focus on Storage Engine or Formula Engine issues depending
on:
Testing the Parts of the Query
Continued
• Comment out all script commands in the MDX script
•
If it's slow with the full MDX script but fast with an empty MDX script, then use MDX Script
Performance Analyzer to identify the problem MDX script assignment:
http://mdxscriptperf.codeplex.com/
• It may be necessary to start from a stripped down cube, with all
of the following removed:
•
•
•
•
•
•
The MDX script, except for the Calculate command
All unary operators
All measure expressions
All custom member formulas
All Semi-additive measures
All Custom Rollup properties
•
http://support.microsoft.com/kb/2458438
•
Note that they are expensive, so avoid turning them on in production!
• If on SQL Server 2008 R2, run Profiler because new trace events
to track resource usage and locks
• If on SQL Server 2012, take a look at the new advanced
calculation trace events
Checking on Partitions and Aggregations
• If the query is spending the majority of the time in the
Storage Engine, examine the partitions and
aggregations
• Look in the trace for the partitions that the query is
hitting to ensure that no unnecessary partitions are
being queried
•
•
For example, if you are querying only 2011 data, the query should not also be
hitting 2009 and 2010 partitions
In that case, is there something wrong with the MDX?
Or is there something wrong in the slice or data within the partition?
•
Consider using Query Subcube Verbose to understand more details
•
• Look in the trace to see if any aggregations are being
used by the query
• If no aggregations are being used, consider adding
appropriate aggregations for the query
Common Formula Engine slow downs
• Block mode – won’t go into this
• Lots of discussion in the performance guide
• Stop by the SQL Clinic if you have questions
• Rewriting crossjoins to put hierarchies from
the same dimension together
• Non Empty
• Removing arbitrary sets
• Not overwriting an empty cell with a non-null
value
Putting Hierarchies From the Same
Dimension Together
•
Sometimes you see:
•
Why not do this instead?
•
Advantages?
− [Product Category] * [Year] * [Product Subcategory] * [Product]
− [Product Category] * [Product Subcategory] * [Product] * [Year]
•
•
•
Auto-Exists can optimize the latter pattern to query each dimension independently
If the server has to do “cross dimensional auto-exists”, huge performance hit
Particularly if there is an arbitrary shape in the subselect
•
•
A different algorithm, which can be much more expensive
•
Performance optimizations in R2 and 2012 to improve this, but it still helps
•
Usually it’s a matter of educating users of Excel to “first drag all the product attributes,
and then drag Years”
Or better, try to hide the attribute hierarchies and only give them a user hierarchy
On occasion it’s unavoidable
•
Non Empty
• Related to auto exists
• Two algorithms:
• Naïve and Express
• Trace events to show which algorithm
• Issues:
• Non Empty can be expensive
• Not using Non Empty is not feasible
• Watch out for query patterns:
•
•
•
•
Non Empty
Arbitrary shapes in subselect
Calculations
Causes Non Empty to first apply auto-exists, and then apply Non Empty
• Applying auto-exists can be *very* expensive
Arbitrary Shapes
• An arbitrary shape is any set that cannot be expressed as a
crossjoin of members from the same level of an attribute
hierarchy
• Arbitrary shapes can only use the query cache if they are used
in a subselect, in the WHERE clause, or in a calculated member
•
•
•
Arbitrary Set:
{(Food, USA), (Drink, Canada)}
{customer.geography.USA, customer.geography.[British Columbia]}.
• Note that an arbitrary shape on the query axis does not limit
the use of any cache
Processing Pitfalls (Dimensions)
• SELECT DISTINCT …
• Do you have appropriate indexes in the relational
database?
• Are you using the fastest provider?
•
Wherever possible, use a native provider (except…)
•
E.g. Network Packet Size?
•
•
The “key” attribute is the most expensive part of dimension processing
Can you optimize it?
•
Note that I don’t actually recommend it, but it can make a positive difference in
some scenarios
• Have you optimized data access?
• Attribute relationships?
• Consider/test processing with ByTable (versus
ByAttribute)
Processing Pitfalls (Dimensions)
• Why is ProcessUpdate so slow!!!??
• It has to fetch *all* the data in the dimension
• Check which members already exist, update them (track if they
changed)
• If a new member shows up, add it
• It has to invalidate indexes/aggregations in partitions that
reference it
• It has to rebuild indexes/aggregations
• Optimizations?
• Rigid attribute relationships help this, but hurt flexibility
• Discussion…
Processing Pitfalls (Partitions)
• Processing in parallel
− MaxActiveConnections
− Enough threads?
• Speed of reading data
− How fast is your data source?
− Speed of looking up dimension keys – too many dimensions?
• Building indexes and aggregations
−
−
−
−
−
Quota limitations
DataStorePageSize?
NUMA?
Reduce aggregations? Number and size…
Reduce indexes?
Tuning AS for High End Hardware (1)
• What is your scenario?
•
•
•
•
Throughput?
Processing window?
Query response time?
All of the above? Sigh…
•
Fruits of it also show up in the 2008 R2 Operations/Performance guides
•
•
•
For machines with lots of cores and active threads, Windows heap gives better perf
AS heap is faster for lower concurrency scenarios
Windows Heap is the new default in SSAS 2012
•
•
•
•
Pages allocated from the 8 KB page pool show contention
For large results (e.g. processing aggregations/indexes), this can hurt significantly
Using 64 KB avoids this contention
In SSAS 2012, this size is auto-selected – it picks 64 KB for storage engine caches
• Lots of effort went into this area for SSAS 2012
• Windows Heap instead of AS Heap
• DataStorePageSize/DataStoreHashPageSize
Tuning AS for High End Hardware (2)
• NUMA (Non-Uniform Memory Access)
• High end hardware with lots of cores is usually NUMA
• Causes slowdowns in performance
•
“I upgraded from 16 cores to 32 cores and my performance is about the same (or worse)”
•
•
An attempt was made to return pages from “nearby memory”
Some more performance work has been done to tune this in SSAS 2012
•
If a partition file is first scanned on NUMA node 0, the cache pages are allocated from NUMA node 0
memory
Later accesses on any other NUMA node will pay a greater price to access this “distant memory”
SSAS 2012 added a new I/O Process thread pool
The I/O Process thread pool handles all segment scans
In fact, under the covers there are multiple I/O Process thread pools created – one per NUMA node (**)
A partition chooses one of those thread pools and will try to stick with it – thus attempting to avoid the
problem
• Some work was already done in 2005
• Windows File System Cache is affected by the same problem
•
•
•
•
•
** Caveats exist – default is split the thread pools only if there are >= 4 NUMA nodes
Tuning AS for High End Hardware (3)
• Other bottlenecks
• Can turn off Resource Monitoring with a configuration setting
• The results of some DMVs will become invalid (no error is raised)
• Can configure Analysis Services to open files in RANDOM mode
• This changes the way Windows handles the pages in the file system cache
• It can generate some good performance gains on high end hardware
• Risk: only do it if the machine has enough memory – the setting will cause
pages to stay in memory longer and the system can start choking
Additional Resources
•
•
•
•
•
Analysis Services Maestros training
Analysis Services Performance Guide
Analysis Services Operations Guide
SQL CAT team
http://blogs.msdn.com/b/psssql/archive/2012/01
/31/analysis-services-thread-pool-changes-in-sqlserver-2012.aspx
?
!