Transcript Document

Doing fast!
Optimizing Query performance
with ColumnStore Indexes in
SQL Server 2012
Margarita Naumova | SQL Master Academy
Agenda
•
•
•
•
What is a columnstore index
How it works
Limits and restrictions
When and how to use it
About me
• Working with SQL Server from v6.5
• MCT from 1998
• SQL Server Trainer and Consultant with over 50 projects
in Bulgaria, Finland, Germany, UK, Greece..
• 5 years Senior Consultant in Microsoft, member of
Microsoft Worldwide Technical Leadership Team
• Microsoft Certified Master SQL Server 2008
• BG SQL UG Leader
• MVP SQL Server
• Now teaching and presenting SQL Server,
running SQL Master Academy Training Program
www.SQLMasterAcademy.com
What is a Columnstore Index
A new column store
in SQL Server Database
Engine
Advanced Query
processing techniques
Accelerates specific
workloads (DW)
Performance Example of CSI
1TB DW Database, 1.44 billion rows in fact table, 32 logical procs
with 256GB RAM
DEMO
CSI Performance Example
A columnstore Index Structure
• Stores data column-wise
• Better compression
– Uses VertiPaq compression
technology
• Less IO
Only needed cols are
fetched
– Better Buffer hit rates
• Pure Column Store
• Fetches only needed
columns from disk
Segment (values for 1
col for a set of about
1mnl of rows)
Advanced Query processing techniques
• Processing mode
– Batch mode
– Row mode
• Best performance - when processes column data
in batches using highly efficient algorithms
• Optimizer makes a cost based decision
• Segment elimination is an additional optimization
• Optimizer allows you to use hints
DEMO
CSI Internals
DO’s for using CSI Effectively
• Put CSI on large tables only
– Large fact tables and very large dimension table
• Include every column of the table in the columnstore
index
• Use in read mostly workload, where most updates are
appending new data
• Workflow permits using partitioning (or drop rebuild
index) to handle new data
• Structure your queries as star joins with grouping and
aggregation as much as possible or entail scanning and
aggregating large amounts of data
ColumnStore Index Optimizations Rules
• DOP>2 and enough memory
• On ColumnStore indexed tables avoid using:
–
–
–
–
joins and filters on string columns
OUTER JOIN
NOT IN (<subquery>)
UNION ALL to directly combine columnstore-indexed
tables with other tables
Loading data
• Table with columnstore index can be read, not
updated
– INSERT, UPDATE, DELETE and MERGE is not
allowed
– Partition switching is allowed
• There are three possible methods for loading
data
– Disabling index
– Partition switching
– UNION ALL (tickle update)
Loading Method 1
• Disable (or drop) the index
– ALTER INDEX ….DISABLE
• Update the table
• Rebuild the columnstore index
• ALTER INDEX … REBUILD
Loading Method 2
• Load new data into a staging table
• Build a columnstore index on the staging
table
– CREATE NONCLUSTERED COLUMNSTORE
INDEX myindex ON Staging(……)
Switch
5/11 Staging table
• Switch the partition into the main table
– ALTER TABLE Staging SWITCH TO
PARTITION 5
Load
DEMO
Loading and
updating data
in CSI
Loading Method 3
•
•
•
•
Build CSI index on primary table
Create staging/delta table with no CSI
Insert new data into (row-based) delta table
Query both tables using UNION ALL to combine the
results
– Perform local-global aggregation using CTE
• aggregate the data from the columnstore and from the delta table in
separate queries
• combine the results with a second level of grouping and aggregation
– Do not make a view with UNION ALL
• Keep delta table small
Other Usage and limitations
• Must have a base table structure – heap or Clustered
Index
• CSI is not appropriate for lookups, seeks and in OLTP as
a general workload type
• Supported data types should be <8 Bytes
• The following data types cannot be included in a
columnstore index:
–
–
–
–
–
–
binary and varbinary, ntext, text, and image
varchar(max) and nvarchar(max)
Uniqueidentifier, rowversion (and timestamp)
decimal (and numeric) with precision greater than 18 digits
datetimeoffset with scale greater than 2
CLR types (hierarchyid and spatial types), xml
Good news
• Most things just work with columnstore indexes
–
–
–
–
–
–
Backup and restore
Mirroring
Log Shipping
SSMS
Administration tools
TDE
CSI Summary
• Benefits
– Interactive experience with data
• Near instant response times
• Ad hoc queries
• Large data sets
– Easy to set up
• No need for summary tables, indexed views
• Fewer indexes to design, create and maintain
• Reduces need to manually tune queries
• Consider limitation in V1 and overcome with
– Good coding practices to achieve batch processing
– Respective method of loading data
Thank you!
www.SQLMasterAcademy.com
[email protected]
Margarita Naumova| SQL Master Academy