Transcript Document
--A Gem of SQL Server 2012, particularly for Data Warehousing-- Present By Steven Wang Steven Wang • Senior DBA/Senior BI Specialist at BNZ • MCITP/MCTS: BI Developer, Database Developer and Database Administrator • Blog: www.msbicoe.com • Email: [email protected] Agenda • • • • • • • • • The Nitty-gritty of Columnstore Indexes (Demo) Columnstore Indexes Under the Hood (Demo) Columnstore Indexes Performance Tuning (Demo) The Limitations of the Columnstore Indexes How to load data for a colunstore indexed table The Columnstore Indexes Best Practises The Impact of Columnstore Indexes on BI Resources Q&A The Nitty-gritty of Columnstore Indexes • What is columnstore? The Nitty-gritty of Columnstore Indexes • What is columnstore? Row-Store --Serialize all of the attribute values in a row together, then the values in the next row, and so on. ProductKey …………... DateKey …………... ResellerKey …………... Quantity …………... Price …………... Amount …………... Columns-Store --Serializes all of the values of an attribute (column) together The Nitty-gritty of Columnstore Indexes • Why columnstore? --It’s All about I/O; --It’s all about compression efficiency; --In columnstore, compression algorithms are working better and a higher compression ratio can be achieved; In Memory --Columnstore compression improves CPU performance; --The maximum memory is limited by OS,1TB? 2TB?; --Get only columns which are queried; The Nitty-gritty of Columnstore Indexes • SQL Server 2012 Columnstore Indexes For the current release, SQL server 11 RTM,only nonclustered columnstore indexe is allowed A nonclustered index covers all possible columns and stores data in column-wise fashion. Data is highly compressed by using xVelocity Engine (aka, “Vertipaq”) A new “Batch mode” execution model is introduced A columstore index breaks each column into 1 million row chunks called segment. Delivers order-of-magnitude gains for DW queries Base table is not able to INSERT, UPDATE, DELETE while a nonclustered columnstore index exists The Nitty-gritty of Columnstore Indexes • How to create a columnstore index? CREATE NONCLUSTERED COLUMNSTORE INDEX [ci_FactResellerSales] ON [dbo].[FactResellerSales] ( [ProductKey], [OrderDateKey], [DueDateKey], [ShipDateKey], [ResellerKey], [UnitPrice], [ExtendedAmount], [UnitPriceDiscountPct], [DiscountAmount], [ProductStandardCost], [ShipDate] ); The Nitty-gritty of Columnstore Indexes • Demo Lots of fundamental of columnstore indexes Performance difference between B-Tree and CI And more… Columnstore Indexes Under the Hood • Data is highly compressed. Dramatically reduced IO. More Data can fit into memory. Row Store Column Store Column Store, Compressed Columnstore Indexes Under the Hood • Data is highly compressed. Dramatically reduced IO. More data can fit into memory. Row Store Column Store, Compressed Columnstore Indexes Under the Hood Run-length Encoding How data is compressed? RLE Columnstore Indexes Under the Hood Dictionary Encoding How data is compressed? Columnstore Indexes Under the Hood • A vector-based query execution method called “Batch Mode” processing is implemented. Dramatically reduced CPU consumption time. A Batch is an object that contains about 1000 rows Columnstore Indexes Under the Hood • Segment elimination can skip large chunks of data to speed up scans. A segment a 1 million of rows Each segment stores min and max value If no rows qualify, then the entire segment is skipped for scan Columnstore Indexes Under the Hood • Demo Columnstore Indexes Performance Tuning • Maximizing the use of “Batch Mode” processing The Limitations of the Columnstore Indexes • Base table is not able to insert, delete and update data • For current release, no clustered columnstore index • Columnstore Indexes are not designed for needle-in-the-haystack kind of queries. No Seek. Data Type not allowed: Other restrictions: No Sparse column No unique columnstore index No customer sort option for columns No replication No change tracking No change data capture No filestream column Binary and varbinary Varchar(max) and nvarchar(max) Decimal/Numeric precision > 18 Ntext, text, and image Uniqueidentifier Rowversion Sql-variant Datetimeoffset(>2) CLR data type XML How to load data for a colunstore indexed table • There is really no simple way that loads data into a columnstore indexed table • Columnstore is not born for data updating • 3 possible ways: Drop columnstore index, load the data, recreate. (or disable, load the data, rebuild) Partition the columnstore indexed table and using partition switching. Seems the best way. Using t 2 tables, one with historical data with columnstore index, another is just a normal table. Complicated, lots of maintenance The Columnstore Indexes Best Practises • Include all columns in the columnstore index whenever possible • Put columnstore indexes on large tables only • Consider to create a clustered index on columns which are frequently used, like date column • Structure your queries as star joins with grouping and aggregation as much as possible • Avoid joins and string filters directly on columns of columnstore indexed tables. • Whenever possible, avoid constructing queries with outer join, Union all, and not in directly on columnstore indexed • Using integer whenever possilbe • Using table partitions The Impact of Columnstore Indexes on BI • • • • • Data type consideration for Data warehouse designing The columns chosen for creating a clustered index Partition clustering ROLAP vs MOLAP DirectQuery for the Tabular BI semantic model Resources • Eric Hanson: SQL Server Columnstore Performance Tuning http://social.technet.microsoft.com/wiki/contents/articles/4995.sqlserver-columnstore-performance-tuning.aspx • BOL: Columnstore Indexes http://msdn.microsoft.com/en-us/library/gg492088.aspx • Joe Sack: Exploring Columnstore Index Metadata, Segment Distribution and Elimination Behaviors http://www.sqlskills.com/blogs/joe/post/Exploring-Columnstore-IndexMetadata-Segment-Distribution-and-Elimination-Behaviors.aspx • Benjamin Nevarez: Improve the Performance of Data Warehouse Queries with Columnstore Indexes http://www.sqlmag.com/article/sqlserverdenali/data-warehousequeries-columnstore-indexes-141712 • Stavros Harizopoulos, Daniel Abadi: Column-Oriented Database system Q&A