Transcript BDBI

BDBI

Radio

B

aker’s

D

ozen

B

usiness

I

ntelligence Webcast Radio

13 Weekly tips on Microsoft SQL Server/BI/SharePoint Technologies http://www.BDBIRadio.com

Week of 02/24/2013: 13 topics for the SQL Server 2012 Columnstore Index

BDBI Radio

• Hosted by Kevin S. Goff, Microsoft SQL Server MVP and author of CoDe Magazine “Baker’s Dozen” Productivity Series • Weekly webcast - Sundays, 11 AM to 12:30 PM EST • Site: http://www.BDBIRadio.com

• Overall blog/site: http://www.KevinSGoff.net

• Program format: – First 60 minutes - 13 quick tips on different SQL/BI/SharePoint topics • In any week, tips might cover one topic (13 features in PowerPivot or 13 SQL Interview topics), or 13 random tips on anything from data warehousing to MDX/DAX programming, to maybe even a few .NET tips for scenarios when the .NET and BI worlds collide – Second 30 minutes , open Q/A (email me at [email protected]

) – Webcast recorded, recordings will be available on website • Once in a while, a special guest interview 5/1/2020 13 topics for Columnstore Index 2

BDBI Radio

Upcoming community events (confirmed): – BI User Group in NYC on March 11 • http://msbigdatanyc.com/ • Topic: Differences between SSAS OLAP and SSAS Tabular – SQL User Group in NYC on April 25 • http://nycsqlusergroup.com/ • Topic TBD – Northern Virginia SQL User Group on May 20 • http://www.novasql.com

• Topic TBD • Other News: – SQL Server 2012 Service Pack 1 available • http://blogs.msdn.com/b/analysisservices/archive/2012/11/07/announcing microsoft-sql-server-2012-service-pack-1-sp1.aspx

5/1/2020 13 topics for Columnstore Index 3

BDBI Radio

• Today: 13 pieces of information for the Columnstore index • New index in SQL Server 2012 • More than just an index • A real game-changer, one of the biggest features in the SQL database engine of all time • Some companies upgraded to SQL 2012 just because of this feature • Represents another example where MS is devoting serious attention to the underlying database engine – Earlier versions of SQL Server (2005) focused largely on language and developer enhancements – SQL 2008 and 2012 have seen underlying database management/engine changes (Change Data Capture and Columnstore Index) 5/1/2020 13 topics for Columnstore Index 4

BDBI Radio

• 13 pieces of information for the SQL 2012 Columnstore index 1.

Quick demonstration 2.

3.

4.

5.

6.

Introduction to the Columnstore Index Characteristics of the Columnstore Index Who benefits from this?

Columnstore indexes vs Rowstore Index Execution plan using the Columnstore index 7.

8.

Batch Mode Processing – new processing mode for the Columnstore index Where the Columnstore index can’t directly be used 9.

Selective vs non-Selective queries – where Columnstore index isn’t used 10. General Usage rules 11. Restriction rules on the Columnstore index 12. Overall Performance Benchmarks 13. New database engine features in SQL v.Next

5/1/2020 13 topics for Columnstore Index 5

BDBI Radio:

1 – Quick Code Example 5/1/2020 13 topics for Columnstore Index 6

BDBI Radio

2 - Introduction to the Columnstore Index • New relational, xVelocity memory-optimized database index in SQL Server 2012, “baked in” to the database engine • More and more functionality in DB engine (xVelocity, CDC) • Potentially Significant performance enhancements for data warehousing and data mart scenarios – a real game changer – (not really for OLTP databases, we’ll see why later) • Best for queries that scan/aggregate large sets of data • My opinion? One of the coolest things ever in SQL Server • In a regular index, indexed data from each row kept together on single page-data in each column spread across all pages of index • In a columnstore index, data from each column is kept together so each data page contains data only from a single column (compressed, more fits in memory, more efficient IO) 5/1/2020 13 topics for Columnstore Index 7

BDBI Radio

3 - Characteristics of the Columnstore Index • Highly compressed - Exploits similarity of data within column • IO Statistics - dramatically reduces # of logical reads!!!

• Not stored in standard buffer pools, but rather in a new optimized cache • Smart IO and caching using read-ahead reads • Part of Microsoft’s xVelocity technology – compression is factor of 8 (and twice as efficient as page compression) • Once posted, only READONLY • Best for data warehouse/mart queries that scan/aggregate large amounts of data–might lower need for OLAP aggregation • Some queries might run at least 10x faster (or more) 5/1/2020 13 topics for Columnstore Index 8

BDBI Radio

4 - Who benefits from this? • Queries and reports against Data Warehouses/Data Marts (works best with Fact/Dimension tables modeled in a star schema) • Load from Data Warehouses/Marts into OLAP Cubes • SSAS OLAP Databases that use the ROLAP methodology or pass through mode • New Analysis Services Tabular Model uses xVelocity engine • Some companies took the release candidate and put into production, simply for this feature (some case studies show queries that went from 17 minutes to 3 seconds!) 5/1/2020 13 topics for Columnstore Index 9

BDBI Radio

5 - Columnstore indexes vs Rowstore Index • Columnstore index stores each column in separate set of pages (vs. storing multiple data rows per page) • Only columns needed are fetched • Easier to compress redundant column data • Uses xVelocity found in PowerPivot • Improved IO scan/buffer hit rates • Segment elimination: each partition is broken into million row segments with metadata for min/max values – segment is not read if query scope does not include min/max values • Query will only fetch necessary columns • In reality, not “really” an index – more like a compressed “cube” 5/1/2020 13 topics for Columnstore Index 10

BDBI Radio

6 - Execution plan using the Columnstore indexColumnstore index

was 5% of the batch

Clustered index was

65% of the batch

Covering index (which

would have been the best approach prior to SQL Server 2012) was 35% of the batch

Time Statistics, 12x

faster than covering index, 20x faster than clustered index

5/1/2020 13 topics for Columnstore Index 11

BDBI Radio

• 7 - Batch Mode Processing – new processing mode for the Columnstore index • Certain execution plan operators (Hash Match in particular) use new Batch execution mode • Reads rows in blocks of 1,000 • Big performance benefit over row-based execution 5/1/2020 13 topics for Columnstore Index 12

BDBI Radio

8 – Where the Columnstore index can’t directly be used • Issue w/OUTER JOIN: can’t use directly against table • Will “work”, but will use slower row execution mode • Must pre-aggregate separately and then do OUTER JOIN (will use batch mode) 5/1/2020 13 topics for Columnstore Index 13

BDBI Radio

9 - Selective vs non-Selective queries – where Columnstore index

isn’t used

5/1/2020 13 topics for Columnstore Index 14

BDBI Radio

10 – General Usage Syntax and Rules • Syntax is simple: use new COLUMNSTORE keyword • 1 Columnstore index per table: cannot be clustered • Order of columns does not matter • Include all columns from table • No INCLUDE statement, No ASC/DESC • General MS recommendation: if queries will frequently use a certainly column on the predicate, create a clustered index on that column and then create the columnstore index. – Even though column store index isn’t “ordered” itself, you’ll get better segment elimination CREATE NONCLUSTERED COLUMNSTORE INDEX [IX_BPO_ColumnStore] ON [BigPurchaseOrderHeader] (PurchaseOrderID, VendorID, OrderDate, ShipMethodID, Freight, TotalDue) 5/1/2020 13 topics for Columnstore Index 15

BDBI Radio • 11 – Restrictions and Rules

• Cannot be clustered, cannot be created against a view • Cannot act as a PK or FK, cannot include sparse columns • Can’t work on tables with Change Data Capture/Change Tracking or FileStream, can’t participate in replication, nor when page/row compression exists • Cannot be used with certain data types, such as binary, text/image, rowversion/timestamp, CLR data types (hierarchyID/spatial), nor with data types created with MAX keyword…e.g. varchar(max • Cannot be modified with ALTER – must be dropped and recreated • It’s a read-only index - cannot insert rows and expect columnstore index to be maintained. 5/1/2020 13 topics for Columnstore Index 16

BDBI Radio

11 – Restrictions and Rules (continued) • Note: range partitioning is supported….(use partitioning to load a table, index it with a columnstore index, and switch it in as newest partition. ) – Partition by day, split the last partition – Load data into staging table and then create columnstore index – Switch it in – SQL Server 2012 permits 15,000 partitions per table) • Not optimized for certain statements (OUTER JOIN, UNION, NOT IN ) • Not optimized for certain scenarios (high selectivity, queries lacking any joining, aggregating) • Not optimized for a JOIN statement on a composite set of columns • Best practice – always use integer keys for FKs 5/1/2020 13 topics for Columnstore Index 17

12 - Columnstore index: Performance Index

Clustered index Non-clustered covering index Column Store index 30000

CPU time (ms)

4337 2246 140

Total Time (ms)

3899 2393 199

Logical Reads

27631 21334 4180

Read-ahead Reads

0 8 12652 25000 20000 15000 10000 5000 0 Clustered Index Non-clustered covering index Column Store Index Total Time (ms) Logical Reads

BDBI Radio

13 - New database engine features in SQL v.Next • Columnstore indexes will be updatable!!! – No longer a readonly index • Will be able to create a columnstore index as a clustered index • Not related directly to columnstore index, but next major version of SQL Server will have a new in-memory row store for OLTP databases (codename “Hekaton”) 19 5/1/2020 13 topics for Columnstore Index