Transcript Slide 1

Columnstore Indexes in SQL Server
2012
Conor Cunningham
Principal Architect, Microsoft SQL Server
[email protected]
Representing Microsoft Development Center Serbia
What This Talk Covers
• SQL Server’s upcoming “Denali” release contains
a new feature for Data Warehouses to speed up
Data Warehouse queries
• This talk provides an overview of the new surface
area and some details about how it works
Who am I?
• I’ve worked at Microsoft for the SQL Core Engine
team as an Architect for many years
• I work mostly on Query Processors
• I wrote the SQL 2008 Internals book on how the
Query Optimizer works
• I blog at “Conor vs. SQL”
• I like to talk to customers about how they use the
product so that I improve things in future releases
Agenda
•
•
•
•
Data Warehouse Introduction
New Feature and Demo
How the Feature Works
Restrictions in this release
Data Warehouse Introduction
• Data Warehouses support reporting and business
intelligence operations in organizations
• Store facts that can be aggregated over different
dimensions
• They often store lots and lots of facts (rows)
– This leads to a design pattern called a star schema
where fact tables are “over”-normalized to reduce row
width
– Dimension tables are frequently joined
– Results are very often aggregated
• Example: Show me the sales totals for each department
by month for the past 3 years
6
Data Warehouse Challenges
• These kinds of databases become difficult once
they get big.
–
–
–
–
–
–
Query latency
ETL load times
Backup time and size
Index rebuilding
Finding time to load new data
Query plan selection issues/limitations
Opportunity
• What If…
– We made DW queries 10+ times faster?
• Example – Business Analyst does ROLAP
reports against SQL Server 2008
–
–
–
–
Click to drill down into a report
Go get some coffee
Click again
Go get more coffee
• We aim to make that experience interactive
– (However, coffee shop profits may plunge!)
8
Demo
9
How Does It Work?
• New Index Type – ColumnStore
• New Query Execution Algorithms – “Batch” mode
• Specifically Target Star Join Queries
▫
▫
Not all queries are faster in the initial release
Customers will want to consider this in their
application design
Supported Pattern:
SELECT SUM(…), cols FROM FactTbl JOIN DimTbl1
JOIN DimTbl2 … WHERE … GROUP BY cols
Index Storage Design
•
Column-Orientation
– Store data vertically instead of per-row
– String Dictionaries for variable-length data
•
•
Segment data into groups (1 million rows/group)
Benefits
–
–
–
–
DW queries usually pick only a subset of columns
You can do the IO only for those columns
We can also compress that data effectively since it often has lots of duplicates
Space savings of 1.5x-2x vs. a row-based page-compressed equivalent
IO Patterns for
(CI Scan, Column-based scan of 3 cols, Column-based
w/Compression)
11
Speedup from the Index
• If the IO required is cut in half…
– We don’t get to 10 times faster (yet)
– We need to improve the memory utilization and CPU
utilization to get the rest of the speedup
• So how do we improve Query Execution 10x???
12
What takes time in a CPU?
• Memory IO takes time
– Cache Misses stall the CPU
– L2 cache misses stall the CPU even more
– So we reduced cache misses
• Instructions take time
– Instructions also go through the caches
– So we reduced instructions
• Disk Access takes time
– So we biased the memory policies for this index to work
best when in memory
• Over time, CPU speed has increased faster than
memory speed, making all of these worse
13
Query Execution Row Mode Changes
• Each operator calls child for each
row
• This works fine for smaller numbers
of rows, poorly for batches
• In bigger queries, CPU cycles
instructions in and out of the CPU
(L2 cache misses)
• So this model suffers in DW with
too many instructions, too many
cache misses
Batch Format
• Column-Oriented
• Sized to fit within L2 cache
• Multiple Operators work on a
batch sequentially
• Goal: Reduce avg. per-tuple cost
– Compression
– Reducing L2 data and instruction
cache misses
– Probabilistic data representations
– Probabilistic operator execution
algorithms
• This gets us to 10x faster (avg)
15
SQL 2012 Restrictions
• Create index:
– Only on common business data types
Yes
int, real, string, money, datetime, decimal <= 18 digits
No
decimal > 18 digits, binary, BLOB, CLR, (n)varchar(max),
uniqueidentifier, datetimeoffset with precision > 2
• Maintain table: limited operations
– Can read but not update the data
– However: One can switch partitions in and out
• Process queries: all read-only T-SQL queries run
– Some queries are accelerated more than others
16
Using Apollo: Loading new data
• Table with columnstore index can be read, not
updated
– Partition switching is allowed
– INSERT, UPDATE, DELETE, and MERGE not allowed
• Three possible methods for loading data
– Disable, update, rebuild
– Partition switching
– UNION ALL between large table with columnstore and
smaller updateable table
17
Query performance issues
• Not all operators are batch-mode enabled
– Scan, Filter, Project
– Local hash partial aggregation
– Hash inner join, hash table build
• Only parallel queries can use batch mode
• If hash tables don’t fit into memory, fall back to
row-mode processing
– Memory grant request depends on cardinality est.
– Falling back to row-mode is slow
18
Revisiting Our Example Scenario
• For SQL Server 2012, our customer will be able
to:
– Have specific queries go very fast (with less coffee)
– DW Application developers
• Must design their code to load/unload data online
• Can use hints to control user experience for the fast and slow
cases
– Hint index – if it fails to get a plan, then you can present UI to the
user to “maybe go get coffee” and then run in row mode
• This story will continue to improve as we add
more capabilities to Batch processing
19
Summary
• New Index and Execution Algorithms for DW
• Significant speedup for conforming applications
• Opportunities for customers who can build their
code to leverage the benefits
20
Thank You!
• Questions?
Openness and Interoperability @Microsoft
Microsoft and
Open Source
gateway for deeper
exploration of open
source engagements
Port25
Codeplex
blogs from the platform
community and the
OSS Lab teams
resources for
developers and
consumers of open
source projects
http://Port25.technet.com
http://www.codeplex.com
http://www.microsoft.com/
openness
Interoperability
Bridges
technical collaborative
works
http://www.interoperability
bridges.com
Open Up
Shared Source
cross-Industry
Interoperability and
Standards activities
portal for
programmatically
sharing code
http://www.microsoft.com/
interop/openup
http://www.microsoft.com/
sharedsource
OData
Open Spec
BizSpark
open source starter kit
for Internet publishing of
Government datasets
using the Open Data
protocols, file formats,
standards, technical
specifications
Program for Start-Up
companies from both
commercial and open
source backgrounds
http://ogdisdk.cloudapp.net
http://www.microsoft.com/
openspecifications
http://www.microsoft.com/
bizspark
How can I receive up-to-date Openness announcements from Microsoft?
In addition to the websites above, you can receive regular updates to Microsoft’s
openness, interoperability and standards efforts via the following channels:
• http://blogs.technet.com/b/openness/
• http://blogs.msdn.com/b/interoperability/
• http://twitter.com/OpenAtMicrosoft
• http://port25.technet.com
• http://channel9.msdn.com/Blogs/Interoperability
Please rate this lecture
and WIN HTC MOZART!
Help us choose the best Sinergija
lecturer! Telekom Srbija and
Microsoft will award you – at the
conference end, we’ll give one HTC
Mozart WP7 phone to someone
from the audience – randomly.
Go to www.mssinergija.net, log in
and cast your votes.
You can rate only lectures that you were present at, just once. More lectures you rate, more chances you have.
Please use computers at the front of this room, or rate lecture from your phone or home computer, at Sinergija portal.
This prize contest will end at Thursday, October 20th at 9 PM. Winner will be announced at the official Sinergija web portal,
www.mssinergija.net
is a friend of Sinergija 2011
Conference and Imagine Cup
student competition in Serbia.