Agenda Trends in the Data Warehousing Space • Scale more Approximate data volume managed by DW Less than 1TB 21% 18% 1 - 3 TB 17% More than 10

Download Report

Transcript Agenda Trends in the Data Warehousing Space • Scale more Approximate data volume managed by DW Less than 1TB 21% 18% 1 - 3 TB 17% More than 10

Agenda
3
Trends in the Data Warehousing Space
• Scale more
Approximate data volume managed
by DW
Less than 1TB
21%
18%
1 - 3 TB
17%
More than 10 TB
Today
In 3
years
19%
25%
3 - 10 TB
Don't Know
41%
17%
10s of TBs, to 100s of TB,
to PBs
• Performance at scale
interactive
34%
query response
• Data warehousing for masses
2%
6%
0% 10% 20% 30% 40% 50%
Source: TDWI Report – Next Generation DW
Columnstore designed to address above needs
4
In-Memory
Technologies
In-Memory OLTP
•
5-20X performance gain for
OLTP integrated into SQL
Server
In-Memory DW
•
5-25X performance gain
and high data compression
•
Updatable and clustered
SSD Bufferpool
Extension
•
4-10X of RAM and up to 3X
performance gain
transparently for apps
ColumnStore - How is it different ?
Data stored as columns
Data stored as rows
C1
…
6
C2
C3
C4
C5
Columnstore Index Terminology
• Row Group
 Set of rows (typically 1 million)
Row Group
• Column Segment
Column Segment
C1
C2
C3
C4
C5
 Contains values from one
column from the row group
C6
• Segments are compressed
• Each segment stored
separately
• Segment is unit of transfer
between disk and memory
7
ColumnStore Index - Example
OrderDateKey
ProductKey
StoreKey
RegionKey
Quantity
SalesAmount
20101107
106
01
1
6
30.00
20101107
103
04
2
1
17.00
20101107
109
04
2
2
20.00
20101107
103
03
2
1
17.00
20101107
106
05
3
4
20.00
20101108
106
02
1
5
25.00
20101108
102
02
1
1
14.00
20101108
106
03
2
5
25.00
20101108
109
01
1
1
10.00
20101109
106
04
2
4
20.00
20101109
106
04
2
5
25.00
20101109
103
01
1
1
17.00
8
Step-1: Horizontally Partition (create Row Groups)
OrderDateKey
ProductKey
StoreKey
RegionKey
Quantity
SalesAmount
20101107
106
01
1
6
30.00
20101107
103
04
2
1
17.00
20101107
109
04
2
2
20.00
20101107
103
03
2
1
17.00
20101107
106
05
3
4
20.00
20101108
106
02
1
5
25.00
~1M rows
OrderDateKey
ProductKey
StoreKey
RegionKey
Quantity
SalesAmount
20101108
102
02
1
1
14.00
20101108
106
03
2
5
25.00
20101108
109
01
1
1
10.00
20101109
106
04
2
4
20.00
20101109
106
04
2
5
25.00
20101109
103
01
1
1
17.00
9
OrderDateKey
ProductKey
StoreKey
RegionKey
Quantity
SalesAmount
20101107
106
01
1
6
30.00
20101107
103
04
2
1
17.00
20101107
109
04
2
2
20.00
20101107
103
03
2
1
17.00
20101107
106
05
3
4
20.00
20101108
106
02
1
5
25.00
OrderDateKey
ProductKey
StoreKey
RegionKey
Quantity
SalesAmount
20101108
102
02
1
1
14.00
20101108
106
03
2
5
25.00
20101108
109
01
1
1
10.00
20101109
106
04
2
4
20.00
20101109
106
04
2
5
25.00
20101109
103
01
1
1
17.00
OrderDateKey
20101107
20101107
20101107
20101107
20101107
20101108
ProductKey
106
103
109
103
106
106
OrderDateKey
RegionKey
Quantity
SalesAmount
01
1
6
30.00
1
17.00
04
2
03
3
1
1
4
05
02
20101108
106
03
20101109
109
20101109
106
106
103
5
02
102
20101109
2
2
StoreKey
20101108
2
04
ProductKey
20101108
11/7/2015
StoreKey
01
04
04
01
20.00
17.00
20.00
25.00
RegionKey
Quantity
SalesAmount
1
1
14.00
2
1
2
5
1
2
4
1
5
1
25.00
10.00
20.00
25.00
17.00
Some segments will compress more than others
*Encoding and reordering not shown
11
SELECT ProductKey, SUM (SalesAmount)
FROM SalesTable
WHERE OrderDateKey < 20101108
ProductKey
StoreKey
RegionKey
Quantity
SalesAmount
20101107
106
01
1
6
30.00
20101107
103
04
2
1
109
04
2
2
103
03
106
05
20101107
20101107
20101107
20101108
106
2
3
1
02
RegionKey
20101108
ProductKey
StoreKey
1
20101108
102
02
2
03
1
01
2
106
20101109
109
20101109
106
20101109
106
103
04
04
4
5
20.00
17.00
20.00
25.00
OrderDateKey
20101108
1
17.00
2
1
Quantity
1
5
1
4
5
1
SalesAmount
14.00
25.00
10.00
20.00
25.00
01
17.00
Column Elimination
Segment
Elimination
OrderDateKey
• Motivation:
Batch object
• Column store significantly reduces i/o required.
• Next bottleneck is CPU usage.
• Batch processing addresses CPU usage.
bitmap of qualifying rows
Column vectors
• Functionality:
C1
C2
C3
•
•
•
•
•
Batch = columnar format + filter vector.
Moving “set of rows” - batch (~900 rows).
Batch moved between iterators.
Near-zero data copying with slight batch updates.
# of function calls reduced orders of magnitude.
13
Trends In Data Warehousing Space
How Does Columnstore Work?
What’s New In Columnstore?
Demo
In Summary
11/7/2015
14
Clustered Columnstore Index
Space Used in GB (101 million row table)
20.0
15.0

Columnstore
PREFERRED
storage engine for DW scenarios
10.0
5.0
91%
savings
0.0
** Space Used = Table space + Index space
16
C1
C1
C2
C3
C4
C5
C2
C3
C4
C5
C6
C6
tuple mover
Column
Store
Delta (row)
store
Updatable
17
Improved Query Performance
•
•

•



Note:
Distinct aggregates and UNION operators continue to
be executed in row mode.
18
223.9
1000
100
1.3
7.5
1.5
1.6
Query 15
-1.0
Query 14
10.3
Query 13
3.2
Query 12
1.7
Query 4.5
1.7
Query 4
22.7 31.0
Query 3.5
92.1
Query 3
10
3.9
Row Store
Column Store
Query 17
Query 16
Query 10
Query 9
Query 8
Query 5
Query 2
1
345
295
245
195
145
95
45
-5
Improvement Factor
302.4
Query 1
Response Time (s)
10000
Improvement
Row Store vs. Column Store
5.7
PowerMetric
Power
Metric
Total Execution Time
19
2
Competitive Compression
Table compression options:
DATA_COMPRESSION = { NONE | ROW | PAGE | COLUMNSTORE | COLUMNSTORE_ARCHIVE }
1. COLUMNSTORE Compression
• Default compression when creating a table with Clustered Columnstore Index
• Typical customer workloads gets 5-7x compression ratios
2. ARCHIVAL Compression
TPCH
TPCDS
Customer 1
Customer 2
3.1X
2.8X
8X
5.5X
** compression measured against raw data file
• Enables additional 30% compression for whole table and/or chosen partitions, with CPU overhead.
• Going back and forth between columnstore and columnstore_archive compressions.
• sys.partitions exposes compression info (3 – columnstore, 4 – columnstore_archive)
20
Index Build:
Creates clustered columnstore index.
from CI
CREATE CLUSTERED COLUMNSTORE INDEX …
// from HEAP
CREATE CLUSTERED COLUMNSTORE INDEX … WITH (DROP_EXISTING = ON)
//
Index Rebuild:
Re-creates clustered columnstore index completely.
ALTER TABLE … REBUILD
ALTER INDEX … REBUILD
CREATE CLUSTERED COLUMNSTORE INDEX … WITH (DROP_EXISTING = ON)
Reorganize:
Forces delta store operation.
ALTER INDEX … REORGANIZE
row groups
… REORGANIZE WITH (COMPRESS_ALL_ROW_GROUPS = ON)
21
// compresses closed
// compresses all row groups
Columnstore Index: DMVs
22
Columnstore Index: Data Load
• Loading performance comparable to loading into CI (actually, load is a bit faster to CCI )
• Load data directly into CCI (presort data file if possible)
23
11/7/2015
25
http://www.trySQLSever.com
http://www.powerbi.com
http://microsoft.com/bigdata
http://channel9.msdn.com/Events/TechEd
www.microsoft.com/learning
http://microsoft.com/technet
http://microsoft.com/msdn