Transcript Document
Boosting DWH-Performance with
SQL Server 2014
ColumnStore Index,
In-Memory Tables &
Natively Compiled Stored Procedures
Introduction
Markus Ehrenmüller-Jensen
Business Intelligence Architect
@MEhrenmueller
[email protected]
runtastic
Pluskaufstraße 7
4061 Pasching
Austria, Europe
http://www.runtastic.com
SQL Server 2005 - 2014
BI Developer
Database Developer
Database Admin
Agenda
Introduction
Disk vs. In-Memory
Column Store
Columnar Storage
In-Memory OLTP
In Memory but durable
Native Compiled Stored Procedures
Procedures as DLL
Turbo-DWH
Case Study
Sneak Preview
In-Memory in SQL Server
Cache
Buffer Pool
Pin-to-Memory
DBCC PINTABLE
ColumnStore
(Non-)Clustered Index
SSAS Tabular
Power Pivot
In-Memory OLTP
Memory Optimized Table
Natively Compiled Stored Procedure
Ordinary Report
http://www.flickr.com/photos/marts-pics/3554153954/sizes/l/in/photostream/
ColumnStore Index
http://commons.wikimedia.org/wiki/User:Malene
Improve Query Performance with ColumnStore Index
DEMO
How?
xVelocity (VertiPaq, PowerPivot, BISM Tabular)
Compression (7x)
Column-Elimination
Segment-Elimination
Parallel Read Operations
Query Processor in Batch-Mode
Typically 10x faster
RowStore vs. ColumnStore
Page 1 of
row store
Page 2 of
row store
Key AlternateKey Name
1 AR-5381
Adjustable Race
2 BA-8327
Bearing Ball
3 BE-2349
Ball Bearing Cage
4 BE-2908
Ball Bearing Grease
5 BL-2036
Blade
6 CA-5965
LL Crankarm
7 CA-6738
ML Crankarm
Segment
for
column 1
Segment
for
column 2
Segment
for
column 3
Stock
1000
1000
800
800
800
500
500
Segment
for
column 4
Use Cases
> 1 Mio rows
Aggregations, groupings & filters (DWH/OLAP)
Write once, read multiple times
Less distinctive values per column
Sweet spot
Design (eg. matching data type, no functions), star &
snowflake schema, inner joins
Can substitute datamarts/aggregation-tables
ROLAP & Tabular Model DirectQuery
Query Optimizer includes CS Index
Restrictions
Data types:
ntext, text, and image, vardecimal, varchar(max) and
nvarchar(max), rowversion (and timestamp), sql_variant, CLR types
(hierarchyid and spatial types), xml, uniqueidentifier
Page/Row compression
Replication
Change Tracking, Change Data Capture
Filestream
Enterprise Edition only
ColumnStore Index
Column Store
Power Pivot
Analysis
Services
SQL Server
Non clustered
Index
SQL 2012+
Clustered
Index
SQL 2014+
Nonclustered ColumnStore Index (NCI)
May combined with other indices
Decide which column to include
Only one NCI per table
Reduntant storage
No sort order
SQL Server 2012+
NCI Restrictions
No constraints allowed (unique)
Indexed table is read-only
NCI Best Practice
Memory, Memory, Memory
Include all columns
MAXDOP > 1
Fact-tables and big dimension tables
Update/Insert
Disable/enable index
Partitioning
View/Union all
Clustered ColumnStore Index (CCI)
Physical columnare storage (instead of row based)
No reduntant storage
All columns included automatically
No other index allowed
UPDATE-able
Supports more data types
Swichting between ROW-mode and BATCH-mode allowed
SQL Server 2014+
CCI: Structure
CREATE CLUSTERED
COLUMNSTORE INDEX
Deleted Bitmap
Columnstore
BULK INSERT
INSERT
DELETE
REORGANIZE
Deltastore(s)
REBUILD
CCI: Deltastore
Ordinary Rowstore
OPEN / CLOSED / (COMPRESSED)
Compression is expensive operation
Tuple-Mover (every 5 minutes)
CCI: Restrictions
No Constraints (unique, primary, foreign key)
No Triggers
Not all datatypes supported
No ISOLATION LEVEL SNAPSHOT
Enterprise Edition only
ColumnStore Index
Non-clustered
Clustered
SQL Server 2012+
Additional index (redundancy)
Read-only
Subset of columns
Max. one NCI per table
SQL Server 2014+
Master
Update-able
All columns
No additional index allowed
In-Memory Myth Buster
In-Memory OLTP ia a response to competitors
Microsoft worked on this technology since 2009
In-Memory OLTP is like DBCC PINTABLE
PINTABLE bounds pages to buffer pool; In-Memory OLTP
does neither have a buffer-pool nor pages
In-Memory OLTP is a separated product
In-Memory OLTP is fully integrated in SQL Server
In-Memory OLTP can be used without app-changes
Small (schema) changes are necessary
In-Memory tables are nor durable or high available
Schema and data are durable; AlwaysOn is supported
Disk-based storage (not In-memory)
Page based
„Latch contention“ during concurrent access
Random IO when writing on disk
Transactionslog
Every change to the data and all indices
UNDO with rollback
Memory Optimized Tables
Based on Filestream
Table converted into a DLL
Data files
16MB (<=16GB memory) or 128MB
256KB chunks
Inserted rows only
Chronological row versions
Delta files
1MB (<=16GB memory) or 8 MB
4KB chunks
IDs of deleted rows
In-Memory Rows
New row format
Optimized for in-memory
No data page container
Versioning
No direct UPDATE
But DELETE & INSERT instead
Non-blocking
SELECT, UPDATE, …
Database
CREATE DATABASE MyDatabase
GO
ALTER DATABASE MyDatabase
ADD FILEGROUP MyIMFilegroup
CONTAINS MEMORY_OPTIMIZED_DATA
GO
ALTER DATABASE MyDatabase
ADD FILE
(name='MyIMFile',filename='C:\DATA\MyIMFile')
TO FILEGROUP MyIMFilegroup
GO
Memory Optimized Table Filegroup
Durability by default
Index is never durable
Neither in data file nor in transaction-log
Write-access sequentially only
No Random IO
Write
Transaction-log: synchronous
Datafile: asynchronous
Option: SCHEMA_ONLY
No writes to transaction-log not durable
ETL, temporary tables, …
Optimized for In-Memory
No latches, no locks
No pages
Direct access to rows
No Heaps
All indecis are covering
Multiversion optimistic concurrency
Durability
Durability by defulat
Deactivate explicitly for eg. ETL
Succesfully (comitted) transactions in log only
No UNDO necessary (only sequentially)
SSD recommended
In-Memory Filegroup
Sequential access only (appending)
SSD recommended
Create multiple files (container) recovery
Indexing
In-Memory only
Never on disk
Rebuild during server restart & database restore
No fragmentation
All indices are covering
Direct pointer to in-memory row
Not supported
Filtered Index, nullable key, …
Max. 8 indices per table
(N)VARCHAR must use _BIN2-collation
Case sensitive
Accent sensitive
Hash or Range
Index
Hash
Range
Multi-column index: All
columns used for hashing
(no scan for single columns
possible)
Index is no sorted
Size determined during
creation through
BUCKET_COUNT (rounded
to binary value)
Multi-column index: Similar
to non-clustered indices of
disk-based tables
Sort order is crucial
Index can grow
Unequal & sorted scans
Multiversion
Adress
4711
Adress
4711
4712
Timestamp Timestamp Link to Content
From
Until
next
100
∞
null
Hello!
Timestamp
From
100
200
Timestamp
Until
200
∞
Link to Content
next
4712
Hello!
null
Hello World!
Multi-Version Optimistic Concurrency
Multi-Version
Timestamp
As long as a version is needed
Pointer to next version
Optimistic
No locks
Synchronous UPDATE ROLLBACK
READ is never blocked
Garbage Collection
Old row version
Slows down access (xcan)
Occupy memory
Automatically
During transaction
Separate system job
Lowest active timestamp
Every transaction has a timestamp
Restrictions
Whole table has to fit into memory
Max. 8060 Bytes per row (size of disk page)
Max 250GB per table
One-stop CREATE TABLE
Columns, primary key, indices
No ALTER TABLE, no CREATE INDEX
Not supported
Varchar(max), SPARSE column, DBCC CHECKTABLE, XML, CLR,
…
Workarounds for
foreign key constraints, constraints, triggers, …, SQL_VARIANT,
LOB, …, ALTER TABLE/INDEX, CREATE INDEX
Must have
Primary Key, Index on Primary Key
Use Cases
Performance-critical OLTP
Orders, stock exchange, live bets
High Data-Input Rate
Shock-absorber
Staging for ETL
Relational Cache
Session State
No Use Case
Schema-changes not allowed
Full DWH
Reporting
ColumnStore
XML/Fulltext Search
Parallel queries
Best Practice
Sufficient Memory
Ca. 4x the size of the table
Try out with productive environment
Though not on productive system
Multiple data files
Can be read in parallel (restart/restore)
IO spread on several disks
Statistics
Are not maintained automatically
UPDATE STATISTICS <tbl> WITH FULLSCAN, NORECOMPUTE
Use VIEWS for access
ALTER VIEW is allowed
Natively Compiled Stored Procedures
T-SQL C DLL
Execution Plan is created during compilation
Won‘t be updated later automatically
DROP/CREATE necessary
Lives in SQL Server Process
Performance >10x
Use Case
CPU is bottleneck
Latence is problem
Query/DML NCSP helpful
Validation/COMMIT IO-System to be optimized
instead
You can live with the restrictions
Native Compilation Advisor
Use NCSP only for parts of your implementation
Restriction
Memory optimized tables only
SCHEMABINDING mandatory
T-SQL
DISTINCT, CTE, RANK, EXISTS, IN, Sub-Queries,
OUTER-JOIN, …, ROLLUP, EXTERNAL, …
Built-in functions
EXECUTE AS CALLER
No parallel operation
Workarounds
SELECT INTO, CASE, GOTO, …
Cursors, table-valued functions, UNION, OR, NOT, …
Best Practice
Avoid mixture of compiled & non-compiled
precedures
Low efficiency
Might be less performant then non-compiled at all
As much logic into NCSP as possible
Single-Statement precedures do not benefit
ColumnStore vs. In-Memory OLTP
DWH vs. OLTP
Aggregation vs. Single Rows
Disk vs. Memory
Compressed vs. uncompressed
Peformance-Sample
Staging-Load
After TRUNCATE/DELETE
Initial-Load DWH
DWH-Load without changes
DWH-Load after changed column in all rows
DWH-Load with empty staging table
DWH-typical query
Performance-Measurement
InMemory has no IO
Elapsed time instead
AdventureWorksDW2012..FactResellerSales
Blown up to 6 Mio rows
Challange for ETL
Non-Clustered ColumnStore Index
Read only, no UPDATE/DELETE
Memory Optimized Tables
No TRUNCATE
Collation *_BIN2 only
No automatic UPDATE of STATISTICS
No UPDATE on Primary Key (eg. SCD2/ValidUntil)
Natively Compiled Procedures
Memory Optimized Tables only
Restriction in WHERE: no OR, no IN, …
MEhrenmueller‘s Assumptions
Staging
Memory Optimized Table
SCD1
Clustered Column Store Index
SCD2
Clustered Column Store Index
Answer of Puzzle
Disk Heap
Disk CCI
Disk PK (SSIS)
Disk Heap
Disk Heap
Disk Heap
Disk Heap
Disk Heap
MOT
Disk PK NCI
Disk Heap
SQL Server v2016
UPDATE-able Nonclustered ColumnStoreIndex
In-Memory ColumnStore Index
Call to Action
Try
Non-Clustered Column Store Index
Clustered Column Store Index
In-Memory Table
Natively Compiled Stored Procedure
On
Staging, DWH, Datamarts
Wrap up
Introduction
Disk vs. In-Memory
Column Store
Columnar Storage
In-Memory OLTP
In Memory but durable
Native Compiled Stored Procedures
Procedures as DLL
Turbo-DWH
Case Study
Questions?
Markus Ehrenmüller-Jensen
Business Intelligence Architect
@MEhrenmueller
[email protected]
runtastic
Pluskaufstraße 7
4061 Pasching
Austria, Europe
http://www.runtastic.com
SQL Server 2005 - 2014
BI Developer
Database Developer
Database Admin