Transcript SQL Server

SQL Server Parallel Data Warehouse:
Supporting Large Scale Analytics
José Blakeley, Software Architect
Database Systems Group, Microsoft Corporation
SQL Server PDW Overview
2
JHU DIR March 2011
3/18/2011
Workload Types
Online Transaction Processing (OLTP)







Balanced read-update ratio (60%-40%)
Fine-grained inserts and updates
High transaction throughput e.g., 10s K/s
Usually very short transactions e.g., 1-3 tables
Sometimes multi-step e.g., financial
Relatively small data sizes e.g., few TBs
Day-to-day
business
Data Warehousing and Business Analysis (DW)








3
Read-mostly (90%-10%)
Few updates in place, high-volume bulk inserts
Concurrent query throughput e.g., 10s K / hr
Per query response time < 2 s
Snowflake, star schemas are common e.g., 5-10 tables
Complex queries (filter, join, group-by, aggregation)
Very large data sizes e.g., 10s TB - PB
JHU DIR March 2011
Analysis over
historical data
3/18/2011
SQL Server Parallel Data Warehouse
Shared-nothing, distributed, parallel DBMS



Built-in data and query partitioning
Provides single system view over a cluster of SQL Servers
Appliance concept



Software + hardware solution
Choice of hardware vendors (e.g., HP, Dell, NEC)
Optimized for DW workloads



Bulk loads (1.2 – 2.0 TB/hr)
Sequential scans (700 TB in 3hr)
Scale from 10 Terabytes to Petabytes



4
1 rack manages ~40 TB
1 PB will need ~25 racks
JHU DIR March 2011
3/18/2011
Hardware Architecture
Compute Nodes
Control Nodes
SQL
Active / Passive
SQL
SQL
SQL
SQL
SQL
SQL
ETL Load
Interface
Corporate
Backup
Solution
5
SQL
Dual Fiber Channel
Data Center
Monitoring
SQL
Dual Infiniband
Client Drivers
(ODBC, OLEDB, ADO.NET)
Spare Compute Node
2 Rack Appliance
JHU DIR March 2011
3/18/2011
Software Architecture
Query
Tool
MS BI
(AS, RS)
DWSQL
Internet
Explorer
3rd Party
Tools
IIS
Data Access
(OLEDB, ODBC, ADO.NET, JDBC)
Admin
Console
Compute Node
Compute Nodes
Compute Nodes
Data Movement Service
Data
Movement
Service
PDW Engine
User Data
SQL Server
Landing Zone Node
DW
Authentication
DW
Configuration
DW
Schema
TempDB
Data Movement Service
SQL Server
Control
Node
6
JHU DIR March 2011
3/18/2011
Key Software Functionality
PDW Engine









Provides single system image
SQL compilation
Global metadata and appliance
configuration
Global query optimization and plan
generation
Global query execution
coordination
Global transaction coordination
Authentication and authorization
Supportability (HW and SW status
info via DMVs)



Parallel Loader

Runs from the Landing Zone

SSIS or command line tool
Parallel Database Copy

High performance data export

Enables Hub-Spoke scenarios
Parallel Backup/Restore

Backup files stored on Backup
Nodes

Backup files may be archived into
external device/system
Data Movement Service



7
Data movement across the
appliance
Distributed query execution
operators
JHU DIR March 2011
3/18/2011
Query Processing
SQL statement compilation


Parsing, validation, optimization
Builds an MPP execution plan




A sequence of discrete parallel QE “steps”
Steps involve SQL queries to be executed by SQL Server at
each compute node
As well as data movement steps
Executes the plan



Coordinates workflow among steps
Assembles the result set
Returns result set to client

8
JHU DIR March 2011
3/18/2011
18,000,048,306 rows
Example DW Schema
SELECT TOP 10
L_ORDERKEY,
SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT))
AS REVENUE,
O_ORDERDATE,
O_SHIPPRIORITY
FROM
CUSTOMER,
ORDERS,
LINEITEM
30,000,000
WHERE C_MKTSEGMENT = 'BUILDING' AND
C_CUSTKEY = O_CUSTKEY AND
L_ORDERKEY = O_ORDERKEY AND
O_ORDERDATE < ‘2010-03-05' AND
L_SHIPDATE > ‘2010-03-05'
GROUP BY L_ORDERKEY,
O_ORDERDATE,
O_SHIPPRIORITY
ORDER BY REVENUE DESC,
O_ORDERDATE
rows
4,500,000,000 rows
600,000,000 rows
2,400,000,000 rows
25 rows
9
5 rows
JHU DIR March
2011
3/18/2011
7/17/2015
450,000,000 rows
Example – Schema TPCH
----------------------------------------------------------------------- Customer Table
-- distributed on c_custkey
---------------------------------------------------------------------CREATE TABLE customer
( c_custkey
bigint,
c_name
varchar(25),
c_address
varchar(40),
c_nationkey integer,
c_phone
char(15),
c_acctbal
decimal(15,2),
c_mktsegment char(10),
c_comment
varchar(117))
WITH (distribution=hash(c_custkey)) ;
----------------------------------------------------------------------- Orders Table
---------------------------------------------------------------------CREATE TABLE orders
( o_orderkey
bigint,
o_custkey
bigint,
o_orderstatus
char(1),
o_totalprice
decimal(15,2),
o_orderdate
date,
o_orderpriority char(15),
o_clerk
char(15),
o_shippriority integer,
o_comment
varchar(79))
WITH (distribution=hash(o_orderkey)) ;
10
----------------------------------------------------------------------- LineItem Table
-- distributed on l_orderkey
---------------------------------------------------------------------CREATE TABLE lineitem
( l_orderkey
bigint,
l_partkey
bigint,
l_suppkey
bigint,
l_linenumber bigint,
l_quantity
decimal(15,2),
l_extendedprice decimal(15,2),
l_discount
decimal(15,2),
l_tax
decimal(15,2),
l_returnflag char(1),
l_linestatus char(1),
l_shipdate
date,
l_commitdate date,
l_receiptdate date,
l_shipinstruct char(25),
l_shipmode
char(10),
l_comment
varchar(44))
WITH (distribution=hash(l_orderkey)) ;
JHU DIR March 2011
3/18/2011
Example - Query
Ten largest “building” orders shipped since March 5, 2010
SELECT TOP 10 L_ORDERKEY,
SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS REVENUE,
O_ORDERDATE,
O_SHIPPRIORITY
FROM CUSTOMER, ORDERS, LINEITEM
WHERE C_MKTSEGMENT = 'BUILDING' AND
C_CUSTKEY = O_CUSTKEY AND
L_ORDERKEY = O_ORDERKEY AND
O_ORDERDATE < ‘2010-03-05' AND
L_SHIPDATE > ‘2010-03-05'
GROUP BY L_ORDERKEY,
O_ORDERDATE,
O_SHIPPRIORITY
ORDER BY REVENUE DESC,
O_ORDERDATE
11
JHU DIR March 2011
3/18/2011
Example – Execution Plan
------------------------------- Step 1: create temp table at control node
-----------------------------CREATE TABLE [tempdb].[dbo].[Q_[TEMP_ID_664]]
( [l_orderkey] BIGINT, [REVENUE] DECIMAL(38, 4),
[o_orderdate] DATE, [o_shippriority] INTEGER );
------------------------------- Step 2: create temp tables at all compute nodes
-----------------------------CREATE TABLE
[tempdb].[dbo].[Q_[TEMP_ID_665]_[PARTITION_ID]]
( [l_orderkey] BIGINT, [l_extendedprice] DECIMAL(15, 2),
[l_discount] DECIMAL(15, 2), [o_orderdate] DATE,
[o_shippriority] INTEGER, [o_custkey] BIGINT,
[o_orderkey] BIGINT )
WITH ( DISTRIBUTION = HASH([o_custkey]) );
-------------------------------- Step 3: SHUFFLE_MOVE
-------------------------------SELECT [l_orderkey], [l_extendedprice], [l_discount],
[o_orderdate], [o_shippriority], [o_custkey], [o_orderkey]
FROM [dwsys].[dbo].[orders] JOIN [dwsys].[dbo].[lineitem]
ON ([l_orderkey] = [o_orderkey])
WHERE ([o_orderdate] < ‘2010-03-05'
AND [o_orderdate] >= ‘2010-09-15 00:00:00.000')
INTO Q_[TEMP_ID_665]_[PARTITION_ID]
SHUFFLE ON (o_custkey);
12
------------------------------- Step 4: PARTITION_MOVE
-----------------------------SELECT [l_orderkey],
sum(([l_extendedprice] * (1 - [l_discount]))) AS REVENUE,
[o_orderdate], [o_shippriority]
FROM [dwsys].[dbo].[customer] JOIN
tempdb.Q_[TEMP_ID_665]_[PARTITION_ID]
ON ([c_custkey] = [o_custkey])
WHERE [c_mktsegment] = 'BUILDING'
GROUP BY [l_orderkey], [o_orderdate], [o_shippriority]
INTO Q_[TEMP_ID_664];
------------------------------- Step 5: Drop temp tables at all compute nodes
-----------------------------DROP TABLE tempdb.Q_[TEMP_ID_665]_[PARTITION_ID];
-------------------------------- Step 6: RETURN result to client
-------------------------------SELECT TOP 10 [l_orderkey], sum([REVENUE]) AS REVENUE,
[o_orderdate], [o_shippriority]
FROM tempdb.Q_[TEMP_ID_664]
GROUP BY [l_orderkey], [o_orderdate], [o_shippriority]
ORDER BY [REVENUE] DESC, [o_orderdate] ;
-------------------------------- Step 7: Drop temp table at control node
-------------------------------DROP TABLE tempdb.Q_[TEMP_ID_664];
JHU DIR March 2011
3/18/2011
Microsoft Column-store
Technology
VertiPaq and VertiScan
In-memory BI (IMBI)
Slides by Amir Netz
15
JHU DIR March 2011
3/18/2011
In-Memory BI Technology


Developed by SQL Analysis Services (OLAP) team
Column-based storage and processing


Compression (VertiPaq)


Only touch the columns needed for the query
Columnar data is more compressible than row data
Fast in-memory processing (VertiScan)

16
Filter, grouping, aggregation, sorting
JHU DIR March 2011
3/18/2011
How VertiPaq Compression Works
Read Raw Data
Organize by Columns
Phase I: Encoding
Convert to uniform representation
(Integer Vectors)
Encoding is per column
Dictionary
Encoding
2x – 10x size reduction
Value Encoding
1x – 2x size reduction
VertiPaq
Compression
Compression
Analysis
Hybrid RLE
Phase II: Compression
Minimize storage space
Compression is per 8M row segments
17
75%-95% of data
Run Length
Encoding (RLE)
JHU size
DIRreduction
March 2011
~100x
5%-25% of data
Bit Packing
2x – 4x size reduction
3/18/2011
436,892,631 rows
TECSPURSL00 (dbo) (Read-only)
APPX – 1TB
FTxlatOrgId
CTxlatOrgId
TRCreditedSubsidiaryId
BillingCurrencyID
Star Join
Schema
34 rows
Region (dbo) (Read-only)
ProductId
DataSourceId
SalesDateId
AreaId
BillingMonthSalesDateID
RegionCode
ActualQuantityCnt
RegionName
ActualLicenseCnt
SecondaryLicenseCnt
ActualRevenueAmt
AdjustedGrossRevenueAmt
JointVentureRevenueAmt
LicenseTransactionItemId
GeographyId
JointVentureActualQuantityCnt
JointVentureActualLicenseCnt
JointVentureSecondaryLicenseCnt
BillDocTypeID
SalesOfficeID
ReasonID
BillingStatusID
TopParentProductID
ParentProductID
BundleFlagID
ExtractListID
ServiceRevenueAmt
ServiceActualQuantityCnt
ServiceAdjustedRevenueAmt
FiscalYearName
SalesDate (dbo) (Read-only)
SalesDateId
FiscalWeekID
FiscalMonthID
FiscalQuarterID
FiscalYearID
FiscalQuarterName
FiscalMonthName
FiscalYearName
CalendarMonthName
FWBeginDate
FWEndDate
CalendarDate
CalendarDateName
MonthRelativeID
FMBeginDate
FMEndDate
TransactionDataPopulated
FiscalSemesterID
FiscalSemesterName
AggregatedSalesDateID
RevenueCommitmentMultiplier
LicenseCommitmentMultiplier
CommitmentPeriod
DataSourcePlanningFlag
AdvisorID
JHU DIR March 2011
FiscalYearId
RecordTypeId
AdjustedLicenseCnt
18
FiscalYear (dbo) (Read-only)
ManagementReportingId
RegionId
SELECT FE0.RegionName,
FL0.FiscalYearName,
SUM (A.ActualRevenueAmt)
UpperGeography (dbo) (Read-only)
CreditedSubsidiaryID
FROM TECSPURSL00 A
CreditedSubRegionID
JOIN SalesDate L
CreditedRegionID
ON A.SalesDateID = L.SalesDateID
CreditedAreaID
CreditedBigAreaID
JOIN UpperGeography UG
ON A.TRCreditedSubsidiaryId =
UG.CreditedSubsidiaryID
118 rows
JOIN Region FE0
ON UG.CreditedRegionID =
FE0.RegionID
JOIN FiscalYear FL0
ON L.FiscalYearID = FL0.FiscalYearID
GROUP BY FE0.RegionName, FL0.FiscalYearName
41 rows
OrderStatusID
13,517 rows
3/18/2011
7/17/2015
Column-Store on APPX
Time in seconds
SQL Server 2008 vs. CS index Comparison
9.00
8.00
7.00
6.00
5.00
4.00
3.00
2.00
1.00
0.00
Q1
SQL 0.34
IMBI 1.54


Q2
0.51
1.15
Q4
0.36
0.87
Q4
2.67
0.73
Q5
2.05
0.85
Q6
0.27
0.84
Q7
0.74
0.89
Q8
2.64
1.13
Q9
8.52
5.41
Q10
3.15
0.89
Q11
2.12
1.68
Q12
2.29
1.31
Response time < 2s common
Smaller variance in response time  more predictable query
performance
19
JHU DIR March 2011
3/18/2011
THANKS!
20
JHU DIR March 2011
3/18/2011