Transcript SQL Server

SQL Server: A Data Platform for
Large-Scale Applications
José Blakeley, Software Architect
Database Systems Group, Microsoft Corporation
SQL Server Design Philosophy
To solve 90+% of the problems automatically






Self Tuning
Dynamic Optimization
Self Configuration
Self Management
Self Healing
The remaining percentage takes higher touch



Requires careful design for scalability
Good knowledge of how the RDBMS platform works
This talk describes how we are codifying our solution for
large-scale DW into SQL Server

2
UCI ISG Seminar
1/8/2010
SQL Data Platform
Any Place, Any Type, Any Scale
Enterprise Data Platform
Beyond Relational
Dynamic Development
Pervasive Insight
4
UCI ISG Seminar
1/8/2010
Outline
 SQL
Server Data Platform
 Basic concepts – common ground


Assumptions
Workloads
SQL Server Parallel DW DBMS




Philosophy
System Architecture
Software Architecture
Summary

5
UCI ISG Seminar
1/8/2010
Workload Types
Online Transaction Processing (OLTP)







Balanced read-update ratio (60%-40%)
Fine-grained inserts and updates
OLTP =throughput
Day-to-day e.g., 10s K/s
High transaction
business
Usually very short transactions e.g., 1-3 tables
Sometimes multi-step e.g., financial
Relatively small data sizes e.g., few TBs
Data Warehousing and Business Analysis (DW)








7
Read-mostly (90%-10%)
Few updates in place, high-volume bulk inserts
Concurrent query throughput e.g., 10sDW
K / hr
= Analysis over
Per query response time < 2 s
recorded data
Snowflake, star schemas are common e.g., 5-10 tables
Complex queries (filter, join, group-by, aggregation)
Very large data sizes e.g., 10s TB - PB
UCI ISG Seminar
1/8/2010
436,892,631 rows
Star Join Schema
TECSPURSL00 (dbo) (Read-only)
FTxlatOrgId
CTxlatOrgId
TRCreditedSubsidiaryId
BillingCurrencyID
34 rows
Region (dbo) (Read-only)
ProductId
DataSourceId
FiscalYear (dbo) (Read-only)
FiscalYearId
FiscalYearName
ManagementReportingId
RecordTypeId
RegionId
SalesDateId
AreaId
BillingMonthSalesDateID
RegionCode
SELECT FE0.RegionName,
RegionName
FL0.FiscalYearName,
SUM (A.ActualRevenueAmt)
FROM TECSPURSL00 A
JOIN SalesDate L
ON A.SalesDateID = L.SalesDateID
UpperGeography (dbo) (Read-only)
JOIN UpperGeography UG
CreditedSubsidiaryID
CreditedSubRegionID
ON A.TRCreditedSubsidiaryId =
CreditedRegionID
UG.CreditedSubsidiaryID
CreditedAreaID
CreditedBigAreaID
JOIN Region FE0
ON UG.CreditedRegionID =
FE0.RegionID
118 rows
JOIN FiscalYear FL0
ON L.FiscalYearID = FL0.FiscalYearID
GROUP BY FE0.RegionName, FL0.FiscalYearName
41 rows
ActualQuantityCnt
ActualLicenseCnt
SecondaryLicenseCnt
AdjustedLicenseCnt
ActualRevenueAmt
AdjustedGrossRevenueAmt
JointVentureRevenueAmt
LicenseTransactionItemId
GeographyId
JointVentureActualQuantityCnt
JointVentureActualLicenseCnt
JointVentureSecondaryLicenseCnt
BillDocTypeID
SalesOfficeID
ReasonID
BillingStatusID
TopParentProductID
ParentProductID
BundleFlagID
ExtractListID
ServiceRevenueAmt
ServiceActualQuantityCnt
ServiceAdjustedRevenueAmt
SalesDate (dbo) (Read-only)
SalesDateId
FiscalWeekID
FiscalMonthID
FiscalQuarterID
FiscalYearID
FiscalQuarterName
FiscalMonthName
FiscalYearName
CalendarMonthName
FWBeginDate
FWEndDate
CalendarDate
CalendarDateName
MonthRelativeID
FMBeginDate
FMEndDate
TransactionDataPopulated
FiscalSemesterID
FiscalSemesterName
AggregatedSalesDateID
RevenueCommitmentMultiplier
LicenseCommitmentMultiplier
CommitmentPeriod
DataSourcePlanningFlag
AdvisorID
8
UCI ISG Seminar
OrderStatusID
13,517 rows
1/8/2010
7/7/2015
Why look at workloads?
$B

8
6
OLTP
Mixed

DW
4
2

1995
9
2000
2005
2010

DBMS architectures of 30
years ago designed for
mixed workloads
OLTP and DW workloads
have grown into big market
segments
Workload-specific engines
can be profitable
Innovations are driven by
deeper understanding of
workloads
UCI ISG Seminar
1/8/2010
Changing TPC-H Landscape


68x perf
12% cost
New DW startups disrupting TPC-H
Column-store, compression, and scale-out
10
UCI ISG Seminar
1/8/2010
Data Warehousing
SQL Server 2008

Provides out-of-the-box SMP scale for ~10 TB






Many successful apps for 10-100 TB running today via
custom scale-out



Data compression – row and page
Star join, bitmap filters, partitioned table parallelism
Minimally Logged INSERT
MERGE
Resource governor – CPU and memory
E.g., Danske Bank, Clalit Health, US Dep. of Agriculture, PanSTARRS
Building out-of-the-box scale-out for 10s-100s of TB
This is the focus of this talk
11
UCI ISG Seminar
1/8/2010
Outline
 SQL
Server Data Platform
 Basic concepts – common ground


Assumptions
Workloads
 SQL




Server Parallel DW DBMS
Design Philosophy
Hardware Architecture
Software Architecture
Summary
12
UCI ISG Seminar
1/8/2010
Design Philosophy

Shared-nothing, distributed, parallel DBMS


Appliance concept



Implicit data and function partitioning
Software + hardware solution
Optimized for DW workloads
Scalable to Petabytes
13
UCI ISG Seminar
1/8/2010
Appliance Concept
Parallel DW
DBMS
System
Software
INDUSTRY STANDARD
SERVERS
Reference
Hardware
Platforms
INDUSTRY STANDARD
NETWORKING
INDUSTRY STANDARD
STORAGE
14
UCI ISG Seminar
1/8/2010
Hardware Architecture
Compute Nodes
Control Nodes
SQL
Active / Passive
SQL
SQL
SQL
SQL
SQL
SQL
ETL Load
Interface
Corporate
Backup
Solution
15
SQL
Dual Fiber Channel
Data Center
Monitoring
SQL
Dual Infiniband
Client Drivers
(ODBC, OLEDB, ADO.NET)
Spare Compute Node
1 Rack Server
Appliance
UCI ISG Seminar
1/8/2010
Node Types

Control node:




Where clients apps connect
Parallel engine runs here
Contains system metadata


Store user data
Perform query execution
Not accessible to outside world
Landing Zone:



Staging place for data loading
Accessible to outside world
Can be augmented with 3rd
party HW and SW
Backup node:



Compute nodes:




Backup file storage
Accessible to outside world
Can be augmented with 3rd
party HW and SW
Management node:



Windows domain controller
(Active Directory)
SW upgrades staging place
Holds SW images in case a
node needs reimaging
Software Architecture
Query
Tool
MS BI
(AS, RS)
DWSQL
Other
3rd Party
Tools
Internet
Explorer
IIS
Data Access
(OLEDB, ODBC, ADO.NET, JDBC)
Admin
Console
Compute Node
Compute Nodes
Compute Nodes
Data Movement Service
Data
Movement
Service
MPP Engine Coordinator
User Data
SQL Server
Landing Zone Node
DW
Authentication
DW
Configuration
DW
Schema
TempDB
Data Movement Service
SQL Server
Control
17 Node
UCI ISG Seminar
1/8/2010
Key Components

MPP Engine Coordinator









Provides single system image
SQL compilation
Global metadata and appliance
configuration
Global query optimization and plan
generation
Global query execution
coordination
Global transaction coordination
Authentication and authorization
Supportability (HW and SW status
info)



Parallel Loader

Run from the Landing Zone

SSIS or command line tool
Parallel Database Copy

High performance data export

Enables Hub-Spoke scenarios
Parallel Backup/Restore

Backup files stored on Backup
Nodes

Backup files may be archived into
external device/system
Data Movement Service


18
Data movement across the
appliance
Distributed query execution
operators
UCI ISG Seminar
1/8/2010
Query Processing

SQL statement compilation


Builds an MPP execution plan


A set of parallel QE steps
Executes the plan



Parsing, validation, optimization
Coordinates workflow among steps
Assembles the resultset
Returns resultset to client
Example – Schema TPCH
--------------------------------------------------------------------------------- Customer Table
-- distributiond on c_custkey
-------------------------------------------------------------------------------CREATE TABLE customer
( c_custkey
bigint,
c_name
varchar(25),
c_address
varchar(40),
c_nationkey integer,
c_phone
char(15),
c_acctbal
decimal(15,2),
c_mktsegment char(10),
c_comment
varchar(117))
WITH (distribution=hash(c_custkey)) ;
--------------------------------------------------------------------------------- Orders Table
-------------------------------------------------------------------------------CREATE TABLE orders
( o_orderkey
bigint,
o_custkey
bigint,
o_orderstatus
char(1),
o_totalprice
decimal(15,2),
o_orderdate
date,
o_orderpriority char(15),
o_clerk
char(15),
o_shippriority integer,
o_comment
varchar(79))
WITH (distribution=hash(o_orderkey)) ;
20
--------------------------------------------------------------------------------- LineItem Table
-- distributiond on l_orderkey
-------------------------------------------------------------------------------CREATE TABLE lineitem
( l_orderkey
bigint,
l_partkey
bigint,
l_suppkey
bigint,
l_linenumber bigint,
l_quantity
decimal(15,2),
l_extendedprice decimal(15,2),
l_discount
decimal(15,2),
l_tax
decimal(15,2),
l_returnflag char(1),
l_linestatus char(1),
l_shipdate
date,
l_commitdate date,
l_receiptdate date,
l_shipinstruct char(25),
l_shipmode
char(10),
l_comment
varchar(44))
WITH (distribution=hash(l_orderkey)) ;
UCI ISG Seminar
1/8/2010
Example - Query
SELECT
FROM
WHERE
GROUP
ORDER
TOP 10
L_ORDERKEY,
SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT))
AS REVENUE,
O_ORDERDATE,
O_SHIPPRIORITY
CUSTOMER,
ORDERS,
LINEITEM
C_MKTSEGMENT = 'BUILDING' AND
C_CUSTKEY = O_CUSTKEY AND
L_ORDERKEY = O_ORDERKEY AND
O_ORDERDATE < '1995-03-05' AND
O_ORDERDATE >= '1994-09-15 00:00:00.000'
BY
L_ORDERKEY,
O_ORDERDATE,
O_SHIPPRIORITY
BY
REVENUE DESC,
O_ORDERDATE
;
21
UCI ISG Seminar
1/8/2010
Example – Execution Plan
------------------------------- Step 1: create temp table at control node
-----------------------------CREATE TABLE [tempdb].[dbo].[Q_[TEMP_ID_664]]
( [l_orderkey] BIGINT, [REVENUE] DECIMAL(38, 4),
[o_orderdate] DATE, [o_shippriority] INTEGER );
------------------------------- Step 2: create temp tables at all compute nodes
-----------------------------CREATE TABLE
[tempdb].[dbo].[Q_[TEMP_ID_665]_[PARTITION_ID]]
( [l_orderkey] BIGINT, [l_extendedprice] DECIMAL(15, 2),
[l_discount] DECIMAL(15, 2), [o_orderdate] DATE,
[o_shippriority] INTEGER, [o_custkey] BIGINT,
[o_orderkey] BIGINT )
WITH ( DISTRIBUTION = HASH([o_custkey]) );
-------------------------------- Step 3: SHUFFLE_MOVE
-------------------------------SELECT [l_orderkey], [l_extendedprice], [l_discount],
[o_orderdate], [o_shippriority], [o_custkey], [o_orderkey]
FROM [dwsys].[dbo].[orders] JOIN [dwsys].[dbo].[lineitem]
ON ([l_orderkey] = [o_orderkey])
WHERE ([o_orderdate] < '1995-03-05'
AND [o_orderdate] >= '1994-09-15 00:00:00.000')
INTO Q_[TEMP_ID_665]_[PARTITION_ID]
SHUFFLE ON (o_custkey);
22
------------------------------- Step 4: PARTITION_MOVE
-----------------------------SELECT [l_orderkey],
sum(([l_extendedprice] * (1 - [l_discount]))) AS REVENUE,
[o_orderdate], [o_shippriority]
FROM [dwsys].[dbo].[customer] JOIN
tempdb.Q_[TEMP_ID_665]_[PARTITION_ID]
ON ([c_custkey] = [o_custkey])
WHERE [c_mktsegment] = 'BUILDING'
GROUP BY [l_orderkey], [o_orderdate], [o_shippriority]
INTO Q_[TEMP_ID_664];
------------------------------- Step 5: Drop temp tables at all compute nodes
-----------------------------DROP TABLE tempdb.Q_[TEMP_ID_665]_[PARTITION_ID];
-------------------------------- Step 6: RETURN result to client
-------------------------------SELECT TOP 10 [l_orderkey], sum([REVENUE]) AS REVENUE,
[o_orderdate], [o_shippriority]
FROM tempdb.Q_[TEMP_ID_664]
GROUP BY [l_orderkey], [o_orderdate], [o_shippriority]
ORDER BY [REVENUE] DESC, [o_orderdate] ;
-------------------------------- Step 7: Drop temp table at control node
-------------------------------DROP TABLE tempdb.Q_[TEMP_ID_664];
UCI ISG Seminar
1/8/2010
Other Important Functionality

Fault tolerance

All HW components have redundancy:




CPUs, Disks, networks, power, storage processors
Control and Compute nodes use failover clustering
Management nodes have active & standby
Integration with SQL Server BI tools




23
SS Integration Services (ETL) has PDW as a destination
SS Analysis Services (OLAP) has PDW as a source
SS Reporting Services
Excel
UCI ISG Seminar
1/8/2010
Future Challenges

Richer DW analysis



Richer hub-and-spoke configurations



Execution strategies (DW)
Optimization techniques (DW)
Integration with other data services


Isolation levels, deadlocks, logs
Distributed, parallel query processing


MPP to MPP
Distributed transactions


Map-reduce-like functionality inside the cluster
Data mining, embedded analytics
Streaming
Increased HW architecture choices

24
Low-power clusters
UCI ISG Seminar
1/8/2010
Summary



SQL Server Data Platform overview
Workload types
SQL Server Parallel DW DBMS





Design Philosophy
System Architecture
Software Architecture
Query Example
Future challenges
25
UCI ISG Seminar
1/8/2010
Resources

Microsoft SQL Server 2008 case studies


SQL Server best practices



http://technet.microsoft.com/en-us/sqlserver/bb671430.aspx
Dave Salch, Eric Kraemer, Umair Waheed, Paul Dyke, Fast Track Data
Warehouse 2.0 Architecture, SQL Server Technical Article, MSDN,
Nov. 2009.
SQL Server Customer Advisory Team


http://www.microsoft.com/sqlserver/2008/en/us/case-studies.aspx
http://sqlcat.com/Default.aspx
Research on balanced HW architectures


26
Yogesh Simmhan, et. al. GrayWulf: Scalable Software Architecture for
Data Intensive Computing, HICSS pp.1-10, 42nd Hawaii International
Conference on System Sciences, 2009
Alexander S. Szalay, et al., Low Power Amdahl-Balanced Blades for
Data Intensive Computing, SC 2009.
UCI ISG Seminar
1/8/2010
THANKS!
27
UCI ISG Seminar
1/8/2010