Transcript SQL Server
SQL Server: A Data Platform for
Large-Scale Applications
José Blakeley, Software Architect
Database Systems Group, Microsoft Corporation
SQL Server Design Philosophy
To solve 90+% of the problems automatically
Self Tuning
Dynamic Optimization
Self Configuration
Self Management
Self Healing
The remaining percentage takes higher touch
Requires careful design for scalability
Good knowledge of how the RDBMS platform works
This talk describes how we are codifying our solution for
large-scale DW into SQL Server
2
UCI ISG Seminar
1/8/2010
SQL Data Platform
Any Place, Any Type, Any Scale
Enterprise Data Platform
Beyond Relational
Dynamic Development
Pervasive Insight
4
UCI ISG Seminar
1/8/2010
Outline
SQL
Server Data Platform
Basic concepts – common ground
Assumptions
Workloads
SQL Server Parallel DW DBMS
Philosophy
System Architecture
Software Architecture
Summary
5
UCI ISG Seminar
1/8/2010
Workload Types
Online Transaction Processing (OLTP)
Balanced read-update ratio (60%-40%)
Fine-grained inserts and updates
OLTP =throughput
Day-to-day e.g., 10s K/s
High transaction
business
Usually very short transactions e.g., 1-3 tables
Sometimes multi-step e.g., financial
Relatively small data sizes e.g., few TBs
Data Warehousing and Business Analysis (DW)
7
Read-mostly (90%-10%)
Few updates in place, high-volume bulk inserts
Concurrent query throughput e.g., 10sDW
K / hr
= Analysis over
Per query response time < 2 s
recorded data
Snowflake, star schemas are common e.g., 5-10 tables
Complex queries (filter, join, group-by, aggregation)
Very large data sizes e.g., 10s TB - PB
UCI ISG Seminar
1/8/2010
436,892,631 rows
Star Join Schema
TECSPURSL00 (dbo) (Read-only)
FTxlatOrgId
CTxlatOrgId
TRCreditedSubsidiaryId
BillingCurrencyID
34 rows
Region (dbo) (Read-only)
ProductId
DataSourceId
FiscalYear (dbo) (Read-only)
FiscalYearId
FiscalYearName
ManagementReportingId
RecordTypeId
RegionId
SalesDateId
AreaId
BillingMonthSalesDateID
RegionCode
SELECT FE0.RegionName,
RegionName
FL0.FiscalYearName,
SUM (A.ActualRevenueAmt)
FROM TECSPURSL00 A
JOIN SalesDate L
ON A.SalesDateID = L.SalesDateID
UpperGeography (dbo) (Read-only)
JOIN UpperGeography UG
CreditedSubsidiaryID
CreditedSubRegionID
ON A.TRCreditedSubsidiaryId =
CreditedRegionID
UG.CreditedSubsidiaryID
CreditedAreaID
CreditedBigAreaID
JOIN Region FE0
ON UG.CreditedRegionID =
FE0.RegionID
118 rows
JOIN FiscalYear FL0
ON L.FiscalYearID = FL0.FiscalYearID
GROUP BY FE0.RegionName, FL0.FiscalYearName
41 rows
ActualQuantityCnt
ActualLicenseCnt
SecondaryLicenseCnt
AdjustedLicenseCnt
ActualRevenueAmt
AdjustedGrossRevenueAmt
JointVentureRevenueAmt
LicenseTransactionItemId
GeographyId
JointVentureActualQuantityCnt
JointVentureActualLicenseCnt
JointVentureSecondaryLicenseCnt
BillDocTypeID
SalesOfficeID
ReasonID
BillingStatusID
TopParentProductID
ParentProductID
BundleFlagID
ExtractListID
ServiceRevenueAmt
ServiceActualQuantityCnt
ServiceAdjustedRevenueAmt
SalesDate (dbo) (Read-only)
SalesDateId
FiscalWeekID
FiscalMonthID
FiscalQuarterID
FiscalYearID
FiscalQuarterName
FiscalMonthName
FiscalYearName
CalendarMonthName
FWBeginDate
FWEndDate
CalendarDate
CalendarDateName
MonthRelativeID
FMBeginDate
FMEndDate
TransactionDataPopulated
FiscalSemesterID
FiscalSemesterName
AggregatedSalesDateID
RevenueCommitmentMultiplier
LicenseCommitmentMultiplier
CommitmentPeriod
DataSourcePlanningFlag
AdvisorID
8
UCI ISG Seminar
OrderStatusID
13,517 rows
1/8/2010
7/7/2015
Why look at workloads?
$B
8
6
OLTP
Mixed
DW
4
2
1995
9
2000
2005
2010
DBMS architectures of 30
years ago designed for
mixed workloads
OLTP and DW workloads
have grown into big market
segments
Workload-specific engines
can be profitable
Innovations are driven by
deeper understanding of
workloads
UCI ISG Seminar
1/8/2010
Changing TPC-H Landscape
68x perf
12% cost
New DW startups disrupting TPC-H
Column-store, compression, and scale-out
10
UCI ISG Seminar
1/8/2010
Data Warehousing
SQL Server 2008
Provides out-of-the-box SMP scale for ~10 TB
Many successful apps for 10-100 TB running today via
custom scale-out
Data compression – row and page
Star join, bitmap filters, partitioned table parallelism
Minimally Logged INSERT
MERGE
Resource governor – CPU and memory
E.g., Danske Bank, Clalit Health, US Dep. of Agriculture, PanSTARRS
Building out-of-the-box scale-out for 10s-100s of TB
This is the focus of this talk
11
UCI ISG Seminar
1/8/2010
Outline
SQL
Server Data Platform
Basic concepts – common ground
Assumptions
Workloads
SQL
Server Parallel DW DBMS
Design Philosophy
Hardware Architecture
Software Architecture
Summary
12
UCI ISG Seminar
1/8/2010
Design Philosophy
Shared-nothing, distributed, parallel DBMS
Appliance concept
Implicit data and function partitioning
Software + hardware solution
Optimized for DW workloads
Scalable to Petabytes
13
UCI ISG Seminar
1/8/2010
Appliance Concept
Parallel DW
DBMS
System
Software
INDUSTRY STANDARD
SERVERS
Reference
Hardware
Platforms
INDUSTRY STANDARD
NETWORKING
INDUSTRY STANDARD
STORAGE
14
UCI ISG Seminar
1/8/2010
Hardware Architecture
Compute Nodes
Control Nodes
SQL
Active / Passive
SQL
SQL
SQL
SQL
SQL
SQL
ETL Load
Interface
Corporate
Backup
Solution
15
SQL
Dual Fiber Channel
Data Center
Monitoring
SQL
Dual Infiniband
Client Drivers
(ODBC, OLEDB, ADO.NET)
Spare Compute Node
1 Rack Server
Appliance
UCI ISG Seminar
1/8/2010
Node Types
Control node:
Where clients apps connect
Parallel engine runs here
Contains system metadata
Store user data
Perform query execution
Not accessible to outside world
Landing Zone:
Staging place for data loading
Accessible to outside world
Can be augmented with 3rd
party HW and SW
Backup node:
Compute nodes:
Backup file storage
Accessible to outside world
Can be augmented with 3rd
party HW and SW
Management node:
Windows domain controller
(Active Directory)
SW upgrades staging place
Holds SW images in case a
node needs reimaging
Software Architecture
Query
Tool
MS BI
(AS, RS)
DWSQL
Other
3rd Party
Tools
Internet
Explorer
IIS
Data Access
(OLEDB, ODBC, ADO.NET, JDBC)
Admin
Console
Compute Node
Compute Nodes
Compute Nodes
Data Movement Service
Data
Movement
Service
MPP Engine Coordinator
User Data
SQL Server
Landing Zone Node
DW
Authentication
DW
Configuration
DW
Schema
TempDB
Data Movement Service
SQL Server
Control
17 Node
UCI ISG Seminar
1/8/2010
Key Components
MPP Engine Coordinator
Provides single system image
SQL compilation
Global metadata and appliance
configuration
Global query optimization and plan
generation
Global query execution
coordination
Global transaction coordination
Authentication and authorization
Supportability (HW and SW status
info)
Parallel Loader
Run from the Landing Zone
SSIS or command line tool
Parallel Database Copy
High performance data export
Enables Hub-Spoke scenarios
Parallel Backup/Restore
Backup files stored on Backup
Nodes
Backup files may be archived into
external device/system
Data Movement Service
18
Data movement across the
appliance
Distributed query execution
operators
UCI ISG Seminar
1/8/2010
Query Processing
SQL statement compilation
Builds an MPP execution plan
A set of parallel QE steps
Executes the plan
Parsing, validation, optimization
Coordinates workflow among steps
Assembles the resultset
Returns resultset to client
Example – Schema TPCH
--------------------------------------------------------------------------------- Customer Table
-- distributiond on c_custkey
-------------------------------------------------------------------------------CREATE TABLE customer
( c_custkey
bigint,
c_name
varchar(25),
c_address
varchar(40),
c_nationkey integer,
c_phone
char(15),
c_acctbal
decimal(15,2),
c_mktsegment char(10),
c_comment
varchar(117))
WITH (distribution=hash(c_custkey)) ;
--------------------------------------------------------------------------------- Orders Table
-------------------------------------------------------------------------------CREATE TABLE orders
( o_orderkey
bigint,
o_custkey
bigint,
o_orderstatus
char(1),
o_totalprice
decimal(15,2),
o_orderdate
date,
o_orderpriority char(15),
o_clerk
char(15),
o_shippriority integer,
o_comment
varchar(79))
WITH (distribution=hash(o_orderkey)) ;
20
--------------------------------------------------------------------------------- LineItem Table
-- distributiond on l_orderkey
-------------------------------------------------------------------------------CREATE TABLE lineitem
( l_orderkey
bigint,
l_partkey
bigint,
l_suppkey
bigint,
l_linenumber bigint,
l_quantity
decimal(15,2),
l_extendedprice decimal(15,2),
l_discount
decimal(15,2),
l_tax
decimal(15,2),
l_returnflag char(1),
l_linestatus char(1),
l_shipdate
date,
l_commitdate date,
l_receiptdate date,
l_shipinstruct char(25),
l_shipmode
char(10),
l_comment
varchar(44))
WITH (distribution=hash(l_orderkey)) ;
UCI ISG Seminar
1/8/2010
Example - Query
SELECT
FROM
WHERE
GROUP
ORDER
TOP 10
L_ORDERKEY,
SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT))
AS REVENUE,
O_ORDERDATE,
O_SHIPPRIORITY
CUSTOMER,
ORDERS,
LINEITEM
C_MKTSEGMENT = 'BUILDING' AND
C_CUSTKEY = O_CUSTKEY AND
L_ORDERKEY = O_ORDERKEY AND
O_ORDERDATE < '1995-03-05' AND
O_ORDERDATE >= '1994-09-15 00:00:00.000'
BY
L_ORDERKEY,
O_ORDERDATE,
O_SHIPPRIORITY
BY
REVENUE DESC,
O_ORDERDATE
;
21
UCI ISG Seminar
1/8/2010
Example – Execution Plan
------------------------------- Step 1: create temp table at control node
-----------------------------CREATE TABLE [tempdb].[dbo].[Q_[TEMP_ID_664]]
( [l_orderkey] BIGINT, [REVENUE] DECIMAL(38, 4),
[o_orderdate] DATE, [o_shippriority] INTEGER );
------------------------------- Step 2: create temp tables at all compute nodes
-----------------------------CREATE TABLE
[tempdb].[dbo].[Q_[TEMP_ID_665]_[PARTITION_ID]]
( [l_orderkey] BIGINT, [l_extendedprice] DECIMAL(15, 2),
[l_discount] DECIMAL(15, 2), [o_orderdate] DATE,
[o_shippriority] INTEGER, [o_custkey] BIGINT,
[o_orderkey] BIGINT )
WITH ( DISTRIBUTION = HASH([o_custkey]) );
-------------------------------- Step 3: SHUFFLE_MOVE
-------------------------------SELECT [l_orderkey], [l_extendedprice], [l_discount],
[o_orderdate], [o_shippriority], [o_custkey], [o_orderkey]
FROM [dwsys].[dbo].[orders] JOIN [dwsys].[dbo].[lineitem]
ON ([l_orderkey] = [o_orderkey])
WHERE ([o_orderdate] < '1995-03-05'
AND [o_orderdate] >= '1994-09-15 00:00:00.000')
INTO Q_[TEMP_ID_665]_[PARTITION_ID]
SHUFFLE ON (o_custkey);
22
------------------------------- Step 4: PARTITION_MOVE
-----------------------------SELECT [l_orderkey],
sum(([l_extendedprice] * (1 - [l_discount]))) AS REVENUE,
[o_orderdate], [o_shippriority]
FROM [dwsys].[dbo].[customer] JOIN
tempdb.Q_[TEMP_ID_665]_[PARTITION_ID]
ON ([c_custkey] = [o_custkey])
WHERE [c_mktsegment] = 'BUILDING'
GROUP BY [l_orderkey], [o_orderdate], [o_shippriority]
INTO Q_[TEMP_ID_664];
------------------------------- Step 5: Drop temp tables at all compute nodes
-----------------------------DROP TABLE tempdb.Q_[TEMP_ID_665]_[PARTITION_ID];
-------------------------------- Step 6: RETURN result to client
-------------------------------SELECT TOP 10 [l_orderkey], sum([REVENUE]) AS REVENUE,
[o_orderdate], [o_shippriority]
FROM tempdb.Q_[TEMP_ID_664]
GROUP BY [l_orderkey], [o_orderdate], [o_shippriority]
ORDER BY [REVENUE] DESC, [o_orderdate] ;
-------------------------------- Step 7: Drop temp table at control node
-------------------------------DROP TABLE tempdb.Q_[TEMP_ID_664];
UCI ISG Seminar
1/8/2010
Other Important Functionality
Fault tolerance
All HW components have redundancy:
CPUs, Disks, networks, power, storage processors
Control and Compute nodes use failover clustering
Management nodes have active & standby
Integration with SQL Server BI tools
23
SS Integration Services (ETL) has PDW as a destination
SS Analysis Services (OLAP) has PDW as a source
SS Reporting Services
Excel
UCI ISG Seminar
1/8/2010
Future Challenges
Richer DW analysis
Richer hub-and-spoke configurations
Execution strategies (DW)
Optimization techniques (DW)
Integration with other data services
Isolation levels, deadlocks, logs
Distributed, parallel query processing
MPP to MPP
Distributed transactions
Map-reduce-like functionality inside the cluster
Data mining, embedded analytics
Streaming
Increased HW architecture choices
24
Low-power clusters
UCI ISG Seminar
1/8/2010
Summary
SQL Server Data Platform overview
Workload types
SQL Server Parallel DW DBMS
Design Philosophy
System Architecture
Software Architecture
Query Example
Future challenges
25
UCI ISG Seminar
1/8/2010
Resources
Microsoft SQL Server 2008 case studies
SQL Server best practices
http://technet.microsoft.com/en-us/sqlserver/bb671430.aspx
Dave Salch, Eric Kraemer, Umair Waheed, Paul Dyke, Fast Track Data
Warehouse 2.0 Architecture, SQL Server Technical Article, MSDN,
Nov. 2009.
SQL Server Customer Advisory Team
http://www.microsoft.com/sqlserver/2008/en/us/case-studies.aspx
http://sqlcat.com/Default.aspx
Research on balanced HW architectures
26
Yogesh Simmhan, et. al. GrayWulf: Scalable Software Architecture for
Data Intensive Computing, HICSS pp.1-10, 42nd Hawaii International
Conference on System Sciences, 2009
Alexander S. Szalay, et al., Low Power Amdahl-Balanced Blades for
Data Intensive Computing, SC 2009.
UCI ISG Seminar
1/8/2010
THANKS!
27
UCI ISG Seminar
1/8/2010