Transcript SQL Server
SQL Server Parallel Data Warehouse: Supporting Large Scale Analytics José Blakeley, Software Architect Database Systems Group, Microsoft Corporation SQL Server PDW Overview 2 JHU DIR March 2011 3/18/2011 Workload Types Online Transaction Processing (OLTP) Balanced read-update ratio (60%-40%) Fine-grained inserts and updates High transaction throughput e.g., 10s K/s Usually very short transactions e.g., 1-3 tables Sometimes multi-step e.g., financial Relatively small data sizes e.g., few TBs Day-to-day business Data Warehousing and Business Analysis (DW) 3 Read-mostly (90%-10%) Few updates in place, high-volume bulk inserts Concurrent query throughput e.g., 10s K / hr Per query response time < 2 s Snowflake, star schemas are common e.g., 5-10 tables Complex queries (filter, join, group-by, aggregation) Very large data sizes e.g., 10s TB - PB JHU DIR March 2011 Analysis over historical data 3/18/2011 SQL Server Parallel Data Warehouse Shared-nothing, distributed, parallel DBMS Built-in data and query partitioning Provides single system view over a cluster of SQL Servers Appliance concept Software + hardware solution Choice of hardware vendors (e.g., HP, Dell, NEC) Optimized for DW workloads Bulk loads (1.2 – 2.0 TB/hr) Sequential scans (700 TB in 3hr) Scale from 10 Terabytes to Petabytes 4 1 rack manages ~40 TB 1 PB will need ~25 racks JHU DIR March 2011 3/18/2011 Hardware Architecture Compute Nodes Control Nodes SQL Active / Passive SQL SQL SQL SQL SQL SQL ETL Load Interface Corporate Backup Solution 5 SQL Dual Fiber Channel Data Center Monitoring SQL Dual Infiniband Client Drivers (ODBC, OLEDB, ADO.NET) Spare Compute Node 2 Rack Appliance JHU DIR March 2011 3/18/2011 Software Architecture Query Tool MS BI (AS, RS) DWSQL Internet Explorer 3rd Party Tools IIS Data Access (OLEDB, ODBC, ADO.NET, JDBC) Admin Console Compute Node Compute Nodes Compute Nodes Data Movement Service Data Movement Service PDW Engine User Data SQL Server Landing Zone Node DW Authentication DW Configuration DW Schema TempDB Data Movement Service SQL Server Control Node 6 JHU DIR March 2011 3/18/2011 Key Software Functionality PDW Engine Provides single system image SQL compilation Global metadata and appliance configuration Global query optimization and plan generation Global query execution coordination Global transaction coordination Authentication and authorization Supportability (HW and SW status info via DMVs) Parallel Loader Runs from the Landing Zone SSIS or command line tool Parallel Database Copy High performance data export Enables Hub-Spoke scenarios Parallel Backup/Restore Backup files stored on Backup Nodes Backup files may be archived into external device/system Data Movement Service 7 Data movement across the appliance Distributed query execution operators JHU DIR March 2011 3/18/2011 Query Processing SQL statement compilation Parsing, validation, optimization Builds an MPP execution plan A sequence of discrete parallel QE “steps” Steps involve SQL queries to be executed by SQL Server at each compute node As well as data movement steps Executes the plan Coordinates workflow among steps Assembles the result set Returns result set to client 8 JHU DIR March 2011 3/18/2011 18,000,048,306 rows Example DW Schema SELECT TOP 10 L_ORDERKEY, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS REVENUE, O_ORDERDATE, O_SHIPPRIORITY FROM CUSTOMER, ORDERS, LINEITEM 30,000,000 WHERE C_MKTSEGMENT = 'BUILDING' AND C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND O_ORDERDATE < ‘2010-03-05' AND L_SHIPDATE > ‘2010-03-05' GROUP BY L_ORDERKEY, O_ORDERDATE, O_SHIPPRIORITY ORDER BY REVENUE DESC, O_ORDERDATE rows 4,500,000,000 rows 600,000,000 rows 2,400,000,000 rows 25 rows 9 5 rows JHU DIR March 2011 3/18/2011 7/17/2015 450,000,000 rows Example – Schema TPCH ----------------------------------------------------------------------- Customer Table -- distributed on c_custkey ---------------------------------------------------------------------CREATE TABLE customer ( c_custkey bigint, c_name varchar(25), c_address varchar(40), c_nationkey integer, c_phone char(15), c_acctbal decimal(15,2), c_mktsegment char(10), c_comment varchar(117)) WITH (distribution=hash(c_custkey)) ; ----------------------------------------------------------------------- Orders Table ---------------------------------------------------------------------CREATE TABLE orders ( o_orderkey bigint, o_custkey bigint, o_orderstatus char(1), o_totalprice decimal(15,2), o_orderdate date, o_orderpriority char(15), o_clerk char(15), o_shippriority integer, o_comment varchar(79)) WITH (distribution=hash(o_orderkey)) ; 10 ----------------------------------------------------------------------- LineItem Table -- distributed on l_orderkey ---------------------------------------------------------------------CREATE TABLE lineitem ( l_orderkey bigint, l_partkey bigint, l_suppkey bigint, l_linenumber bigint, l_quantity decimal(15,2), l_extendedprice decimal(15,2), l_discount decimal(15,2), l_tax decimal(15,2), l_returnflag char(1), l_linestatus char(1), l_shipdate date, l_commitdate date, l_receiptdate date, l_shipinstruct char(25), l_shipmode char(10), l_comment varchar(44)) WITH (distribution=hash(l_orderkey)) ; JHU DIR March 2011 3/18/2011 Example - Query Ten largest “building” orders shipped since March 5, 2010 SELECT TOP 10 L_ORDERKEY, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS REVENUE, O_ORDERDATE, O_SHIPPRIORITY FROM CUSTOMER, ORDERS, LINEITEM WHERE C_MKTSEGMENT = 'BUILDING' AND C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND O_ORDERDATE < ‘2010-03-05' AND L_SHIPDATE > ‘2010-03-05' GROUP BY L_ORDERKEY, O_ORDERDATE, O_SHIPPRIORITY ORDER BY REVENUE DESC, O_ORDERDATE 11 JHU DIR March 2011 3/18/2011 Example – Execution Plan ------------------------------- Step 1: create temp table at control node -----------------------------CREATE TABLE [tempdb].[dbo].[Q_[TEMP_ID_664]] ( [l_orderkey] BIGINT, [REVENUE] DECIMAL(38, 4), [o_orderdate] DATE, [o_shippriority] INTEGER ); ------------------------------- Step 2: create temp tables at all compute nodes -----------------------------CREATE TABLE [tempdb].[dbo].[Q_[TEMP_ID_665]_[PARTITION_ID]] ( [l_orderkey] BIGINT, [l_extendedprice] DECIMAL(15, 2), [l_discount] DECIMAL(15, 2), [o_orderdate] DATE, [o_shippriority] INTEGER, [o_custkey] BIGINT, [o_orderkey] BIGINT ) WITH ( DISTRIBUTION = HASH([o_custkey]) ); -------------------------------- Step 3: SHUFFLE_MOVE -------------------------------SELECT [l_orderkey], [l_extendedprice], [l_discount], [o_orderdate], [o_shippriority], [o_custkey], [o_orderkey] FROM [dwsys].[dbo].[orders] JOIN [dwsys].[dbo].[lineitem] ON ([l_orderkey] = [o_orderkey]) WHERE ([o_orderdate] < ‘2010-03-05' AND [o_orderdate] >= ‘2010-09-15 00:00:00.000') INTO Q_[TEMP_ID_665]_[PARTITION_ID] SHUFFLE ON (o_custkey); 12 ------------------------------- Step 4: PARTITION_MOVE -----------------------------SELECT [l_orderkey], sum(([l_extendedprice] * (1 - [l_discount]))) AS REVENUE, [o_orderdate], [o_shippriority] FROM [dwsys].[dbo].[customer] JOIN tempdb.Q_[TEMP_ID_665]_[PARTITION_ID] ON ([c_custkey] = [o_custkey]) WHERE [c_mktsegment] = 'BUILDING' GROUP BY [l_orderkey], [o_orderdate], [o_shippriority] INTO Q_[TEMP_ID_664]; ------------------------------- Step 5: Drop temp tables at all compute nodes -----------------------------DROP TABLE tempdb.Q_[TEMP_ID_665]_[PARTITION_ID]; -------------------------------- Step 6: RETURN result to client -------------------------------SELECT TOP 10 [l_orderkey], sum([REVENUE]) AS REVENUE, [o_orderdate], [o_shippriority] FROM tempdb.Q_[TEMP_ID_664] GROUP BY [l_orderkey], [o_orderdate], [o_shippriority] ORDER BY [REVENUE] DESC, [o_orderdate] ; -------------------------------- Step 7: Drop temp table at control node -------------------------------DROP TABLE tempdb.Q_[TEMP_ID_664]; JHU DIR March 2011 3/18/2011 Microsoft Column-store Technology VertiPaq and VertiScan In-memory BI (IMBI) Slides by Amir Netz 15 JHU DIR March 2011 3/18/2011 In-Memory BI Technology Developed by SQL Analysis Services (OLAP) team Column-based storage and processing Compression (VertiPaq) Only touch the columns needed for the query Columnar data is more compressible than row data Fast in-memory processing (VertiScan) 16 Filter, grouping, aggregation, sorting JHU DIR March 2011 3/18/2011 How VertiPaq Compression Works Read Raw Data Organize by Columns Phase I: Encoding Convert to uniform representation (Integer Vectors) Encoding is per column Dictionary Encoding 2x – 10x size reduction Value Encoding 1x – 2x size reduction VertiPaq Compression Compression Analysis Hybrid RLE Phase II: Compression Minimize storage space Compression is per 8M row segments 17 75%-95% of data Run Length Encoding (RLE) JHU size DIRreduction March 2011 ~100x 5%-25% of data Bit Packing 2x – 4x size reduction 3/18/2011 436,892,631 rows TECSPURSL00 (dbo) (Read-only) APPX – 1TB FTxlatOrgId CTxlatOrgId TRCreditedSubsidiaryId BillingCurrencyID Star Join Schema 34 rows Region (dbo) (Read-only) ProductId DataSourceId SalesDateId AreaId BillingMonthSalesDateID RegionCode ActualQuantityCnt RegionName ActualLicenseCnt SecondaryLicenseCnt ActualRevenueAmt AdjustedGrossRevenueAmt JointVentureRevenueAmt LicenseTransactionItemId GeographyId JointVentureActualQuantityCnt JointVentureActualLicenseCnt JointVentureSecondaryLicenseCnt BillDocTypeID SalesOfficeID ReasonID BillingStatusID TopParentProductID ParentProductID BundleFlagID ExtractListID ServiceRevenueAmt ServiceActualQuantityCnt ServiceAdjustedRevenueAmt FiscalYearName SalesDate (dbo) (Read-only) SalesDateId FiscalWeekID FiscalMonthID FiscalQuarterID FiscalYearID FiscalQuarterName FiscalMonthName FiscalYearName CalendarMonthName FWBeginDate FWEndDate CalendarDate CalendarDateName MonthRelativeID FMBeginDate FMEndDate TransactionDataPopulated FiscalSemesterID FiscalSemesterName AggregatedSalesDateID RevenueCommitmentMultiplier LicenseCommitmentMultiplier CommitmentPeriod DataSourcePlanningFlag AdvisorID JHU DIR March 2011 FiscalYearId RecordTypeId AdjustedLicenseCnt 18 FiscalYear (dbo) (Read-only) ManagementReportingId RegionId SELECT FE0.RegionName, FL0.FiscalYearName, SUM (A.ActualRevenueAmt) UpperGeography (dbo) (Read-only) CreditedSubsidiaryID FROM TECSPURSL00 A CreditedSubRegionID JOIN SalesDate L CreditedRegionID ON A.SalesDateID = L.SalesDateID CreditedAreaID CreditedBigAreaID JOIN UpperGeography UG ON A.TRCreditedSubsidiaryId = UG.CreditedSubsidiaryID 118 rows JOIN Region FE0 ON UG.CreditedRegionID = FE0.RegionID JOIN FiscalYear FL0 ON L.FiscalYearID = FL0.FiscalYearID GROUP BY FE0.RegionName, FL0.FiscalYearName 41 rows OrderStatusID 13,517 rows 3/18/2011 7/17/2015 Column-Store on APPX Time in seconds SQL Server 2008 vs. CS index Comparison 9.00 8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 0.00 Q1 SQL 0.34 IMBI 1.54 Q2 0.51 1.15 Q4 0.36 0.87 Q4 2.67 0.73 Q5 2.05 0.85 Q6 0.27 0.84 Q7 0.74 0.89 Q8 2.64 1.13 Q9 8.52 5.41 Q10 3.15 0.89 Q11 2.12 1.68 Q12 2.29 1.31 Response time < 2s common Smaller variance in response time more predictable query performance 19 JHU DIR March 2011 3/18/2011 THANKS! 20 JHU DIR March 2011 3/18/2011