Trends in the DW space Understanding the Opportunity Approximate data volume managed by DW Less than 1TB 41% 17% Performance at scale: ability to analyze.
Download ReportTranscript Trends in the DW space Understanding the Opportunity Approximate data volume managed by DW Less than 1TB 41% 17% Performance at scale: ability to analyze.
Trends in the DW space Understanding the Opportunity Approximate data volume managed by DW Less than 1TB 41% 17% Performance at scale: ability to analyze massive amounts of data 21% 18% 1 - 3 TB 19% 3 - 10 TB 25% 17% More than 10 TB 2% Don't Know 0% scalability 10s of TBs, to 100s of TB, to PBs 34% 6% 10% Today 20% 30% In 3 years Source: TDWI Report – Next Generation DW 40% 50% from Understanding the Opportunity DW Software License Revenue US$ Billions CAGR 14 Share(‘15) 12 Public Cloud 7.1% 4.6% 10 Private Cloud 7.1% 5.0% Appliances/RA 26.2% 30.0% Traditional -0.3% 60.4% 8 1.1 1.5 1.9 2.4 3 3.8 6 4 7.9 8 8.2 8.2 8.1 7.7 FY11 FY12 FY13 FY14 FY15 2 0 FY10 Appliances are the key trend in the next 4 years (4 Billion market by ‘15) Source: MS internal analysis, DBSMIT Cloud Market Opportunity Forecast Trends in the DW space How does SQL Server PDW fit in? SQL Server Data Warehousing in Appliance Model Scale out Control Rack Compute Racks (1/2 to 4) Storage Nodes Compute Nodes Control Nodes Active / Passive Data Center Monitoring Dual Infiniband Management Nodes Dual Fiber Channel Client Drivers Landing Zone ETL Load Interface Backup Node Corporate Backup Solution Spare Compute Node Corporate Network Appliance Network PDW Compute Nodes Time Dim Product Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Prod Dim ID Prod Category Prod Sub Cat Prod Desc SQL Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold Store Dim Store Dim ID Store Name Store Mgr Store Size SQL SQL Mktg Campaign Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End SQL Time Dim Product Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Prod Dim ID Prod Category Prod Sub Cat Prod Desc SQL Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold Store Dim Store Dim ID Store Name Store Mgr Store Size Smaller Dimension Tables are Replicated on Every Compute Node SQL SQL Mktg Campaign Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End SQL Time Dim Product Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Prod Dim ID Prod Category Prod Sub Cat Prod Desc SQL Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold Store Dim Store Dim ID Store Name Store Mgr Store Size Larger Fact Table is Hash Distributed Across All Compute Nodes SQL SQL Mktg Campaign Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End SQL A quick look at MPP query execution SQL Server PDW Appliance connects to ‘the appliance’ like he would to a ‘normal’ SQL Server steps are executed on each compute node shared nothing MPP system generates a distributed execution plan . Data Movement Service Compute Node 2 Compute Node 1 Distributed Table 1 Red 5 Red 5 3 Blue 11 Red 8 5 Red 12 Red 12 7 Green 7 Green 7 2 Red 8 Blue 11 4 Blue 10 Blue 10 6 Yellow 12 Yellow 12 Blue 21 Green 7 Red 25 Yellow 12 Overall Architecture Control Rack Data Rack (up to 4) Compute Node 1 Control Node Client Interface (JDBC, ODBC, OLE-DB, ADO.NET) PDW Engine DMS Manager PDW Agent DMS Core PDW Agent Compute Node 2 DMS Core PDW Agent Landing Zone Node … ETL Interface Bulk Data Loader PDW Agent Management Node Legend: PDW service PDW = Parallel Data Warehouse DMS = Data Movement Service Active Directory PDW Agent Compute Node 10 DMS Core PDW Agent Trends in the DW space How does SQL Server PDW fit in? SQL Server PDW AU3 – What’s new? Release Themes How did it work before? Control Node focus on MPP related challenges PDW AU3 Architecture with Shell Appliance and Cost-Based Query Optimizer foo SELECT Shell Appliance (SQL Server) Control Node SELECT Engine Service Plan Steps foo Plan Steps Plan Steps Compute Node (SQL Server) Compute Node (SQL Server) Compute Node (SQL Server) foo foo PDW AU3 Architecture with Shell Appliance and Cost-Based Query Optimizer Control Node SELECT SELECT Shell Appliance (SQL Server) MEMO Return Engine Service Plan Steps Plan Steps Plan Steps Compute Node (SQL Server) Compute Node (SQL Server) Compute Node (SQL Server) Optimizer lifecycle… 1. Simplification and space exploration Query standardization and simplification (e.g. column reduction, predicates push-down) Logical space exploration (e.g. join re-ordering, local/global aggregation) Space expansion (e.g. bushy trees – dealing with intermediate resultsets) Physical space exploration Serializing MEMO into binary XML (logical plans) De-serializing binary XML into PDW Memo 2. Parallel optimization and pruning Injecting data move operations (expansion) Costing different alternatives Pruning and selecting lowest cost distributed plan 3. SQL Generation Generating SQL Statements to be executed … And Cost Model Details AU2 to AU3 80 70 60 50 AU2 40 AU3 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 5x improvement in terms of total elapsed time out of the box Zero data conversions in data movement DMS CPU Utilization - TPCH 60 Functionality Using ODBC instead of ADO.NET for reading and writing data Minimizing appliance resource utilization for data moves Benefits Better resource, CPU, utilization 6x or more faster move operations Increased concurrency Mixed workload (loads + queries) 40 20 0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 Eliminate CPU utilization spent on data conversions Further parallelize operations during data moves CPU (%) Goal AU2 AU3 Throughput improvement for data movements Repl Table… Shuffle Replicate Trim Broadcast 0% 200% 400% 600% SQL Server Security and Metadata Security SQL Server security syntax and semantics Supporting user, roles and logins Fixed database roles Allows script re-use Allows well-known security methods Metadata PDW metadata stored in SQL Server Existing SQL Server metadata tables/views (e.g. security views) PDW distribution info as extended properties in SQL Server metadata Existing means and technology for persisting metadata Improved 3rd party tool compatibility (BI, ETL) Support for SQL Server (Native) Client Goal ‘Look’ just like a normal SQL Server Better integration with other BI tools Functionality Use existing SQL Server drivers to connect to SQL Server PDW Implement SQL Server TDS protocol Named Parameter support SQLCMD connectivity to PDW Benefits Use known tools and proven technology stack Existing SQL Server ’eco-system’ 2x performance improvement for return operations 5x reduction of connection time Server: 10.217.165.13, 17001 SQL Server Clients TDS (ADO.NET, ODBC, OLE-DB, JDBC) SequeLink SQL PDW Clients (ODBC, OLE-DB, ADO.NET) Server: 10.217.165.13, 17000 Stored Procedure Support (Subset) Goal Support common scenarios of code encapsulation and reuse in Reporting and ETL Functionality System and user-defined stored procedures Invocation using RPC or EXECUTE Control flow logic, input parameters Benefits Enables common logic re-use Big impact for Reporting Services scenarios Allows porting existing scripts Increases compatibility with SQL Server Collations Goal Support local and international data Functionality Fixed server level collation User-defined column level collation Supporting all Windows collations Allow COLLATE clauses in Queries and DML Benefits Store all the data in PDW w/ additional querying flexibility Existing T-SQL DDL and Query scripts SQL Server alignment and functionality SQL Server PDW Connectors Connector for Hadoop Bi-directional (import/export) interface between MSFT Hadoop and PDW Delimited file support Adapter uses existing PDW tools (bulk loader, dwsql) Low cost solution that handles all the data: structured and unstructured Additional agility, flexibility and choice Connector for Informatica Connector providing PDW source and target (mappings, transformations) Informatica uses PDW bulk loader for fast loads Leverage existing toolset and knowledge Connector for Business Objects Trends in the DW space How does SQL Server PDW fit in? SQL Server PDW AU3 – What’s new? Building BI Solutions with SQL Server PDW Customer Successes Original Customer SMP solution vs. PDW AU3 (with cost-based query optimizer) 1600 1400 1200 1000 Old SMP 800 POS ODS AU3 600 400 200 0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 How are customers using PDW & BI ? Data Volume 80 TB data Portal Requirement hourly loads BI Integration 300GB/day AU3 Feedback T-SQL support painless Enabled queries Reports ETL Operational DB’s Dashboards PDW Scorecards How are customers using PDW & BI ? Data Volume 36 TB data warehouse 7 year data window ClickStream Nielsen Requirements Scalability - growing data volume does not affect performance Performance and ad-hoc analysis for interactive querying by users BI Integration with Microsoft BI stack - SSAS and SSRS AU3 Feedback SSAS cubes worked ‘out-of-box’ ~30x PDW SSAS OLTP system Trends in the DW space How does SQL Server PDW fit in? SQL Server PDW AU3 – What’s new? Building BI Solutions with SQL Server PDW Customer Successes Using SQL Server PDW with Microsoft BI solutions Fast parallel Infiniband Aggregation abilities avoids ETL overhead in existing systems No need indexes indexed/materialized views Infiniband GBit link Understanding the differences compared to ‘SMP world’ foreign key constraints data design retrieval planning MOLAP & ROLAP limits include required data Trends in the DW space How does SQL Server PDW fit in? SQL Server PDW AU3 – What’s new? Building BI Solutions with SQL Server PDW Customer Successes Using SQL Server PDW with third party BI solutions CURRENT_TIMESTAMP , @@DATEFIRST, SET OPTION … Trends in the DW space How does SQL Server PDW fit in? SQL Server PDW AU3 – What’s new? Building BI Solutions with SQL Server PDW Customer Successes Using SQL Server PDW with Microsoft BI solutions BI solutions leveraging Hadoop integration HADOOP Sensor/ RFID Data Blogs, Docs Web Data In the context of ETL , BI , and DW Fast ETL processing CostOptimal storage HADOOP Fast Refinery Active Archive Application Programmers DBMS Admin Power BI Users SQOOP HADOOP Sensor/ Blogs, Web RFID Data Docs Data SQL Server PDW Interactive BI/Data Visualization SQOOP export with source (HDFS path) & target (PDW DB & table) 2. Read HDFS data via mappers 1. 3. Server PDW Hadoop Connector 5. … 4. Telnet Server … HDFS Copies incoming data on Landing Zone FTP Invokes ‘DWLoader’ Compute Node 1 Landing Zone PDWconfiguration file Linux/ Windows/ Hadoop PDW Control Compute Nodes Node Compute Node 8 Trends in the DW space How does SQL Server PDW fit in? SQL Server PDW AU3 – What’s new? Building BI Solutions with SQL Server PDW What’s coming next in SQL Server PDW? What is coming next? CALENDAR YEAR 2011 Q1 Shipped Appliance Update 1 • Improved node manageability • Better performance and reduced overhead • OEM requests Q3 Q2 Shipped CALENDAR YEAR 2012 Q4 Appliance Update 2 • Programmability • Batches • Control flow • Variables • Temp tables • QDR infiniband switch • Onboard Dell Q1 Shipped Q2 Q3 Q4 V-Next Appliance Update 3 • Cost based optimizer • Native SQL Server drivers, including JDBC • Collations • More expressive query language • Data Movement Services performance • SCOM pack • Stored procedures (subset) • Half-rack • 3rd party integration (Informatica, MicroStrategy, Business Objects, HADOOP) • • • • • • • • • Columnar store index Stored procedures Integrated Authentication PowerView integration Workload management LZ/BU redundancy Windows 8 SQL Server 2012 Hardware refresh SQL Server Appliance BI solutions Expect at least 5x performance mva http://northamerica.msteched.com www.microsoft.com/learning http://microsoft.com/technet http://microsoft.com/msdn