Trends in the DW space Understanding the Opportunity Approximate data volume managed by DW Less than 1TB 41% 17% Performance at scale: ability to analyze.

Download Report

Transcript Trends in the DW space Understanding the Opportunity Approximate data volume managed by DW Less than 1TB 41% 17% Performance at scale: ability to analyze.

Trends in the DW space
Understanding the Opportunity
Approximate data volume managed by DW
Less than 1TB
41%
17%
Performance at scale: ability to analyze massive
amounts of data
21%
18%
1 - 3 TB
19%
3 - 10 TB
25%
17%
More than 10 TB
2%
Don't Know
0%
scalability
10s of TBs, to 100s of TB, to PBs
34%
6%
10%
Today
20%
30%
In 3 years
Source: TDWI Report – Next Generation DW
40%
50%
from
Understanding the Opportunity
DW Software License Revenue
US$ Billions
CAGR
14
Share(‘15)
12
Public Cloud
7.1%
4.6%
10
Private Cloud
7.1%
5.0%
Appliances/RA
26.2%
30.0%
Traditional
-0.3%
60.4%
8 1.1
1.5
1.9
2.4
3
3.8
6
4 7.9
8
8.2
8.2
8.1
7.7
FY11
FY12
FY13
FY14
FY15
2
0
FY10
Appliances are the key trend in the next 4 years (4 Billion market by ‘15)
Source: MS internal analysis, DBSMIT Cloud Market Opportunity Forecast
Trends in the DW space
How does SQL Server PDW fit in?
SQL Server Data Warehousing in Appliance Model
Scale out
Control Rack
Compute Racks (1/2 to 4)
Storage Nodes
Compute Nodes
Control Nodes
Active / Passive
Data Center Monitoring
Dual Infiniband
Management Nodes
Dual Fiber Channel
Client Drivers
Landing Zone
ETL Load Interface
Backup Node
Corporate Backup Solution
Spare Compute Node
Corporate Network
Appliance Network
PDW Compute Nodes
Time Dim
Product Dim
Date Dim ID
Calendar Year
Calendar Qtr
Calendar Mo
Calendar Day
Prod Dim ID
Prod Category
Prod Sub Cat
Prod Desc
SQL
Sales Facts
Date Dim ID
Store Dim ID
Prod Dim ID
Mktg Camp Id
Qty Sold
Dollars Sold
Store Dim
Store Dim ID
Store Name
Store Mgr
Store Size
SQL
SQL
Mktg
Campaign Dim
Mktg Camp ID
Camp Name
Camp Mgr
Camp Start
Camp End
SQL
Time Dim
Product Dim
Date Dim ID
Calendar Year
Calendar Qtr
Calendar Mo
Calendar Day
Prod Dim ID
Prod Category
Prod Sub Cat
Prod Desc
SQL
Sales Facts
Date Dim ID
Store Dim ID
Prod Dim ID
Mktg Camp Id
Qty Sold
Dollars Sold
Store Dim
Store Dim ID
Store Name
Store Mgr
Store Size
Smaller Dimension Tables are
Replicated on Every Compute
Node
SQL
SQL
Mktg
Campaign Dim
Mktg Camp ID
Camp Name
Camp Mgr
Camp Start
Camp End
SQL
Time Dim
Product Dim
Date Dim ID
Calendar Year
Calendar Qtr
Calendar Mo
Calendar Day
Prod Dim ID
Prod Category
Prod Sub Cat
Prod Desc
SQL
Sales Facts
Date Dim ID
Store Dim ID
Prod Dim ID
Mktg Camp Id
Qty Sold
Dollars Sold
Store Dim
Store Dim ID
Store Name
Store Mgr
Store Size
Larger Fact Table is Hash
Distributed Across All
Compute Nodes
SQL
SQL
Mktg
Campaign Dim
Mktg Camp ID
Camp Name
Camp Mgr
Camp Start
Camp End
SQL
A quick look at MPP query execution
SQL Server PDW Appliance
connects to
‘the appliance’ like he
would to a ‘normal’ SQL
Server
steps
are executed
on each compute node
shared nothing MPP
system
generates a distributed
execution plan
. Data
Movement Service
Compute Node 2
Compute Node 1
Distributed Table
1
Red
5
Red
5
3
Blue
11
Red
8
5
Red
12
Red
12
7
Green
7
Green
7
2
Red
8
Blue
11
4
Blue
10
Blue
10
6
Yellow
12
Yellow
12
Blue
21
Green
7
Red
25
Yellow
12
Overall Architecture
Control Rack
Data Rack (up to 4)
Compute Node 1
Control Node
Client Interface
(JDBC, ODBC,
OLE-DB, ADO.NET)
PDW Engine
DMS Manager
PDW Agent
DMS Core
PDW Agent
Compute Node 2
DMS Core
PDW Agent
Landing Zone Node
…
ETL
Interface
Bulk Data Loader
PDW Agent
Management Node
Legend:
PDW service
PDW = Parallel Data Warehouse
DMS = Data Movement Service
Active Directory
PDW Agent
Compute Node 10
DMS Core
PDW Agent
Trends in the DW space
How does SQL Server PDW fit in?
SQL Server PDW AU3 – What’s new?
Release Themes
How did it work before?
Control Node
focus on MPP related challenges
PDW AU3 Architecture with Shell Appliance and Cost-Based Query Optimizer
foo
SELECT
Shell Appliance
(SQL Server)
Control Node
SELECT
Engine Service
Plan
Steps
foo
Plan
Steps
Plan
Steps
Compute Node
(SQL Server)
Compute Node
(SQL Server)
Compute Node
(SQL Server)
foo
foo
PDW AU3 Architecture with Shell Appliance and Cost-Based Query Optimizer
Control Node
SELECT
SELECT
Shell Appliance
(SQL Server)
MEMO
Return
Engine Service
Plan
Steps
Plan
Steps
Plan
Steps
Compute Node
(SQL Server)
Compute Node
(SQL Server)
Compute Node
(SQL Server)
Optimizer lifecycle…
1. Simplification and space exploration
Query standardization and simplification (e.g. column reduction, predicates push-down)
Logical space exploration (e.g. join re-ordering, local/global aggregation)
Space expansion (e.g. bushy trees – dealing with intermediate resultsets)
Physical space exploration
Serializing MEMO into binary XML (logical plans)
De-serializing binary XML into PDW Memo
2. Parallel optimization and pruning
Injecting data move operations (expansion)
Costing different alternatives
Pruning and selecting lowest cost distributed plan
3. SQL Generation
Generating SQL Statements to be executed
… And Cost Model Details
AU2 to AU3
80
70
60
50
AU2
40
AU3
30
20
10
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
5x improvement in terms of total elapsed time out of the box
Zero data conversions in data movement
DMS CPU Utilization - TPCH
60
Functionality
Using ODBC instead of ADO.NET for reading
and writing data
Minimizing appliance resource utilization
for data moves
Benefits
Better resource, CPU, utilization
6x or more faster move operations
Increased concurrency
Mixed workload (loads + queries)
40
20
0
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
Q11
Q12
Q13
Q14
Q15
Q16
Q17
Q18
Q19
Q20
Q21
Q22
Eliminate CPU utilization spent on data
conversions
Further parallelize operations during data
moves
CPU (%)
Goal
AU2
AU3
Throughput improvement for data movements
Repl Table…
Shuffle
Replicate
Trim
Broadcast
0%
200%
400%
600%
SQL Server Security and Metadata
Security
SQL Server security syntax and semantics
Supporting user, roles and logins
Fixed database roles
Allows script re-use
Allows well-known security methods
Metadata
PDW metadata stored in SQL Server
Existing SQL Server metadata tables/views (e.g. security views)
PDW distribution info as extended properties in SQL Server metadata
Existing means and technology for persisting metadata
Improved 3rd party tool compatibility (BI, ETL)
Support for SQL Server (Native) Client
Goal
‘Look’ just like a normal SQL Server
Better integration with other BI tools
Functionality
Use existing SQL Server drivers to connect to
SQL Server PDW
Implement SQL Server TDS protocol
Named Parameter support
SQLCMD connectivity to PDW
Benefits
Use known tools and proven technology stack
Existing SQL Server ’eco-system’
2x performance improvement for return
operations
5x reduction of connection time
Server: 10.217.165.13, 17001
SQL Server Clients
TDS
(ADO.NET, ODBC,
OLE-DB, JDBC)
SequeLink
SQL PDW Clients
(ODBC, OLE-DB, ADO.NET)
Server: 10.217.165.13, 17000
Stored Procedure Support (Subset)
Goal
Support common scenarios of code encapsulation
and reuse in Reporting and ETL
Functionality
System and user-defined stored procedures
Invocation using RPC or EXECUTE
Control flow logic, input parameters
Benefits
Enables common logic re-use
Big impact for Reporting Services scenarios
Allows porting existing scripts
Increases compatibility with SQL Server
Collations
Goal
Support local and international data
Functionality
Fixed server level collation
User-defined column level collation
Supporting all Windows collations
Allow COLLATE clauses in Queries and DML
Benefits
Store all the data in PDW w/ additional querying
flexibility
Existing T-SQL DDL and Query scripts
SQL Server alignment and functionality
SQL Server PDW Connectors
Connector for Hadoop
Bi-directional (import/export) interface between MSFT Hadoop and PDW
Delimited file support
Adapter uses existing PDW tools (bulk loader, dwsql)
Low cost solution that handles all the data: structured and unstructured
Additional agility, flexibility and choice
Connector for Informatica
Connector providing PDW source and target (mappings, transformations)
Informatica uses PDW bulk loader for fast loads
Leverage existing toolset and knowledge
Connector for Business Objects
Trends in the DW space
How does SQL Server PDW fit in?
SQL Server PDW AU3 – What’s new?
Building BI Solutions with SQL Server PDW
Customer Successes
Original Customer SMP solution vs. PDW AU3 (with cost-based query optimizer)
1600
1400
1200
1000
Old SMP
800
POS ODS AU3
600
400
200
0
Q1
Q2
Q3
Q4
Q5
Q6
Q7
How are customers using PDW & BI ?
Data Volume
80 TB data
Portal
Requirement
hourly loads
BI Integration
300GB/day
AU3 Feedback
T-SQL support
painless
Enabled queries
Reports
ETL
Operational
DB’s
Dashboards
PDW
Scorecards
How are customers using PDW & BI ?
Data Volume
36 TB data warehouse
7 year data window
ClickStream
Nielsen
Requirements
Scalability - growing data volume does not affect performance
Performance and ad-hoc analysis for interactive querying by users
BI Integration with Microsoft BI stack - SSAS and SSRS
AU3 Feedback
SSAS cubes worked ‘out-of-box’
~30x
PDW
SSAS
OLTP
system
Trends in the DW space
How does SQL Server PDW fit in?
SQL Server PDW AU3 – What’s new?
Building BI Solutions with SQL Server PDW
Customer Successes
Using SQL Server PDW with Microsoft BI solutions
Fast
parallel
Infiniband
Aggregation abilities avoids ETL overhead in existing systems
No need
indexes
indexed/materialized views
Infiniband
GBit
link
Understanding the differences compared to ‘SMP world’
foreign key constraints
data design
retrieval planning
MOLAP & ROLAP
limits
include required data
Trends in the DW space
How does SQL Server PDW fit in?
SQL Server PDW AU3 – What’s new?
Building BI Solutions with SQL Server PDW
Customer Successes
Using SQL Server PDW with third party BI solutions
CURRENT_TIMESTAMP , @@DATEFIRST, SET OPTION …
Trends in the DW space
How does SQL Server PDW fit in?
SQL Server PDW AU3 – What’s new?
Building BI Solutions with SQL Server PDW
Customer Successes
Using SQL Server PDW with Microsoft BI solutions
BI solutions leveraging Hadoop integration
HADOOP
Sensor/
RFID Data
Blogs,
Docs
Web
Data
In the context of ETL , BI , and DW
Fast ETL
processing
CostOptimal
storage
HADOOP
Fast
Refinery
Active
Archive
Application
Programmers
DBMS
Admin
Power
BI Users
SQOOP
HADOOP
Sensor/ Blogs, Web
RFID Data Docs Data
SQL Server PDW
Interactive BI/Data
Visualization
SQOOP export with
source (HDFS path) &
target (PDW DB & table)
2.
Read HDFS
data via mappers
1.
3.
Server
PDW
Hadoop
Connector
5.
…
4.
Telnet
Server
…
HDFS
Copies incoming data
on Landing Zone
FTP
Invokes
‘DWLoader’
Compute
Node 1
Landing Zone
PDWconfiguration file
Linux/ Windows/
Hadoop PDW
Control Compute
Nodes
Node
Compute
Node 8
Trends in the DW space
How does SQL Server PDW fit in?
SQL Server PDW AU3 – What’s new?
Building BI Solutions with SQL Server PDW
What’s coming next in SQL Server PDW?
What is coming next?
CALENDAR YEAR 2011
Q1
Shipped
Appliance Update 1
• Improved node
manageability
• Better performance and
reduced overhead
• OEM requests
Q3
Q2
Shipped
CALENDAR YEAR 2012
Q4
Appliance Update 2
• Programmability
• Batches
• Control flow
• Variables
• Temp tables
• QDR infiniband switch
• Onboard Dell
Q1
Shipped
Q2
Q3
Q4
V-Next
Appliance Update 3
• Cost based optimizer
• Native SQL Server drivers,
including JDBC
• Collations
• More expressive query language
• Data Movement Services
performance
• SCOM pack
• Stored procedures (subset)
• Half-rack
• 3rd party integration (Informatica,
MicroStrategy, Business Objects,
HADOOP)
•
•
•
•
•
•
•
•
•
Columnar store index
Stored procedures
Integrated Authentication
PowerView integration
Workload management
LZ/BU redundancy
Windows 8
SQL Server 2012
Hardware refresh
SQL Server
Appliance
BI solutions
Expect at least 5x performance
mva
http://northamerica.msteched.com
www.microsoft.com/learning
http://microsoft.com/technet
http://microsoft.com/msdn