Transcript Content

Building a smarter planet: Financial Services
Exploding Demands for Big Data, Analytics, Risk
Management, Ultra-low Latency and Compute Power
Requires Optimized HPC Infrastructures
Robert Brinkman
Infrastructure Architect for Banking and
Financial Markets
IBM Banking Center of Excellence
Let’s build a smarter planet
Panel
Vikram Mehta
Vice President, IBM System Networking
IBM Corp.
2
Emile Werr
VP, Global Data Services
Global Head of Enterprise Data Architecture
NYSE Euronext
Dino Vitale
Nick Werstiuk
Director, Cross Technology Services
Morgan Stanley
Product Line Executive
IBM Platform Computing
2009 IBM Corporation
© 2012
Let’s build a smarter planet
Workload Optimized Stacks
Applications
IBM Provided, ISVs, Partners, Custom
Messaging and Security
Cloud
Stack
Transaction
Stack
Appliances Low Latency
& Packages
Stack
Grid
Stack
Discrete
Components or
Applications
High
Transaction
Rates
Specialized
Workloads
High Message
Rates
Big Data
Variable
Workload
Complex Data
Models
Packaged
Hardware and
Software
Data Value
Decay
Big Compute
Financial Markets Industry Imperatives
• Re-engineer for profitable growth: Renewed focus on the customer, Near real
time analytics
• Improve the trade life cycle: Cloud and business process outsourcing
• Optimize enterprise risk management: Data driven transformation and common
industry services
3
2009 IBM Corporation
© 2012
Dino Vitale
Director, Cross Technology Services
Morgan Stanley
Morgan Stanley: Road to Compute As a Service
 Trends
•
Maximize efficiency of compute infrastructure
• Cost / run-rate
• Utilization – more with less, linear scale, sharing
• Operational normalization
 Challenges
•
•
•
•
•
•
•
Phasing
Dynamic provisioning and scaling on-demand of resources to applications according to
varying business needs and SLA
Multi-tenant workload protection
Application design and dependency management
Utility charge-back model options: pay-per-use, fixed allocation, hybrid approach
Sharing resources based on work load supply and demand
BCP
 Convergence opportunities with “Big Data”
•
Increasing data volumes
• Adaptive/real-time Scheduling
• Resource management
• Metrics / Data mining
ON-DEMAND DATA IN HIGH PERFORMANCE ENVIRONMENT
Emile Werr, VP, Global Data Services
Global Head of Enterprise Data Architecture & Identity Management
Technology Challenges
 Big Data (billions of transactions and multi-terabyte captured daily)
 Speed and business agility are essential to our business
 Different viewpoints and data patterns need to be analyzed
 Data coming out of a Trading Plant is not user-friendly
 Correlating disparate data & integration
 Moving large data around is expensive and complex
 System Capacity requirements need to efficiently handle 5x of our Avg daily
volume.
 Data Spikes – the day after Flash Crash volume peaked over18.4 Bn transactions
for NYSE Classic Matching engine (this excludes Options and other markets like
Arca, Amex, Liffe, Euronext, etc.)
 Transaction volume growth sustained year-over-year
 Data needs to be readily available for a min of 7 years for Compliance
 It is too expensive to keep it all online
 Change is constant
Global Data Services
4
USE CASE: Market Reconstruction for Trading Surveillance
The Electronic Book (NYSE DBK) and Market Depth needs to be reconstructed and
accessible via Fast Database
Who Traded Ahead or Interpositioning ? This can be answered by a Database Query
Price level for calculating Shares Ahead
& Shares Available
Data Architecture Practice
Financial Services, Regulatory & Compliance Expertise
Order arrived:
BUY 10 @ 20.09
 Trading systems generate vast transaction volumes at high speeds
 The GRID is utilized to transform, normalize and enrich the time-series data using massive parallel
computing. This is done as EOD or Intra-Day batch processing.
 Date-Level Table scans (Queries) need also massive parallel processing (MPP)
 Appropriate technologies need to be utilized (10gb Network, Virtualized CPU/MEM, Appliance
Databases, Scalable Storage Pools)
8
Data Lifecycle Management Methodology
Trading Data
Market Data
Ref Data
User Generated Data
Transform, Normalize, Enrich
Partition, compress and archive in storage pools
Create Metadata (mappings)
Data Transformation &
Archive
Data Capture
Enterprise
Systems
On-Demand Data
(ODD)
User Analytics
“Business Intelligence”
Utilize MPP Databases & HDFS
Integrate Reporting Tools
Facilitate User Collaboration
Capture Knowledge (KM)
Automate Data Archive & Purge
End-User
Workflow
Secure Data Access & Navigation
Load, Extract, Stream, Filter, Transform, Purge
User-driven Data Mart Provisioning (“Sandboxing”)
Schema Change Capture (“Data Structure Lineage”)
Global Data Services
3
Managed Data Services & Data Flow Automation
Data Capture
Message
Bus
Transformation & Archive
Data Virtualization & Abstraction
Business Demand
Apps
Admins
Scale-Out Grid Fabric
distributed CPU/MEM
Feed
Handler
Analysts
Data Scientists
Researchers
Data
Provisioning
Data
Tools
files
Data
Pump
Continuous Flow
(Trickle Batch)
Storage Pools
Data Services
 Fast Processing &
Data Movement
 Simplified Access &
Administration
 Common Secured
Access
 Scalable
 File & Database
Virtualization
 Automation &
Workflow
 Reliable
Hadoop
Netezza
Analytics Data Warehouse
 Standardization & Consistency
 Agile Framework – Metadata
Driven
 Metering, Monitoring &Tracking
10
Let’s build a smarter planet
Vikram Mehta
Vice President, IBM System Networking
IBM Corporation
11
2009 IBM Corporation
© 2012
Let’s build a smarter planet
Nick Werstiuk
Product Line Executive
IBM Platform Computing
12
2009 IBM Corporation
© 2012
Let’s build a smarter planet
Convergence of Compute and Data
Workload
Data type
Compute Intensive
Structured
Data Intensive
Compute and Data
Intensive
All – Structured + Unstructured
Video, E-Mail, Web
RDBMS, Fixed records
Application
Use Case
CEP
Streaming
Trading
Characteristics
Infrastructure
13
“Real Time”
Dedicated servers,
Appliances, FPGAs
Unstructured
Risk Analytics
Gaming Simulation
Pricing
Intraday
Sentiment
Analysis/CRM
Genomics
AML/Fraud
Daily
Monthly
Compute grid,
Data caches, In-memory grid,
Shared services CPU + GPU
Commodity processors + storage
BI
Reporting
ETL
Quarterly
Disk & Tape,
SMP & Mainframe,
SAN/NAS Infrastructure
Data Warehouses
2009 IBM Corporation
© 2012
Let’s build a smarter planet
Support for Diverse Workloads & Platforms
B
A Geo-spatial integration,
Name classification
Signal processing
D
C
Metadata generation,
File classification,
Batch analysis
Search, Analysis,
Concept Recognition
Data Intensive Apps
Workload Manager
C
C
C
C
C
C
B
B
A
A
A
A
A
A
A
A
C
C
C
C
C
C
B
B
A
A
A
A
A
A
A
A
C
C
C
C
C
C
B
D
D
D
D
D
D
B
B
B
B
B
B
B
B
B
B
D
D
D
D
D
D
B
B
B
Resource Orchestration
14
2009 IBM Corporation
© 2012
14
Let’s build a smarter planet
Why IBM Platform Symphony is faster and more scalable
Latency
Inefficient scheduling, polling model
& heavy-weight transport protocols
limit scalability.
Other
Grid Servers
With a zero-wait time “push model” and
efficient binary protocols, Symphony
scales until the “wire” is saturated
Symphony
Scale
15
2009 IBM Corporation
© 2012
Let’s build a smarter planet
HPC Cloud – Multiple Approaches and Paths to Value
Infrastructure
Management
• Cluster consolidation into
an HPC Cloud
• Self-service cluster
provisioning and
management
• Workload-driven dynamic
cluster
Build out a more dynamic HPC
infrastructure as their HPC Cloud
16
HPC “In the
Cloud”
• ‘Bursting’ to Cloud Providers
• Hosted HPC in the cloud
• Enable HPC Cloud Service
Providers
Leverage the public cloud opportunity,
either to tap into additional resources,
or offer their own HPC cloud services
2009 IBM Corporation
© 2012
Building a smarter planet: Financial Services
Questions