IBM Big Data Platform Overview

Download Report

Transcript IBM Big Data Platform Overview

Big Data Technologies for
Civil and Defense Aviation
Bruce Brown, BigData Technical Specialist
[email protected]
July 16, 2015
© 2012 IBM Corporation
“Data is the new Oil”
In its raw form, oil has little value. Once processed and refined, it helps power the world.
“Big Data has arrived at Seton
Health Care Family, fortunately
accompanied by an analytics
tool that will help deal with the
complexity of more than two
million patient contacts a
year…”
“At the World Economic
Forum last month in Davos,
Switzerland, Big Data was a
marquee topic. A report by the
forum, “Big Data, Big Impact,”
declared data a new class of
economic asset, like currency
or gold.
“Increasingly, businesses are
applying analytics to social media
such as Facebook and Twitter, as
well as to product review websites,
to try to “understand where
customers are, what makes them
tick and what they want”, says
Deepak Advani, who heads IBM’s
predictive analytics group.”
“Companies are being inundated
with data—from information on
customer-buying habits to
supply-chain efficiency. But
many managers struggle to
make sense of the numbers.”
“Data is the new oil.”
“…now Watson is being put to
work digesting millions of pages
of research, incorporating the
best clinical practices and
monitoring the outcomes to
assist physicians in treating
cancer patients.”
The Oscar Senti-meter — a tool
developed by the L.A. Times, IBM
and the USC Annenberg
Innovation Lab — analyzes
opinions about the Academy
Awards race shared in millions of
public messages on Twitter.”
Clive Humby
2
© 2012 IBM Corporation
The Characteristics of Big Data
Cost efficiently
processing the
growing Volume
50x
2010
35 ZB
30 Billion
RFID
sensors and
counting
Collectively analyzing
the broadening Variety
80%
of the
worlds data is
unstructured
2020
Establishing the
Veracity of big
data sources
3
Responding to the
increasing Velocity
1 in 3 business leaders don’t trust
the information they use to make
decisions
© 2012 IBM Corporation
Vestas optimizes
capital investments
based on 2.5
Petabytes of
information.
 Model the weather to optimize
placement of turbines, maximizing
power generation and longevity.
 Reduce time required to identify
placement of turbine from weeks to
hours.
 Incorporate 2.5 PB of structured and
semi-structured information flows. Data
volume expected to grow to 6 PB.
5
5
5
© 2012 IBM Corporation
University of Ontario
Institute of Technology
(UOIT) Detects Neonatal
Patient Symptoms Sooner
Capabilities Utilized:
Stream Computing
•
Performing real-time analytics using
physiological data from neonatal babies
•
Continuously correlates data from medical
monitors to detect subtle changes and alert
hospital staff sooner
•
Early warning gives caregivers the ability to
proactively deal with complications
Significant benefits:
“Helps detect life
threatening conditions
up to 24 hours sooner”
7
7
•
Helps detect life threatening conditions up
to 24 hours sooner
•
Lower morbidity and improved patient care
© 2012 IBM Corporation
TerraEchos Turns to IBM
Big Data for Low Latency
Surveillance Data Analysis
Capabilities Utilized:
Stream Computing
•
Deployed security surveillance system to detect,
classify, locate, and track potential threats at
highly sensitive national lab
•
Stream computing collects and analyzes acoustic
data from fiber-optic sensor arrays
•
Analyzed acoustic data fed into TerraEchos
intelligence platform for threat detection,
classification, prediction & communication
Significant benefits:
8
8
8
“Identifies and
classifies potential
security threats –
miles away”
•
Enables Terraechos solution to analyze and
classify streaming acoustic data in real-time
•
Provides lab & security staff with holistic view of
potential threats & non-issues
•
Enables a faster and more intelligent response to
any threat
© 2012 IBM Corporation
Pacific Northwest
Smart Grid
Demonstration Project
Capabilities:
Stream Computing – real-time
control system
Data Warehouse Appliance –
analyze massive data sets
Demonstrates scalability from
100 to 500K homes while
retaining 10 years’ historical
data
60k metered customers in 5
states
Accommodates ad hoc analysis
of price fluctuation, energy
consumption profiles, risk, fraud
detection, grid health, etc.
9
9
© 2012 IBM Corporation
In Order to Realize New Opportunities, You Need to Think
Beyond Traditional Sources of Data
Transactional and
Application Data
Machine Data
Social Data
Enterprise
Content
 Volume
 Velocity
 Variety
 Variety
 Structured
 Semi-structured
 Highly unstructured
 Highly unstructured
 Throughput
 Ingestion
 Veracity
 Volume
10
© 2012 IBM Corporation
Leveraging Big Data Requires Multiple Platform Capabilities
11
Understand and navigate
federated big data sources
Federated Discovery and Navigation
Manage & store huge
volume of any data
Hadoop File System
MapReduce
Structure and control data
Data Warehousing
Manage streaming data
Stream Computing
Analyze unstructured data
Text Analytics Engine
Integrate and govern all
data sources
Integration, Data Quality, Security,
Lifecycle Management, MDM
© 2012 IBM Corporation
Business-centric Big Data enables you to start with a critical business
pain and expand the foundation for future requirements
 “Big data” isn’t just a
technology—it’s a business
strategy for capitalizing on
information resources
 Getting started is crucial
 Success at each entry point is
accelerated by products within
the Big Data platform
 Build the foundation for future
requirements by expanding
further into the big data platform
12
12
© 2012 IBM Corporation
Expand with the Big Data Platform for future needs
13
13
© 2012 IBM Corporation
1 – Unlock Big Data
 Customer Need
– Understand existing data sources
– Expose the data within existing content
management and file systems for new uses,
without copying the data to a central location
– Search and navigate big data from
federated sources
 Value Statement
– Get up and running quickly and discover and
retrieve relevant big data
– Use big data sources in new information-centric
applications
 Customer examples
– Proctor and Gamble – Connect employees with a
360° view of big data sources
 Get started with: IBM Vivisimo Velocity
14
© 2012 IBM Corporation
2 – Analyze Raw Data
 Customer Need
–
–
–
–
Ingest data as-is into Hadoop and derive insight from it
Process large volumes of diverse data within Hadoop
Combine insights with the data warehouse
Low-cost ad-hoc analysis with Hadoop to test new
hypothesis
 Value Statement
– Gain new insights from a variety and combination of
data sources
– Overcome the prohibitively high cost of converting
unstructured data sources to a structured format
– Extend the value of the data warehouse by bringing in
new types of data and driving new types of analysis
– Experiment with analysis of different data combinations
to modify the analytic models in the data warehouse
 Customer examples
– Financial Services Regulatory Org – managed
additional data types and integrated with their existing
data warehouse
 Get started with: InfoSphere BigInsights
15
© 2012 IBM Corporation
3 – Simplify your Warehouse
• Customer Need
– Business users are hampered by the poor
performance of analytics of a general-purpose
enterprise warehouse – queries take hours to run
– Enterprise data warehouse is encumbered by too
much data for too many purposes
– Need to ingest huge volumes of structured data and
run multiple concurrent deep analytic queries
against it
– IT needs to reduce the cost of maintaining the data
warehouse
• Value Statement
– Speed and Simplicity for deep analytics (Netezza)
– 100s to 1000s users/second for operation analytics
(IBM Smart Analytics System)
• Customer examples
– Catalina Marketing – executing 10x the amount of
predictive workloads with the same staff
• Get started with: IBM Warehouse Solutions
16
16
© 2012 IBM Corporation
4 – Reduce costs with Hadoop
 Customer Need
– Reduce the overall cost to maintain data in the warehouse
– often its seldom used and kept ‘just in case’
– Lower costs as data grows within the data warehouse
– Reduce expensive infrastructure used for processing and
transformations
 Value Statement
– Support existing and new workloads on the most cost
effective alternative, while preserving existing access and
queries
– Lower storage costs
– Reduce processing costs by pushing processing onto
commodity hardware and the parallel processing of
Hadoop
 Customer examples
– Financial Services Firm – move processing of applications
and reports to Hadoop HBase while preserving existing
queries
 Get started with: IBM InfoSphere BigInsights
17
© 2012 IBM Corporation
5 – Analyze Streaming Data
 Customer Need
– Harness and process streaming
data sources
– Select valuable data and insights to be
stored for further processing
– Quickly process and analyze perishable
data, and take timely action
 Value Statement
– Significantly reduced processing time
and cost – process and then store
what’s valuable
– React in real-time to capture
opportunities before they expire
Streaming Data
Sources
Streams Computing
ACTION
 Customer examples
– Ufone – Telco Call Detail Record (CDR)
analytics for customer churn prevention
 Get started with: InfoSphere Streams
18
© 2012 IBM Corporation
Entry Points are Accelerated by Products Within the
Big Data Platform
1 – Unlock Big Data
IBM Vivisimo
Analytic Applications
BI /
Exploration / Functional Industry Predictive Content
BI /
Reporting Visualization
App
App
Analytics Analytics
Reportin
g
IBM Big Data Platform
2 – Analyze Raw Data
Visualization
& Discovery
Application
Development
Systems
Management
3 – Simplify your
warehouse
IBM Warehouse
Solutions
InfoSphere
BigInsights
Accelerators
Hadoop
System
Stream
Computing
Data
Warehouse
5 – Analyze Streaming
Data
4 – Reduce costs with
Hadoop
InfoSphere Streams
InfoSphere
BigInsights
19
Information Integration & Governance
© 2012 IBM Corporation
Big Data Platform Video/Imagery Analytics
Real-time Events
Cognos/i2/BigSheets/Browser Visualization
Tracking and Linking
(Actionable Intelligence) 1
3 Historical View
Broadcast Video
2
User-Generated
Content Sites
Visual Semantic Classification
Machine Learning
Transport
System S Data Fabric
Operating System
X86
Box
X86
Blade
FPGA
Blade
X86
Blade
Cell
Blade
InfoSphere BigInsights
Video Blogs
21
4 Bootstrap and Enrich
Real-Time Video Analytics
Offline Video Analytics
© 2012 IBM Corporation
Automatic Semantic Classification of Virat Data
http://www.viratdata.org/
22
http://ibm64c.watson.ibm.com/imars/virat/
© 2012 IBM Corporation
The Grand Challenge: Analyze a Large Volume and
Variety of Streaming and Static Data to Produce
Actionable Intelligence
Social Networks
Patterns of Life and
Behavior Modeling
Complex Event
Processing and Intercorrelated effects to
other aspects of ABA
Open Source News
Video
Find the relevant dots,
connect them,
tell me what I don’t know,
keep it up to date.
Entity Relationships
and Contextual
Relevance
System of Reference:
Social, Political Weather,
etc., influences and
constraints on
Observation Space
Activity Detection
and Tracking
Historical Data
Cell Phone
Anomaly Detection
23
Predictive Modeling
and Cognitive
Awareness
© 2012 IBM Corporation
The IBM Big Data Platform Enables Complex ABA Architectures!
Volumes of raw data (structured
and unstructured) in file systems
(often highly distributed)
Data
Warehouse
Real-Time Analytics
Event Detection
Situational Awareness
InfoSphere
BigInsights
InfoSphere Identity Insights
Global Name Recognition
Operational
Data Store
InfoSphere
Data Mining, Data Exploration,
and Predictive Analytics
InfoSphere
Information
Server
Relationship
Resolution
Entity Analytics
Sensemaking (Future)
IBM Confidential
InfoSphere
Streams
Real-time streaming
data (structured and
unstructured)
24
Text Analytics & Natural
Language Processing
Traditional data sources
(ERP, CRM, databases, etc.)
© 2012 IBM Corporation
SMARC Big Data Solution Architecture
InfoSphere Streams
Social
Media
Data
Social Media
Data
Data Ingest
& prep.
Text
Analytics
InfoSphere BigInsights
Text Analytics:
Timely Insights
Entity
Analytics and
Integration
Online Flow: Data-in-motion analysis
Entity
Analytics:
Profile
Resolution
Comprehensive
Social Media
Customer
profiles
Predictive
Analytics:
Action
Determination
Predictive
Analytics
Timely
Decisions
Customer
Models
Offline Flow: Data-at-rest analysis
 Large-scale data-at-rest analysis using InfoSphere BigInsights
 Large-scale data-in-motion analysis using InfoSphere Streams
 Advanced text analysis, entity integration, and predictive modeling using common analytics
infrastructure across Streams and BigInsights
25
© 2012 IBM Corporation
Flight Data Solution
Cognos Visualization and Intelligence
High Performance
Analysis
Data Mining
ARCINC
717 Flight Data
?
Pilot unstructured
Data (e.g. Emails,
Text Files)
Realtime
Flight
Data
Monitoring
26
Text Analytics
Data Archival (PB+)
Analytics
Transport
System S Data Fabric
Operating System
X86
Box
X86
Blade
FPGA
Blade
X86
Blade
Cell
Blade
InfoSphere BigInsights
© 2012 IBM Corporation
THINK
27
27
© 2012 IBM Corporation