Building Big Data Operational Intelligence platform

Download Report

Transcript Building Big Data Operational Intelligence platform

BUILDING BIG DATA OPERATIONAL
INTELLIGENCE PLATFORM WITH
APACHE SPARK
Eric Carr (VP Core Systems Group)
Spark Summit 2014
Guavus Confidential – Do Not Distribute
© 2014 Guavus, Inc. All rights reserved.
1
Communication Service Providers &
Big Data Analytics
Market & Technology Imperatives
Guavus Confidential – Do Not Distribute
© 2014 Guavus, Inc. All rights reserved.
Industry Context for Communication Service
Providers (CSPs)
Big Data is at the heart of two core strategies for CSPs:
• Improve current revenue sources through greater operational efficiencies
• Create new revenue source with the 4th wave
Source: Chetan Consulting
Guavus Confidential – Do Not Distribute
© 2014 Guavus, Inc. All rights reserved.
3
CSPs - Industry Value Chain Shift
Source: asymco
Guavus Confidential – Do Not Distribute
© 2014 Guavus, Inc. All rights reserved.
4
CSPs - A High Bar for Operational Intelligence
Exponential
Data Growth
Distributed
Network
Data
Diversity
Timely
Insights
High
Availability
Petabytes of
data per day;
billions of
records per day
Dozens of
locations for
capturing data,
scattered
around a vast
territory
Hundreds of
sources from
different
equipment types
and vendors
Automated
reactions
triggered in
seconds
No data loss;
no down time
CSPs require solutions engineered to meet very stringent requirements
Guavus Confidential – Do Not Distribute
© 2014 Guavus, Inc. All rights reserved.
5
Different Platforms target Different Questions
Type of Data
Data Streams
Stream Analytics +
Operational Intelligence
Decisioning
Discovery
Type of Analysis
Data Lake
Data Warehouses +
Business
Intelligence
Guavus Confidential – Do Not Distribute
Search Centric
© 2014 Guavus, Inc. All rights reserved.
6
Streaming Analytics & Machine Learning
to Action
Guavus Confidential – Do Not Distribute
© 2014 Guavus, Inc. All rights reserved.
Driving Streaming Analytics to Action
Network Flow
Analytics
NetFlow, Routing
Planning
Usage
Awareness
Operational
Interactions
Real-Time
Actions
Layer 7 Visibility
Policy Profile Triggers
Small Cell / RAN /
Backhaul Differentiation
Operational Intelligence
Care & Experience
Mgmt
SON / SDN / Virtualization
Content &
CDN Analytics
New Service Creation &
Monetization
Content Optimization
Guavus Confidential – Do Not Distribute
© 2014 Guavus, Inc. All rights reserved.
8
Reflex 1.0 Pipeline – Timely Cube Reporting
Collector
Hadoop
Compute
Analytics
Store
RAM
Cache
UI
Reflex 1.5 Pipeline – Spark / Yarn
Collector
Spark /
YARN
Analytics
Store
RAM
Cache
UI
Reflex 2.0 Pipeline – Spark Streaming core
Streams
ML
Store
Cache
SQL
Msg Queue
Collector
Spark / YARN / HDFS2
Guavus Confidential – Do Not Distribute
© 2014 Guavus, Inc. All rights reserved.
UI
Stream Engine - Operational Intelligence Analytics
Stream Engine
Data Streams
Contextual Data
Stores
Feature Engine
Data fusion
Metrics creation
Variables selection
Causality inference
Anomaly Engine
Multivariate analysis
Outlier Detection
RCA Engine
Statistical learning
Clustering
Pattern identification
Item set mining
Targeted
Actions
Optimized algorithmic support for common stream data processing & fusions
•
•
•
Detect unusual events occurring in stream(s) of data. Once detected, isolate root cause
Anomaly / outlier detection, commonalities / root cause forensics, prediction / forecasting,
actions / alerts / notifications
Record enrichment capabilities – e.g. URL categorization, device id, etc.
Guavus Confidential – Do Not Distribute
© 2014 Guavus, Inc. All rights reserved.
10
TIME DEPENDENCY
Example - State of the art causality analysis
Stochastic
processes
(Time dependent)
Random
variables
(Time independent)
Granger
Causality
Transfer
Entropy
Correlation
Maximal
Information
Coefficient
Linear
Linear + Non linear
RELATIONSHIP TYPE
Ranking of metrics: 1) Transfer Entropy
2) Maximal Information Coefficient, Granger Causality
3) Correlation
Guavus Confidential – Do Not Distribute
© 2014 Guavus, Inc. All rights reserved.
11
Example - Causality Techniques
Transfer entropy from a process X
to another process Y is the amount of
uncertainty reduced in future values
of Y by knowing the past values of X
given past values of Y.
Maximal Information Coefficient is
a measure of the strength between
two random variables based on
mutual information. Methodology for
empirical estimation based on
maximizing the mutual information
over a set of grids
PROS
Model free, information theory based
approach
Most generic estimation of causality
between two random processes
CONS
Challenging joint probability estimation
Large amount of data needed for
calculation
Choice of time lags
Guavus Confidential – Do Not Distribute
PROS
Model free, information theory based
approach
Can find linear and non-linear
relationships
Estimation possible with smaller dataset
CONS
No time information
© 2014 Guavus, Inc. All rights reserved.
12
Network Operations / Care Example
Identifying Commonalities, Anomalies, RCA
Anomaly
Detection
Event
Drivers
Event
Chaining
Root-Cause
Analysis
Guavus Confidential – Do Not Distribute
© 2014 Guavus, Inc. All rights reserved.
13
BinStream Details
Guavus Confidential – Do Not Distribute
© 2014 Guavus, Inc. All rights reserved.
Use Case
• Use IP Traffic Records to calculate Bandwidth Usage as a
Time Series (continuously …), can’t do that based on the
time the records are received by Spark.
– In general for any record which has a timestamp, it important to
analyze based on the time of event rather than the reception of the
event record.
Guavus Confidential – Do Not Distribute
© 2014 Guavus, Inc. All rights reserved.
15
Challenges With
• For one dataset. Make the time stamp part of the key.
• For continuously streamed data sets.
– You do not know if you have received all data for a particular time
slot. Caused by event delay, or event duration.
– An event could span multiple time slots. Caused by event
duration.
Guavus Confidential – Do Not Distribute
© 2014 Guavus, Inc. All rights reserved.
16
Data Processing – Timing (Map-Reduce world)
Events
Collector/
Adapter
BINS
Closed
Past
current
Future
Write on bin
interval
or size
Collector/
Writer
HDFS
Data Files
HDFS
Job Duration 4555mins
Delayed 2x bin interval(10mins)
to allow last bin to be closed
MR Jobs/
Cube Generator
Cubes
MR Job/
Cube exporter
STORE
Columnar
Storage
Bin Interval
5mins
kth hour
Available
For visualization
(k+1)th hour
Guavus Confidential – Do Not Distribute
Available
For visualization
(k+2)th hour
© 2014 Guavus, Inc. All rights reserved.
17
Proposed Binning & Proration Solution
Spark Clock
T1 T1’
T1’’
Bin
View
X-Axis is the spark clock.
Y-Axis(reversed) is the source clock. Events are
timestamped by this clock.
T2
Diagonal represents the time in both the clocks at a
particular instant.
T3
The bars represent events with
start time [white tip] and end time [black tip]
The bars x-value represents the receive time. Its length
indicates the duration of the event.
Source Clock
T4
Red area (in fact area under the diagonal) represents the
area the events cannot fall.
T5
Green boxes represent spark batches.
Current Batch - Current Event (part thereof)
Current Batch - 1 Batch Older Event (part thereof)
T6
Current Batch - 2 Batch Older Event (part thereof)
In some domain, the Bin View (i.e events or parts which
happened in that bin) is more important. e.g. network
bandwidth usage. So either one can wait until
and
report a complete bin albeit delayed. Or compute and
send updates.
T7
T8
T9
T10
Guavus Confidential – Do Not Distribute
© 2014 Guavus, Inc. All rights reserved.
18
Solution (cond.)
• The typical solutions are:
– For event delay: Wait (Buffer).
– For event duration: Prorate events across time slots.
• Introduce a concept of BinStream. An abstraction over the
Dstream, which needs a
– Function to extract the time fields from the records.
– Function to prorate the records.
Note that this can be trivially achieved by using ‘window’
functionality, by having the batch equal to the time series interval
and the window size equal to maximum possible delay.
Guavus Confidential – Do Not Distribute
© 2014 Guavus, Inc. All rights reserved.
19
Problems / Solutions
• window == wait & buffer.
– This has two issues.
A. Need memory for buffering.
B. Downstream needs to wait for the result (or any part of it)
• BinStream provides two additional options
– Gets rid of delay for getting partial results, by sending regular
latest snapshots for the old time slots. This does not solve the
memory and increases the processing load.
– If the client can handle partial results, i.e. if it can aggregate partial
results, it can get updates to the old bins. This reduces the
memory for the spark-streaming application.
Guavus Confidential – Do Not Distribute
© 2014 Guavus, Inc. All rights reserved.
20
Limitations.
• The number of time series slots for which the updates can
be generated is fixed, basically governed by the event
delay characteristics.
Guavus Confidential – Do Not Distribute
© 2014 Guavus, Inc. All rights reserved.
21
THANK YOU!
Guavus Confidential – Do Not Distribute
© 2014 Guavus, Inc. All rights reserved.