Transcript Spring XD

Introducing Spring XD
Mark Pollack, Sr. Software Engineer, Pivotal
© 2014 Pivotal
Spring XD
XD = eXtreme Data
2 Spring XD
What is a Big Data Application?
4 Spring XD
Big Data Architecture
XD>
REALTIME
Spring
XD
VIEWS
Stream
FILES
Processing
Spring
Analytics
SOCIAL
Ingest
MASTER
BOOT
DATASET
Spring
SENSORS
Workflow
BOOT
Orchestration
MOBILE
Export
Spring
Predictive
BOOT
Modeling
BATCH
VIEWS
5 Spring XD
SPEED
XD>
Lambda Architecture
LAYER
REALTIME
Spring
XD
VIEWS
Stream
FILES
SERVING
LAYER
Processing
Spring
Analytics
MASTER
Spring
DATASET
SOCIAL
Ingest
SENSORS
BOOT
XD
Spring
Workflow
BOOT
Orchestration
MOBILE
Export
BATCH
Spring
Predictive
LAYER
BOOT
Modeling
BATCH
VIEWS
6
SPEED
XD>
GemFire XD
LAYER
REALTIME
Spring
XD
VIEWS
SERVING
Stream
FILES
LAYER
Processing
Spring
Analytics
MASTER
Spring
DATASET
SOCIAL
Ingest
SENSORS
BOOT
XD
Spring
Workflow
BOOT
Orchestration
GemFire XD
MOBILE
Export
BATCH
Spring
Predictive
LAYER
BOOT
Modeling
BATCH
VIEWS
7
Spring IO Platform
8
Spring XD 10,000 ft view
FILES
9 Spring XD
SENSORS
SOCIAL
MOBILE
Streams
HTTP
Tail
File
Mail
Twitter
Gemfire
Syslog
TCP
UDP
JMS
RabbitMQ
MQTT
Trigger
Reactor TCP/UDP
10 Spring XD
Filter
Transformer
Object-to-JSON
JSON-to-Tuple
Splitter
Aggregator
HTTP Client
Groovy Scripts
Java Code
JPMML Evaluator
File
HDFS
JDBC
TCP
Log
Mail
RabbitMQ
Gemfire
Splunk
MQTT
Dynamic Router
Counters
Streams
How can we make this easier?
http | filter | file
11 Spring XD
Taps
 “Listen” to data on another stream
12 Spring XD
Analytics
 Counters and Gauges
• Simple & Field Value Counter
• How many tweets for #java
• Aggregate Counter
• How many tweets for #java in the
week/day/hour
• Gauge & Rich Gauge
• How many requests per minute?
 Abstract API. Implemented in
• In-Memory
• Redis
13 Spring XD
 Predictive Models
• Is this transaction fraudulent?
 Based on JPMML Evaluator
• Wide range of model types
 Interoperable with R, Rattle,
KNIME, RapidMiner
Jobs
CSV to JDBC
FTP to HDFS
JDBC to HDFS
HDFS to JDBC
HDFS to MongoDB
14 Spring XD
Spring XD Runtime
XD Shell
HTTP POST /streams/aStream “M1 | M2”
XD Admin
XD Admin
(leader)
XD Admin
ZooKeeper
XD Container
Data Transport
15 Spring XD
XD Container
Container State
Spring XD Runtime
XD Shell
HTTP POST /streams/aStream “M1 | M2”
XD Admin
XD Admin
(leader)
XD Admin
ZooKeeper
Spring App Context
XD Container
M1
Data Transport
16 Spring XD
XD Container
Container State
Spring XD Runtime
XD Shell
HTTP POST /streams/aStream “M1 | M2”
XD Admin
XD Admin
(leader)
XD Admin
ZooKeeper
Spring App Context
Data Transport
17 Spring XD
XD Container
XD Container
M1
M2
Container State
Predictive Models
18 Spring XD
Concepts
 Model
• Parameterized algorithm
 Model Building
• Derive a parameterized algorithm from the data
• Slow process. Done offline, as a batch process, due to amount of data involved
 Model Scoring
• Use the model to predict new information
• Fast process. Can be done as part of stream processing
19 Spring XD
PMML
 Predictive Model Markup Language
 XML interchange format for analytical models
 From the Data Mining Group http://www.dmg.org
 Processing + models
 Supported by statistics and data minig tools
• R/Rattle, SAS Enterprise Miner, SPSS, Weka
 Java Evaluator API
• JPMML-Evaluator project
• Provides model scoring
20 Spring XD
Distributed, Fault Tolerant Runtime
21 Spring XD
Spring XD – Runtime – Fault Tolerance
XD Shell
HTTP POST /streams/aStream “M1 | M2”
XD Admin
XD Admin
(leader)
XD Admin
ZooKeeper
Spring App Context
Data Transport
22 Spring XD
XD Container
XD Container
M1
M2
Container State
Spring XD – Runtime – Fault Tolerance
XD Shell
HTTP POST /streams/aStream “M1 | M2”
XD Admin
XD Admin
(leader)
XD Admin
Container State
ZooKeeper
XD Container
M2
Data Transport
23 Spring XD
Spring XD – Runtime – Fault Tolerance
XD Shell
HTTP POST /streams/aStream “M1 | M2”
XD Admin
XD Admin
(leader)
XD Admin
Container State
ZooKeeper
XD Container
M1
M2
Data Transport
24 Spring XD
Spring XD – Runtime – Fault Tolerance
XD Shell
XD Admin
(leader)
XD Admin
Container State
ZooKeeper
XD Container
M1
M2
Data Transport
25 Spring XD
Spring XD – Runtime – Fault Tolerance
XD Shell
XD Admin
(leader)
XD Admin
Container State
ZooKeeper
XD Container
XD Container
M1
M2
Data Transport
26 Spring XD
Spring XD – Runtime – Fault Tolerance
XD Shell
XD Admin
(leader)
XD Admin
Container State
ZooKeeper
XD Container
XD Container
XD Container
M1
M2
Data Transport
27 Spring XD
Spring XD – Runtime – Fault Tolerance
XD Shell
XD Admin
(leader)
XD Admin
XD Admin
Container State
ZooKeeper
XD Container
XD Container
XD Container
M1
M2
Data Transport
28 Spring XD
Spring XD – Runtime – Fault Tolerance
XD Shell
HTTP POST /streams/aStream “M3| M4”
XD Admin
(leader)
XD Admin
XD Admin
Container State
ZooKeeper
XD Container
XD Container
XD Container
M1
M3
M4
M2
Data Transport
29 Spring XD