Slides PDF - Spark Summit

Download Report

Transcript Slides PDF - Spark Summit

`
tuplejump
The data engineering platform
tuplejump
A startup with a vision to
simplify data engineering and
empower the next generation
of data powered miracles!
Rohit
Satya
Founder and CEO
Founder and CTO
What we do?
•
Tuplejump Platform provides ready to use, out of the box, all integrated end-to-end
data pipeline components to bring your idea to life fast!
•
Most startups spend a lot of time studying and integrating various OSS. We have
done this for you and assembled a system incorporating best of the breed systems.
•
Our service engineers can assist you or develop your PoCs to entire solutions in
record time.
The Data Pipeline
PREDICT
STORE
EXPLORE
COLLECT
TRANSFORM
VISUALIZE
OpsCenter
The Tuplejump Platform | COLLECT
Hydra
The tentacled framework to gather high volume and velocity data from
push (devices, page alerts, forms, etc) and pull (web scraping, blogs,
social networks, etc.) powered by Akka, reacting on demands to events
and streaming to Spark to batch process.
The Tuplejump Platform | TRANSFORM
Spark + Calliope
Using the friendly Spark API with added features to easily consume or
load data from and to Cassandra powered storage.
Transform structured and unstructured data and join other most simple
data sets using drag and drop.
Join delta transformations on real time feeds with existing data using
Spark streaming,
The Tuplejump Platform | STORE
DStore - Cassandra++
Cassandra, enriched with our custom components to provide an single
storage mechanism for Files, (un)structured data, generic data formats
like XML and JSON, etc.
Stargate
Stargate, a lucene powered indexing mechanism built right into C* to
allow for advanced indexing and searching of data
SnackFS
SnackFS provides an HDFS compatible fat driver distributed file system
over Cassandra.
The Tuplejump Platform | EXPLORE
Shark + Calliope
Shark Analytical engine shines in exploring structured and unstructured
data sets having large amounts of data .
With Calliope, you can have the most comprehensive reporting on data
from Cassandra in seconds and minutes not hours.
Using Stargate indexes you can filter a lot of data in Cassandra saving
those agonizing hours of batch jobs.
UberCube
Our patent pending Ubercube (™) technology is an distributed OLAP
cube engine designed from ground up for interactive exploration over
very large datasets. .
The Tuplejump Platform | PREDICT
MinerBot
Building on Spark's ML frameworl.
EA and ANN/DL frameworks to take ML to the next level.
Drag and drop Machine learning soon!
The Tuplejump Platform | VISUALIZE
Pissaro
A modern, game changing data frontend providing highly
interactive and reactive visualization frontend.
Not just reports!
The Tuplejump Platform | OpsCenter
OpsCenter
Deployment, monitoring and management framework built
specifically targeting deploying, maintaining and scaling our
platform without touching your server.
Click to cluster
One click deployment o take your application from
development to cluster.
BigData PaaS
Coming soon is a PaaS, so you focus on your idea and let us
worry about the rest.
Tuplejump Advantage
•
All the advantages of Spark + All the advantages of Cassandra + Much more!
•
Over 500x (much more in case of filtered data) faster than traditional Hadoop
solutions
•
Shark + C* provide for superfast ad hoc querying.
•
UberCube empowers sub-millisecond responses on very large cubes
•
MinerBot provides ready to use ML Algos, plus a possibility of much more complex
algos and mechanisms than just map reduce.
•
Ready to use, no integration required
•
Easy to develop, deploy, monitor and scale
Case Study I - IoT
Case Study I - IoT
•
Hydra was designed for IoT in first place. Supports MQTT for messaging from and to
devices/sensors and communication between devices.
•
Use message processing to raise alerts
•
Use batch processing for advanced data analytics
•
DStore provides a highly scalable write optimized distributed storage for events and
messages.
•
MinerBot powers anomaly detection and automation on event analysis and patterns
•
Build multidimensional analytics cube on the event features with UberCube
•
Visualize and understand the events in charts with Pissaro
Case Study II - Advertising
Ads
Case Study II - Advertising
•
Hydra empowers high volume/velocity data collection to gather page clicks, user
events, user behaviuor, etc.
•
Event Processing to trigger/handle RTB
•
MinerBot to optimize ad-user matching based on previous success/failure records
•
Pissaro to empower the Advertiser dashboard and reports
Lets talk!