The Convergence of Cloud, Big Data and Mobile: What could happen in 5 years? Cloud Analytics Panel (IWCA Workshop) Judy Qiu Indiana University.

Download Report

Transcript The Convergence of Cloud, Big Data and Mobile: What could happen in 5 years? Cloud Analytics Panel (IWCA Workshop) Judy Qiu Indiana University.

The Convergence of Cloud, Big Data and
Mobile: What could happen in 5 years?
Cloud Analytics Panel (IWCA Workshop)
Judy Qiu
Indiana University
Outlines
•
Motivations
–
–
•
50 billion devices by 2020.
Academia and Industry need advanced analytics on the data they have already collected.
The role of Analytics in Cloud, Big Data and Mobile
–
–
A distributed runtime environment needs to integrate with community infrastructure which
supports interoperable, sustainable and high performance data analytics.
One solution is to converge Apache Big Data stack from industry with High Performance
Cyberinfrastructure into well-defined and implemented common building blocks, providing
sufficient richness in capabilities and productivity. HPC-ABDS has about 300 packages and
aims to provide them in a library form so that they can be reused by higher-level applications
and tuned for a specific domain problem, such as Machine Learning.
Big Data ABDS
HPC, Cluster
Orchestration
Crunch, Tez, Cloud Dataflow
Kepler, Pegasus
Libraries
Mllib/Mahout, R, Python
Matlab, Eclipse, Apps
High Level Programming
Pig, Hive, Drill
Domain-specific Languages
Platform as a Service App Engine, BlueMix, Elastic Beanstalk
Data, Information, Knowledge and Wisdom
XSEDE Software Stack
Languages
Java, Erlang, SQL, SparQL
Streaming
Parallel Runtime
Storm, Kafka, Kinesis
MapReduce
Fortran, C/C++
Coordination
Caching
Zookeeper
Memcached
Data Management
Data Transfer
Hbase, Neo4J, MySQL
Sqoop
iRODS
GridFTP
Scheduling
Yarn
Slurm
File Systems
HDFS, Object Stores
Formats
Thrift, Protobuf
Virtualization
Openstack
Docker, SR-IOV
Infrastructure
CLOUDS
SUPERCOMPUTERS
HPC-ABDS
Integrated
Software
MPI/OpenMP/OpenCL
Lustre
FITS, HDF
Comparison of current Data Analytics stack from Cloud and HPC infrastructure