PreDatA -- Preparotary Data Analytics on Peta
Download
Report
Transcript PreDatA -- Preparotary Data Analytics on Peta
1
PreDatA -- Preparatory Data
Analytics on Peta-Scale Machines
Fang Zheng
Hasan Abbasi
Jianting Cao
Jai Dayal
Jay Lofstead
Karsten Schwan
Matthew Wolf
CERCS Center
Georgia Tech
Qing Liu
Scott Klasky
Norbert Podhorszki
Oak Ridge National Laboratory
Ciprian Docan
Manish Parashar
Rutgers University
2
Background
“Big Data” problem for Peta-scale scientific
applications
Scientists desire:
Faster I/O
Faster data analysis
Application
Current Runs
(K cores)
Length of 1 “short”
simulation (hour)
Checkpoint
Data Size (TB)
Total Analysis
Data Size (TB)
GTC
140
100
16
56
GTS
40
72
2
0.6
S3D
90
120
1
90
XGC-1
220
80
8
1
3
Preparatory Data Analytics
Simulation output data needs to be prepared/preanalyzed:
Indexing, annotation, reduction, sorting, layout re-organization,
etc. to speedup future analysis and visualization
Latent data characterization for validation and monitoring
Preparatory data analytics can be critical for end-to-end
performance of computational science discoveries
Needle hasn’t grown as fast as the haystack!
Big Data
4
Problem
How to do preparatory data analytics?
Scalable
Efficient
Conventional Approaches:
In-compute-node
S
CN
F
CN
S
CN
F
…
S
CN
F
vs.
S
CN
F
Offline
S
CN
S
CN
…
Compute
Node
S
Simulation
F
Pre-analytics
Storage
Storage
S
CN
S
CN
CN
F
CN
F
5
PreDatA Middleware
S
CN
S
CN
S
…
CN
CN
CN
F
F
Storage
S
CN
Simulation
Staging Area
6
PreDatA Architecture
Asynchronous data movement with Datatap/EVPath
Pluggable pre-data analytics
User-defined operations
Higher-level Data Services
Integrated operations, separated from application codes with ADIOS
Compute node
Application
ADIOS
Staging node
Data Operation
High Level Data Service
High-level Abstraction
Data Operation
Buffer Management
Task Execution
Data Extraction
Data Movement
Data Shuffling
7
Driver Applications
GTC (Gyrokinetic Toroidal Code)
Output: 16384 cores outputs 260GB / 120 seconds
Pre-analytics:
BP file
sorted array
BP writer
Sort
Bitmap
Indexing
Particle array
Index file
Histogram
Plotter
2D Histogram
Plotter
8
Driver Applications (Cont.)
GTC@JaguarPF
Performance & Cost
Improvement (%)
6
Performance Improvement
CPU Seconds
5
4
3
98 CPU hours saved
in a 30min run
2
1
0
512
1024
2048
4096
8192
16384
1,716,960 CPU hours
saved in a year!
Number of Compute Cores
CPU Seconds = Total Simulation Time x Total Number of Cores Used
1.2~3% improvement in cost (CPU seconds)
9
Driver Applications (Cont.)
Pixie3D (3D MHD code)
Output: 16384 cores, 32 GB / 100 seconds
3D domain decomposition
Pre-analytics: diagnostics + layout re-organization
Output Data
Diagnostics
Particle Diag.
Toroidal flux Diag.
…
Momentum Diag.
Velocity divergence Diag.
Energy Diag.
Growth rate Diag.
Current Diag.
Maximum velocity Diag.
Visualization
BP file
Layout Re-organization
BP writer
10x read performance improvement
through layout re-organization
0
Current Work
Programming Interface/Runtime system to enable In-situ Workflow
in Staging Area
A collection of analysis operations organized as workflow
Use ADIOS as coupling interface
Treat analysis operations as black box
Runtime system:
Workflow scheduling
Data movement
Layout re-distribution
Fault tolerance
Integration with Deep analysis tools (Hadoop, Paraview/Visit)
Work with real-world applications
Pixie3D, GTC, GTS, LAMMPS, S3D
1
Thank you!