PreDatA -- Preparotary Data Analytics on Peta

Download Report

Transcript PreDatA -- Preparotary Data Analytics on Peta

1
PreDatA -- Preparatory Data
Analytics on Peta-Scale Machines
Fang Zheng
Hasan Abbasi
Jianting Cao
Jai Dayal
Jay Lofstead
Karsten Schwan
Matthew Wolf
CERCS Center
Georgia Tech
Qing Liu
Scott Klasky
Norbert Podhorszki
Oak Ridge National Laboratory
Ciprian Docan
Manish Parashar
Rutgers University
2
Background
 “Big Data” problem for Peta-scale scientific
applications
 Scientists desire:
 Faster I/O
 Faster data analysis
Application
Current Runs
(K cores)
Length of 1 “short”
simulation (hour)
Checkpoint
Data Size (TB)
Total Analysis
Data Size (TB)
GTC
140
100
16
56
GTS
40
72
2
0.6
S3D
90
120
1
90
XGC-1
220
80
8
1
3
Preparatory Data Analytics
 Simulation output data needs to be prepared/preanalyzed:
 Indexing, annotation, reduction, sorting, layout re-organization,
etc. to speedup future analysis and visualization
 Latent data characterization for validation and monitoring
 Preparatory data analytics can be critical for end-to-end
performance of computational science discoveries
 Needle hasn’t grown as fast as the haystack!
Big Data
4
Problem
 How to do preparatory data analytics?
 Scalable
 Efficient
 Conventional Approaches:
In-compute-node
S
CN
F
CN
S
CN
F
…
S
CN
F
vs.
S
CN
F
Offline
S
CN
S
CN
…
Compute
Node
S
Simulation
F
Pre-analytics
Storage
Storage
S
CN
S
CN
CN
F
CN
F
5
PreDatA Middleware
S
CN
S
CN
S
…
CN
CN
CN
F
F
Storage
S
CN
Simulation
Staging Area
6
PreDatA Architecture





Asynchronous data movement with Datatap/EVPath
Pluggable pre-data analytics
User-defined operations
Higher-level Data Services
Integrated operations, separated from application codes with ADIOS
Compute node
Application
ADIOS
Staging node
Data Operation
High Level Data Service
High-level Abstraction
Data Operation
Buffer Management
Task Execution
Data Extraction
Data Movement
Data Shuffling
7
Driver Applications
 GTC (Gyrokinetic Toroidal Code)
 Output: 16384 cores outputs 260GB / 120 seconds
 Pre-analytics:
BP file
sorted array
BP writer
Sort
Bitmap
Indexing
Particle array
Index file
Histogram
Plotter
2D Histogram
Plotter
8
Driver Applications (Cont.)
 GTC@JaguarPF
 Performance & Cost
Improvement (%)
6
Performance Improvement
CPU Seconds
5
4
3
98 CPU hours saved
in a 30min run
2
1
0
512
1024
2048
4096
8192
16384
1,716,960 CPU hours
saved in a year!
Number of Compute Cores
CPU Seconds = Total Simulation Time x Total Number of Cores Used
1.2~3% improvement in cost (CPU seconds)
9
Driver Applications (Cont.)
 Pixie3D (3D MHD code)
 Output: 16384 cores, 32 GB / 100 seconds
 3D domain decomposition
 Pre-analytics: diagnostics + layout re-organization
Output Data
Diagnostics
Particle Diag.
Toroidal flux Diag.
…
Momentum Diag.
Velocity divergence Diag.
Energy Diag.
Growth rate Diag.
Current Diag.
Maximum velocity Diag.
Visualization
BP file
Layout Re-organization
BP writer
10x read performance improvement
through layout re-organization
0
Current Work
 Programming Interface/Runtime system to enable In-situ Workflow
in Staging Area
 A collection of analysis operations organized as workflow
 Use ADIOS as coupling interface
 Treat analysis operations as black box
 Runtime system:




Workflow scheduling
Data movement
Layout re-distribution
Fault tolerance
 Integration with Deep analysis tools (Hadoop, Paraview/Visit)
 Work with real-world applications
 Pixie3D, GTC, GTS, LAMMPS, S3D
1
Thank you!