Switching to High Gear

Download Report

Transcript Switching to High Gear

Switching to High Gear
Opportunities for Grand-scale
Real-time Parallel Simulations
Kalyan S. Perumalla, Ph.D.
Senior Research Staff Member
Oak Ridge National Laboratory
Adjunct Professor
Georgia Institute of Technology
IEEE DS-RT, Singapore
Oct 26, 2009
Main Theme
“Think Big…
Really Big”
Computational Power…unprecedented potential…exploit
Simulation Scale…stretch imagination…new scopes
2
Managed by UT-Battelle
for the U.S. Department of Energy
Confluence of Opportunities, Needs
Yes High-End Computing
Power
Yes
Large-scale
Scientific
Questions
3
Managed by UT-Battelle
for the U.S. Department of Energy
???
Scalable
Simulation
Methods
Parallel Computing Power: It’s Coming
High-end computing…
Coming soon to a center near you!
Access to 1000’s of cores…
for every parallel simulation researcher…
in just 2-3 years from now
4
Managed by UT-Battelle
for the U.S. Department of Energy
Evidence of Growth in 103-Core
5
Managed by UT-Battelle
for the U.S. Department of Energy
Now, all Top 500 are 103-core or More!
6
Managed by UT-Battelle
for the U.S. Department of Energy
Switching Gears
7
Gear
Decade Processors
1
1980
101
2
1990
102
3
2000
103
4
2010
104
5
2010
105 -106
R
2020
Managed by UT-Battelle
for the U.S. Department of Energy
Potential Areas for Discrete Event
Execution on 105-106 Scale
• Cyber infrastructure simulations
– Internet protocols, peer-to-peer designs, … Initial models
scaling to
• Epidemiological simulations
103-104 cores
– Disease spread models, mitigation strategies, …
• Social dynamics simulations
– Pre- and post-operations campaigns, foreign policy, …
• Vehicular mobility simulations
– Regional- or nation-scale, …
• Agent-based simulations
– Behavioral exploration, complex compositions, …
• Sensor network simulations
– Wide area monitoring, situational awareness, …
• Organization simulations
– Command and control, business processes, …
• Logistics simulations
– Supply chain processes, contingency analyses, …
Business
Sensitive
8 Managed by
UT-Battelle
for the U.S. Department of Energy
If only we look harder…
• Many nation-scale and world-scale questions are
becoming relevant
• New methods and methodologies are waiting to be
discovered
9
Managed by UT-Battelle
for the U.S. Department of Energy
Slippery Slopes
Starting point for an
experimental study
Gory detail
10
Managed by UT-Battelle
for the U.S. Department of Energy
Abstractions
10
How do we abstract immense complexity?
Answer: Very difficult until we experiment with the system at scale
11
Managed by UT-Battelle
for the U.S. Department of Energy
What do we mean by Gory Detail?
Cyber Security Example
• Network at large
– Topologies, bandwidths, latencies, link types, MAC protocols, TCP/IP, BGP, …
• Core systems
– Routers, databases, service level agreements, inter-AS relationships, …
• End systems
– Processor traits, disk traits, OS instances, daemons, services, S/W bugs, …
• “Heavy” applications and traffic
– Video (YouTube, …), VOIP, live streams; foreground, background
• Behavioral infusion
– Social nets (topologies, dynamics, agencies, advertisers), peer-to-peer
12
12
Managed by UT-Battelle
for the U.S. Department of Energy
Example: Epidemiology or Computer Worm Propagation
• Typical dynamics model
– Multiple variants exist, but
qualitatively similar
• Excellent fit, but post-facto (!)
– Plot collected data
dI
  I (S  I )
dt
• Difficult as predictive model
– Great amount of detail buried in α
• Gory detail needed for better
predictive power
– Interaction topology
– Resource limitations
13
Managed by UT-Battelle
for the U.S. Department of Energy
13
Slippery Slope: Cost and Time
Cost to realize
experimentation
capability
Time to reach
experimentation
capability
14
Managed by UT-Battelle
for the U.S. Department of Energy
14
Our Research Organization in Discrete
Event Runtimes and Applications
•Customization
•Scenario Generation
•Experimentation
Evacuation
Decision
Support
•Visualization
•Core Models
Transportation
Network
Simulations
Automated
Detection/
Tracking
Design &
Analysis
Comm.
Effects
Design &
Analysis
Sensor
Network
Simulations
…
…
•Feasibility Demonstration
•Extensible Frameworks
•Novel Modeling Methods
•Trade-offs
•Memory-Computation
•Speed-Accuracy
•“Enabling”
•Scalability
Vehicular
Simulations
Communicati
on Network
Simulations
Logistics
Simulations
Enterprise
Simulations
Social
Network
Simulations
…
Asynchronous
Scientific
Simulations
Parallel/Distributed Discrete Event Simulation Engines
•Efficiency
•Correctness
Model Execution
Synchronization
Data Integration
Interoperability
…
Multi-Scale
•Robustness
•Usability
•Extensibility
•Integration
Business
Sensitive
15 Managed by
UT-Battelle
for the U.S. Department of Energy
Super
computers
Clusters
MultiCores
GPGPUs
…
PDAs
A Few of Our Current Areas, Projects
• State-level mobility
– Multi-million intersections and links
• Epidemiological analyses
– Detailed, billion-entity dynamics
• Wireless radio signal estimation
– Multi-million-cell cluttered terrains
• Supercomputer design
– Designing next architectures by
simulating on current
• Internet security, protocol design
– As-is instantiation of nodes and
routers
• Populace’s cognitive behaviors
– Large population cognition with
connectionist networks
16
Managed by UT-Battelle
for the U.S. Department of Energy
• GARFIELD-EVAC
– 106-107-link scenarios of FL, LA, …
• RCREDIF
– 109-individual infection scenarios
• RCTLM
– 3-D 107-cells simulated on 104 cores
• µΠ
– Performance prediction of 106-core
MPI programs on 104 cores
• NetWarp
– Hi-fi Internet test-bed
Scalable Experimentation for Cyber Security
NetWarp is our novel test-bed
technology for highly scalable,
detailed, rapid experimentation of
cyber security and cyber
infrastructures
17
Managed by UT-Battelle
for the U.S. Department of Energy
Cyber Experimentation Approaches
Fully Virtualized
System
Fidelity
Hardware
Testbed
Emulation
System
NetWarp
Packet-level Simulation
Sequential
Real-Time
or Faster
102
Managed by UT-Battelle
for the U.S. Department of Energy
Possible
Parallel
Mixed Abstraction
Simulation
Aggregate
Models
103
104
105
106
Scalability
18
As Fast As
107
108
NetWarp Architecture
19
19
Managed by UT-Battelle
for the U.S. Department of Energy
Business sensitive
DOE-Sponsored Institute for Advanced
Architectures and Algorithms
Need highly scalable simulation methods and
methodologies to simulate next generation architectures
and algorithms on future supercomputing platforms…
“…catalyst for the co-design
and development of
architectures, algorithms,
and applications to create
synergy in their respective
evolutions…”
20
Managed by UT-Battelle
for the U.S. Department of Energy
μπ (MUPI) Performance Investigation
System
• μπ = micro parallel
performance investigator
– Performance prediction for MPI,
Portals and other parallel
applications
– Actual application code
executed on the real hardware
– Platform is simulated at large
virtual scale
– Timing customized by userdefined machine
21
Managed by UT-Battelle
for the U.S. Department of Energy
• Scale is key differentiator
– Target: 150,000 virtual cores
– E.g., 150,000 virtual MPI ranks in
simulated scenario
• Based on µsik (micro
simulator kernel)
– Scalable PDES engine
– TCP- or MPI-connected
simulation kernels
Example: MPI application over μπ
 Modify MPI include and recompile
– Change #include <mpi.h> to
#include <mupi.h>
 Relink to mupi library
– Instead of –lmpi, use -lmupi
 Run the modified MPI application
(a μπ simulation)
– mpirun –np 4 test -nvp 32
runs test with 32 virtual MPI ranks
simulation uses 4 real cores
 μπ itself uses multiple real cores to
run in parallel
22
Managed by UT-Battelle
for the U.S. Department of Energy
Epidemic Disease Propagation
• Can be an extremely challenging simulation problem
• Asymptotic behaviors are relatively well understood
• Transients are poorly understood, hard to predict well
• Defined and characterized by many interlinked processes
• “Gory Detail” necessary
23
Managed by UT-Battelle
for the U.S. Department of Energy
Epidemic Disease Propagation
• Reaction-diffusion processes
– Probability based on interaction
times, vulnerabilities, thresholds
– Short- and long-distance mobility,
sojourn times
– Probabilistic state transitions,
infections, recoveries
• Supercomputing’08 model reported
scalability only to 400 cores
– Synchronization costs become
prohibitive
– Synchronous execution our prime
suspect
• Our discrete event execution relieves
synchronization costs
– Scales to tens of thousands of cores
– Up to 1 billion affected entities
24
Managed by UT-Battelle
for the U.S. Department of Energy
Image from psc.edu
PDES Scaling Neeeds
• Anticipate impending
opportunities in multiple
application areas of grandscale PDES scenarios
• Prepare to capitalize on
increasing computational
power (300K+ cores)
• Aim to achieve computational
capability to enable new
PDES-based scientific
solutions
25
Managed by UT-Battelle
for the U.S. Department of Energy
Jaguar Petascale System [Cray XT5]
26
Managed by UT-Battelle
for the U.S. Department of Energy
Jaguar: NCCS’ Cray XT5*
* Data and images from http://nccs.gov
27
Managed by UT-Battelle
for the U.S. Department of Energy
Technological Upgrade: 105-Scalable
PDES Frameworks
To realize scale with any of the PDES
models and applications, we need the
core frameworks to scale
28
Managed by UT-Battelle
for the U.S. Department of Energy
Recent Attempts at 105-Core PDES
Frameworks
Degradation beyond 64K cores observed by us as well as others
Bauer et al (Jun’09) on Blue Gene P (Argonne)
Perumalla & Tipparaju (Jan’09) on Cray XT5 (ORNL)
Degradation observed in more than one metric (rollback efficiency, speedup)
Business
Sensitive
29 Managed by
UT-Battelle
for the U.S. Department of Energy
Implications to Discrete Event Execution
on High Performance Computing
Platforms
Translates to
Discrete event
execution style is
vastly different
from most
traditional
supercomputingbased simulations
•
•
•
•
•
•
•
•
Different optimizations
Different communication patterns
Different latency needs
Different bandwidth needs
Different buffering requirements
Different scheduling needs
Different synchronization requirements
Different flow control schemes
Overall, needs a different RTI
• Qualitatively different runtime
infrastructure, designed, built,
optimized and tuned for discrete event
applications
Business
Sensitive
30 Managed by
UT-Battelle
for the U.S. Department of Energy
Some of our Objectives
Fill technological gap by achieving the highest scaling
capabilities of parallel discrete event simulations
• Scale from 104 cores (current) to 105-106 cores (new)
• Realize very large-scale scenarios (multi-billion entity)
• Cyber infrastructures, social computing, epidemiology, logistics
• Aid projects in simulation-based design of future generation supercomputers
Ultimately, enable formulation of grand-scale solutions with nontraditional supercomputing simulations
31
Managed by UT-Battelle
for the U.S. Department of Energy
Electro-magnetic (EM) Wave Propagation
• Predict receiver signal
• Account for reflectivity,
transmitivity, multi-path
effects
• Power level (voltage) modeled
per face of grid cell
32
Managed by UT-Battelle
for the U.S. Department of Energy
PHOLD Benchmark
• Relatively fine grained
– ~5 microseconds computation per event
• 10 “juggler” entities per processor core
– Analogous to grid cells, road intersections or such
• Total of 1000 “juggling balls” per core
– Analogous to state updates exchanged among cells
• Upon receipt of a ball event, a juggler throws it back random
(exponential) time into the future to a random juggler
– 1 every 1000 juggling exchanges are constrained to be intra-core, rest
inter-core
33
Managed by UT-Battelle
for the U.S. Department of Energy
Radio Propagation: Speedup on Cray
XT4
34
Managed by UT-Battelle
for the U.S. Department of Energy
Radio Propagation: Speedup on Cray
XT4
35
Managed by UT-Battelle
for the U.S. Department of Energy
Radio Propagation: Runtime Costs on Cray
XT4
36
Managed by UT-Battelle
for the U.S. Department of Energy
Epidemic Propagation: Performance on Cray
XT5
37
Managed by UT-Battelle
for the U.S. Department of Energy
Epidemic Propagation – Parallel Run
time on Cray XT5
Runtime (seconds)
750
700
650
600
550
500
0
16384
32768
49152
No. of Cores
Optimistic
38
Managed by UT-Battelle
for the U.S. Department of Energy
Conservative
65536
PHOLD: Performance on Cray XT5
39
Managed by UT-Battelle
for the U.S. Department of Energy
Scalability – Observations
• Scalability problems with current approaches
not evident previously
– Fine until 104 cores, but poor thereafter
• Even with discrete event, implementation is key
– Semi-asynchronous execution scales poorly
– Fully asynchronous execution needed
40
Managed by UT-Battelle
for the U.S. Department of Energy
Algorithm Design and Development for
Scalable Discrete Event Execution
Design algorithms optimized for Cray XT5, Blue Gene P/Q
• Design new virtual-time synchronization algorithm
• Design novel rollback control schemes
• Design discrete event-specific flow control
LBTS computation for band d
Trial 0 … Trial r-1 Trial r
Band d
Band d+2
Band d+1
Δ>0
Ends when Δ==0
New band
started
d,r
Δ>0
d,r+1
Δ==0
d+1,0
Current synchronization algorithm
41
Managed by UT-Battelle
for the U.S. Department of Energy
Additional Important Algorithmic
Aspects
• Novel separation of event
communication from synchronization
– Prioritization support in our
communication layer
– “QoS” support for fast synchronization
• Efficient flow control
• Novel timestamp-aware buffering
– Highly unstructured interprocessor communication
– Exploit near vs. far timestamps
• Optimized rollback dynamics
– Coordinated with virtual-time
synchronization
CoreD
– Stability and throttling
mechanisms
transient
message
CoreC
CoreB
CoreA
Past
Future
wallclock time
Example of the “transient event” problem
42
Managed by UT-Battelle
for the U.S. Department of Energy
– Cancel back protocols
Data Integration Interface
Development
Application Programming Interface (API) to
– Incorporate streaming input into discrete
event execution
– Achieve runtime efficiency as an important
consideration
LP=Logical Process with its own timeline
LP
LP
LP
LP
LP
LP
LP
Processor
Core
…
Processor
Core
LP
LP
Simulator
Process
Machine
LP
LP
LP
Simulator
Process
LP
LP
LP
LP
Simulator
Process
…
Processor
Core
Simulator
Process
…
Processor
Core
Machine
Interconnection Network(s)
Novel concepts supporting latency-hiding
– To permit maximal concurrency without violating timeordering between live simulation and real-time inputs
– Reuse optimistic synchronization for latency-hiding for
unpredictable data input from external sources
43
Managed by UT-Battelle
for the U.S. Department of Energy
Software Implementation
Micro-Kernel
Pc
LP
LP
Commitable
LP
LP
Pp
EPTS
EPTSQQ
– Primarily in C/C++
Processable
– Building on current software (scales to
– Optimized for performance on Cray XT5 and
Blue Gene P
KP
KP
Pe
EETS
EETSQQ
104)
PEL →t
LP
LP
ECTS
ECTSQQ
KP
KP
FEL
→t
Runtime algorithms and data integration
interfaces realized in software
Future Event List
Proc’d Event List
Local Virtual Time
User LPs
LP
LP
LP
LP
KP
KP
When update kernel Q’s?
•New LP added or deleted
•LP executes an event
•LP receives an event
KP
KP
Kernel LPs
Emittable
Our current scalable data structures
Communication to be structured flexibly
– Use MPI or Portals or combination
µsik
– Will explore potentially new layers
TM
µsik
µsik
µsik
Process
Process
Process
TM Red
TM Null
RM
libSynk
RM Bar
OS/Hardware
FM Myr
Network
FM ShM
X
FM TCP
FM MPI
Y Implies X uses Y
Our existing layered software
44
Managed by UT-Battelle
for the U.S. Department of Energy
– Non-blocking collectives (MPI-3)
– Chapel language
FM
LVT
Performance Metrics
Efficiency, speedup measured using event rates
Event rate ≡ No. of events processed per wall clock sec
• Weak scaling:
Ideal speedup ≡ Events/second/processor invariant with
number of processors
• Strong scaling:
Ideal speedup ≡ Aggregate events/second linearly
increases with number of processors
45
Managed by UT-Battelle
for the U.S. Department of Energy
Application Benchmarking and
Demonstration
Entire runtime and data integration
frameworks to be exercised
At-scale simulation from each
– Instantiate scenarios scaled up from area
smaller-scale scenarios in literature
– Experiment with strong-scaling as
well as weak-scaling, as appropriate
for each application area
– Epidemiological simulations
– Human behavioral simulations
– Cyber infrastructure simulations
– Logistics simulations

pi  1  e
Example inter-entity networks
46
Managed by UT-Battelle
for the U.S. Department of Energy
 Nr ln(1rsi  )
rR
Example: Probability of infection in
epidemiological model
Status
Showed preliminary evidence that PDES is
– Feasible even at the largest core-counts
– Adequately scalable to over 100,000 cores
– But should be improved much, much more
Applications can now move beyond “if” and begin to contemplate
on “how” to use petascale discrete event execution
47
Managed by UT-Battelle
for the U.S. Department of Energy
Methodological Alternatives
Sometimes, new modeling formulations may better suit
scaling needs!
– Redefine and refine model to suit the computing platform
Example
– Ultra-scale vehicular mobility simulations on GPUs…
48
Managed by UT-Battelle
for the U.S. Department of Energy
Example: Ultra-scale Vehicular
Mobility Simulations
E.g., National Evacuation Conference
• www.nationalevacuationconference.org
49
Managed by UT-Battelle
for the U.S. Department of Energy
Our GARFIELD Simulation &
Visualization System
Texture
Memory
v
FP
FP
FP FP=Fragment
FP Processor
Texture
Memory
FP
FP
FP FP=Fragment
FP Processor
v
Demo
State
DC
LA
TN
FL
Nodes
9,559
413,574
583,484
1,048,506
Links
14,884
988,458
1,335,586
2,629,268
Texture
X×X
1048576
4194304
3211264
4194304
TX
2,073,870
5,116,492
3211264
Texture
Memory
FP
FP
FP FP=Fragment
FP Processor
v
Texture
Memory
v
Evac
Time
Hours
35.20
65.07
157.91
179.20
217.60
Run
Time
Sec
FP
FP
FP FP=Fragment
FP Processor
FP
FP
FP FP=Fragment
FP Processor
FP
FP
FP
FP
Managed by UT-Battelle
for the U.S. Department of Energy
777.65
Texture
Memory
Texture
Memory
50
54.90
409.59
353.89
611.83
FP=Fragment
Processor
Marketing
Text book definition of marketing
“Creating the need”
Simulation community’s responsibility
– Identify potential, benefits
– Invent new methods, methodologies, capabilities
– Educate about need, potential, benefit
51
Managed by UT-Battelle
for the U.S. Department of Energy
Lighter Vein or Reality?
• David Nicol once noted
“PADS research tends to scratch where it doesn’t itch”
• Now, probably time to ponder
“Have we been tolerating some (very bothersome)
itches for lack of a long scratching stick?”
PADS=Parallel and Distributed Simulation
52
Managed by UT-Battelle
for the U.S. Department of Energy
Perspective and Action
“Perfect opportunity to expand our outlook in
simulation-based methods and methodologies”
• 105-106 cores nearly a reality
Million-core computers impending
www.exascale.org
• Nation-scale, world-scale
questions of increasing interest
Compositional dynamics of millions to billion
processes, individuals
• Assume immense computing power
• Conceive large simulation-enabled solutions
53
Managed by UT-Battelle
for the U.S. Department of Energy
Thank you!
Questions? Comments?