Parallel Simulations of Continuous Systems
Download
Report
Transcript Parallel Simulations of Continuous Systems
Parallel Simulation of Continuous
Systems:
A Brief Introduction
Oct. 19, 2005
CS6236 Lecture
Background
Computer
simulations
Discrete
models
Continuous
models
Sample applications of continuous systems
Civil engineering: building construction
Aerospace engineering: aircraft design
Mechanical engineering: machining
Systems biology: heart simulations
Computer engineering: semiconductor simulations
Outline
Mathematical models and methods
Parallel algorithm methodology
Some active research areas
Mathematical Models
Ordinary/partial differential equations
Laplace equation:
Heat (diffusion) equation:
Steady-state v.s. time-dependent
Convert into discrete problem through numerical
discretization
Finite difference methods: structured grids
Finite element methods: local basis functions
Spectral methods: global basis functions
Finite volume methods: conservation
Example: 1-D Laplace Equation
Laplace equation in one dimension
with boundary conditions
Finite difference approximation
with
Jacobi iteration
Example: 2-D Laplace Equation
Laplace equation in two dimension
with boundary conditions at four sides
Parallel Programming Model
Parallel computation: two or more tasks
executing concurrently
Task encapsulates sequential program and local
memory
Tasks can be mapped to processors in various
ways, including multiple tasks per processor
Performance Considerations
Load balance: work divided evenly
Concurrency: work done simultaneously
Overhead: work not present in serial
computation
Communication
Synchronization
Redundant work
Speculative work
Example: 1-D Laplace Equation
Define n tasks, one for each yi
Program for task i, i=1,…,n
Initialize yi
for k=1,…
if i>1, send yi to task i-1
if i<n, send yi to task i+1
if i<n, recv yi+1 from task i+1
if i>1, recv yi-1 from task i-1
yi = (yi-1+yi+1)/2
end
Design Methodology
Partition (Decomposition): decompose problem
into fine-grained tasks to maximize potential
parallelism
Communication: determine communication
pattern among tasks
Agglomeration: combine into coarser-grained
tasks, if necessary, to reduce communication
requirements or other costs
Mapping: assign tasks to processors, subject to
tradeoff between communication cost and
concurrency
Design Methodology
Types of Partitioning
Domain decomposition: partition data
Example: grid points in 1-, 2-, or 3-D mesh
Functional decomposition: partition computation
Example: components in climate model (atmosphere,
ocean, land, etc.)
Example: Domain Decomposition
3-D mesh can be partitioned along any
combination of one, two, or all three of its
dimensions
Partitioning Checklist
Identify at least an order of magnitude more
tasks than processors in target parallel system
Avoid redundant computation or storage
Make tasks reasonably uniform in size
Number of tasks, rather than size of each task,
should grow as problem size increases
Communication Issues
Latency and bandwidth
Routing and switching
Contention, flow control, and aggregate
bandwidth
Collective communication
One-to-many: broadcast, scatter
Many-to-one: gather, reduction, scan
All-to-all
Barrier
Communication Checklist
Communication should be reasonably uniform
across tasks in frequency and volume
As localized as possible
Concurrent
Overlapped with computation, if possible
Not inhibiting concurrent execution of tasks
Agglomeration
Communication is proportional to surface area of
subdomain, whereas computation is proportional
to volume of subdomain
Higher-dimensional decompositions have more
favorable communication-to-computation ratio
Increasing task sizes reduces communication
but also reduces potential concurrency and
flexibility
Surface-to-Volume Ratio
Example: Agglomeration
Define p tasks, each with n/p of yi’s
Program for task j, j=1,...p
initialize yl,...,yh
for k=1,...
if j>1, send yl to task j-1
if j<p, send yh to task j+1
if j<p, recv yh+1 from task j+1
if j>1, recv yl-1 from task j-1
for i=l to h
zi = (yi-1+yi+1)/2
end
y=z
end
Example: Overlap Comm/Comp
Program for task j, j=1,...p
initialize yl,...,yh
for k=1,...
if j>1, send yl to task j-1
if j<p, send yh to task j+1
for i=l+1 to h-1
zi = (yi-1+yi+1)/2
end
if j<p, recv yh+1 from task j+1
zh = (yh-1+yh+1)/2
if j>1, recv yl-1 from task j-1
zl = (yl-1+yl+1)/2
y=z
end
Mapping
Two basic strategies for assigning tasks to
processors:
Place tasks that can execute concurrently on different
processors
Place tasks that communicate frequently on same
processor
Problem: These two strategies often conflict
In general, finding optimal solution to this
tradeoff is NP-complete, so heuristics are used
to find reasonable compromise
Dynamic vs static strategies
Mapping Issues
Partitioning
Granularity
Mapping
Scheduling
Load balancing
Particularly challenging for irregular problems
Some software tools: Metis, Chaco, Zoltan, etc.
Example: Atmosphere Model
Partitioning
grid points in 3-D finite difference model
Typically yields 105 to 107 tasks
Communication
9-point stencil horizontally and 3-point stencil vertically
Physics computations in vertical columns
Global operations to compute total mass
Example: Atmosphere Model
Other Equations
Heat (diffusion) equation:
Laplace equation:
Advection equation:
Wave equation:
Classification of second-order equations
Parabolic, hyperbolic, and elliptic
Methods for time-dependent equations
Explicit v.s. implicit
Finite-difference, finite-volume, finite-element
CFL Condition for Stability
Necessary condition named after Courant,
Friedrichs, and Lewy
Computational domain of dependence must
contain physical domain of dependence
Implies time step must satisfy
Active Research Areas
DES of continuous systems
Active Research Areas
Coupling of different physics
Different mathematical models
Continuous v.s. discrete techniques
Load balancing
Manager-worker model
Irregular/unstructured problems
Dynamic load balancing
Summary
Mathematical models for continuous systems
Parallel algorithm design
Ordinary and partial differential equations
Finite difference, finite volume, and finite element
Partitioning
Communication
Agglomeration
Mapping
Active research areas
References
I. T. Foster, Designing and Building Parallel
Programs, Addison-Wesley, 1995
A. Grama, A. Gupta, G. Karypis, and V. Kumar,
Introduction to Parallel Computing, 2nd. ed.,
Addison-Wesley, 2003
M. J. Quinn, Parallel Computing: Theory and
Practice, McGraw-Hill, 1994
K. M. Chandy and J. Misra, Parallel Program
Design: A Foundation, Addison-Wesley, 1988