Parallel Simulations of Continuous Systems

Download Report

Transcript Parallel Simulations of Continuous Systems

Parallel Simulation of Continuous
Systems:
A Brief Introduction
Oct. 19, 2005
CS6236 Lecture
Background
Computer
simulations
Discrete
models

Continuous
models
Sample applications of continuous systems





Civil engineering: building construction
Aerospace engineering: aircraft design
Mechanical engineering: machining
Systems biology: heart simulations
Computer engineering: semiconductor simulations
Outline

Mathematical models and methods

Parallel algorithm methodology

Some active research areas
Mathematical Models

Ordinary/partial differential equations




Laplace equation:
Heat (diffusion) equation:
Steady-state v.s. time-dependent
Convert into discrete problem through numerical
discretization




Finite difference methods: structured grids
Finite element methods: local basis functions
Spectral methods: global basis functions
Finite volume methods: conservation
Example: 1-D Laplace Equation

Laplace equation in one dimension
with boundary conditions

Finite difference approximation

with
Jacobi iteration
Example: 2-D Laplace Equation

Laplace equation in two dimension
with boundary conditions at four sides
Parallel Programming Model



Parallel computation: two or more tasks
executing concurrently
Task encapsulates sequential program and local
memory
Tasks can be mapped to processors in various
ways, including multiple tasks per processor
Performance Considerations



Load balance: work divided evenly
Concurrency: work done simultaneously
Overhead: work not present in serial
computation




Communication
Synchronization
Redundant work
Speculative work
Example: 1-D Laplace Equation


Define n tasks, one for each yi
Program for task i, i=1,…,n
Initialize yi
for k=1,…
if i>1, send yi to task i-1
if i<n, send yi to task i+1
if i<n, recv yi+1 from task i+1
if i>1, recv yi-1 from task i-1
yi = (yi-1+yi+1)/2
end
Design Methodology




Partition (Decomposition): decompose problem
into fine-grained tasks to maximize potential
parallelism
Communication: determine communication
pattern among tasks
Agglomeration: combine into coarser-grained
tasks, if necessary, to reduce communication
requirements or other costs
Mapping: assign tasks to processors, subject to
tradeoff between communication cost and
concurrency
Design Methodology
Types of Partitioning

Domain decomposition: partition data


Example: grid points in 1-, 2-, or 3-D mesh
Functional decomposition: partition computation

Example: components in climate model (atmosphere,
ocean, land, etc.)
Example: Domain Decomposition

3-D mesh can be partitioned along any
combination of one, two, or all three of its
dimensions
Partitioning Checklist




Identify at least an order of magnitude more
tasks than processors in target parallel system
Avoid redundant computation or storage
Make tasks reasonably uniform in size
Number of tasks, rather than size of each task,
should grow as problem size increases
Communication Issues




Latency and bandwidth
Routing and switching
Contention, flow control, and aggregate
bandwidth
Collective communication




One-to-many: broadcast, scatter
Many-to-one: gather, reduction, scan
All-to-all
Barrier
Communication Checklist





Communication should be reasonably uniform
across tasks in frequency and volume
As localized as possible
Concurrent
Overlapped with computation, if possible
Not inhibiting concurrent execution of tasks
Agglomeration



Communication is proportional to surface area of
subdomain, whereas computation is proportional
to volume of subdomain
Higher-dimensional decompositions have more
favorable communication-to-computation ratio
Increasing task sizes reduces communication
but also reduces potential concurrency and
flexibility
Surface-to-Volume Ratio
Example: Agglomeration


Define p tasks, each with n/p of yi’s
Program for task j, j=1,...p
initialize yl,...,yh
for k=1,...
if j>1, send yl to task j-1
if j<p, send yh to task j+1
if j<p, recv yh+1 from task j+1
if j>1, recv yl-1 from task j-1
for i=l to h
zi = (yi-1+yi+1)/2
end
y=z
end
Example: Overlap Comm/Comp

Program for task j, j=1,...p
initialize yl,...,yh
for k=1,...
if j>1, send yl to task j-1
if j<p, send yh to task j+1
for i=l+1 to h-1
zi = (yi-1+yi+1)/2
end
if j<p, recv yh+1 from task j+1
zh = (yh-1+yh+1)/2
if j>1, recv yl-1 from task j-1
zl = (yl-1+yl+1)/2
y=z
end
Mapping

Two basic strategies for assigning tasks to
processors:





Place tasks that can execute concurrently on different
processors
Place tasks that communicate frequently on same
processor
Problem: These two strategies often conflict
In general, finding optimal solution to this
tradeoff is NP-complete, so heuristics are used
to find reasonable compromise
Dynamic vs static strategies
Mapping Issues







Partitioning
Granularity
Mapping
Scheduling
Load balancing
Particularly challenging for irregular problems
Some software tools: Metis, Chaco, Zoltan, etc.
Example: Atmosphere Model

Partitioning



grid points in 3-D finite difference model
Typically yields 105 to 107 tasks
Communication



9-point stencil horizontally and 3-point stencil vertically
Physics computations in vertical columns
Global operations to compute total mass
Example: Atmosphere Model
Other Equations





Heat (diffusion) equation:
Laplace equation:
Advection equation:
Wave equation:
Classification of second-order equations


Parabolic, hyperbolic, and elliptic
Methods for time-dependent equations


Explicit v.s. implicit
Finite-difference, finite-volume, finite-element
CFL Condition for Stability



Necessary condition named after Courant,
Friedrichs, and Lewy
Computational domain of dependence must
contain physical domain of dependence
Implies time step must satisfy
Active Research Areas

DES of continuous systems
Active Research Areas

Coupling of different physics



Different mathematical models
Continuous v.s. discrete techniques
Load balancing



Manager-worker model
Irregular/unstructured problems
Dynamic load balancing
Summary

Mathematical models for continuous systems



Parallel algorithm design





Ordinary and partial differential equations
Finite difference, finite volume, and finite element
Partitioning
Communication
Agglomeration
Mapping
Active research areas
References




I. T. Foster, Designing and Building Parallel
Programs, Addison-Wesley, 1995
A. Grama, A. Gupta, G. Karypis, and V. Kumar,
Introduction to Parallel Computing, 2nd. ed.,
Addison-Wesley, 2003
M. J. Quinn, Parallel Computing: Theory and
Practice, McGraw-Hill, 1994
K. M. Chandy and J. Misra, Parallel Program
Design: A Foundation, Addison-Wesley, 1988