Parallelization

Download Report

Transcript Parallelization

Partial Differential Equations
• Introduction, Adam Zornes
• Discretizations and Iterative Solvers,
Chenfang Chen
• Parallelization, Dr. Danny Thorne
What do You Stand For?
• A PDE is a Partial Differential Equation
• This is an equation with derivatives of at
least two variables in it.
• In general, partial differential equations are
much more difficult to solve analytically
than are ordinary differential equations
What Does a PDE Look Like
• Let u be a function of x and y. There are
several ways to write a PDE, e.g.,
– ux + u y = 0
– du/dx + du/dy = 0
The Baskin Robin’s esq
Characterization of PDE’s
• The order is determined by the maximum
number of derivatives of any term.
• Linear/Nonlinear
– A nonlinear PDE has the solution times a
partial derivative or a partial derivative raised
to some power in it
• Elliptic/Parabolic/Hyperbolic
Six One Way
• Say we have the following:
Auxx + 2Buxy + Cuyy + Dux + Euy + F = 0.
• Look at B2 - AC
– < 0 elliptic
– = 0 parabolic
– > 0 hyperbolic
Or Half a Dozen Another
• A general linear PDE of order 2:
• Assume symmetry in coefficients so that A = [aij]
is symmetric. Eig(A) are real. Let P and Z denote
the number of positive and zero eigenvalues of
A.
– Elliptic: Z = 0 and P = n or Z = 0 and P = 0..
– Parabolic: Z > 0 (det(A) = 0).
– Hyperbolic: Z=0 and P = 1 or Z = 0 and P = n-1.
– Ultra hyperbolic: Z = 0 and 1 < P < n-1.
Elliptic, Not Just For Exercise
Anymore
• Elliptic partial differential equations have
applications in almost all areas of
mathematics, from harmonic analysis to
geometry to Lie theory, as well as
numerous applications in physics.
• The basic example of an elliptic partial
differential equation is Laplace’s Equation
– uxx - uyy = 0
The Others
• The heat equation is the basic Hyperbolic
– ut - uxx - uyy = 0
• The wave equations are the basic
Parabolic
– ut - ux - uy = 0
– utt - uxx - uyy = 0
• Theoretically, all problems can be mapped
to one of these
What Happens Where You
Can’t Tell What Will Happen
• Types of boundary conditions
– Dirichlet: specify the value of the function on a
surface
– Neumann: specify the normal derivative of the
function on a surface
– Robin: a linear combination of both
• Initial Conditions
Is It Worth the Effort?
• Basically, is it well-posed?
– A solution to the problem exists.
– The solution is unique.
– The solution depends continuously on the
problem data.
• In practice, this usually involves correctly
specifying the boundary conditions
Why Should You Stay Awake for
the Remainder of the Talk?
• Enormous application to computational
science, reaching into almost every nook
and cranny of the field including, but not
limited to: physics, chemistry, etc.
Example
• Laplace’s equation involves a steady state
in systems of electric or magnetic fields in
a vacuum or the steady flow of
incompressible non-viscous fluids
• Poisson’s equation is a variation of
Laplace when an outside force is applied
to the system
Poisson Equation in 2D
Example: CFD
Computational Fluid Dynamics
• CFD can be defined narrowly as confined to
aerodynamic flow around vehicles but it can be
generalized to include as well such areas as
weather and climate simulation, flow of
pollutants in the earth, and flow of liquids in oil
fields (reservoir modelling).
• Involve Huge PDE’s
• Computational Science only Realistic Solution
Links
• http://www.npac.syr.edu/users/gcf/cps713overI94/
• http://www.cse.uiuc.edu/~rjhartma/pdesong.html
• http://www.maths.soton.ac.uk/teaching/units/ma274/nod
e2.html
• http://www.npac.syr.edu/projects/csep/pde/pde.html
• http://mathworld.wolfram.com/PartialDifferentialEquation.
html
Discretization and
Iterative Method for PDEs
Chunfang Chen
Danny Thorne
Adam Zornes
Classification of PDEs
Different mathematical and physical
behaviors:
 Elliptic Type
 Parabolic Type
 Hyperbolic Type
System of coupled equations for several
variables:
 Time : first-derivative (second-derivative for
wave equation)
 Space: first- and second-derivatives
Classification of PDEs (cont.)
General form of second-order PDEs ( 2
variables)
PDE Model Problems
 Hyperbolic (Propagation)
• Advection equation (First-order linear)
• Wave equation
(Second-order linear )
PDE Model Problems (cont.)
 Parabolic (Time- or space-marching)
• Burger’s equation(Second-order nonlinear)
(Diffusion / dispersion)
• Fourier equation (Second-order linear )
PDE Model Problems (cont.)
 Elliptic (Diffusion, equilibrium problems)
• Laplace/Poisson (second-order linear)
• Helmholtz equation (second-order linear)
PDE Model Problems (cont.)
 System of Coupled PDEs
• Navier-Stokes Equations
Well-Posed Problem
Numerically well-posed
 Discretization equations
 Auxiliary conditions (discretized approximated)
• the computational solution exists (existence)
• the computational solution is unique (uniqueness)
• the computational solution depends continuously on the
approximate auxiliary data
• the algorithm should be well-posed (stable) also
Boundary and Initial
Conditions
Initial conditions: starting
point for propagation problems
Boundary conditions: specified
on domain boundaries to provide
the interior solution in
computational domain
R
R
s
n
Numerical Methods
 Complex geometry
 Complex equations (nonlinear, coupled)
 Complex initial / boundary conditions
 No analytic solutions
 Numerical methods needed !!
Numerical Methods
Objective: Speed, Accuracy at minimum
cost
 Numerical
 Numerical
 Numerical
 Validation
Accuracy (error analysis)
Stability (stability analysis)
Efficiency (minimize cost)
(model/prototype data, field data,
analytic solution, theory, asymptotic solution)
 Reliability and Flexibility (reduce preparation
and debugging time)
 Flow Visualization (graphics and animations)
computational solution
procedures
Governing
Equations
ICS/BCS
Continuous
Solutions
System of
Algebraic
Equations
Equation
(Matrix)
Solver
Discrete
Nodal
Values
Tridiagonal
Ui (x,y,z,t)
ADI
p (x,y,z,t)
Finite-Element
SOR
T (x,y,z,t)
Spectral
Gauss-Seidel
Boundary Element
Krylov
Hybrid
Multigrid
Discretization
Finite-Difference
Finite-Volume
DAE
Approximate
Solution
or
 (,,, )
Discretization
 Time derivatives
almost exclusively by finite-difference methods
 Spatial derivatives
- Finite-difference: Taylor-series expansion
- Finite-element: low-order shape function and
interpolation function, continuous within each
element
- Finite-volume: integral form of PDE in each
control volume
- There are also other methods, e.g. collocation,
spectral method, spectral element, panel
method, boundary element method
Finite Difference
 Taylor series
 Truncation error
 How to reduce truncation errors?
• Reduce grid spacing, use smaller x = x-xo
• Increase order of accuracy, use larger n
Finite Difference Scheme
 Forward difference
 Backward difference
 Central difference
Example : Poisson Equation
(-1,1)
(1,1)
y
x
(-1,-1)
(1,-1)
Example (cont.)
Ui
 1, j
 2Ui , j  Ui
x
2
 1, j

Ui ,
j  1
 2Ui , j  Ui ,
y





  
 (i,j+1)y 
(i-1,j)x (i,j)(i+1,j)
 (i,j-1) 
  
2





j  1
1
Rectangular Grid
After we discretize the Poisson equation on a
rectangular domain, we are left with a finite
number of gird points. The boundary values
of the equation are
0,4 1,4 2,4 3,4 4,4
the only known grid
0,3 1,3 2,3 3,3 4,3
points

0,1
0,0
0,2

1,1
1,0
1,2

2,1
2,0
2,2

3,1
3,0
3,2

4,1
4,0
4,2
What to solve?
Discretization produces a linear system of
equations.
4 1
0
1
0
The A matrix is a
pentadiagonal banded
matrix of a standard
form:
1
4
1
0
1
0
1
4
1
0
1
0
0
1
1
0
A solution method is to be performed for
solving
4 1
1 4
Matrix Storage
We could try and take advantage of the
banded nature of the system, but a
more general solution is the adoption of
a sparse matrix storage strategy.
Limitations of Finite
Differences
 Unfortunately, it is not easy to use
finite differences in complex
geometries.
 While it is possible to formulate
curvilinear finite difference methods,
the resulting equations are usually
pretty nasty.
Finite Element Method
 The finite element method, while more
complicated than finite difference methods,
easily extends to complex geometries.
 A simple (and short) description of the finite
element method is not easy to give.
PDE
Weak
Matrix
Form
System
Finite Element Method
(Variational Formulations)
 Find u in test space H such that a(u,v) = f(v) for all v
in H, where a is a bilinear form and f is a linear
functional.




V(x,y) = j Vjj(x,y), j = 1,…,n
I(V) = .5 j j AijViVj - j biVi, i,j = 1,…,n
Aij =  a( j,  j), i = 1,…,n
Bi =  f j, i = 1,…,n
 The coefficients Vj are computed and the function
V(x,y) is evaluated anyplace that a value is needed.
 The basis functions should have local support (i.e.,
have a limited area where they are nonzero).
Time Stepping Methods
 Standard methods are common:
– Forward Euler (explicit)
– Backward Euler (implicit)
– Crank-Nicolson (implicit)
T jn 1  T jn
t
  LxxT
LxxT j 
n 1
j
 (1   ) LxxT
T j 1  2T j  T j 1
x 2
n
j
θ = 0, Fully-Explicit
θ = 1, Fully-Implicit
θ = ½, Crank-Nicolson
Time Stepping Methods (cont.)
 Variable length time stepping
– Most common in Method of Lines (MOL) codes or
Differential Algebraic Equation (DAE) solvers
Solving the System
 The system may be solved using simple
iterative methods - Jacobi, Gauss-Seidel,
SOR, etc.
•
Some advantages:
•
Some disadvantages
- No explicit storage of the matrix is required
- The methods are fairly robust and reliable
- Really slow (Gauss-Seidel)
- Really slow (Jacobi)
Solving the System
Advanced iterative methods (CG, GMRES)
CG is a much more powerful way to solve the
problem.
• Some advantages:
– Easy to program (compared to other advanced
methods)
– Fast (theoretical convergence in N steps for an N
by N system)
• Some disadvantages:
– Explicit representation of the matrix is probably
necessary
– Applies only to SPD matrices
Multigrid Algorithm:
Components
 Residual
compute the error of the approximation
 Iterative method/Smoothing Operator
Gauss-Seidel iteration
 Restriction
obtain a ‘coarse grid’
 Prolongation
from the ‘coarse grid’ back to the
original grid
Residual Vector
 The equation we are to solve is defined as:
 So then the residual is defined to be:
 Where uq is a vector approximation for u
 As the u approximation becomes better,
the components of the residual vector(r) ,
move toward zero
Multigrid Algorithm:
Components
 Residual
compute the error of your approximation
 Iterative method/Smoothing Operator
Gauss-Seidel iteration
 Restriction
obtain a ‘coarse grid’
 Prolongation
from the ‘coarse grid’ back to the original
grid
Multigrid Algorithm:
Components
 Residual
compute the error of your approximation
 Iterative method/Smoothing Operator
Gauss-Seidel iteration
 Restriction
obtain a ‘coarse grid’
 Prolongation
from the ‘coarse grid’ back to the original
grid
The Restriction Operator
 Defined as ‘half-weighted’ restriction.
 Each new point in the courser grid, is
dependent upon it’s neighboring points from
the fine grid
Multigrid Algorithm:
Components
 Residual
compute the error of your approximation
 Iterative method/Smoothing Operator
Gauss-Seidel iteration
 Restriction
obtain a ‘coarse grid’
 Prolongation
from the ‘coarse grid’ back to the original
grid
The Prolongation Operator
 The grid change is exactly the opposite
of restriction
(0,2) 
(1,2) 
(2,2) 
(0,4) (1,4) (2,4) (3,4) (4,4)
(0,1) 
(1,1) 
(2,1) 
(0,2) (1,2) (2,2) (3,2) (4,2)
(0,0) 
(1,0) 
(2,0) 
(0,0) (1,0) (2,0) (3,0) (4,0)
(0,3) (1,3) (2,3) (3,3) (4,3)
(0,1) (1,1) (2,1) (3,1) (4,1)
Prolongation vs. Restriction
 The most efficient multigrid algorithms use
prolongation and restriction operators that
are directly related to each other. In the one
dimensional case, the relation between
prolongation and restriction is as follows:
Full Multigrid Algorithm
1.Smooth initial U vector to receive a new approximation Uq
2. Form residual vector: Rq =b -A Uq
3. Restrict Rq to the next courser grid  Rq-1
4. Smooth Ae= Rq-1 starting with e=0 to obtain eq-1
5.Form a new residual vector using: Rq-1= Rq-1 -A eq-1
6. Restrict R2 (5x? where ?5) down to R1(3x? where ?3)
7. Solve exactly for Ae= R1 to obtain e1
8. Prolongate e1e2 & add to e2 you got from restriction
9. Smooth Ae= R2 using e2 to obtain a new e2
10. Prolongate eq-1 to eq and add to Uq
Reference
http://csep1.phy.ornl.gov/CSEP/PDE/PDE.htm
l
www.mgnet.org/
www.ceprofs.tamu.edu/hchen/
www.cs.cmu.edu/~ph/859B/www/notes/multig
rid.pdf
www.cs.ucsd.edu/users/carter/260
www.cs.uh.edu/~chapman/teachpubs/slides04
-methods.ppt
http://www.ccs.uky.edu/~douglas/Classes/cs5
21-s01/index.html
HPC Issues in PDEs, Part 3
Parallelization
Danny Thorne, CS521, University of Kentucky, April 9, 2002
Parallel Computation
• Serious calculations today are mostly done on a parallel
computer.
• The domain is partitioned into subdomains that may or may
not overlap slightly.
• Goal is to calculate as many things in parallel as possible
even if some things have to be calculated on several
processors in order to avoid communication.
• ”Communication is the Darth Vader of parallel computing.”
Example: Original Mesh
• Consider solving a PDE on this
grid.
• Suppose that only half of the
nodes fit on a processor.
Example: Mesh on Two Processors
• Divide into two connected
subsets and renumber within
the subdomains.
• Communication occurs
between neighbors that cross
the processor boundary.
• Ghost points (or overlap) can
reduce communication
sometimes at the expense of
extra computation.
• Computation is O( 11000)
communication per word.
Mesh Decomposition
• Goals are to maximize interior while minimizing connections
between subdomains. That is, minimize communication.
• Such decomposition problems have been studied in load
balancing for parallel computation.
• Lots of choices:
• METIS, ParMETIS -- University of Minnesota.
• PARTI -- University of Maryland,
• CHACO -- Sandia National Laboratories,
• JOSTLE -- University of Greenwich,
• PARTY -- University of Paderborn,
• SCOTCH -- Université Bordeaux,
• TOP/DOMDEC -- NAS at NASA Ames Research Center.
Mesh Decomposition
• Quality of mesh decomposition has a highly significant effect
on performance.
• Arriving at a “good” decomposition is a complex task.
• “Good” may be problem/architecture dependent.
• A wide variety of well-established methods exist.
• Several packages/libraries provide implementations of
many of these methods.
• Major practical difficulty is differences in file formats.
http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-11.html
Decomposition Goals
• Load balance.
• Distribute elements evenly across processors.
• Each processor should have equal share of work.
• Communication costs should be minimized.
• Minimize sub-domain boundary elements.
• Minimize number of neighboring domains.
• Distribution should reflect machine architecture.
• Communication versus calculation.
• Bandwidth versus latency.
• Note that optimizing load balance and communication
cost simultaneously is an NP-hard problem.
http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-13.html
Static Mesh versus Dynamic Mesh
• Static mesh
• Decomposition need only be carried out once.
• Static decomposition may therefore be carried out as a
preprocessing step, often done in serial.
• Dynamic mesh
• Decomposition must be adapted as underlying mesh
changes.
• Dynamic decomposition therefore becomes part of the
calculation itself and cannot be carried out solely as a preprocessing step.
http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-14.html
Rod Impact
• Notice that the mesh
changes dynamically.
http://www.amtec.com/plotgallery/animation/movies/t-rod.gif
Frisbee
• Need a dynamic mesh
for this?
http://www.llnl.gov/CASC/Overture/henshaw/figures/pib.mpg
Dynamic Decomposition
• To redistribute a mesh
• A new decomposition must be found and required
changes made (shuffling of data between processors).
• The time taken may outweigh the benefit gained so an
assessment of whether it is worthwhile is needed.
• The decomposition must run in parallel and be fast.
• Take into account the old decomposition.
• Make minimal changes.
http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-15.html
Dual Graphs
• To make a decomposition we need
• A representation of the basic entities being distributed
(e.g. the elements, in the case of a FE mesh),
• An idea of how communication takes place between them.
• A dual graph, based on the mesh, fills this role.
• Vertices in the graph represent the entities (elements).
• Edges in the graph represent communication.
• Graph depends on how data is transferred.
• For finite elements it could be via nodes, edges or faces.
• So a single mesh can have more than one dual graph.
• Edges (communications) or vertices (calculations) may be
weighted.
http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-16.html
http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-17.html
2D Example
Graph Partitioning
• This problem occurs in several applications.
• VLSI circuit layout design,
• Efficient orderings for parallel sparse matrix factorization,
• Mesh decomposition.
• This problem has been extensively studied.
• A large body of literature already exists.
• Much work is still being produced in the field.
http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-18.html
Graph Partitioning
• Given a graph G(V , E ) we want to partition it into k parts
such that:
• Each part has roughly the same number of vertices,
• Number of edges that cross partitions is minimized.
http://www-users.cs.umn.edu/~karypis/publications/Talks/multi-constraint/sld002.htm
Mesh Decomposition
• View mesh decomposition as having two aspects:
• Partitioning a graph,
• Mapping the sub-domains to processors.
• Algorithms may
• Address both issues,
• Or simplify the problem by ignoring the latter.
• Whether this is an issue depends on target machine.
• With very fast communications, e.g. the T3D, it's probably
less important.
• On a cluster of workstations it may be absolutely critical.
http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-22.html
Generalization
Decomposition Algorithms
• Global methods
• Direct k-way partitioning.
• Recursive application of some  -way technique, e.g.
bisection.
• Local refinement techniques
• Incrementally improve quality of an existing
decomposition.
• Multi-level variants
• Work with coarse approximation of graph to increase
performance.
• Hybrid techniques
• Using various combinations of above.
http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-24.html
Examples
• Simple, direct k-way algorithms
• Random, Scattered & Linear Bandwidth Reduction
• The Greedy Algorithm
• Optimization algorithms
• Simulated Annealing
• Chained Local Optimization
• Genetic Algorithms
• Geometry based recursive algorithms
• Coordinate Partitioning
• Inertial Partitioning
http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-25.html
Examples
• Graph based recursive algorithms
• Layered Partitioning
• Spectral Partitioning
• Local refinement algorithms
• Kernighan and Lin
• Jostle
• Multi-level and hybrid variants
• Multi-level Kernighan and Lin Partitioning
• Multi-level Spectral RQL / Symmlq Partitioning
http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-26.html
Space Filling Curves
• Locality preserving.
• Each point lies a
unique distance along
the curve.
Space Filling Curves
• Optimal load balance.
• Subdomain boundaries
are sub-optimal.
• Recall: Optimizing load
and comm is NP-hard.
Space Filling Curves
The main advantages of this
partition method are:
• It is fast compared to graph
partitioning heuristics,
• It runs in parallel,
• It requires no
administration and no
storage of processor
neighborhoods.
• The knowledge of the
separators is enough to
compute where to find a
node and which processor to
ask for it.
Parallel PDE Tools
• Parallel ELLPACK -- Purdue University
• POOMA -- Los Alamos National Laboratory
• Overture -- Lawrence Livermore National Laboratory
• PADRE – LLNL, LANL
• PETSc -- Argonne National Laboratory
• Unstructured Grids -- University of Heidelberg
Links
• A Software Framework for Easy Parallelization of PDE Solvers -- http://www.ifi.uio.no/~xingca/PCFD2000/Slides/pcfd2000/
• Numerical Objects Online -- http://www.nobjects.com/Diffpack/
• Xing Cai's home page at University of Oslo -- http://www.ifi.uio.no/~xingca/
• Related Grid Decomposition Sites -- http://www.erc.msstate.edu/~bilderba/research/other_sites.html
• Domain Decomposition Methods -- http://www.ddm.org/
• People working on Domain Decomposition -- http://www.ddm.org/people.html
• Unstructured Mesh Decomposition -- http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/
• HPCI Seminar: Parallel Sparse Matrix Solvers -- http://www.epcc.ed.ac.uk/computing/research_activities/HPCI/sparse_seminar.html
• METIS: Family of Multilevel Partitioning Algorithms -- http://www-users.cs.umn.edu/~karypis/metis/index.html
• Multilevel Algorithms for Multi-Constraint Graph Partitioning -- http://www-users.cs.umn.edu/~karypis/publications/Talks/multi-constraint/sld001.htm
• Structured Versus Unstructured Meshes -- http://www.epcc.ed.ac.uk/epcc-tec/courses/images/Meshes.gif
• Domain Decomposition Methods for elliptic PDEs -- http://www.cs.purdue.edu/homes/mav/projects/dom_dec.html
• Multiblock Parti library -- http://www.cs.umd.edu/projects/hpsl/compilers/base_mblock.html
• Chaco: Software for Partitioning Graphs -- http://www.cs.sandia.gov/~bahendr/chaco.html
• A Portable and Efficient Parallel Code for Astrophysical Fluid Dynamics -- http://astro.uchicago.edu/Computing/On_Line/cfd95/camelse.html
• Links for Graph Partitioning -- http://rtm.science.unitn.it/intertools/graph-partitioning/links.html
• A Multilevel Algorithm for Partitioning Graphs -- http://www.chg.ru/SC95PROC/509_BHEN/SC95.HTM
• Multigrid Workbench: The Algorithm -- http://www.mgnet.org/mgnet/tutorials/xwb/mgframe.html
• Domain DecompositionParallelization of Mesh Based Applications -- http://www.hlrs.de/organization/par/par_prog_ws/online/DomainDecomposition/index.htm
• Parallel ELLPACK Elliptic PDE Solvers -- http://www.cs.purdue.edu/homes/markus/research/pubs/pde-solvers/paper.html
• POOMA -- http://www.acl.lanl.gov/pooma/
• Overture -- http://www.llnl.gov/CASC/Overture/
• PADRE: A Parallel Asynchronous Data Routing Environment -- http://www.c3.lanl.gov:80/~kei/iscope97.1/iscope97.1.html
• PETSc: The Portable, Extensible Toolkit for Scientific Computation -- http://www-fp.mcs.anl.gov/petsc/
• UG Home Page -- http://cox.iwr.uni-heidelberg.de/~ug/
• Space-filling Curves -- http://mrl.nyu.edu/~dzorin/sig99/zorin3/sld036.htm
• Separate IMAGE for Basic foil 30 Two Space Filling Curves -- http://www.npac.syr.edu/users/gcf/cps615gemoct98/foilsepimagedir/030IMAGE.html
• Using Space-filling Curves for Multi-dimensional Indexing -- http://www.dcs.bbk.ac.uk/~jkl/BNCOD2000/slides.html
• The origins of fractals -- http://plus.maths.org/issue6/turner1/
• Adaptive Parallel Multigrid Methods -- http://wissrech.iam.uni-bonn.de/research/projects/zumbusch/hash.html