Technologies and Tools for High
Download
Report
Transcript Technologies and Tools for High
Canonical Problems and
Scalable Solvers for
Numerical PDE
Robert C. Kirby
Department of Computer Science
The University of Chicago
Outline
Canonical problems in Scientific Computing
Algebraic equations
Differential equations
Optimization
Overview of SciDAC/TOPS
Hard problems
Cutting edge research
Funding opportunities
CMSC34000 Lecture 2, 6 Jan 2005
Canonical Problems
Problem
Ax b
Linear systems
Ax Bx
Eigenvalue
f ( x, x, t , p) 0
Ordinary DE
F(x) 0
Nonlinear systems
Constrained
optimization
min ( x, u )
u
s.t.
F ( x, u) 0
CMSC34000 Lecture 2, 6 Jan 2005
Linear Systems
Ax = b
PDE --> Linear System
Nonlinear system --> Sequence of linear steps
Millions of unknowns
Sparse
Ill-conditioned k(A) = || A || || A^-1 ||
Nonsymmetric
CMSC34000 Lecture 2, 6 Jan 2005
Solution techniques
Direct (Gaussian elimination & variants)
Iterative (Sequence of approximations)
Parallelism
Jacobi / Gauss Seidel
SOR
Krylov-subspace methods (sequence of clever matrix-vector
products builds up good approximation
Multigrid (algebraic / geometric)
Each process owns its own rows
Matrix action --> communication
Euler methods (forward, backward)
Runge-Kutta methods
CMSC34000 Lecture 2, 6 Jan 2005
Ordinary differential equations
Time rate of change of system of quantities)
Chemical/nuclear reactions
N-body problems (celestial mechanics)
Time-dependent PDE (after spatial discretization)
Algebraic equations at each time step
“Stiffness”
Wide range of time scales requires special treatment
Fastest modes require smallest time steps
Multistep methods
CMSC34000 Lecture 2, 6 Jan 2005
ODE Techniques
Euler methods (forward, backward)
Runge-Kutta methods
Multi-adaptive Galerkin (Anders Logg)
Parallelism?
Depends on coupling of components
Naturally arises from PDE
Parallelism in time is more difficult…
CMSC34000 Lecture 2, 6 Jan 2005
PDE
Multiple dependent variables
u u u p 0
u 0
Algebraic equations
If time dependent, get ODE or DAE at each
time step
CMSC34000 Lecture 2, 6 Jan 2005
Solution strategies
Finite differences (FDM)
Replace derivatives at points with difference quotient
Naturally leads to equations for each grid point
Irregular, adaptive grids require special treatment
Finite elements (FEM)
Replace “strong form” with “weak form”
Weak form over subspaces leads to algebraic system
Generalizes FDM (geometry, order)
Finite volumes
Discrete conservation laws over each control volume
Often “low order” but “stable”
CMSC34000 Lecture 2, 6 Jan 2005
Nonlinear equations
Arise from nonlinear PDE, ODE, etc
Require Newton’s method
F(u) F(uc ) F'(uc )u 0
u uc u
Jacobian matrix (differentiation)
Nonlinear Preconditioning
CMSC34000 Lecture 2, 6 Jan 2005
Optimization
Find the minimum of some function subject to
some constraints
Constaints may themselves be PDE
CMSC34000 Lecture 2, 6 Jan 2005
Motivation
Solver performance is a major concern for parallel
simulations based on PDE formulations
… including many of those of the U.S. DOE Scientific Discovery
through Advanced Computing (SciDAC) program
For target applications, implicit solvers may require 50% to
95% of execution time
… at least, before “expert” overhaul for algorithmic optimality and
implementation performance
Even after a “best manual practice” overhaul, the solver may
still require 20% to 50% of execution time
The solver may hit up against both the processor scalability
limit and the memory bandwidth limitation of a PDE-based
application, before any other part of the code
… the first of these is not fundamental, though the second one is
CMSC34000 Lecture 2, 6 Jan 2005
Presentation plan
Overview of the SciDAC initiative
Overview of the Terascale Optimal PDE Simulations
project (TOPS)
Brief review of scalable implicit methods (domain
decomposed multilevel iterative methods)
algorithms
software components: PETSc, Hypre, etc.
Three “war stories” from the SciDAC magnetically
confined fusion energy portfolio
Some advanced research directions
physics-based preconditioning
nonlinear Schwarz
On the horizon
CMSC34000 Lecture 2, 6 Jan 2005
SciDAC apps and infrastructure
4 projects
in high
energy and
nuclear
physics
14 projects in
biological and
environmental
research
18 projects in
scientific
software and
network
infrastructure
5 projects
in fusion
energy
science
10 projects
will in
basic
energy
sciences
CMSC34000 Lecture 2, 6 Jan 2005
“Enabling technologies” groups to develop
reusable software and partner with application
groups
From 2001 start-up, 51 projects share $57M/year
Approximately one-third for applications
A third for “integrated software infrastructure
centers”
A third for grid infrastructure and collaboratories
Plus, multi-Tflop/s IBM SP machines at NERSC
and ORNL available for SciDAC researchers
CMSC34000 Lecture 2, 6 Jan 2005
Unclassified resources for DOE science
IBM Power3+ SMP
16 procs per node
6656 procs total
10 Tflop/s
“Seaborg”
Berkeley
IBM Power4 Regatta
32 procs per node
864 procs total
4.5 Tflop/s
“Cheetah”
Oak Ridge
CMSC34000 Lecture 2, 6 Jan 2005
Designing a simulation code
(from 2001 SciDAC report)
V&V
loop
Performance
loop
CMSC34000 Lecture 2, 6 Jan 2005
A “perfect storm” for simulation
(dates are symbolic)
Hardware Infrastructure
Applications
1686
A
R
C
H
I
T
E
C
T
U
R
E
S
scientific models
1947
numerical algorithms
1976
computer architecture
scientific software engineering
“Computational science is undergoing a phase transition.” – D. Hitchcock, DOE
CMSC34000 Lecture 2, 6 Jan 2005
Imperative: multiple-scale applications
Multiple spatial scales
interfaces, fronts, layers
thin relative to domain
size, << L
Multiple temporal scales
fast waves
small transit times
relative to convection or
diffusion, << T
Richtmeyer-Meshkov instability, c/o A. Mirin, LLNL
Analyst must isolate dynamics of interest and model the rest in a
system that can be discretized over more modest range of scales
May lead to infinitely “stiff” subsystem requiring special
treatment by the solution method
CMSC34000 Lecture 2, 6 Jan 2005
Examples: multiple-scale applications
Biopolymers, nanotechnology
1012 range in time, from 10-15
sec (quantum fluctuation) to 10-3
sec (molecular folding time)
typical computational model
ignores smallest scales, works on
classical dynamics only, but
scientists increasingly want both
Galaxy formation
1020 range in space from binary
star interactions to diameter of
universe
heroic computational model
handles all scales with localized
adaptive meshing
Supernova simulation, c/o A. Mezzacappa, ORNL
Supernovae simulation
massive ranges in time and
space scales for radiation,
turbulent convection,
diffusion, chemical
reaction, nuclear reaction
CMSC34000 Lecture 2, 6 Jan 2005
SciDAC portfolio characteristics
Multiple temporal scales
Multiple spatial scales
Linear ill conditioning
Complex geometry and severe anisotropy
Coupled physics, with essential nonlinearities
Ambition for uncertainty quantification,
parameter estimation, and design
Need toolkit of portable, extensible, tunable
implicit solvers, not “one-size fits all”
CMSC34000 Lecture 2, 6 Jan 2005
TOPS starting point codes
PETSc (ANL)
Hypre (LLNL)
Sundials (LLNL)
SuperLU (LBNL)
PARPACK (LBNL*)
TAO (ANL)
Veltisto (CMU)
Many interoperability connections between these
packages that predated SciDAC
Many application collaborators that predated SciDAC
CMSC34000 Lecture 2, 6 Jan 2005
TOPS participants
CU
CMU
ANL
UC-B/LBNL
NYU
CU-B
LLNL
ODU
UT-K
TOPS lab (3)
TOPS university (7)
CMSC34000 Lecture 2, 6 Jan 2005
In the old days, see “Templates” guides …
www.netlib.org/templates
124 pp.
www.netlib.org/etemplates
410 pp.
… these are good starts, but not adequate for SciDAC scales!
CMSC34000 Lecture 2, 6 Jan 2005
“integrated software
infrastructure centers”
34 applications
groups
adaptive
gridding
discretization
solvers (TOPS)
Ax b
Ax Bx
f ( x, x, t , p) 0
F ( x, p) 0
7 ISIC groups
(4 CS, 3 Math)
10 grid, data
collaboratory
groups
systems
software
component
architecture
performance
engineering
data
management
min ( x, u )
u
s.t.
F ( x, u) 0
software
integration
performance
optimization
CMSC34000 Lecture 2, 6 Jan 2005
Keyword: “Optimal”
Convergence rate nearly
independent of discretization
parameters
Convergence rate as
independent as possible of
physical parameters
multilevel schemes for rapid linear
convergence of linear problems
Newton-like schemes for quadratic
convergence of nonlinear problems
200
Time to Solution
150
100
50
scalable
0
1
10
100
1000
Problem Size (increasing with number of
processors)
continuation schemes
physics-based preconditioning
Optimal convergence plus
scalable loop body yields
scalable solver
CMSC34000 Lecture 2, 6 Jan 2005
But where to go past O(N) ?
Since O(N) is already optimal, there is nowhere further
“upward” to go in efficiency, but one must extend optimality
“outward,” to more general problems
Hence, for instance, algebraic multigrid (AMG) to seek to
obtain O(N) in indefinite, anisotropic, or inhomogeneous
problems on irregular grids
R
AMG Framework
n
error easily
damped by
pointwise
relaxation
algebraically
smooth error
Choose coarse grids, transfer
operators, and smoothers to
eliminate these “bad”
components within a smaller
dimensional space, and recur
CMSC34000 Lecture 2, 6 Jan 2005
Toolchain for PDE solvers in TOPS project
Design and implementation of “solvers”
Time integrators
(w/ sens. anal.)
Nonlinear solvers
(w/ sens. anal.)
Constrained optimizers
f ( x, x, t , p) 0
F ( x, p) 0
min ( x, u ) s.t. F ( x, u ) 0, u 0
u
Linear solvers
Eigensolvers
Software integration
Performance optimization
Ax b
Ax Bx
Optimizer
Sens. Analyzer
Time
integrator
Nonlinear
solver
Eigensolver
Linear
solver
Indicates
dependence
CMSC34000 Lecture 2, 6 Jan 2005
Dominant data structures are grid-based
finite differences
finite
elements
finite volumes
node i
All lead to problems with sparse
Jacobian matrices; many tasks can
leverage off an efficient set of tools
for manipulating distributed sparse
data structures
row i
J=
CMSC34000 Lecture 2, 6 Jan 2005
Newton-Krylov-Schwarz:
a PDE applications “workhorse”
F (u) F (uc ) F ' (uc )u 0
u uc u
u
M 1 Ju M 1F
Ju F
arg min {Jx F} M 1 i RiT ( Ri JRiT )1 Ri
xV { F , JF , J 2 F ,}
Newton
Krylov
Schwarz
nonlinear solver
asymptotically quadratic
accelerator
spectrally adaptive
preconditioner
parallelizable
CMSC34000 Lecture 2, 6 Jan 2005
SPMD parallelism w/domain decomposition
3
2
1
rows assigned
to proc “2”
A21
A22
A23
Partitioning of the grid
induces block structure
on the Jacobian
CMSC34000 Lecture 2, 6 Jan 2005
Time-implicit Newton-Krylov-Schwarz
For accommodation of unsteady problems, and nonlinear robustness in
steady ones, NKS iteration is wrapped in time-stepping:
Pseudotime
loop
NKS
loop
for (l = 0; l < n_time; l++) {
select time step
for (k = 0; k < n_Newton; k++) {
compute nonlinear residual and Jacobian
for (j = 0; j < n_Krylov; j++) {
forall (i = 0; i < n_Precon ; i++) {
solve subdomain problems concurrently
} // End of loop over subdomains
perform Jacobian-vector product
enforce Krylov basis conditions
update optimal coefficients
check linear convergence
} // End of linear solver
perform DAXPY update
check nonlinear convergence
} // End of nonlinear loop
} // End of time-step loop
CMSC34000 Lecture 2, 6 Jan 2005
(N)KS kernel in parallel
Krylov
iteration
P1:
…
P2:
Pn:
local
scatter
Jac-vec
multiply
Bulk synchronous model leads to easy
scalability analyses and projections.
Each phase can be considered
separately. What happens if, for
instance, in this (schematicized)
iteration, arithmetic speed is doubled,
scalar all-gather is quartered, and local
scatter is cut by one-third?
precond
sweep
inner daxpy
product
P1:
P2:
…
Pn:
CMSC34000 Lecture 2, 6 Jan 2005
Estimating scalability of stencil computations
Given complexity estimates of the leading terms of:
And a bulk synchronous model of the architecture including:
the concurrent computation (per iteration phase)
the concurrent communication
the synchronization frequency
internode communication (network topology and protocol reflecting horizontal
memory structure)
on-node computation (effective performance parameters including vertical
memory structure)
One can estimate optimal concurrency and optimal execution
time
on per-iteration basis, or overall (by taking into account any granularitydependent convergence rate), based on problem size N and concurrency P
simply differentiate time estimate in terms of (N,P) with respect to P, equate to
zero and solve for P in terms of N
CMSC34000 Lecture 2, 6 Jan 2005
Scalability results for DD stencil computations
With tree-based (logarithmic) global reductions and
scalable nearest neighbor hardware:
With 3D torus-based global reductions and scalable
nearest neighbor hardware:
optimal number of processors scales linearly with problem
size
optimal number of processors scales as three-fourths power
of problem size (almost “scalable”)
With common network bus (heavy contention):
optimal number of processors scales as one-fourth power
of problem size (not “scalable”)
bad news for conventional Beowulf clusters, but see 2000
Bell Prize “price-performance awards”, for multiple NICs
CMSC34000 Lecture 2, 6 Jan 2005
NKS efficiently implemented in PETSc’s
MPI-based distributed data structures
Main Routine
Timestepping Solvers (TS)
Nonlinear Solvers (SNES)
Linear Solvers (SLES)
PETSc
PC
KSP
Application
Initialization
Function
Evaluation
User
code
Jacobian
Evaluation
PostProcessing
PETSc code
CMSC34000 Lecture 2, 6 Jan 2005
User Code/PETSc library interactions
Main Routine
Timestepping Solvers (TS)
Nonlinear Solvers (SNES)
Linear Solvers (SLES)
PETSc
PC
KSP
Application
Initialization
Function
Evaluation
User
code
Jacobian
Evaluation
PETSc code
PostProcessing
Can be AD code
CMSC34000 Lecture 2, 6 Jan 2005
1999 Bell Prize for unstructured grid
computational aerodynamics
Transonic “Lambda” Shock, Mach contours on surfaces
mesh c/o D. Mavriplis,
ICASE
Implemented in PETSc
www.mcs.anl.gov/petsc
CMSC34000 Lecture 2, 6 Jan 2005
Fixed-size parallel scaling results
128 nodes
43min
Four orders
of magnitude
in 13 years
3072 nodes
2.5min,
226Gf/s
11M unknowns
70% efficient
c/o K. Anderson, W. Gropp,
D. Kaushik, D. Keyes and
B. Smith
CMSC34000 Lecture 2, 6 Jan 2005
Three “war stories” from magnetic fusion
energy applications in SciDAC
Physical models based on fluid-like magnetohydrodynamics (MHD)
B
E divb B
t
E V B J
0 J B
n
nV 0 Dn
t
V
V V J B p V
t
n T
V T p V n || bˆ bˆ I T Q
1 t
CMSC34000 Lecture 2, 6 Jan 2005
Challenges in magnetic fusion
• Conditions of interest possess two properties that pose
great challenges to numerical approaches—anisotropy and
stiffness.
• Anisotropy produces subtle balances of large forces,
and vastly different parallel and perpendicular
transport properties.
• Stiffness reflects the vast range of time-scales in the
system: targeted physics is slow (~transport scale)
compared to waves
CMSC34000 Lecture 2, 6 Jan 2005
Tokamak/stellerator simulations
Center for Extended MHD Modeling (based at
Princeton Plasma Physics Lab) M3D code
Realistic toroidal geom., unstructured mesh,
hybrid FE/FD discretization
Fields expanded in scalar potentials, and
streamfunctions
Operator-split, linearized, w/11 potential
solves in each poloidal cross-plane/step (90%
exe. time)
Parallelized w/PETSc (Tang et al., SIAM
PP01, Chen et al., SIAM AN02, Jardin et al.,
SIAM CSE03)
Want from TOPS:
Now: scalable linear implicit solver for much
higher resolution (and for AMR)
Later: fully nonlinearly implicit solvers and
coupling to other codes
CMSC34000 Lecture 2, 6 Jan 2005
Provided new solvers across existing interfaces
Hypre in PETSc
SuperLU_DIST in PETSc
codes with PETSc interface (like CEMM’s M3D) can now
invoke Hypre routines as solvers or preconditioners with
command-line switch
as above, with SuperLU_DIST
Hypre in AMR Chombo code
so far, Hypre is level-solver only; its AMG will ultimately
be useful as a bottom-solver, since it can be coarsened
indefinitely without attention to loss of nested geometric
structure; also FAC is being developed for AMR uses, like
Chombo
CMSC34000 Lecture 2, 6 Jan 2005
Hypre: multilevel preconditioning
smoother
A Multigrid V-cycle
Finest Grid
Restriction
transfer from
fine to coarse
grid
coarser grid has fewer cells
(less work & storage)
First Coarse
Grid
Prolongation
transfer from coarse
to fine grid
Recursively apply this
idea until we have an
easy problem to solve
CMSC34000 Lecture 2, 6 Jan 2005
Hypre’s AMG in M3D
PETSc-based PPPL code M3D has been retrofit with Hypre’s algebraic
MG solver of Ruge-Steuben type
Iteration count results below are averaged over 19 different PETSc
SLESSolve calls in initialization and one timestep loop for this operator
split unsteady code, abcissa is number of procs in scaled problem; problem
size ranges from 12K to 303K unknowns (approx 4K per processor)
700
600
500
400
ASM-GMRES
AMG-FMGRES
300
200
100
0
3
12
27
48
75
CMSC34000 Lecture 2, 6 Jan 2005
Hypre’s AMG in M3D
Scaled speedup timing results below are summed over 19 different PETSc
SLESSolve calls in initialization and one timestep loop for this operator split
unsteady code
Majority of AMG cost is coarse-grid formation (preprocessing) which does not
scale as well as the inner loop V-cycle phase; in production, these coarse hierarchies
will be saved for reuse (same linear systems are called in each timestep loop),
making AMG much less expensive and more scalable
60
50
40
ASM-GMRES
AMG-FMGRES
AMG inner (est)
30
20
10
0
3
12
27
48
75
CMSC34000 Lecture 2, 6 Jan 2005
Hypre’s “Conceptual Interfaces”
Linear System Interfaces
Linear Solvers
GMG, ...
FAC, ...
Hybrid, ...
AMGe, ...
ILU, ...
unstruc
CSR
Data Layout
structured
composite
block-struc
(Slide c/o E. Chow, LLNL)
CMSC34000 Lecture 2, 6 Jan 2005
SuperLU in NIMROD
NIMROD is another MHD code in the CEMM collaboration
employs high-order elements on unstructured grids
very poor convergence with default Krylov solver on 2D poloidal
crossplane linear solves
TOPS wired in SuperLU, just to try a sparse direct solver
Speedup of more than 10 in serial, and about 8 on a
modest parallel cluster (24 procs)
PI Dalton Schnack (General Atomics) thought he entered a
time machine
SuperLU is not a “final answer”, but a sanity check
Parallel ILU under Krylov should be superior
CMSC34000 Lecture 2, 6 Jan 2005
2D Hall MHD sawtooth instability
(PETSc examples /snes/ex29.c and /sles/ex31.c)
(Porcelli et al., 1993, 1999)
Model equations:
Vorticity, early time
Vorticity, later time
zoom
Equilibrium:
CMSC34000 Lecture 2, 6 Jan 2005
(figures c/o A. Bhattacharjee, CMRS)
PETSc’s DMMG in Hall MR application
Implicit code (snes/ex29.c)
versus explicit code (sles/ex31.c),
both with second-order
integration in time
Implicit code (snes/ex29.c) with
first- and second-order integration
in time
CMSC34000 Lecture 2, 6 Jan 2005
Abstract Gantt Chart for TOPS
Each color module represents an algorithmic research idea on its way to becoming part of a supported
community software tool. At any moment (vertical time slice), TOPS has work underway at multiple levels.
While some codes are in applications already, they are being improved in functionality and performance as
part of the TOPS research agenda.
Dissemination
Applications Integration
e.g.,PETSc
Hardened Codes
e.g.,TOPSLib
Research Implementations
e.g., ASPIN
Algorithmic Development
time
CMSC34000 Lecture 2, 6 Jan 2005
Jacobian-free Newton-Krylov
In the Jacobian-Free Newton-Krylov (JFNK) method, a
Krylov method solves the linear Newton correction
equation, requiring Jacobian-vector products
These are approximated by the Fréchet derivatives
1
J (u )v [ F (u v) F (u )]
(where
is chosen with a fine balance between
approximation and floating point rounding error) or
automatic differentiation, so that the actual Jacobian
elements are never explicitly needed
One builds the Krylov space on a true F’(u) (to within
numerical approximation)
CMSC34000 Lecture 2, 6 Jan 2005
Philosophy of Jacobian-free NK
To evaluate the linear residual, we use the true F’(u) , giving a
true Newton step and asymptotic quadratic Newton
convergence
To precondition the linear residual, we do anything convenient
that uses understanding of the dominant physics/mathematics
in the system and respects the limitations of the parallel
computer architecture and the cost of various operations:
Jacobian of lower-order discretization
Jacobian with “lagged” values for expensive terms
Jacobian stored in lower precision
Jacobian blocks decomposed for parallelism
Jacobian of related discretization
operator-split Jacobians
physics-based preconditioning
CMSC34000 Lecture 2, 6 Jan 2005
Recall idea of preconditioning
Krylov iteration is expensive in memory and in
function evaluations, so subspace dimension k must be
kept small in practice, through preconditioning the
Jacobian with an approximate inverse, so that the
product matrix has low condition number in
1
1
( B A) x B b
1
B
Given the ability to apply the action of
to a
vector, preconditioning can be done on either the left,
as above, or the right, as in, e.g., for matrix-free:
1
JB v [ F (u B v) F (u )]
1
1
CMSC34000 Lecture 2, 6 Jan 2005
Physics-based preconditioning
In Newton iteration, one seeks to obtain a correction
(“delta”) to solution, by inverting the Jacobian
matrix on (the negative of) the nonlinear residual:
u [ J (u )] F (u )
k
k
1
A typical operator-split code also derives a “delta” to
the solution, by some implicitly defined means,
through a series of implicit and explicit substeps
F (u ) u
k
k
k
This implicitly defined mapping from residual to
“delta” is a natural preconditioner
Software must accommodate this!
CMSC34000 Lecture 2, 6 Jan 2005
Physics-based preconditioning
We consider a standard “dynamical
core,” the shallow-water wave splitting
algorithm, as a solver
Leaves a first-order in time splitting
error
In the Jacobian-free Newton-Krylov
framework, this solver, which maps a
residual into a correction, can be
regarded as a preconditioner
The true Jacobian is never formed yet
the time-implicit nonlinear residual at
each time step can be made as small as
needed for nonlinear consistency in
long time integrations
CMSC34000 Lecture 2, 6 Jan 2005
Example: shallow water equations
(u )
0
t
x
Continuity (*)
Momentum (**)
(u ) (u 2 )
g
0
t
x
x
x
These equations admit a fast gravity wave, as can be
seen by cross differentiating, e.g., (*) by t and (**) by
x, and subtracting:
t
2
2
g
otherterm s
2
2
t
x
CMSC34000 Lecture 2, 6 Jan 2005
1D shallow water equations, cont.
Wave equation for geopotential:
2
2
g
otherterm s
2
2
t
x
Gravity wave speed
g
g u , but stability restrictions
Typically
would require timesteps based on the CourantFriedrichs-Levy (CFL) criterion for the fastest wave,
for an explicit method
One can solve fully implicitly, or one can filter out the
gravity wave by solving semi-implicitly
CMSC34000 Lecture 2, 6 Jan 2005
1D shallow water equations, cont.
Continuity (*)
Momentum (**)
n 1 n (u ) n 1
0
x
(u ) n 1 (u ) n
n 1
(u 2 ) n
g n
0
x
x
Solving (**) for (u ) n 1 and substituting into (*),
where
n 1
g
S
n
n
(
)
x
x
x
n 1
2
Sn
n
2
n
(
u
)
(u ) n
x
CMSC34000 Lecture 2, 6 Jan 2005
1D shallow water equations, cont.
After the parabolic equation is spatially discretized and
solved for n 1, then (u ) n 1can be found from
n 1
(u ) n 1 g n
Sn
x
One scalar parabolic solve and one scalar explicit update
replace an implicit hyperbolic system
This semi-implicit operator splitting is foundational to
multiple scales problems in geophysical modeling
Similar tricks are employed in aerodynamics (sound
waves), MHD (multiple Alfvén waves), reacting flows
(fast kinetics), etc.
Temporal truncation error remains due to the lagging of
To be dealt with shortly
the advection in (**)
CMSC34000 Lecture 2, 6 Jan 2005
1D Shallow water preconditioning
Define continuity residual for each timestep:
n 1 n (u ) n 1
R_
x
Define momentum residual for each timestep:
R_u
(u ) n 1 (u ) n
n 1
(u 2 ) n
g n
x
x
Continuity delta-form (*):
[ (u )]
R_
x
Momentum delta form (**):
(u )
n [ ]
g
R_u
x
CMSC34000 Lecture 2, 6 Jan 2005
1D Shallow water preconditioning, cont.
Solving (**) for
(u ) and substituting into (*),
n [ ]
2
g
(
) R_
( R_u )
x
x
x
After this parabolic equation is solved for , we have
2
(u ) g
n
[ ]
R _u
x
This completes the application of the preconditioner to one
Newton-Krylov iteration at one timestep
Of course, the parabolic solve need not be done exactly;
one sweep of multigrid can be used
See paper by Mousseau et al. (2002) for impressive results
for longtime weather integration
CMSC34000 Lecture 2, 6 Jan 2005
Physics-based preconditioning update
So far, physics-based preconditioning has been
applied to several codes at Los Alamos, in an effort
led by D. Knoll
Summarized in new J. Comp. Phys. paper by Knoll
& Keyes (Jan 2004)
PETSc’s “shell preconditioner” is designed for
inserting physics-based preconditioners, and
PETSc’s solvers underneath are building blocks
CMSC34000 Lecture 2, 6 Jan 2005
Nonlinear Schwarz preconditioning
Nonlinear Schwarz has Newton both inside and
outside and is fundamentally Jacobian-free
It replaces F (u ) 0 with a new nonlinear system
possessing the same root, (u ) 0
Define a correction i (u ) to the i th partition (e.g.,
subdomain) of the solution vector by solving the
following local nonlinear system:
Ri F (u i (u)) 0
where i (u) n is nonzero only in the
components of the i th partition
Then sum the corrections: (u) i i (u) to get
an implicit function of u
CMSC34000 Lecture 2, 6 Jan 2005
Nonlinear Schwarz – picture
F(u)
1
0
1
0
1
1
Ri
Riu RiF
u
CMSC34000 Lecture 2, 6 Jan 2005
Nonlinear Schwarz – picture
F(u)
1
0
1
0
1
1
Ri
1
1
0
0
1
Riu RiF
1
Rj
Rju RjF
u
CMSC34000 Lecture 2, 6 Jan 2005
Nonlinear Schwarz – picture
F(u)
1
0
1
0
1
1
Ri
1
1
0
0
1
Riu RiF
1
Rj
Rju RjF
Fi’(ui)
u
δiu+δju
CMSC34000 Lecture 2, 6 Jan 2005
Nonlinear Schwarz, cont.
It is simple to prove that if the Jacobian of F(u) is
nonsingular in a neighborhood of the desired root then
and (u ) 0have the
F (same
u ) 0unique root
To lead to a Jacobian-free Newton-Krylov algorithm
we need to be able to evaluate for any
:
The residual
u, v n
(u) product
i i (u)
The Jacobian-vector
Remarkably, (Cai-Keyes, 2000)
(u )'itv can be shown that
i ( R J Ri ) Jv
where (u)vand
'
T
J
F
(
u
)
J
R
JR
All required actions are available
ini terms of
i
i
'
T
i
1
i
!
F (u )
CMSC34000 Lecture 2, 6 Jan 2005
Experimental example of nonlinear Schwarz
Stagnation
beyond
critical Re
Difficulty at
critical Re
Newton’s method
Convergence
for all Re
Additive Schwarz Preconditioned Inexact Newton
(ASPIN)
CMSC34000 Lecture 2, 6 Jan 2005
The 2003 SCaLeS initiative
Workshop on a Science-based Case for Large-scale Simulation
Arlington, VA
24-25 June 2003
CMSC34000 Lecture 2, 6 Jan 2005
Charge (April 2003, W. Polansky, DOE):
“Identify rich and fruitful directions for the
computational sciences from the perspective of
scientific and engineering applications”
Build a “strong science case for an ultra-scale
computing capability for the Office of Science”
“Address major opportunities and challenges
facing computational sciences in areas of
strategic importance to the Office of Science”
“Report by July 30, 2003”
CMSC34000 Lecture 2, 6 Jan 2005
Chapter 1. Introduction
Chapter 2. Scientific Discovery
through Advanced Computing: a
Successful Pilot Program
Chapter 3. Anatomy of a Largescale Simulation
Chapter 4. Opportunities at the
Scientific Horizon
First
fruits!
Chapter 5. Enabling Mathematics
and Computer Science Tools
Chapter 6. Recommendations and
Discussion
Volume 2 (due out early 2004):
11 chapters on applications
8 chapters on mathematical
methods
8 chapters on computer science and
infrastructure
“There will be opened a gateway and a road to a
large and excellent science
into which minds more piercing than mine shall
penetrate to recesses still deeper.”
Galileo (1564-1642)
(on ‘experimental mathematical analysis of nature’
appropriated here for ‘simulation science’)
CMSC34000 Lecture 2, 6 Jan 2005
Related URLs
TOPS project
http://www.tops-scidac.org
SciDAC initiative
http://www.science.doe.gov/scidac
SCaLeS report
http://www.pnl.gov/scales
CMSC34000 Lecture 2, 6 Jan 2005