Technologies and Tools for High

Transcript Technologies and Tools for High

Canonical Problems and
Scalable Solvers for
Numerical PDE
Robert C. Kirby
Department of Computer Science
The University of Chicago
Outline


Canonical problems in Scientific Computing

Algebraic equations

Differential equations

Optimization
Overview of SciDAC/TOPS

Hard problems

Cutting edge research

Funding opportunities
CMSC34000 Lecture 2, 6 Jan 2005
Canonical Problems
Problem
Ax  b

Linear systems
Ax  Bx

Eigenvalue
f ( x, x, t , p)  0

Ordinary DE
F(x)  0

Nonlinear systems

Constrained
optimization
min  ( x, u )
u
s.t.
F ( x, u)  0

CMSC34000 Lecture 2, 6 Jan 2005
Linear Systems

Ax = b

PDE --> Linear System

Nonlinear system --> Sequence of linear steps

Millions of unknowns

Sparse

Ill-conditioned k(A) = || A || || A^-1 ||

Nonsymmetric
CMSC34000 Lecture 2, 6 Jan 2005
Solution techniques


Direct (Gaussian elimination & variants)
Iterative (Sequence of approximations)





Parallelism




Jacobi / Gauss Seidel
SOR
Krylov-subspace methods (sequence of clever matrix-vector
products builds up good approximation
Multigrid (algebraic / geometric)
Each process owns its own rows
Matrix action --> communication
Euler methods (forward, backward)
Runge-Kutta methods
CMSC34000 Lecture 2, 6 Jan 2005
Ordinary differential equations

Time rate of change of system of quantities)

Chemical/nuclear reactions

N-body problems (celestial mechanics)

Time-dependent PDE (after spatial discretization)

Algebraic equations at each time step

“Stiffness”


Wide range of time scales requires special treatment

Fastest modes require smallest time steps
Multistep methods
CMSC34000 Lecture 2, 6 Jan 2005
ODE Techniques

Euler methods (forward, backward)

Runge-Kutta methods

Multi-adaptive Galerkin (Anders Logg)

Parallelism?

Depends on coupling of components

Naturally arises from PDE

Parallelism in time is more difficult…
CMSC34000 Lecture 2, 6 Jan 2005
PDE

Multiple dependent variables
u u  u  p  0
 u  0

Algebraic equations

If time dependent, get ODE or DAE at each

time step
CMSC34000 Lecture 2, 6 Jan 2005
Solution strategies



Finite differences (FDM)

Replace derivatives at points with difference quotient

Naturally leads to equations for each grid point

Irregular, adaptive grids require special treatment
Finite elements (FEM)

Replace “strong form” with “weak form”

Weak form over subspaces leads to algebraic system

Generalizes FDM (geometry, order)
Finite volumes

Discrete conservation laws over each control volume

Often “low order” but “stable”
CMSC34000 Lecture 2, 6 Jan 2005
Nonlinear equations

Arise from nonlinear PDE, ODE, etc

Require Newton’s method
F(u)  F(uc ) F'(uc )u  0
u  uc  u

Jacobian matrix (differentiation)

Nonlinear Preconditioning

CMSC34000 Lecture 2, 6 Jan 2005
Optimization

Find the minimum of some function subject to
some constraints

Constaints may themselves be PDE
CMSC34000 Lecture 2, 6 Jan 2005
Motivation

Solver performance is a major concern for parallel
simulations based on PDE formulations
… including many of those of the U.S. DOE Scientific Discovery
through Advanced Computing (SciDAC) program

For target applications, implicit solvers may require 50% to
95% of execution time
… at least, before “expert” overhaul for algorithmic optimality and
implementation performance


Even after a “best manual practice” overhaul, the solver may
still require 20% to 50% of execution time
The solver may hit up against both the processor scalability
limit and the memory bandwidth limitation of a PDE-based
application, before any other part of the code
… the first of these is not fundamental, though the second one is
CMSC34000 Lecture 2, 6 Jan 2005
Presentation plan

Overview of the SciDAC initiative

Overview of the Terascale Optimal PDE Simulations
project (TOPS)
Brief review of scalable implicit methods (domain
decomposed multilevel iterative methods)



algorithms
software components: PETSc, Hypre, etc.

Three “war stories” from the SciDAC magnetically
confined fusion energy portfolio

Some advanced research directions



physics-based preconditioning
nonlinear Schwarz
On the horizon
CMSC34000 Lecture 2, 6 Jan 2005
SciDAC apps and infrastructure
4 projects
in high
energy and
nuclear
physics
14 projects in
biological and
environmental
research
18 projects in
scientific
software and
network
infrastructure
5 projects
in fusion
energy
science
10 projects
will in
basic
energy
sciences
CMSC34000 Lecture 2, 6 Jan 2005

“Enabling technologies” groups to develop
reusable software and partner with application
groups

From 2001 start-up, 51 projects share $57M/year


Approximately one-third for applications

A third for “integrated software infrastructure
centers”

A third for grid infrastructure and collaboratories
Plus, multi-Tflop/s IBM SP machines at NERSC
and ORNL available for SciDAC researchers
CMSC34000 Lecture 2, 6 Jan 2005
Unclassified resources for DOE science
 IBM Power3+ SMP
 16 procs per node
 6656 procs total
 10 Tflop/s
 “Seaborg”
Berkeley
 IBM Power4 Regatta
 32 procs per node
 864 procs total
 4.5 Tflop/s
 “Cheetah”
Oak Ridge
CMSC34000 Lecture 2, 6 Jan 2005
Designing a simulation code
(from 2001 SciDAC report)
V&V
loop
Performance
loop
CMSC34000 Lecture 2, 6 Jan 2005
A “perfect storm” for simulation
(dates are symbolic)
Hardware Infrastructure
Applications
1686
A
R
C
H
I
T
E
C
T
U
R
E
S
scientific models
1947
numerical algorithms
1976
computer architecture
scientific software engineering
“Computational science is undergoing a phase transition.” – D. Hitchcock, DOE
CMSC34000 Lecture 2, 6 Jan 2005
Imperative: multiple-scale applications

Multiple spatial scales



interfaces, fronts, layers
thin relative to domain
size,  << L
Multiple temporal scales


fast waves
small transit times
relative to convection or
diffusion,  << T
Richtmeyer-Meshkov instability, c/o A. Mirin, LLNL

Analyst must isolate dynamics of interest and model the rest in a
system that can be discretized over more modest range of scales

May lead to infinitely “stiff” subsystem requiring special
treatment by the solution method
CMSC34000 Lecture 2, 6 Jan 2005
Examples: multiple-scale applications

Biopolymers, nanotechnology
 1012 range in time, from 10-15
sec (quantum fluctuation) to 10-3


sec (molecular folding time)
typical computational model
ignores smallest scales, works on
classical dynamics only, but
scientists increasingly want both
Galaxy formation
 1020 range in space from binary

star interactions to diameter of
universe
heroic computational model
handles all scales with localized
adaptive meshing
Supernova simulation, c/o A. Mezzacappa, ORNL

Supernovae simulation

massive ranges in time and
space scales for radiation,
turbulent convection,
diffusion, chemical
reaction, nuclear reaction
CMSC34000 Lecture 2, 6 Jan 2005
SciDAC portfolio characteristics

Multiple temporal scales

Multiple spatial scales

Linear ill conditioning

Complex geometry and severe anisotropy

Coupled physics, with essential nonlinearities

Ambition for uncertainty quantification,
parameter estimation, and design
Need toolkit of portable, extensible, tunable
implicit solvers, not “one-size fits all”
CMSC34000 Lecture 2, 6 Jan 2005
TOPS starting point codes

PETSc (ANL)

Hypre (LLNL)

Sundials (LLNL)

SuperLU (LBNL)

PARPACK (LBNL*)

TAO (ANL)

Veltisto (CMU)

Many interoperability connections between these
packages that predated SciDAC

Many application collaborators that predated SciDAC
CMSC34000 Lecture 2, 6 Jan 2005
TOPS participants
CU
CMU
ANL
UC-B/LBNL
NYU
CU-B
LLNL
ODU
UT-K
TOPS lab (3)
TOPS university (7)
CMSC34000 Lecture 2, 6 Jan 2005
In the old days, see “Templates” guides …
www.netlib.org/templates
124 pp.
www.netlib.org/etemplates
410 pp.
… these are good starts, but not adequate for SciDAC scales!
CMSC34000 Lecture 2, 6 Jan 2005
“integrated software
infrastructure centers”
34 applications
groups
adaptive
gridding
discretization
solvers (TOPS)
Ax  b
Ax  Bx
f ( x, x, t , p)  0
F ( x, p)  0
7 ISIC groups
(4 CS, 3 Math)
10 grid, data
collaboratory
groups
systems
software
component
architecture
performance
engineering
data
management
min  ( x, u )
u
s.t.
F ( x, u)  0
software
integration
performance
optimization
CMSC34000 Lecture 2, 6 Jan 2005
Keyword: “Optimal”
Convergence rate nearly
independent of discretization
parameters



Convergence rate as
independent as possible of
physical parameters



multilevel schemes for rapid linear
convergence of linear problems
Newton-like schemes for quadratic
convergence of nonlinear problems
200
Time to Solution

150
100
50
scalable
0
1
10
100
1000
Problem Size (increasing with number of
processors)
continuation schemes
physics-based preconditioning
Optimal convergence plus
scalable loop body yields
scalable solver
CMSC34000 Lecture 2, 6 Jan 2005
But where to go past O(N) ?

Since O(N) is already optimal, there is nowhere further
“upward” to go in efficiency, but one must extend optimality
“outward,” to more general problems

Hence, for instance, algebraic multigrid (AMG) to seek to
obtain O(N) in indefinite, anisotropic, or inhomogeneous
problems on irregular grids
R
AMG Framework
n
error easily
damped by
pointwise
relaxation
algebraically
smooth error
Choose coarse grids, transfer
operators, and smoothers to
eliminate these “bad”
components within a smaller
dimensional space, and recur
CMSC34000 Lecture 2, 6 Jan 2005
Toolchain for PDE solvers in TOPS project



Design and implementation of “solvers”

Time integrators

(w/ sens. anal.)
Nonlinear solvers

(w/ sens. anal.)
Constrained optimizers
f ( x, x, t , p)  0
F ( x, p)  0
min  ( x, u ) s.t. F ( x, u )  0, u  0
u

Linear solvers

Eigensolvers
Software integration
Performance optimization
Ax  b
Ax  Bx
Optimizer
Sens. Analyzer
Time
integrator
Nonlinear
solver
Eigensolver
Linear
solver
Indicates
dependence
CMSC34000 Lecture 2, 6 Jan 2005
Dominant data structures are grid-based
finite differences
finite
elements
finite volumes
node i
All lead to problems with sparse
Jacobian matrices; many tasks can
leverage off an efficient set of tools
for manipulating distributed sparse
data structures
row i
J=
CMSC34000 Lecture 2, 6 Jan 2005
Newton-Krylov-Schwarz:
a PDE applications “workhorse”
F (u)  F (uc )  F ' (uc )u  0
u  uc   u
u 
M 1 Ju  M 1F
Ju   F
arg min {Jx  F} M 1  i RiT ( Ri JRiT )1 Ri
xV { F , JF , J 2 F ,}
Newton
Krylov
Schwarz
nonlinear solver
asymptotically quadratic
accelerator
spectrally adaptive
preconditioner
parallelizable
CMSC34000 Lecture 2, 6 Jan 2005
SPMD parallelism w/domain decomposition
3
2
1
rows assigned
to proc “2”
A21
A22
A23
Partitioning of the grid
induces block structure
on the Jacobian
CMSC34000 Lecture 2, 6 Jan 2005
Time-implicit Newton-Krylov-Schwarz
For accommodation of unsteady problems, and nonlinear robustness in
steady ones, NKS iteration is wrapped in time-stepping:
Pseudotime
loop
NKS
loop
for (l = 0; l < n_time; l++) {
select time step
for (k = 0; k < n_Newton; k++) {
compute nonlinear residual and Jacobian
for (j = 0; j < n_Krylov; j++) {
forall (i = 0; i < n_Precon ; i++) {
solve subdomain problems concurrently
} // End of loop over subdomains
perform Jacobian-vector product
enforce Krylov basis conditions
update optimal coefficients
check linear convergence
} // End of linear solver
perform DAXPY update
check nonlinear convergence
} // End of nonlinear loop
} // End of time-step loop
CMSC34000 Lecture 2, 6 Jan 2005
(N)KS kernel in parallel
Krylov
iteration
P1:
…
P2:

Pn:
local
scatter
Jac-vec
multiply
Bulk synchronous model leads to easy
scalability analyses and projections.
Each phase can be considered
separately. What happens if, for
instance, in this (schematicized)
iteration, arithmetic speed is doubled,
scalar all-gather is quartered, and local
scatter is cut by one-third?
precond
sweep
inner daxpy
product
P1:
P2:
…

Pn:
CMSC34000 Lecture 2, 6 Jan 2005
Estimating scalability of stencil computations

Given complexity estimates of the leading terms of:




And a bulk synchronous model of the architecture including:



the concurrent computation (per iteration phase)
the concurrent communication
the synchronization frequency
internode communication (network topology and protocol reflecting horizontal
memory structure)
on-node computation (effective performance parameters including vertical
memory structure)
One can estimate optimal concurrency and optimal execution
time


on per-iteration basis, or overall (by taking into account any granularitydependent convergence rate), based on problem size N and concurrency P
simply differentiate time estimate in terms of (N,P) with respect to P, equate to
zero and solve for P in terms of N
CMSC34000 Lecture 2, 6 Jan 2005
Scalability results for DD stencil computations

With tree-based (logarithmic) global reductions and
scalable nearest neighbor hardware:


With 3D torus-based global reductions and scalable
nearest neighbor hardware:


optimal number of processors scales linearly with problem
size
optimal number of processors scales as three-fourths power
of problem size (almost “scalable”)
With common network bus (heavy contention):

optimal number of processors scales as one-fourth power
of problem size (not “scalable”)

bad news for conventional Beowulf clusters, but see 2000
Bell Prize “price-performance awards”, for multiple NICs
CMSC34000 Lecture 2, 6 Jan 2005
NKS efficiently implemented in PETSc’s
MPI-based distributed data structures
Main Routine
Timestepping Solvers (TS)
Nonlinear Solvers (SNES)
Linear Solvers (SLES)
PETSc
PC
KSP
Application
Initialization
Function
Evaluation
User
code
Jacobian
Evaluation
PostProcessing
PETSc code
CMSC34000 Lecture 2, 6 Jan 2005
User Code/PETSc library interactions
Main Routine
Timestepping Solvers (TS)
Nonlinear Solvers (SNES)
Linear Solvers (SLES)
PETSc
PC
KSP
Application
Initialization
Function
Evaluation
User
code
Jacobian
Evaluation
PETSc code
PostProcessing
Can be AD code
CMSC34000 Lecture 2, 6 Jan 2005
1999 Bell Prize for unstructured grid
computational aerodynamics
Transonic “Lambda” Shock, Mach contours on surfaces
mesh c/o D. Mavriplis,
ICASE
Implemented in PETSc
www.mcs.anl.gov/petsc
CMSC34000 Lecture 2, 6 Jan 2005
Fixed-size parallel scaling results
128 nodes
43min
Four orders
of magnitude
in 13 years
3072 nodes
2.5min,
226Gf/s
11M unknowns
70% efficient
c/o K. Anderson, W. Gropp,
D. Kaushik, D. Keyes and
B. Smith
CMSC34000 Lecture 2, 6 Jan 2005
Three “war stories” from magnetic fusion
energy applications in SciDAC
Physical models based on fluid-like magnetohydrodynamics (MHD)
B
   E   divb   B
t
E   V  B  J
0 J    B
n
   nV   0 Dn
t
 V

   V  V   J  B  p   V
 t




n  T
  V  T    p  V    n ||    bˆ bˆ   I  T  Q
  1  t


CMSC34000 Lecture 2, 6 Jan 2005
Challenges in magnetic fusion
• Conditions of interest possess two properties that pose
great challenges to numerical approaches—anisotropy and
stiffness.
• Anisotropy produces subtle balances of large forces,
and vastly different parallel and perpendicular
transport properties.
• Stiffness reflects the vast range of time-scales in the
system: targeted physics is slow (~transport scale)
compared to waves
CMSC34000 Lecture 2, 6 Jan 2005
Tokamak/stellerator simulations

Center for Extended MHD Modeling (based at
Princeton Plasma Physics Lab) M3D code





Realistic toroidal geom., unstructured mesh,
hybrid FE/FD discretization
Fields expanded in scalar potentials, and
streamfunctions
Operator-split, linearized, w/11 potential
solves in each poloidal cross-plane/step (90%
exe. time)
Parallelized w/PETSc (Tang et al., SIAM
PP01, Chen et al., SIAM AN02, Jardin et al.,
SIAM CSE03)
Want from TOPS:


Now: scalable linear implicit solver for much
higher resolution (and for AMR)
Later: fully nonlinearly implicit solvers and
coupling to other codes
CMSC34000 Lecture 2, 6 Jan 2005
Provided new solvers across existing interfaces

Hypre in PETSc


SuperLU_DIST in PETSc


codes with PETSc interface (like CEMM’s M3D) can now
invoke Hypre routines as solvers or preconditioners with
command-line switch
as above, with SuperLU_DIST
Hypre in AMR Chombo code

so far, Hypre is level-solver only; its AMG will ultimately
be useful as a bottom-solver, since it can be coarsened
indefinitely without attention to loss of nested geometric
structure; also FAC is being developed for AMR uses, like
Chombo
CMSC34000 Lecture 2, 6 Jan 2005
Hypre: multilevel preconditioning
smoother
A Multigrid V-cycle
Finest Grid
Restriction
transfer from
fine to coarse
grid
coarser grid has fewer cells
(less work & storage)
First Coarse
Grid
Prolongation
transfer from coarse
to fine grid
Recursively apply this
idea until we have an
easy problem to solve
CMSC34000 Lecture 2, 6 Jan 2005
Hypre’s AMG in M3D

PETSc-based PPPL code M3D has been retrofit with Hypre’s algebraic
MG solver of Ruge-Steuben type

Iteration count results below are averaged over 19 different PETSc
SLESSolve calls in initialization and one timestep loop for this operator
split unsteady code, abcissa is number of procs in scaled problem; problem
size ranges from 12K to 303K unknowns (approx 4K per processor)
700
600
500
400
ASM-GMRES
AMG-FMGRES
300
200
100
0
3
12
27
48
75
CMSC34000 Lecture 2, 6 Jan 2005
Hypre’s AMG in M3D


Scaled speedup timing results below are summed over 19 different PETSc
SLESSolve calls in initialization and one timestep loop for this operator split
unsteady code
Majority of AMG cost is coarse-grid formation (preprocessing) which does not
scale as well as the inner loop V-cycle phase; in production, these coarse hierarchies
will be saved for reuse (same linear systems are called in each timestep loop),
making AMG much less expensive and more scalable
60
50
40
ASM-GMRES
AMG-FMGRES
AMG inner (est)
30
20
10
0
3
12
27
48
75
CMSC34000 Lecture 2, 6 Jan 2005
Hypre’s “Conceptual Interfaces”
Linear System Interfaces
Linear Solvers
GMG, ...
FAC, ...
Hybrid, ...
AMGe, ...
ILU, ...
unstruc
CSR
Data Layout
structured
composite
block-struc
(Slide c/o E. Chow, LLNL)
CMSC34000 Lecture 2, 6 Jan 2005
SuperLU in NIMROD

NIMROD is another MHD code in the CEMM collaboration







employs high-order elements on unstructured grids
very poor convergence with default Krylov solver on 2D poloidal
crossplane linear solves
TOPS wired in SuperLU, just to try a sparse direct solver
Speedup of more than 10 in serial, and about 8 on a
modest parallel cluster (24 procs)
PI Dalton Schnack (General Atomics) thought he entered a
time machine
SuperLU is not a “final answer”, but a sanity check
Parallel ILU under Krylov should be superior
CMSC34000 Lecture 2, 6 Jan 2005
2D Hall MHD sawtooth instability
(PETSc examples /snes/ex29.c and /sles/ex31.c)
(Porcelli et al., 1993, 1999)
Model equations:
Vorticity, early time
Vorticity, later time
zoom
Equilibrium:
CMSC34000 Lecture 2, 6 Jan 2005
(figures c/o A. Bhattacharjee, CMRS)
PETSc’s DMMG in Hall MR application

Implicit code (snes/ex29.c)
versus explicit code (sles/ex31.c),
both with second-order
integration in time

Implicit code (snes/ex29.c) with
first- and second-order integration
in time
CMSC34000 Lecture 2, 6 Jan 2005
Abstract Gantt Chart for TOPS
Each color module represents an algorithmic research idea on its way to becoming part of a supported
community software tool. At any moment (vertical time slice), TOPS has work underway at multiple levels.
While some codes are in applications already, they are being improved in functionality and performance as
part of the TOPS research agenda.
Dissemination
Applications Integration
e.g.,PETSc
Hardened Codes
e.g.,TOPSLib
Research Implementations
e.g., ASPIN
Algorithmic Development
time
CMSC34000 Lecture 2, 6 Jan 2005
Jacobian-free Newton-Krylov

In the Jacobian-Free Newton-Krylov (JFNK) method, a
Krylov method solves the linear Newton correction
equation, requiring Jacobian-vector products

These are approximated by the Fréchet derivatives
1
J (u )v  [ F (u  v)  F (u )]


(where
is chosen with a fine balance between
approximation and floating point rounding error) or
automatic differentiation, so that the actual Jacobian
elements are never explicitly needed

One builds the Krylov space on a true F’(u) (to within
numerical approximation)
CMSC34000 Lecture 2, 6 Jan 2005
Philosophy of Jacobian-free NK


To evaluate the linear residual, we use the true F’(u) , giving a
true Newton step and asymptotic quadratic Newton
convergence
To precondition the linear residual, we do anything convenient
that uses understanding of the dominant physics/mathematics
in the system and respects the limitations of the parallel
computer architecture and the cost of various operations:

Jacobian of lower-order discretization

Jacobian with “lagged” values for expensive terms

Jacobian stored in lower precision

Jacobian blocks decomposed for parallelism

Jacobian of related discretization

operator-split Jacobians

physics-based preconditioning
CMSC34000 Lecture 2, 6 Jan 2005
Recall idea of preconditioning

Krylov iteration is expensive in memory and in
function evaluations, so subspace dimension k must be
kept small in practice, through preconditioning the
Jacobian with an approximate inverse, so that the
product matrix has low condition number in
1
1
( B A) x  B b

1
B
Given the ability to apply the action of
to a
vector, preconditioning can be done on either the left,
as above, or the right, as in, e.g., for matrix-free:
1
JB v  [ F (u  B v)  F (u )]
1
1

CMSC34000 Lecture 2, 6 Jan 2005
Physics-based preconditioning

In Newton iteration, one seeks to obtain a correction
(“delta”) to solution, by inverting the Jacobian
matrix on (the negative of) the nonlinear residual:
u  [ J (u )] F (u )
k

k
1
A typical operator-split code also derives a “delta” to
the solution, by some implicitly defined means,
through a series of implicit and explicit substeps
F (u )  u
k


k
k
This implicitly defined mapping from residual to
“delta” is a natural preconditioner
Software must accommodate this!
CMSC34000 Lecture 2, 6 Jan 2005
Physics-based preconditioning




We consider a standard “dynamical
core,” the shallow-water wave splitting
algorithm, as a solver
Leaves a first-order in time splitting
error
In the Jacobian-free Newton-Krylov
framework, this solver, which maps a
residual into a correction, can be
regarded as a preconditioner
The true Jacobian is never formed yet
the time-implicit nonlinear residual at
each time step can be made as small as
needed for nonlinear consistency in
long time integrations
CMSC34000 Lecture 2, 6 Jan 2005
Example: shallow water equations

 (u )

0
t
x

Continuity (*)

Momentum (**)

 (u )  (u 2 )



 g
0
t
x
x
x

These equations admit a fast gravity wave, as can be
seen by cross differentiating, e.g., (*) by t and (**) by
x, and subtracting:


t
 2
 2
 g
 otherterm s
2
2
t
x
CMSC34000 Lecture 2, 6 Jan 2005
1D shallow water equations, cont.

Wave equation for geopotential:
 2
 2
 g
 otherterm s
2
2
t
x



Gravity wave speed
g
g  u , but stability restrictions
Typically
would require timesteps based on the CourantFriedrichs-Levy (CFL) criterion for the fastest wave,
for an explicit method
One can solve fully implicitly, or one can filter out the
gravity wave by solving semi-implicitly
CMSC34000 Lecture 2, 6 Jan 2005
1D shallow water equations, cont.

Continuity (*)

Momentum (**)
 n 1   n  (u ) n 1

0

x
(u ) n 1  (u ) n


n 1
 (u 2 ) n



 g n
0
x
x
Solving (**) for (u ) n 1 and substituting into (*),

where
n 1
 g

S
n 
n
(
)  
x
x
x
n 1
2
Sn
n
2
n

(
u

)
 (u ) n  
x
CMSC34000 Lecture 2, 6 Jan 2005
1D shallow water equations, cont.


After the parabolic equation is spatially discretized and
solved for  n 1, then (u ) n 1can be found from
n 1


(u ) n 1  g n
 Sn
x
One scalar parabolic solve and one scalar explicit update
replace an implicit hyperbolic system

This semi-implicit operator splitting is foundational to
multiple scales problems in geophysical modeling

Similar tricks are employed in aerodynamics (sound
waves), MHD (multiple Alfvén waves), reacting flows
(fast kinetics), etc.

Temporal truncation error remains due to the lagging of
To be dealt with shortly
the advection in (**)
CMSC34000 Lecture 2, 6 Jan 2005
1D Shallow water preconditioning


Define continuity residual for each timestep:
 n 1   n  (u ) n 1
R_ 


x
Define momentum residual for each timestep:
R_u 

(u ) n 1  (u ) n

n 1
 (u 2 ) n



 g n
x
x
Continuity delta-form (*):

[ (u )]

  R_

x

Momentum delta form (**):
 (u )
n [ ]
 g
  R_u

x
CMSC34000 Lecture 2, 6 Jan 2005
1D Shallow water preconditioning, cont.

Solving (**) for
 (u ) and substituting into (*),

n [  ]
2 
  g
(
)   R_  
( R_u )
x
x
x
 After this parabolic equation is solved for  , we have
2
 (u )  g
n
[ ]
 R _u
x

This completes the application of the preconditioner to one
Newton-Krylov iteration at one timestep

Of course, the parabolic solve need not be done exactly;
one sweep of multigrid can be used

See paper by Mousseau et al. (2002) for impressive results
for longtime weather integration
CMSC34000 Lecture 2, 6 Jan 2005
Physics-based preconditioning update

So far, physics-based preconditioning has been
applied to several codes at Los Alamos, in an effort
led by D. Knoll

Summarized in new J. Comp. Phys. paper by Knoll
& Keyes (Jan 2004)

PETSc’s “shell preconditioner” is designed for
inserting physics-based preconditioners, and
PETSc’s solvers underneath are building blocks
CMSC34000 Lecture 2, 6 Jan 2005
Nonlinear Schwarz preconditioning




Nonlinear Schwarz has Newton both inside and
outside and is fundamentally Jacobian-free
It replaces F (u )  0 with a new nonlinear system
possessing the same root, (u )  0
Define a correction  i (u ) to the i th partition (e.g.,
subdomain) of the solution vector by solving the
following local nonlinear system:
Ri F (u   i (u))  0
where  i (u)  n is nonzero only in the
components of the i th partition
Then sum the corrections: (u)  i  i (u) to get
an implicit function of u
CMSC34000 Lecture 2, 6 Jan 2005
Nonlinear Schwarz – picture
F(u)
1
0
1
0
1
1
Ri
Riu RiF
u
CMSC34000 Lecture 2, 6 Jan 2005
Nonlinear Schwarz – picture
F(u)
1
0
1
0
1
1
Ri
1
1
0
0
1
Riu RiF
1
Rj
Rju RjF
u
CMSC34000 Lecture 2, 6 Jan 2005
Nonlinear Schwarz – picture
F(u)
1
0
1
0
1
1
Ri
1
1
0
0
1
Riu RiF
1
Rj
Rju RjF
Fi’(ui)
u
δiu+δju
CMSC34000 Lecture 2, 6 Jan 2005
Nonlinear Schwarz, cont.

It is simple to prove that if the Jacobian of F(u) is
nonsingular in a neighborhood of the desired root then
and (u )  0have the
F (same
u )  0unique root
To lead to a Jacobian-free Newton-Krylov algorithm
we need to be able to evaluate for any
:
 The residual
u, v  n
(u) product
i  i (u)
 The Jacobian-vector
Remarkably, (Cai-Keyes, 2000)
(u )'itv can be shown that

 i ( R J Ri ) Jv
where  (u)vand
'
T
J

F
(
u
)
J

R
JR
All required actions are available
ini terms of
i
i


'
T
i
1
i
!
F (u )
CMSC34000 Lecture 2, 6 Jan 2005
Experimental example of nonlinear Schwarz
Stagnation
beyond
critical Re
Difficulty at
critical Re
Newton’s method
Convergence
for all Re
Additive Schwarz Preconditioned Inexact Newton
(ASPIN)
CMSC34000 Lecture 2, 6 Jan 2005
The 2003 SCaLeS initiative
Workshop on a Science-based Case for Large-scale Simulation
Arlington, VA
24-25 June 2003
CMSC34000 Lecture 2, 6 Jan 2005
Charge (April 2003, W. Polansky, DOE):




“Identify rich and fruitful directions for the
computational sciences from the perspective of
scientific and engineering applications”
Build a “strong science case for an ultra-scale
computing capability for the Office of Science”
“Address major opportunities and challenges
facing computational sciences in areas of
strategic importance to the Office of Science”
“Report by July 30, 2003”
CMSC34000 Lecture 2, 6 Jan 2005

Chapter 1. Introduction
Chapter 2. Scientific Discovery
through Advanced Computing: a
Successful Pilot Program

Chapter 3. Anatomy of a Largescale Simulation

Chapter 4. Opportunities at the
Scientific Horizon

First
fruits!
Chapter 5. Enabling Mathematics
and Computer Science Tools

Chapter 6. Recommendations and
Discussion

Volume 2 (due out early 2004):

11 chapters on applications
8 chapters on mathematical
methods

8 chapters on computer science and
infrastructure

“There will be opened a gateway and a road to a
large and excellent science
into which minds more piercing than mine shall
penetrate to recesses still deeper.”
Galileo (1564-1642)
(on ‘experimental mathematical analysis of nature’
appropriated here for ‘simulation science’)
CMSC34000 Lecture 2, 6 Jan 2005
Related URLs

TOPS project
http://www.tops-scidac.org

SciDAC initiative
http://www.science.doe.gov/scidac

SCaLeS report
http://www.pnl.gov/scales
CMSC34000 Lecture 2, 6 Jan 2005

Technologies and Tools for High

Transcript Technologies and Tools for High

Directory