VizSchema, the FACETS project and interactions with SWIM

Download Report

Transcript VizSchema, the FACETS project and interactions with SWIM

Applied Math Issues in FACETS
FACETS SciDAC Review
May 14, 2009
Speaker:
Core:
–
–
–
–
–
–
John Cary, Johan Carlsson, Tech-X: Core solver
Srinath Vadlamani, Tech-X: Turbulent flux computation via FMCFM
Ammar Hakim, Mahmood Miah, Tech-X: FACETS infrastructure
Allen Malony, Alan Morris, Sameer Shende, Paratools: Performance analysis
Greg Hammett, PPPL: Suggesting stable time stepping schemes
Alexei Pankin, Lehigh University: Providing core transport benchmark against ASTRA
Edge:
–
–
–
–
–
–
–
Tom Rognlien, LLNL
John Cary et al., Tech-X: FACETS integration
Ron Cohen, LLNL: Edge physics, scripting
Hong Zhang, ANL: Nonlinear solvers
Satish Balay, ANL: Portability and systems issues via TOPS, no FACETS funding
Maxim Umansky, LLNL: BOUT physics, no FACETS funding
Sean Farley, LSU: Math grad student, summer 2008 at ANL + ongoing: BOUT/PETSc interface
Mike McCourt, IIT and Cornell: Applied math grad student, summer 2007 at ANL: UEDGE/PETSc interface
Coupling:
−
−
Lois Curfman McInnes, ANL
Alexander Pletzer, Tech-X
Don Estep, CSU
Du Pham, Simon Tavener, CSU: Analysis of stability and accuracy issues in coupling
Ron Cohen, Tom Rognlien, LLNL: Physics issues in coupling
Nonlinear PDEs pervade FACETS
components
• Initial focus: Fully implicit Newton methods in
– Core (via new core solver, Tech-X)
– Edge (via UEDGE and BOUT, LLNL)
• Discussion emphasizes
– PDE representation of physics
– Parallelization and performance
analysis
– Stability and accuracy issues in
coupling
– Collaborations with SciDAC CETs
and Institutes
• Future work
– Core-edge coupling as we move to
implicit coupling
– Possibly kinetic models in edge physics
via Edge Simulation Laboratory (ESL)
– Possibly wall and sources components
2
TOPS provides enabling technology to FACETS;
FACETS motivates enhancements to TOPS
TOPS Overview
− TOPS develops, demonstrates, and
disseminates robust, quality
engineered, solver software for
high-performance computers
− TOPS institutions: ANL, LBNL,
LLNL, SNL, Columbia U, Southern
Methodist U, U of California Berkeley, U of Colorado - Boulder,
U of Texas - Austin
Towards Optimal Petascale Simulations
PI: David Keyes, Columbia Univ.
www.scidac.gov/math/TOPS.html
Applications
Math
TOPS
FACETS fusion
CS
3
Overall scope of TOPS
• Design and implementation of “solvers”
– Linear solvers
– Eigensolvers
– Nonlinear solvers
(with sensitivity analysis)
Ax  b
– Optimizers
min
u
 ( x, u ) s.t. F ( x, u )  0, u  0
• Software integration
• Performance optimization
Sensitivity Analyzer
PETSc,
SUNDIALS,
Trilinos
Time
integrator
F ( x, p)  0
f ( x, x, t , p)  0
SUNDIALS, Trilinos
Optimizer
Ax  Bx
– Time integrators
(with sensitivity analysis)
TAO, Trilinos
PARPACK,
SuperLU,
Trilinos
PETSc,
Trilinos
Nonlinear
solver
Linear
solver
Eigensolver
hypre,
PETSc,
SuperLU,
Trilinos
Indicates dependence
Primary emphasis of TOPS
numerical software
4
Nonlinear PDEs in Core and Edge
Components
Core: 1D conservation laws:
q
  F  s
t
where q = {plasma density,
electron energy density,
ion energy density}


F = fluxes, including
neoclassical diffusion,

electron/ion temperature,
gradient induced turbulence, etc.
Edge: 2D conservation laws: Continuity, momentum, and
thermal energy equations for electrons and ions:
n
p
  (ne,iv e,i )  Se,i
, where n e,i & v e,i are electron and ion
t
densities and mean velocities
ve,i
 me,i ne,ive,i ve,i  pe,i qne,i (E  ve,i  B /c)
t
m
  e,i Re,i  Se,i


where me,i , pe,i , Te,i are masses, pressures, temperatures
nme,i
q, E, B are particle charge, electric & mag. fields
m
are viscous tensors, thermal forces, source
 e,i , Re,i , Se,i

3 Te,i 3
 2 n t  2 nve,i Te,i  pe,i ve,i  qe,i  e,i ve,i  Qe,i

s = particle and heating sources
where qe,i , Qe,i are heat fluxes & volume heating terms
and sinks
 Also neutral gas equation
Challenges: highly nonlinear

fluxes
Challenges: extremely anisotropic transport, extremely strong
 nonlinearities, large range of spatial and temporal scales
Dominant computation of each can be expressed as nonlinear PDE: Solve F(u) = 0,
where u represents the fully coupled vector of unknowns
5
FACETS/TOPS collaboration focuses on
nonlinearly implicit methods
• Popular nonlinear solution approaches
– Explicit Methods
• Splitting of coupled variables
– Often by equation or by coordinate direction
– Motivated by desire to solve complicated
problems with limited computer resources
– Semi-Implicit Methods
• Maintain some variable couplings
– Fully Implicit Methods
• Maintain all variable couplings
• For example, preconditioned Newton-Krylov methods
• Implicit algorithms have demonstrated
efficient and scalable solution for many
magnetic fusion energy problems
6
Newton-Krylov methods are efficient and
robust
l-1 )
l 1 l-1
l ) dul l1
F’(u
=
–
F(u
F'(u
)

u


F(u
)
• Newton: Solve:
l l du
l +
Update: ul uull1= uul-1
• Krylov: Projection methods for solving linear
systems, Ax=b, using the Krylov subspace
2
j1
K j  span(r
,
Ar
,
A
r
,K
,
A
r0 )
0
 0 0
– Popular methods: GMRES, TFQMR, BiCGStab, CG, etc.
• Preconditioning: In practice, typically needed


− Transform Ax=b into an equivalent form:
1
or (AB ) (Bx)  b
B1Ax  B1b
where inverse action of B approx. that of A, but at a smaller cost
• Matrix-free: Newton-like convergence without the
cost of computing/storing the true Jacobian, F’(u)

− Krylov: Compute only Jacobian-vector products, F’(u) v
− Preconditioning: Typically use ‘cheaper’ approx. for F’(u) or its
inverse action
77
PETSc provides parallel Newton-Krylov
solvers via SNES
• PETSc: Portable, Extensible Toolkit for
Scientific computation
– www.mcs.anl.gov/petsc
– Targets parallel solution of large-scale PDE-based
problems
• SNES: Scalable Nonlinear Equations Solvers
– Emphasizes Newton-Krylov methods
– Uses high-level abstractions for matrices, vectors,
linear solvers
• Easy to customize and extend
• Supports matrix-free methods
• Facilitates algorithmic experimentation
– Jacobians available via application, Finite
Differences (FD) and Automatic Differentiation (AD)
8
Core and Edge components use PETSc
flexibility via SNES
Solve F(u) = 0: Fully implicit matrix-free Newton-Krylov methods
F'(ul 1) ul   F(ul1 )
ul  ul 1   ul
UEDGE + Core Solver Drivers
(+ Timestepping + Parallel Partitioning)
Options originally
used by UEDGE
– Can choose
from among a
variety of
algorithms and
parallel data
structures
– UEDGE now has
access to many
more parallel
solver options
Nonlinear Solvers (SNES)
GMRES
SSOR
TFQMR
ILU
B-Jacobi
Preconditioners
Krylov Solvers
BCGS
CGS
ASM
Multigrid
BCG
Others…
Others…
Application
Initialization
Matrices
Vectors
AIJ
Sequential
B-AIJ
Parallel
Diagonal
Others…
Dense
PETSc
Function
Evaluation
PostProcessing
Matrix-free
Others…
Application
code
PETSc
code
Jacobian
Evaluation
application or PETSc for
Jacobian (finite differencing)
9
Challenges in nonlinear solvers for core
• Plasma core is the region well inside the separatrix
• Transport along field lines >> perpendicular transport, leading to
homogenization in poloidal direction
• Core satisfies set of 1D conservation laws: q   F  s
t
q = {plasma density, electron energy density, ion energy density}
F = highly nonlinear fluxes including neoclassical
diffusion, electron/ion temperature gradient
induced turbulence, etc.
s = particle and heating sources and sinks 
− New FACETS capability: get s from NUBEAM
via core/sources coupling
hot plasma core
separatrix
10
Implicit core solver applies nested
iteration with parallel flux computation
• Extremely nonlinear fluxes lead to stiff profiles (can
be numerically challenging)
– Implicit time stepping for stability
– Coarse-grain solution easier to find
– Nested iteration used to obtain
fine-grain solution
– Flux computation typically very
expensive, but problem dimension
relatively small
– Parallelization of flux computation
across “workers” …“manager”
solves nonlinear equations on
1 proc using PETSc/SNES
Nonlinear solve
• Runtime flexibility in assembly of time integrator
(including any diagonally implicit Runge-Kutta
scheme) for improved accuracy
11
Flexibility of FACETS framework allows users to explore
time stepping schemes with no change to source
code
• Explicit method is unstable
• Crank-Nicholson is marginally stable
• Use BDF1 for stability and accuracy
• Other schemes, e.g., various IMEXSSP
can be coded at runtime.
Stable to ETG
modes
Ion temperature
Nested iteration
improves robustness
of nonlinear solver
(unstable)
Sharp kink develops
Ref: A. Pletzer, et.al., "Benchmarking the parallel
FACETS core solver," Poster presented at the
50th Annual Meeting of the Division of Plasma
Physics, Dallas, TX, November 17-21, 2008.
Radial coordinate
12
Participation of Paratools identified
performance bottleneck in core solver
• Load imbalance
responsible for
lack of
scalability at
high processor
count (128)
• Also, careful
profiling
identifies
redundant flux
computation at
low processor
count (8)
Paratools (A. Malony et al.)
affiliated with the SciDAC
Performance Engineering
Research Institute (PERI)
13
Challenges in nonlinear solvers for edge
•UEDGE Features
– Multispecies plasma; variables ni,e, u||i,e, Ti,e for
particle density, parallel momentum, and energy
balances
– Reduced Navier-Stokes or Monte Carlo neutrals
– Multi-step ionization and recombination
– Finite volume discretization; non-orthogonal mesh
– Steady-state or time dependent
•UEDGE Issues
– Strong nonlinearities
– Parallel Jacobian
computations
14
More complete parallel Jacobian data enables robust
solution for problems with strong nonlinearities
• New capability: Computing parallel Jacobian
matrix using matrix coloring for finite diff.
UEDGE parallel partitioning
– More complete parallel Jacobian data enables
more robust parallel preconditioners
• Impact
– Enables inclusion of neutral gas equation
(difficult for highly anisotropic mesh, not
possible in prior parallel UEDGE approach)
– Useful for cross-field drift cases
Poloidal distance
Missing Jacobian elements
5 equations:
ion density,
ion velocity,
gas density
diffusion,
electron
temp, ion
temp
Previous parallel
UEDGE Jacobian
(Block Jacobi only)
Recent progress:
Complete parallel
Jacobian data
8 proc: Matrix-free Newton w. GMRES: Block Jacobi
stagnates; complete Jacobian data enables convergence
15
Computational experiments explore efficient and
robust edge solvers
Matrix-free Newton with GMRES,
8-proc case for LU preconditioner:
• 57% time: UEDGE parallel setup (17 sec)
• 43% time: parallel nonlinear solver (13 sec)
• 8%: Create Jacobian data structure,
determine parallel coloring, scaling
• 6%: Compute Jacobian: FD approx via
coloring, including 40 f(u) computations
• 4%: Compute f(u) for RHS + line search
• 25%: Linearized Newton solve: GMRES/LU
via MUMPS (hold Jacobian/PC fixed
for 5 Newton iterations)
Problem size: 24,576 (128x64 mesh w. 3
unknowns per mesh point
Computational environment: Jazz @ ANL:
Myrinet2000 network, 2.4 GHz Pentium Xeon procs
with 1-2 GB of RAM
16
New work with BOUT uses both SUNDIALS
(integrators) and PETSc (preconditioners)
BOUT (BOUndary Turbulence), LLNL
• Motivation and physics
– Radial transport driven by plasma turbulence; BOUT(C++) to provide
fundamental edge model
• 2D UEDGE approx. turbulent diffusion
• 3D BOUT models turbulence in detail
– Ion and electron fluids; electromagnetic
– Full tokamak edge cross-section
• Numerics and tools
– Finite-difference; 2D parallel partitioning
– Time dep; implicit PVODE/CVODE
– Can couple turbulent fluxes to UEDGE
• Current status within FACETS
− Parallel BOUT/PETSc/SUNDIALS
verified against original BOUT
− Transitioning to BOUT++
− Experimenting with preconditioners
BOUT edge density turbulence, ni/ni
17
Preliminary investigation of model problems reveals
stability issues arising from coupling
Simple model problem
Explicit coupling
• Implicit Euler for each
component solve
• “Nonoverlapping” coupling
strategy
• 512 cells in each component
Weak instability
• There is a weak instability for equal diffusion constants
18
Numerical analysis tasks over the next year
• Devise and analyze a sequence of model problems
– The model problems should have increasing complexity
• Two coupled heat equations in one dimension with various
coupling strategies
• Coupled one-dimension – two dimension heat equations
• Add strong inhomogeneous behavior “parallel” to the interface
boundary
• Add complications: rapid changes in diffusion in the interior,
nonlinear diffusion, multirate time integration
– Conduct numerical studies using “manufactured” solutions
with realistic behavior for various coupling strategies
– Carry out rigorous stability analysis for various coupling
strategies and general solutions
– Carry out analogous tests for FACETS codes
• Extend a posteriori analysis techniques to finite
volume methods for coupled problems
– Apply to nonlinear problems with realistic discretizations
by computing stability
19
FACETS motivates new PETSc capabilities
that benefit the general community
• New features included in Dec 2008 release of
PETSc-3.0
– SNES: limit Newton updates based on applicationdefined criteria for maximum allowable step
• Needed by UEDGE
– MatFD: parallel interface to matrix coloring for
sparse finite difference Jacobian estimation
• Needed by UEDGE
• New research: FACETS core-edge coupling
inspires support for strong coupling between
models in nonlinear solvers
– multi-model algebraic system specification
– multi-model algebraic system solution
20
FACETS/TOPS work inspires new
research for SciDAC CS/math teams
• General Challenge: How to make sound choices during
runtime among available implementations and parameters,
suitably compromising among
– accuracy, performance, algorithmic robustness, etc.
• FACETS Challenge: How to select and parameterize
preconditioned Newton-Krylov algorithms at runtime based
on problem instance and computational environment?
• Research in Computational Quality of Service (CQoS)
– Goal: Develop general-purpose infrastructure for dynamic
component adaptivity, i.e., composing, substituting, and
reconfiguring running component applications in response to
changing conditions
– Collaboration among SciDAC math/cs teams
• Center for Technology for Advanced Scientific Component Software (TASCS),
Paratools, Performance Engineering Research Institute (PERI), and TOPS
– FACETS-specific capabilities can leverage this infrastructure
21
FACETS collaborations on ‘solvers’ with SciDAC
math/CS teams & CSU are essential
• TOPS, CSU, Paratools, PERI, and TASCS
provide enabling technology to FACETS
– TOPS: Parallel solvers via numerical libraries
– CSU: Insights to stability/accuracy in coupling
– PERI/Paratools: Performance analysis and tuning
– TASCS: Component technology (ref: T. Epperly)
• FACETS motivates new work by CSU, TOPS,
Paratools, PERI, and TASCS
– New CSU research on stability & accuracy issues
– New TOPS library features + algorithmic research
– New capabilities in TASCS/PERI/Paratools for CQoS
22