Transcript Document

Progress in Unstructured Mesh
Techniques
Dimitri J. Mavriplis
Department of Mechanical Engineering
University of Wyoming
and
Scientific Simulations
Laramie, WY
Overview
• NSU3D Unstructured Multigrid Navier-Stokes Solver
– 2nd order finite-volume discretization
– Fast steady state solutions
• (~100M pts in 15 minutes NASA Columbia Supercomputer)
– Extension to Design Optimization
– Extension to Aeroelasticity
• Enabling techniques: Accuracy and Efficiency
• High-Order Discontinuous Galerkin Methods (Longer term)
–
–
–
–
High accuracy discretizations through increased p order
Fast combined h-p multigrid solver
Steady-State (2-D and 3-D Euler)
Unsteady Time-Implicit (2-D Euler)
NSU3D Discretization
• Vertex based unstructured meshes
– Finite volume / finite element
• Arbitrary Elements
– Single edge-based data structure
• Central Difference with matrix dissipation
• Roe solver with MUSCL reconstruction
NSU3D Spatial Discretization
• Mixed Element Meshes
– Tetrahedra, Prisms, Pyramids, Hexahedra
• Control Volume Based on Median Duals
– Fluxes based on edges
– Single edge-based data-structure represents all element
types
Mixed-Element Discretizations
• Edge-based data structure
–
–
–
–
Building block for all element types
Reduces memory requirements
Minimizes indirect addressing / gather-scatter
Graph of grid = Discretization stencil
• Implications for solvers, Partitioners
NSU3D Convergence Acceleration
Methods for Steady-State (and
Unsteady) Problems
• Multigrid Methods
– Fully automated agglomeration techniques
– Provides convergence rates independent of grid
size (usually < 500 MG Cycles)
• Implicit Line Solver
– Used on each MG Level
– Reduces stiffness due to grid anisotropy in
Blayer
• No Wall Fctns
Multigrid Methods
• High-frequency (local) error rapidly reduced by explicit
methods
• Low-frequency (global) error converges slowly
• On coarser grid:
– Low-frequency viewed as high frequency
Multigrid Correction Scheme
(Linear Problems)
Coarse Level Construction
• Agglomeration Multigrid solvers for unstructured meshes
– Coarse level meshes constructed by agglomerating fine grid
cells/equations
Anisotropy Induced Stiffness
• Convergence rates for RANS (viscous)
problems much slower then inviscid
flows
– Mainly due to grid stretching
– Thin boundary and wake regions
– Mixed element (prism-tet) grids
• Use directional solver to relieve stiffness
– Line solver in anisotropic regions
Directional Solver for Navier-Stokes Problems
• Line Solvers for Anisotropic Problems
– Lines Constructed in Mesh using weighted graph algorithm
– Strong Connections Assigned Large Graph Weight
– (Block) Tridiagonal Line Solver similar to structured grids
Multigrid Line Solver Convergence
• DLR-F4 wing-body, Mach=0.75, 1o, Re=3M
– Baseline Mesh: 1.65M pts
Parallelization through Domain
Decomposition
• Intersected edges resolved by ghost vertices
• Generates communication between original and ghost vertex
– Handled using MPI and/or OpenMP (Hybrid implementation)
– Local reordering within partition for cache-locality
• Multigrid levels partitioned independently
– Match levels using greedy algorithm
– Optimize intra-grid communication vs inter-grid communication
Partitioning
• (Block) Tridiagonal Lines solver inherently sequential
• Contract graph along implicit lines
• Weight edges and vertices
• Partition contracted graph
• Decontract graph
– Guaranteed lines never broken
– Possible small increase in imbalance/cut edges
NASA Columbia Supercluster
• 20 SGI Atix Nodes
–
–
–
–
512 Itanium2 cpus each
1 Tbyte memory each
1.5Ghz / 1.6Ghz
Total 10,240 cpus
• 3 Interconnects
– SGI NUMAlink (shared
memory in node)
– Infiniband (across nodes)
– 10Gig Ethernet (File I/O)
• Subsystems:
– 8 Nodes: Double density Altix
3700BX2
– 4 Nodes: NUMAlink4
interconnect between nodes
• BX2 Nodes, 1.6GHz cpus
NSU3D TEST CASE
•
•
•
•
Wing-Body Configuration
72 million grid points
Transonic Flow
Mach=0.75, Incidence = 0 degrees, Reynolds number=3,000,000
NSU3D Scalability
• 72M pt grid
– Assume perfect speedup
on 128 cpus
• Good scalability up to
2008 using NUMAlink
– Superlinear !
• Multigrid slowdown due
to coarse grid
communication
• ~3TFlops on 2008 cpus
Single Grid Performance up to 4016 cpus
First real world
application on Columbia
using > 2048 cpus
•
•
•
•
1 OMP possible for IB on 2008 (8 hosts)
2 OMP required for IB on 4016 (8 hosts)
Good scalability up to 4016
5.2 Tflops at 4016
Unstructured NS Solver/NASA
Columbia Supercomputer
• ~100M pt solutions in 15 minutes
• 109 pt solutions can become routine
– Ease other bottlenecks (I/O for 109 pts = 400 GB)
• High resolution MDO
• High resolution Aeroelasticity
Enabling Techniques
• Design Optimization
– Robust Mesh Deformation (Linear elasticity)
– Discrete Adjoint for Flow equations
– Discrete Adjoint for Mesh Motion Equations
• Mesh sensitivites (Park and Nielsen)
– Line-Implicit Agglomeration Multigrid Solver
• Flow, flow adjoint, mesh motion, mesh adjoint
– Duality preserving formulation
• Adjoint discretization requires almost no additional memory over
first order-Jacobian used for implicit solver
• Modular (subroutine) construction for adjoint and mesh
sensitivities
– dR/dx = dr/d(edge) . d(edge)/dx
• Similar convegence rates for tangent and adjoint problems
Enabling Techniques
• Aeroelasticity
– Robust Mesh Deformation (Linear elasticity)
– Line-Implicit Agglomeration Multigrid Solver
• Flow (implicit time step), mesh motion
• Linear multigrid formulation
– High-order temporal discretization
• Backwards Difference (up to 3rd order)
• Implicit Runge-Kutta (up to fourth order)
• Formulation of Geometric Conservation Law for highorder time-stepping
– Necessary for non-linear stability
Mesh Motion
• Developed for MDO and Aeroelasticity Problems
• Emphasis on Robustness
– Spring Analogy
– Truss Analogy, Beam Analogy
– Linear Elasticity: Variable Modulus
• Emphasis on Efficiency
– Edge based formulation
– Gauss Seidel Line Solver with Agglomeration
Multigrid
– Fully integrated into flow solver
Formulation
• Mesh motion strategies
– Tension spring analogy
(xi ) m
k


j
ij
((x j ) m  ( xi ) m )
k
j
ij
1
, m  1, 2, 3 and kij  p
Lij
Laplace equation, maximum principle,
incapable of reproducing solid body rotation
– Truss analogy (Farhat et al, 1998)
Formulation
• Linear Elasticity Equations
  ij
f
{ }  [ D]{ } { }  [ A]{U }
 xj
Applyinga standardGalerkin method
[ K ]{U }  {F }
where
[ K ]   [ AN ]T [ D][ AN ] dV

• Prescription of E very important
– Reproduces solid body translation/rotation for stiff E regions
– Prescribe large E in critical regions
– Relegates deformation to less critical regions of mesh
Results and Discussion
• Mesh motion strategies for 2D viscous mesh
spring
LE constant E
truss
LE variable E
IMPORTANT DIFFERENCES (FUN3D)
• Navier-Equations for displacement

 u
2

1
(.u )  0
1  2
• Derived assuming constant E
 2 ui
 2   ij, j  X i
t
E  v

 ij 
ekk ij  eij 

1   1  2

 ij, j  0

1
eij  ui , j  u j ,i
2

1
ui , jj 
ui ,ij  0
1  2
• Variations only in Poisson ratio
Method of Solution
• Linear Elasticity Equations can be difficult to solve
• Apply same techniques as for flow solver
– Linear agglomeration multigrid(LMG) method
– Line-implicit solver
– Using same line/AMG structures
Method of Solution
• Agglomeration multigrid
Method of Solution
• Line-implicit solver
Strong coupling
Results and Discussion
• Line solver + MG4 , first 10 iterations
iter= 10
0
1
2
3
4
5
6
7
8
9
Viscous mesh, linear elasticity with variable E
3D Dynamic Meshes (NS mesh)
DLR wing-body
configuration,
473,025 vertices
Results and Discussion
• Convergence rates for different iterative methods
2D viscous mesh, linear elasticity
3D viscous mesh, linear elasticity
Unsteady Flow Solver Formulation
• Flow governing equations in
Arbitrary-Lagrangian-Eulerian(ALE) form:
U
   (F(U)  G (U))  0
t




UdV   (F(U)  xU)  ndS   G(U)  ndS  0

t (t )
 ( t )
 ( t )
• After discretization (in space):


 (VU )
 R(U , x (t ), n (t ))  S (U , n (t ))  0
t
Unsteady Flow Solver Formulation
• Flow governing equations in
Arbitrary-Lagrangian-Eulerian(ALE) form
U
   (F(U)  G (U))  0
t




UdV   (F(U)  xU)  ndS   G(U)  ndS  0

t (t )
 ( t )
 ( t )
GCL: Maintain Uniform Flow Exactly (discrete soln)

V
 R ( x (t ), n (t ))
t
Implicit Runge-Kutta Schemes
• Dalquist Barrier: No complete A and L stability
above BDF2
– BDF3 often works…. But…
– For higher order: Implicit Runge Kutta Schemes
• Backwards Difference (BDF2, BDF3) and Implicit
Runge Kutta (up to 4th order in time) previously
compared for unsteady flows with static grids
• For moving grids, must obey Geometric
Conservation Law GCL
– 2nd and 3rd order BDF-GCL relatively straight-forward
– How to construct high-order Runge-Kutta GCL
schemes ?
New Approach to GCL
• Use APPROXIMATE FaceVelocities evaluated at
the RK Quadrature Points to Respect GCL, but
still maintain Design Accuracy
– i.e. For low order schemes: dx/dt = (xn+1 - xn ) /dt
– For High-order RK: Solve system at each time step
given by DGCL:
0
0
0  X 1 
 1
 tx1 

 
 2
n
0  X 2 
1 V  V 
 12  22 0
    3
n


 32  33 0  X 3 
t V  V 
 31

4
n

 X 




V

V
42
43
44   4  E

E
 41
2D Example
• Periodic Pitching NACA0012 (exaggerated)
• Mach=0.755, AoA=0.016o +- 2.51o
• RK accurate with large time steps
2D Pitching Airfoil
• Error measured as RMS difference
in all flow variables between
solution integrated from t=0 to t=54
with reference solution at t=54
– Reference solution: RK64 with 256
time steps/period
• Slope of accuracy curves:
– BDF2: 1.9
– RK64: 3.5
3D Example: Twisting OneraM6 Wing
• Mach=0.755, AoA=0.016o +- 2.51o
• Reduced frequency=0.1628
3D Validation (RK64 +GCL)
• Twisting ONERA M6 Wing
• Same error measure as in 2D (ref.solution=128 steps/period)
•IRK64 enables huge time steps
•Slopes of error curves: BDF2=2.0, RK64=3.3
AGARD WING
Aeroelastic Test Case
Modal Analysis
1st Mode
2nd Mode
AGARD WING
Aeroelastic Test Case
Modal Analysis
3rd Mode
4th Mode
AGARD WING
Aeroelastic Test Case
• First 4 structural modes
• Coarse Euler Simulation
– 45,000 points, 250K cells
• Linear Elasticity Mesh Motion
– Multigrid solver
• 2nd order BDF Time stepping
– Multigrid solver
• Flow/Structure solved fully coupled at
each implicit time step
• 2 hours on 1 cpu per analysis run
Flutter Boundary Prediction
Flutter Boundary
Generalized Displacements
Current and Future Work
• Investigate benefits of Implicit Runge-Kutta
– 4th order temporal accuracy
• Investigate optimal time-step size and convergence criteria
• Develop automated temporal-error control scheme
• Viscous simulations, Finer Meshes
– 5M pt Unsteady Navier-Stokes solutions :
• 2-4 hours on 128 cpus of Columbia
• Adjoint for unsteady problems
– Time domain
– Frequency domain
Higher-Order Methods
• Simple asymptotic arguments indicate benefit of
higher-order discretizations
• Most beneficial for:
– High accuracy requirements
– Smooth functions
Motivation
• Higher-order methods successes
– Acoustics
– Large Eddy Simulation (structured grids)
– Other areas
• High-order methods not demonstrated in:
–
–
–
–
Aerodynamics, Hydrodynamics
Unstructured mesh LES
Industrial CFD
Cost effectiveness not demonstrated:
• Cost of discretization
• Efficient solution of complex discrete equations
Motivation
• Discretizations well developed
– Spectral Methods, Spectral Elements
– Streamwise Upwind Petrov Galerkin (SUPG)
– Discontinuous Galerkin
• Most implementations employ explicit or semiimplicit time stepping
– e.g. Multi-Stage Runge Kutta ( t  x )
• Need efficient solvers for:
– Steady-State Problems
– Time-Implicit Problems
( t  x )
Multigrid Solver for Euler Equations
• Develop efficient solvers (O(N)) for steady-state
and time-implicit high-order spatial discretizations
• Discontinuous Galerkin
–
–
–
–
Well suited for hyperbolic problems
Compact-element-based stencil
Use of Riemann solver at inter-element boundaries
Reduces to 1st order finite-volume at p=0
• Natural extension of FV unstructured mesh techniques
• Closely related to spectral element methods
Discontinuous Galerkin (DG)
ui
M ij
 K ijui  0
t
j  1,2,..., N
Mij  Mass Matrix
Kij  Eij  F1ij  F2ij  F3ij 
Spatial (convective
or Stiffness) Matrix
Eij 
Element Based-Matrix
FKij 
Element-Boundary (Edge) Matrix
Steady-State Solver
• Kijui=0 (Ignore Mass matrix)
• Block form of Kij:
– Eij = Block Diagonals (coupling of
all modes within an element)
– Fij = 3 Block Off-Diagonals
(coupling between neighboring
elements)
Solve iteratively as:
Eij (ui n+1 – uin ) = Kij uin
Steady-State Solver: Element Jacobi
Solve iteratively as:
Eij (ui n+1– uin) = Kij uin
uin+1 = E-1ij Kij uin
Obtain E-1ij by Gaussian Elimination
(LU Decomposition)
10X10 for p=3 on triangles
DG for Euler Equations
• Mach = 0.5 over 10% sin bump
• Cubic basis functions (p=3), 4406 elements
Entropy as Measure of Error
• S  0.0 for exact solution
• S is smaller for higher order accuracy
Single Grid: Accuracy
• P - approximation order
• N - number of elements
Element Jacobi Convergence
• P-Independent Convergence
• H-dependence
Improving Convergence
H-Dependence
• Requires implicitness between grid
elements
• Multigrid methods based on use of coarser
meshes for accelerating solution on fine
mesh
Spectral Multigrid
• Form coarse “grids” by reducing order of
approximation on same grid
– Simple implementation using hierarchical basis
functions
• When reach 1st order, agglomerate
(h-coarsen) grid levels
• Perform element Jacobi on each MG level
Hierarchical Basis Functions
• Low order basis functions are subset of higher order
basis functions
• Low order expansion (linear in 2D):
– U= a1F1 + a2F2 + a3F3
• Higher order (quadratic in 2D)
– U=a1F1 + a2F2 + a3F3 + a4F4 + a5F5 + a6F6
• To project high order solution onto low order space:
– Set a4=0, a5=0, a6=0
Hierarchical Basis on Triangles
• Linear (p=1): F1=z1, F2=z2, F3=z3
• Quadratic (p=2):
F40.5z1z2, F50.5z2z3, F6z3z1
• Cubic (p=3):
F71/12z1z2(z1z2), F81/12z2z3(z2z3),
F91/12z3z1(z3z1), F10z1z2z3
Spectral Multigrid
• Fine/Coarse Grids contain same elements
• Transfer operators almost trivial for hierarchical basis
functions
• Restriction: Fine to Coarse
– Transfer low order (resolvable) modes to coarse level exactly
– Omit higher order modes
• Prolongation: Coarse to Fine
– Transfer low order modes exactly
– Zero out higher order modes
Element Jacobi Convergence
• P-Independent Convergence
• H-dependence
Multigrid Convergence
• Nearly h-independent
4-Element Airfoil (Euler Solution)
4-Element Airfoil (Entropy)
Agglomeration Multigrid for p=0
Four Element Airfoil (Inviscid)
• Mach=0.3
• hp-Multigrid
– p=1…4
– V-cycle(5,0)
– Smoother (EGS)
• Mesh size
– N=1539
– N=3055
– N=5918
• AMG
– 3-Levels
– 4-Levels
– 5-Levels
Four Element Airfoil: p- and h
dependence
N = 3055
P=4
• Improved convergence for higher
orders
• Slight h-dependence
Four Element Airfoil: Linear (CGC)
hp-multigrid
• N=3055, P=4
• Newton Scheme: Quadratic
convergence
– Driven by linear MG scheme
• Linear hp-multigrid between the
non-linear updates
• Exit strategy (“k” iteration)
– machine epsilon (non-optimized)
– optimization criterion:
n
||
R
||L 2
k
|| rcgc
||L 2 
2n
Linear (CGC) vs. non-linear
(FAS) hp-Multigrid
• FAS - non-linear multigrid
• CGC - linear multigrid
• Linear MG most efficient
– Expense of non-linear residual
• NQ = 16 (p=4)
• NQ = 25 (over-integration)
Preliminary 3D DG Results
(steady-state)
• 3D biconvex airfoil mesh
– 7,000 tetrahedral elements
3D Steady State Euler DG
• P-multigrid convergence
3D Steady-State Euler DG
p=1 (2nd order)
p=4 (5th order)
• Curved boundaries under development
Unsteady DG (2D)
• Implicit time-stepping for low reduced frequency
problems and small explicit time step restriction of
high-order schemes
• Balance Temporal/Spatial Discretization Errors
– p=3(4th order in space)
– BDF1, BDF2, IRK4
• Runge-Kutta is equivalent to DG in time
• Use h-p multigrid to solve non-linear problem at each
implicit time step
Unsteady Euler DG
•Convection of vortex
•P=3: Fourth order spatial accuracy
•BDF1, BDF2, IRK64
•Time step = 0.2, CFLcell = 2.
Unsteady Euler DG
•P=3: Fourth order spatial accuracy
•BDF1: 1st order temporal accuracy
•Time step = 0.2, CFLcell = 2.
•10 pMG cycles per time step
Unsteady Euler DG
•p=3 (4th order spatial accuracy)
•BDF2: 2nd order temporal accuracy
•Time step = 0.2, CFLcell = 2.
•10 pMG cycles per time step
Unsteady Euler DG
•p=3, 4th order spatial accuracy
•IRK4: 4th order temporal accuracy
•Time step = 0.2, CFLcell= 2.
•5 pMG cycles/stage, 20pMG cycles/time step
Unsteady Euler DG
• Vortex convection problem
Unsteady Euler DG
• IRK 4th order best for high accuracy
Future Work
• Higher order schemes still costly in terms of cpu time
compared to 2nd order schemes
– Will these become viable for industrial calculations?
• H-P Adaptivity
– Flexible approach to use higher order where beneficial
– Incorporate hp-Multigrid with hp Adaptivity
• Extend to:
– 3D Viscous
– Unsteady
– Dynamic Meshes