pptx format - University of Delaware
Download
Report
Transcript pptx format - University of Delaware
CHARMM-G, a GPU based MD Simulation
code with PME and Reaction Force field for
Studying Large Membrane Regions
Narayan Ganesan1, Sandeep Patel2, and Michela Taufer1
Computer and Info. Sciences Dept.1
Chemistry and Biochemistry Dept.2
University of Delaware
Outline
• Overview of forces in molecular dynamics
• Data structures and methodology
• PME for long distance electrostatic interactions
• Steps involved in PME calculations
• Performance and profiling of large membranes
• Related work and conclusions
Classical Forces
Classical
Forces
NonBonded
Bonded
Bonds
Dihedrals
Van der
Waals
Electrostatic
Angles
Reaction
Field (RF)
PME
Bond Interactions
• Bond forces: acts only within pairs of molecules
• Angle forces: acts only within a triad of atoms
• Torsion or dihedral forces: acts only within quartet of
atoms
Non-bond Interactions:
Van der Waals Potential
• Van der Waals or Lennard-Jones potential: Decays rapidly with
distance
E VDW
12 6
4
r
r
• A cutoff of ~10A,
accurately captures
the effect of the Van
der Waals potential
Non-Bond Interactions:
Electrostatic Potential
• Coulomb Potential: inverse square law
U coulomb
Fcoulomb
1
q1q 2
4 0 | r1 r2 |
1
q1 q 2
4 0 | r1 r2 |
• Decays as 1/r with distance
• Since 1/r decays rather slowly,
the potential can act over long
distances
2
• Choosing a cutoff for electrostatic force/potential causes
computational errors and inaccuracy
• Our solutions to sum long distance electrostatic forces:
Reaction Force Field (RF)
Ewald summation / Particle Mesh Ewald (PME)
GPU Implementation: Data Structures
• A single thread is assigned to each atom
• For each atom a set of lists is maintained:
Bond list stores list of bonds the atom belongs to
Angle list stores list of angles the atom belongs to
Dihedral list stores list of dihedrals the atom belongs to
Nonbond list stores non-bond interactions with atoms within cutoff
Nonbond list for q5: q2, r2
q6, r6
q8, r8
q9, r9
q1
q2
q3
q4
q5
q6
q7
q8
q9
MD Simulation
• MD simulations are iterative executions of MD steps
• Each iteration computes forces on each particle due to:
Bonds – Bond List
Angles – Angle List
Dihedrals – Dihedral List
Electrostatic
- Nonbond List
Van der Waals
Bond, angle, and dihedral lists
are unchanged for each atom
throughout the simulation
Nonbond list is updated based
on a cutoff buffer
• If Ewald summation is used an additional component is added:
Long distance interaction using PME method
Ways to Update Nonbond List
• Global neighbor list
Each thread can iterate through the
global list of atoms to build the
nonbond list
• Cell-based neighbor list
Divide the domain into equal cells of
size = cutoff
Search only in current cell and
adjacent cells for neighboring atoms
There are 26 adjacent cells and 1
current cell in 3-dimensions
• Cell-based list is computationally very efficient but also needs
regular cell updates
Cell Updates
• Single thread manages a
•
•
•
single or a set of cells
Each cell is managed by a list
of atoms in the cell called
‘CellList’
When an atom ‘i’ moves from
Cell A to Cell B, the thread
responsible for Cell A updates
the list of Cell B via thread
safe integer atomic intrinsics
Invalid atoms are removed
from the cell lists by the
‘CellClean’ kernel
Periodic Boundary Condition
q1
q2
q3
q1
q2
q3
q1
q2
q3
q4
q5
q6
q4
q5
q6
q4
q5
q6
q7
q8
q9
q7
q8
q9
q7
q8
q9
q1
q2
q3
q1
q2
q3
q1
q2
q3
q4
q5
q6
q7
q8
q1
q5
q6
q4
q5
q6
q9
q4
q7
q8
q9
q7
q8
q9
q2
q3
q1
q2
q3
q1
q2
q3
q4
q5
q6
q4
q5
q6
q4
q5
q6
q7
q8
q9
q7
q8
q9
q7
q8
q9
Region of influence
Cell of interest
of edge vectors
ax, ay
Reaction Force Field
• Any molecule is surrounded by spherical cavity of finite radius
•
Within the radius, electrostatic interactions are calculated explicitly
Outside the cavity, the system is treaded as a dielectric continuum
This model allows the replacement of the infinite Coulomb
sum by a finite sum plus the reaction filed
Coulomb
potential
2
B 0 r ji
1
1
Uc
q i q j
3
i
j
4 0
2 R c
rij
where the second terms is the reaction filed correction and Rc is the
radius of the cavity
Ewald Summation Method (I)
• Proposed by Paul Peter Ewald in 1921 for crystallographic
•
•
•
systems
Has found applications in molecular, astrophysical and
crystallographic systems
Used to sum inverse distance potential over long distance
efficiently – e.g., Gravity and Coulomb Potential.
Was started to be used in the late 70s for numerical
simulations
O(NlogN) instead of O(NxN)
Ewald Summation Method
• Three contributions to the total energy, depending on the
distance of the interaction:
Direct space (Edir)
Reciprocal space (Erec)
Self energy (Eself )
Ewald Summation Method (II)
• Divide interactions into short range (Direct Space) and long
range (Reciprocal Space)
E dir
q i q j erfc ( | ri r j |)
4 0 1 | ri r j |
Short
Range
Direct space using
Nonbond List
E rec
1
2 V
m0
Long
Range
exp ( m / )
2
m
2
2
Fourier Space
2
S (m )S ( m )
V - Volume of the
simulation region
S(m) – Structure
parameters
Steps in SMPE
1
Put charges on grids
2
FFT of charge grid
3
4
Multiply with
structure constants
Convolution yields potential at grid points
which have to be summed
5
Compute force on atom i by calculating
U
ri
FFT back
Charge Spreading
• Each charge is spread on a 4x4x4 = 64 grid points in 3-D
•
Grid spacing 1 A by a cardinal B-Spline of order 4
Create a 3 dimensional Charge Matrix “Q”.
Mesh-based charge density
Approximation by sum of charges at each grid point
Multiple charges can influence a single lattice point
Essman et al.,
J. Chem. Phys. 1995
Charges
Q ( k1 , k 2 , k 3 )
xi yi zi: position of
qM
i
i 1 .. n
the ith
4
( x i k1 ) M 4 ( y i k 2 ) M 4 ( z i k 3 )
charge; k1 k2 k3: index of the lattice point
Cardinal B-Spline of Order 4
• B-Spline has a region of
•
influence of 4 units
Each unit = 1A
During charge spreading BSpline has an impact on the
neighboring 4x4x4 cells in 3
dimensions
CPU vs. GPU Charge Spreading
• Charge Spreading by a cardinal B-Spline of order 4:
Unit cell
charges
• CPU implementation is straightforward
•
•
Time computation: Natoms x 4 x 4 x 4 time steps
GPU implementation is hard to parallelize
Can lead to racing conditions - need floating point atomic writes
Current version of CUDA supports atomic writes for integers
only
Charges need to be converted to fixed point in order to utilize the
functionality
CPU vs. GPU Charge Spreading
CPU spreading of charges:
GPU gathering of
charges by a cardinal
B-Spline of order 4:
Each thread is assigned
to a lattice point
• Charge spreading on GPU can be parallelized easily by the grid
points instead of the atoms
• Each thread works on a single or a set of grid points
• Need O(ax*ay*az) threads, with each thread parsing through all the
atoms within 4x4x4 neighborhood –> O(N)
GPU Charge Spreading (I)
• Each lattice point maintains a list of atoms within 4x4x4
neighborhood for charge gathering
1
2
3
Effect of charges 1, 2, 3 are gathered at
the lattice point
Neighbor list of point: q1, r1
q2, r2
q3, r3
GPU Charge Spreading (II)
• When a charge moves, several lattice
2’
2
•
•
•
•
points need to be updated
The charge is added to the neighbor
list of lattice points in dark gray
The charge is removed from the
neighbor list of lattice points in light
gray
Lattice points in white are not affected
Since there are equal number of light
gray and dark gray lattice points, a 1to-1 mapping was devised
The threads for lattice points in light gray update the list of
lattice points in dark gray in a 1-to-1 fashion
GPU Charge Spreading (III)
1
2’
2
1’
• When a single lattice point is updated
by multiple threads, thread safe
integer atomic intrinsics are used to
update the cell lists
Fast Fourier Transform
• CUFFT provides library functions to compute FFT and inverse
FFT
3D FFT implemented with series of 1D FFTs and
•
transpositions
CUFFTExec can be optimized by choosing proper FFT
dimensions
Power of 2
Scientific Challenge
• One-third of the human genome is composed of membrane-
•
bound proteins
Pharmaceuticals target membrane-bound protein receptors
e.g., G-protein coupled receptors
Importance of systems to human health and understanding of
dysfunction
• State-of-the-art simulations only consider small regions (or
•
•
patches) of physiological membranes
Heterogeneity of the membrane spans length scales much
larger than included in these smaller model systems.
Our goal: apply large-scale GPU-enabled computations for the
study of large membrane regions
DMPC
• DiMyristoylPhosphatidylCholine (DMPC) lipid bilayers
Small system: 17 004
atoms, 46. 8A x 46.8 A x 76.0
A
Large system: 68 484 atoms,
93.6 A x 93.6 A x 152.0 A
92A
92A
Explicit solvent i.e., water
Membrane
152A
Performance
Small membrane (17 004 atoms)
Large membrane (68 484 atoms)
Case studies: Global neighbor list and RF (I), with cell-based list and RF (II), with neighbor list
and PME (III), and with cell-based neighbor list and PME (IV)
Kernel Profiling (I)
• Large membrane – RF method
Global neighbor list
Cell-based neighbor list
Kernel Profiling (II)
• Large membrane – PME method
Global neighbor list
Cell-based neighbor list
Related Work
• Other MD code including PME method:
M. J. Harvey and G. De. Fabritiis, J. Chem. Theory and Comp, 2009”
• Our implementation is different in terms of:
Charge spreading algorithm
Force field methods, including RF
Conclusions and Future Work
• CHARMM-G is a flexible MD code based on the CHARMM force
•
•
•
field integrating
Ewald summation
Reaction force field
The code supports explicit solvent representations and enables
fast simulations of large membrane regions
Improvements of the CUDA FFT will further improve the
performance presented in the paper
Future work include:
Code optimizations and parallelization across multiple GPUs
Scientific characterization of large membranes
Acknowledgements
GCL Members:
Trilce Estrada
Boyu Zhang
Abel Licon
Narayan Ganesan
Lifan Xu
Philip Saponaro
Maria Ruiz
Michela Taufer
Collaborators:
Sandeep Patel, Brad A. Bauer, Joseph E.
Davis (Dept. of Chemistry, UD)
Related work:
Bauer et al, JCC 2010 (In Press)
GCL members in Spring 2010
Sponsors:
Davis et al., BICoB 2009
More questions: [email protected]
32