Approximate Molecular Dynamics:

Download Report

Transcript Approximate Molecular Dynamics:

An improved hybrid Monte Carlo
method for conformational
sampling of proteins
Jesús A. Izaguirre and Scott Hampton
Department of Computer Science and Engineering
University of Notre Dame
March 5, 2003
This work is partially supported by two NSF grants (CAREER and BIOCOMPLEXITY)
and two grants from University of Notre Dame
1
Overview
1. Motivation: sampling
conformational space of
proteins
3. Evaluation of new
Shadow HMC
2. Methods for sampling
(MD, HMC)
4. Future applications
2
Protein: The Machinery of Life
NH2-Val-His-Leu-Thr-Pro-Glu-GluLys-Ser-Ala-Val-Thr-Ala-Leu-TrpGly-Lys-Val-Asn-Val-Asp-Glu-ValGly-Gly-Glu-…..
3
Protein Structure
4
Why protein folding?

Huge gap: sequence data and 3D structure data




EMBL/GENBANK, DNA (nucleotide) sequences
15 million sequence, 15,000 million base pairs
SWISSPROT, protein sequences
120,000 entries
PDB, 3D protein structures
20,000 entries
Bridging the gap through prediction

Aim of structural genomics:


“Structurally characterize most of the protein sequences by an efficient
combination of experiment and prediction,” Baker and Sali (2001)
Thermodynamics hypothesis:
Native state is at the global free energy minimum
Anfinsen (1973)
5
Questions related to folding I

Long time kinetics:
dynamics of folding




only statistical
correctness possible
ensemble dynamics
e.g., folding@home
Short time kinetics


strong correctness
possible
e.g., transport
properties, diffusion
coefficients
6
Questions related to folding II

Sampling


Compute equilibrium
averages by visiting all
(most) of “important”
conformations
Examples:



Equilibrium
distribution of solvent
molecules in vacancies
Free energies
Characteristic
conformations
(misfolded and folded
states)
7
Overview
1. Motivation: sampling
conformational space of
proteins
3. Evaluation of new
Shadow HMC
2. Methods for sampling
(MD, HMC)
4. Future applications
8
Classical molecular dynamics




Newton’s equations
of motion: Mq ''  U (q)  F(q). - - - (1)
Atoms
Molecules
CHARMM force
field
(Chemistry at Harvard
Molecular Mechanics)
Bonds, angles and torsions
9
What is a Forcefield?
In molecular dynamics a
molecule is described as a
series of charged points
(atoms) linked by springs
(bonds).
To describe the time evolution of bond lengths, bond angles and
torsions, also the nonbond van der Waals and elecrostatic
interactions between atoms, one uses a forcefield.
The forcefield is a collection of equations and associated constants
designed to reproduce molecular geometry and selected properties
of tested structures.
10
Energy Terms Described in the
CHARMm forcefield
Bond
Dihedral
Angle
Improper
11
Energy Functions
Ubond = oscillations about the equilibrium bond length
Uangle = oscillations of 3 atoms about an equilibrium angle
Udihedral = torsional rotation of 4 atoms about a central bond
Unonbond = non-bonded energy terms (electrostatics and Lennard-Jones)
12
Molecular Dynamics –
what does it mean?
MD = change in conformation over time using a forcefield
Energy
Energy supplied to the minimized
system at the start of the simulation
Conformation impossible
to access through MD
Conformational change
13
MD, MC, and HMC in sampling



HMC
Molecular Dynamics takes long steps in phase
space, but it may get trapped
Monte Carlo makes a random walk (short steps),
it may escape minima due to randomness
Can we combine these two methods?
MD
MC
14
Hybrid Monte Carlo

We can sample from a distribution with
density p(x) by simulating a Markov chain
with the following transitions:


From the current state, x, a candidate state x’
is drawn from a proposal distribution S(x,x’).
The proposed state is accepted with prob.
min[1,(p(x’) S(x’,x)) / (p(x) S(x,x’))]
If the proposal distribution is symmetric,
S(x’,x)) = S(x,x’)), then the acceptance prob.
only depends on p(x’) / p(x)
15
Hybrid Monte Carlo II




Proposal functions must be reversible:
if x’ = s(x), then x = s(x’)
Proposal functions must preserve volume
Jacobian must have absolute value one
Valid proposal: x’ = -x
Invalid proposals:


x’ = 1 / x (Jacobian not 1)
x’ = x + 5 (not reversible)
16
Hybrid Monte Carlo III





Hamiltonian dynamics preserve volume in phase
space
Hamiltonian dynamics conserve the Hamiltonian
H(q,p)
Reversible symplectic integrators for Hamiltonian
systems preserve volume in phase space
Conservation of the Hamiltonian depends on the
accuracy of the integrator
Hybrid Monte Carlo: Use reversible symplectic
integrator for MD to generate the next proposal
in MC
17
HMC Algorithm
Perform the following steps:
1. Draw random values for the momenta p from
normal distribution; use given positions q
2. Perform cyclelength steps of MD, using a
symplectic reversible integrator with timestep t,
generating (q’,p’)
3. Compute change in total energy

H = H(q’,p’) - H(q,p)
4. Accept new state based on exp(- H )
18
Hybrid Monte Carlo IV
Advantages of HMC:
 HMC can propose and accept distant points
in phase space, provided the accuracy of
the MD integrator is high enough
 HMC can move in a biased way, rather than
in a random walk (distance k vs sqrt(k))
 HMC can quickly change the probability
density
19
Hybrid Monte Carlo V


As the number of atoms
increases, the total error
in the H(q,p) increases. The
error is related to the time
step used in MD
Analysis of N replicas of
multivariate Gaussian
distributions shows that
HMC takes O(N5/4 ) with
time step t = O(N-1/4)
Kennedy & Pendleton, 91
System size Max t
N
66
0.5
423
0.25
868
0.1
5143
0.05
20
Hybrid Monte Carlo VI




The key problem in scaling is the accuracy
of the MD integrator
More accurate methods could help scaling
Creutz and Gocksch 89 proposed higher
order symplectic methods for HMC
In MD, however, these methods are more
expensive than the scaling gain. They need
more force evaluations per step
21
Overview
1. Motivation: sampling
conformational space of
proteins
3. Evaluation of new
Shadow HMC
2. Methods for sampling
(MD, HMC)
4. Future applications
22
Improved HMC


Symplectic integrators conserve exactly
(within roundoff error) a modified
Hamiltonian that for short MD simulations
(such as in HMC) stays close to the true
Hamiltonian Sanz-Serna & Calvo 94
Our idea is to use highly accurate
approximations to the modified
Hamiltonian in order to improve the scaling
of HMC
23
Shadow Hamiltonian
Work by Skeel and Hardy, 2001, shows how to
compute an arbitrarily accurate approximation to
the modified Hamiltonian, called the Shadow
Hamiltonian
T -1
 Hamiltonian: H=1/2p M p + U(q)
 Modified Hamiltonian: HM = H + O(t p)
2p)
 Shadow Hamiltonian: SH2p = HM + O(t




Arbitrary accuracy
Easy to compute
Stable energy graph
Example, SH4 = H – f( qn-1, qn-2, pn-1, pn-2 ,βn-1 ,βn-2)
24

See comparison of SHADOW and ENERGY
25
Shadow HMC

Replace total energy H with shadow energy

SH2m = SH2m (q’,p’) – SH2m (q,p)
Nearly linear scalability of sampling rate
Computational cost SHMC, N(1+1/2m), where m
is accuracy order of integrator
 Extra storage (m copies of q and p)
 Moderate overhead (25% for small
proteins)

26
Example Shadow Hamiltonian
27
ProtoMol: a framework for MD
Matthey, et al, ACM Tran. Math. Software (TOMS), submitted
Front-end
Middle layer
back-end
libfrontend
libintegrators
libbase, libtopology
libparallel, libforces
Modular design of ProtoMol (Prototyping Molecular dynamics).
Available at http://www.cse.nd.edu/~lcls/protomol
28
SHMC implementation


Shadow Hamiltonian
requires propagation
of β
Can work for any
integrator
29
Systems tested
30
Sampling Metric 1




Generate a plot of dihedral angle vs.
energy for each angle
Find local maxima
Label ‘bins’ between maxima
For each dihedral angle, print the label of
the energy bin that it is currently in
31
Sampling Metric 2


Round each dihedral angle to the nearest
degree
Print label according to degree
32
Acceptance Rates
33
More Acceptance Rates
34
Sampling rate for decalanine
(dt = 2 fs)
35
Sampling rate for 2mlt
36
Sampling rate comparison

Cost per conformation is total simulation
time divided by number of new
conformations discovered
(2mlt, dt = 0.5 fs)




HMC
122 s/conformation
SHMC
16 s/conformation
HMC discovered 270 conformations in 33000
seconds
SHMC discovered 2340 conformations in 38000
seconds
37
Conclusions




SHMC has a much higher acceptance rate,
particularly as system size and timestep
increase
SHMC discovers new conformations more
quickly
SHMC requires extra storage and
moderate overhead.
SHMC works best at relatively large
timesteps
38
Future work

Multiscale problems for rugged energy surface




System size


Multiple time stepping algorithms plus constraining
Temperature tempering and multicanonical ensemble
Potential smoothing
Parallel Multigrid O(N) electrostatics
Applications



Free energy estimation for drug design
Folding and metastable conformations
Average estimation
39
Acknowledgments



Dr. Thierry Matthey, lead developer of
ProtoMol, University of Bergen, Norway
Graduate students: Qun Ma, Alice Ko, Yao
Wang, Trevor Cickovski
Dr. Robert Skeel, Dr. Ruhong Zhou, and Dr.
Christoph Schutte for valuable discussions
40
Multiple time stepping

Fast/slow force splitting

Bonded: “fast”


Long range nonbonded: “slow”


Small periods
Large characteristic time
Evaluate slow forces less frequently


Fast forces cheap
Slow force evaluation expensive
41
The Impulse integrator
Grubmüller,Heller, Windemuth and Schulten, 1991
Tuckerman, Berne and Martyna, 1992

The impulse “train”
Fast impulses, t
Time, t
Slow impulses, t
How far apart can we stretch the impulse train?
42
Stretching slow impulses

t ~ 100 fs if accuracy does not degenerate

1/10 of the characteristic time
MaIz, SIAM J. Multiscale Modeling and Simulation, 2003 (submitted)

Resonances (let  be the shortest period)


Natural: t = n , n = 1, 2, 3, …
Numerical:


Linear: t = /2
Nonlinear: t = /3
MaIS_a, SIAM J. on Sci. Comp. (SISC), 2002 (in press)
MaIS_b, 2003 ACM Symp. App. Comp. (SAC’03), 2002 (in press)
Impulse far from being multiscale!
43
3rd order nonlinear resonance of
Impulse
MaIS_a, SISC, 2002; MaIS_b, ACM SAC’03, 2002
Fig. 1: Left: flexible water system. Right: Energy drift from 500ps MD simulation of flexible
water at room temperature revealing 3:1 and 4:1 nonlinear resonance (3.3363 and 2.4 fs)
44
Overview

Introduction





Nonlinear instabilities of Impulse integrator
Approximate MD integrators






Molecular dynamics (MD) in action
Classical MD
Protein folding
Targeted MOLLY
MUSICO
Applications
Summary
Future work
Acknowledgements
45
Objective statement

Design multiscale integrators that are
not limited by nonlinear and linear
instabilities



Allowing larger time steps
Better sequential performance
Better scaling
46
Overview

Introduction





Nonlinear instabilities of Impulse integrator
Approximate MD integrators






Molecular dynamics (MD) in action
Classical MD
Protein folding
Targeted MOLLY
MUSICO
Applications
Summary
Future work
Acknowledgements
47
Targeted MOLLY (TM)
MaIz, 2003 ACM Symp. App. Comp. (SAC’03), 2002 (in press)
MaIz, SIAM J. Multiscale Modeling and Simulation, 2003 (submitted)


TM = MOLLY + targeted Langevin coupling
Mollified Impulse (MOLLY) to overcome
linear instabilities
Izaguirre, Reich and Skeel, 1999

Stochasticity to stabilize MOLLY
Izaguirre, Catarello, et al, 2001
48
Mollified Impulse (MOLLY)

MOLLY (mollified Impulse)





Slow potential at time averaged positions, A(x)
Averaging using only fastest forces
Mollified slow force = Ax(x) F(A(x))
Equilibrium and B-spline
B-spline MOLLY



Averaging over certain time interval
Needs analytical Hessians
Step sizes up to 6 fs (50~100% speedup)
49
Introducing stochasticity

Langevin stabilization of MOLLY (LM)
Izaguirre, Catarello, et al, 2001


12 fs for flexible waters with correct dynamics
Dissipative particle dynamics (DPD):
Pagonabarraga, Hagen and Frenkel, 1998


Pair-wise Langevin force on “particles”
Time reversible if self-consistent
Vi
FRi, FDi
Vj
FRj = - FRi
FDj = - FDi
50
Targeted Langevin coupling

Targeted at “trouble-making” pairs



Stabilizing MOLLY


Bonds, angles
Hydrogen bonds
Slow forces evaluated much less frequently
Recovering correct dynamics

Coupling coefficient small
51
TM: main results


16 fs for flexible waters
Correct dynamics

Self-diffusion coefficient, D.



leapfrog w/ 1fs,
D = 3.69+/-0.01
TM w/ (16 fs, 2fs), D = 3.68+/-0.01
Correct structure

Radial distribution function (r.d.f.)
52
TM: correct r.d.f.
Fig. 4. Radial distribution function of O-H (left) and H-H (right) in flexible waters.
53
TM: discussion

Much larger time steps (~100 fs)

Better force splitting
Zhou, Harder, etc. 2001


Conformational transition rate



More stable MOLLY integrators
folding@home: ensemble dynamics
Celestial systems
Graphics rendering in gaming applications
54
Overview

Introduction





Nonlinear instabilities of Impulse integrator
Approximate MD integrators






Molecular dynamics (MD) in action
Classical MD
Protein folding
Targeted MOLLY
MUSICO
Applications
Summary
Future work
Acknowledgements
55
Lengthening time scales: SRP
1 msec simulation of an enzyme using stochastic reaction path (SRP) method.
Courtesy of R. Elber at Cornell Univ.. SRP requires initial and final configuration.
56
What do we learn from SRP?


Cons: final configuration not always
available
Pros: large time steps/long simulation


Motions whose char. time is less than the step
size filtered out
Essential (interesting) motions remain
What are other approaches for
filtering out fast motions?
57
Lengthening time scales: MUSICO

Ideas:





Splitting into nearly linear and nonlinear parts
Implicit integration of linear part with
constraining of internal d.o.f.
Explicit treatment of highly nonlinear part
Optional pairwise stochasticity for stability
MUSICO (in progress)

Multi-Scale Implicit-explicit Constrained
molecular dynamics
58
MUSICO: the anticipations

Larger step sizes: ps, ns

Implicit methods generally more stable
Schlick, et al (1997)

Greater scalability



Hessian computations mostly local
Low in communication
Real speedup


Linear parts cheap
100 fold speedup sequentially?
59
MUSICO: the implicit part

Implicit equations for nearly linear part
1. x n 1/ 2  x n  t p n / 2,
2. Fn 1/ 2  U ( x n 1/ 2   t 2 Fn 1/ 2 ),
3. p n 1  p n  t Fn 1/ 2 ,
4. x n 1  x n 1/ 2  t p n 1 / 2

Newton-Raphson method
F( f 0 ,f1 ,...,f N 1 )  Fn1/ 2  U (xn1/ 2  t 2Fn1/ 2 )  0,
J  f  F.


Require Hessians
Krylov subspace method
60
Overview

Introduction





Nonlinear instabilities of Impulse integrator
Approximate MD integrators






Molecular dynamics (MD) in action
Classical MD
Protein folding
Targeted MOLLY
MUSICO
Applications
Summary
Future work
Acknowledgements
61
Applications of approximate MD

Combining w/ other algorithms


Steered/interactive MD
Monte Carlo (MC)




Escaping from local minima
Quicker exploring conformational space
Coarsening
Large system and/or long simulations



Protein simulations: Folding/Unfolding/Equilibrium
Drug design
Other
62
Approx. MD in drug design


Designing drugs that regulate proteins
Examples: drugs that bind to




Enzymes
Receptors
Ion channels and transporter systems
Evaluation of binding affinity using
approximate MD

Very long simulations and better sampling
63
Approx. MD in drug design (cont.)
64
ProtoMol: a framework for MD
Matthey, et al, ACM Tran. Math. Software (TOMS), submitted
Front-end
Middle layer
back-end
libfrontend
libintegrators
libbase, libtopology
libparallel, libforces
Modular design of ProtoMol (Prototyping Molecular dynamics).
Available at http://www.cse.nd.edu/~lcls/protomol
65
Overview

Introduction





Nonlinear instabilities of Impulse integrator
Approximate MD integrators






Molecular dynamics (MD) in action
Classical MD
Protein folding
Targeted MOLLY
MUSICO
Applications
Summary
Future work
Acknowledgements
66
Summary



Introduction
Nonlinear instabilities (original)
New algorithms

Targeted MOLLY (original)




Larger time steps and speedup
Correct dynamics and structures
MUSICO (original, in progress)
Applications of approx. MD (in progress)
67
Overview

Introduction





Nonlinear instabilities of Impulse integrator
Approximate MD integrators






Molecular dynamics (MD) in action
Classical MD
Protein folding
Targeted MOLLY
MUSICO
Applications
Summary
Future work
Acknowledgements
68
Future work

Short term



Mid term




MUSICO / Krylov solver
Simulation of small proteins in vacuum
Optimization of MUSICO / Krylov solver
Simulation of large proteins in water
Binding affinity in drug design
Long term


Distributed protein folding and drug design
Infrastructure for heterogeneous distributed
computing – similar to folding@home
69
Future work (cont.)

Funding opportunities




NSF
NIH
DOE
Pharmaceutical industry
70
Key references
[1] Izaguirre, Ma, et al. Overcoming instabilities in Verlet-I/r-RESPA with the
mollified impulse method. Vol. 24 of Lecture Notes in Comput. Sci. & Eng.,
pages 146-174, Springer-Verlag, Berlin, New York, 2002
[2] Ma, Izaguirre and Skeel. Verlet-I/r-RESPA/Impulse is limited by nonlinear
instability. SIAM J. Scientific Computing, 2002 (in press).
[3] Ma and Izaguirre. Targeted mollified impulse method for molecular
dynamics. SIAM J. Multiscale Modeling and Simulation, submitted
[4] Matthey, Cickovski, Hampton, Ko, Ma, Slabach and Izaguirre. PROTOMOL,
an object-oriented framework for prototyping novel applications of molecular
dynamics. ACM Tran. Math. Software (TOMS), submitted
[5] Ma, Izaguirre and Skeel. Nonlinear instability in multiple time stepping
molecular dynamics. 2003 ACM Symposium on Applied Computing (SAC’03).
Melborne, Florida. 2002 (in press)
[6] Ma and Izaguirre. Long time step molecular dynamics using targeted
Langevin Stabilization. Accepted by the 2003 ACM SAC’03. Melborne,
Florida. 2002 (in press)
71
Overview

Introduction





Nonlinear instabilities of Impulse integrator
Approximate MD integrators






Molecular dynamics (MD) in action
Classical MD
Protein folding
Targeted MOLLY
MUSICO
Applications
Summary
Future work
Acknowledgements
72
Acknowledgements

People




Resources



Dr. Jesus Izaguirre
Dr. Robert Skeel, Univ. of Illinois at Urbana-Champaign
Dr. Thierry Matthey, University of Bergen, Norway
Hydra and BOB clusters at ND
Norwegian Supercomputing Center, Bergen, Norway
Funding


NSF BIOCOMPLEXITY-IBN-0083653, and
NSF CAREER Award ACI-0135195
73
THE END.
THANKS!
74
Key references
[1] J. A. Izaguirre, Q. Ma, T. Matthey, et al. Overcoming instabilities in Verlet-I/r-RESPA
with the mollified impulse method. In T. Schlick and H. H. Gan, editors, Proceedings of
the 3rd International Workshop on Algorithms for Macromolecular Modeling, Vol. 24
of Lecture Notes in Computational Science and Engineering, pages 146-174, SpringerVerlag, Berlin, New York, 2002
[2] Q. Ma, J. A. Izaguirre, and R. D. Skeel. Verlet-I/r-RESPA/Impulse is limited by
nonlinear instability. Accepted by the SIAM Journal on Scientific Computing, 2002.
Available at http://www.nd.edu/~qma1/publication_h.html.
[3] Q. Ma and J. A. Izaguirre. Targeted mollified impulse method for molecular dynamics.
Submitted to the SIAM Journal on Multiscale Modeling and Simulation, 2003.
[4] T. Matthey, T. Cickovski, S. Hampton, A. Ko, Q. Ma, T. Slabach and J. Izaguirre.
PROTOMOL, an object-oriented framework for prototyping novel applications of
molecular dynamics. Submitted to the ACM Transactions on Mathematical Software
(TOMS), 2003.
[5] Q. Ma, J. A. Izaguirre, and R. D. Skeel. Nonlinear instability in multiple time stepping
molecular dynamics. Accepted by the 2003 ACM Symposium on Applied Computing
(SAC’03). Melborne, Florida. March 2003
[6] Q. Ma and J. A. Izaguirre. Long time step molecular dynamics using targeted Langevin
Stabilization. Accepted by the 2003 ACM Symposium on Applied Computing (SAC’03).
Melborne, Florida. March 2003
[7] M. Zhang and R. D. Skeel. Cheap implicit symplectic integrators. Appl. Num. Math.,
25:297-302, 1997
75
Other references
[8] J. A. Izaguirre, Justin M. Wozniak, Daniel P. Catarello, and Robert D. Skeel. Langevin
Stabilization of Molecular Dynamics, J. Chem. Phys., 114(5):2090-2098, Feb. 1, 2001.
[9] T. Matthey and J. A. Izaguirre, ProtoMol: A Molecular Dynamics Framework with Incremental
Parallelization, in Proc. of the Tenth SIAM Conf. on Parallel Processing for Scientific
Computing, 2001.
[10] H. Grubmuller, H. Heller, A. Windemuth, and K. Schulten, Generalized Verlet algorithm for
efficient molecular dynamics simulations with long range interactions, Molecular Simulations 6
(1991), 121-142.
[11] M. Tuckerman, B. J. Berne, and G. J. Martyna, Reversible multiple time scale molecular
dynamics, J. Chem. Phys 97 (1992), no. 3, 1990-2001
[12] J. A. Izaguirre, S. Reich, and R. D. Skeel. Longer time steps for molecular dynamics. J. Chem.
Phys., 110(19):9853–9864, May 15, 1999.
[13] L. Kale, R. Skeel, M. Bhandarkar, R. Brunner, A. Gursoy, N. Krawetz, J. Phillips, A. Shinozaki,
K. Varadarajan, and K. Schulten. NAMD2: Greater scalability for parallel molecular dynamics.
J. Comp. Phys., 151:283–312, 1999.
[14] R. D. Skeel. Integration schemes for molecular dynamics and related applications. In M.
Ainsworth, J. Levesley, and M. Marletta, editors, The Graduate Student’s Guide to Numerical
Analysis, SSCM, pages 119-176. Springer-Verlag, Berlin, 1999
[15] R. Zhou, , E. Harder, H. Xu, and B. J. Berne. Efficient multiple time step method for use with
ewald and partical mesh ewald for large biomolecular systems. J. Chem. Phys., 115(5):2348–
2358, August 1 2001.
76
Other references (cont.)
[16] E. Barth and T. Schlick. Extrapolation versus impulse in multiple-time-stepping schemes: Linear
analysis and applications to Newtonian and Langevin dynamics. J. Chem. Phys., 1997.
[17] I. Pagonabarraga, M. H. J. Hagen and D. Frenkel. Self-consistent dissipative particle dynamics
algorithm. Europhys Lett., 42 (4), pp. 377-382 (1998).
[18] G. Besold, I. Vattulainen, M. Kartunnen, and J. M. Polson. Towards better integrators for
dissipative particle dynamics simulations. Physical Review E, 62(6):R7611–R7614, Dec. 2000.
[19] R. D. Groot and P. B. Warren. Dissipative particle dynamics: Bridging the gap between atomistic
and mesoscopic simulation. J. Chem. Phys., 107(11):4423–4435, Sep 15 1997.
[20] I. Pagonabarraga and D. Frenkel. Dissipative particle dynamics for interacting systems. J. Chem.
Phys., 115(11):5015–5026, September 15 2001.
[21] Y. Duan and P. A. Kollman. Pathways to a protein folding intermediate observed in a 1microsecond simulation in aqueous solution. Science, 282:740-744, 1998.
[22] M. Levitt. The molecular dynamics of hydrogen bonds in bovine pancreatic tripsin inhibitor
protein, Nature, 294, 379-380, 1981
[23] M. P. Allen and D. J. Tildesley, Computer Simulation of Liquids, Oxford University Press, New
York, 1987.
[24] E. Hairer, C. Lubich and G. Wanner. Geometric numerical integration: structure-preserving
algorithms for ordinary differential equations. Springer, 2002
[25] C. B. Anfinsen. Principles that govern the folding of protein chains. Science, 181, 223-230, 1973
77
Flood of data – genes and proteins
http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html
78
Flood of data (cont.)
http://us.expasy.org/sprot/relnotes/relstat.html
79
Flood of data (cont.)
PDB Content Growth
http://www.rcsb.org/pdb/holdings.html
80
Flood of data (cont.)
http://us.expasy.org/sprot/relnotes/relstat.html
81
Pseudo-code of an MD simulation
MD Simulation:
(1) Pre-processing: Construct initial configuration of x0, v0 and F0;
(2) loop 1 to number of steps
(a) Update velocities (by a half step)
(b) Update positions (by a full step)
(c) Evaluate forces on each atom
(d) Update velocities (by a half step)
(3) Post-processing
Algorithm 1: Pseudo-code of an MD simulation.
82
Why protein folding?

Huge gap: sequence data and 3D structure data




EMBL/GENBANK, DNA (nucleotide) sequences
15 million sequence, 15,000 million base pairs
SWISSPROT, protein sequences
120,000 entries
PDB, 3D protein structures
20,000 entries
Bridging the gap through prediction

Aim of structural genomics:


“Structurally characterize most of the protein sequences by an efficient
combination of experiment and prediction,” Baker and Sali (2001)
Thermodynamics hypothesis:
Native state is at the global free energy minimum
Anfinsen (1973)
83
Folding: challenging time scale gap

Longest fully solvated:



Duan and Kollman, 1998
1 s single trajectory/100 days Cray T3E
IBM’s Blue Gene project: 1999 -- 2005

Petaflop (1015 flops per second) computer for
folding proteins by 2005
84
Stömer/Verlet/Leapfrog

Discretization of Eq. (1) as
qn1  2qn  qn1  h2 f (qn )


Stömer (astronomy, its higher order variants,
1907, aurora borealis), Verlet (molecular dynamics,
1967), Leapfrog (partial differential equations)
An explicit one-step method:
Vh : (qn , vn )  (qn1 , vn 1 ) :
vn1/ 2  vn 

h
h
f (qn ), qn 1  qn  hvn 1/ 2 , vn 1/ 2  vn 1/ 2  f (qn 1 )
2
2
Second order accurate, symplectic, and time
reversible
85
Verlet-I/r-RESPA/Impulse
Grubmüller,Heller, Windemuth and Schulten, 1991
Tuckerman, Berne and Martyna, 1992



The state-of-the-art MTS integrator
Splitting via switching functions
2nd order accurate, time reversible
half kick: vn 1/ 2  vn  t f slow (qn ) / 2
oscillate: update positions and momentum
using leapfrog (t/2, much smaller time steps)
Algorithm 1. Half step discretization of Impulse integrator
86
Nonlinear resonance of Impulse
Ma, Izaguirre and Skeel (2003)


Approach

Analytical: Stability conditions using only model
parameters for a simple nonlinear problem

Numerical: Long simulations differing only in outer time
steps; correlation between step size and instability
Results: energy growth occurs unless
longest t < 1/3 shortest period.

Unconditionally unstable 3rd order resonance

Flexible waters: outer time step less than 3~3.3fs

Proteins w/ SHAKE: time step less than 4~5fs
87
Nonlinear instability: analytical

Approach:


1-D nonlinear model problem, in the neighborhood of
stable equilibrium
MTS splitting of potential:
U (q)  (2q2 / 2)oscillate  ( Aq2 / 2  Bq3 / 3  Cq4 / 4)kick  O(q5 ).



Analyze the reversible symplectic map
Express stability condition in terms of problem
parameters
Result:



3rd order resonance stable only if “equality” met
4th order resonance stable only if “inequality” met
Impulse unstable at 3rd order resonance in practice
88
Nonlinear: analytical (cont.)

Main result. Let
   2 , where     i , where
  1  hs ' A /(2c ') if c '  0,   0 if c '  0, and
  1  hc ' A /(2s ') if s '/   0,   0 if s '/   0, and
s '  sin(h / 2) and c '  cos(h / 2).
1. (3rd order) Map stable at equilibrium if
B  0, C  0, and unstable if B  0.

Impulse is unstable in practice.

May be stable at the 4th order resonance.
2. (4th order) Map stable if C  0 or C  2hB2 s ' c '/ ,
and unstable if 0  C  2hB2 s ' c '/ .
89
Nonlinear: numerical (cont.)
Fig. 2: Left: flexible melittin protein (PDB entry 2mlt). Right: Energy drift from 10ns
MD simulation of flexible 2mlt at room temperature revealing 3:1 nonlinear resonance
(at 3, 3.27 and 3.78 fs).
90
Nonlinear: numerical (cont.)
Energy drift from 500ps SHAKE-constrained MD simulation of explicitly solvated
2mlt at room temperature revealing combined 4:1 and 3:1 nonlinear resonance.
91
B-spline MOLLY (cont.)
Total energy(Kcal/mol) vs. time (fs)
Relative drift of toal energy:
< 1.0% for t = 5.0 fs,
< 3.5% for t = 6.0 fs.
92
TM: background (cont.)

Dissipative particle dynamics:



Coarsening
Pair-wise random/dissipative force on elements
(thus momentum preserving)
Time reversible if using self-consistent
scheme (self-consistent dissipative leapfrog,
SCD-leapfrog)
Vi
FRi, FDi
Vj
FRj = - FRi
FDj = - FDi
93
Self-consistent dissipative leapfrog
Pagonabarraga, Hagen and Frenkel, 1998
1. 1/2 kick: vi  vi   t ( Fi C  Fi D  Fi R ) /(2m),
2. Oscillate: ri  ri  vi t
3. Evaluate: Fi C (rj ), Fi D (rj , v j )   (vij  eij )eij , Fi R (rj )   eijij /  t

4. 1/2 kick (a): vi  vi   t ( Fi C  Fi R ) /(2m)

5. 1/2 kick (b): vi  vi   t Fi D /(2m)
6. Evaluate: Fi D (rj , v j )
Iterate steps 5 and 6 k times to obtain consistency of Fi D and vij .
Algorithm 2. One step of the self-consistent dissipative leapfrog discretization
94
Binding effectiveness computation

Structure based drug design



First, study molecular details of a function or
condition
Then, find a protein target for a disease
Third, design the drug that binds to the
protein target



Determine protein structure (experiments,
prediction)
Search drug databases – screening
Effectiveness of binding using molecular dynamics
95
Small proteins
Beta hairpin protein
Villin head piece protein
http://www.stanford.edu/group/pandegroup/Cosm/phase2.html
96
Targeted MOLLY discretization
One step of Targeted MOLLY:
(1) Update velocities w/ mollified “slow” forces (by a half long-step)
(2) Propagate positions and velocities w/ “fast” forces and pairwise
targeted Langevin coupling (by a full long-step)
(3) Do a time-averaging of positions using “fastest” forces and
evaluate the mollified “slow” force
(4) Update velocities w/ mollified “slow” forces (by a half long-step)
Algorithm 1: Pseudo-code of one step of Targeted MOLLY
97
TM: one step of discretization
1
MOLLIFIED KICK: pn1  pn 1  t F slow / 2,
2
OSCILLATE: Propagate qn 1 and pn1 by integrating
q  M 1 p, p  F fast (q)  F R (q)  F D (q)
for an interval t to get qn and p -n using SCD-leapfrog.
_
A TIME AVERAGING: Calculate time-averaged positions q n and
_
a Jacobian matrix, J n   q q n using only the fastest forces, F reduced (q).
_
1

slow
MOLLIFIED KICK: pn  pn  t F (q n ) / 2.
2
Algorithm 2. One step of the Targeted MOLLY (TM) discretization
98
TM: canonical ensemble sampling
H ( x, v)  v2 / 2  2 x2 / 2  ( x 1)4 / 4, with x(0)  0, v(0)  1, 2  100.
Fig. 3: Phase diagram (left) and velocity distribution (right) of the FPU problem [Hairer, C. Lubich
and G. Wanner 2002] using TM with Langevin coupling coefficient of 0.01 showing TM recovers
canonical ensemble after 2E07 steps with initial conditions x(0) = 0, v(0) = 1.
99
MD in action: coalescence
Coalescence of water droplets w/ surfactants in gas phase.
http://www.cse.nd.edu/~lcls/protomol
100
ProtoMol: object oriented framework
for MD Matthey and Izaguirre (2001), Matthey, et al (2003)
Kale, Skeel, et al (1999)
http://www.cse.nd.edu/~lcls/protomol
101
ProtoMol (cont.)
102
Approx. MD: ion channel gating
KcsA potassium channel, courtesy of the Theoretical and Computational Biophysics
Group at UIUC, http://www.ks.uiuc.edu/Research/smd_imd/kcsa/
103
MD: Verlet Method
Energy function:
used to determine the force on each atom:
Newton’s equation represents a set of N second order differential
equations which are solved numerically at discrete time steps to
determine the trajectory of each atom.
Advantage of the Verlet Method:
requires only one force evaluation per timestep
104
What is a Forcefield?
In molecular dynamics a
molecule is described as a
series of charged points
(atoms) linked by springs
(bonds).
To describe the time evolution of bond lengths, bond angles and
torsions, also the nonbond van der Waals and elecrostatic
interactions between atoms, one uses a forcefield.
The forcefield is a collection of equations and associated constants
designed to reproduce molecular geometry and selected properties
of tested structures.
105
Energy Terms Described in the
CHARMm forcefield
Bond
Dihedral
Angle
Improper
106
Energy Functions
Ubond = oscillations about the equilibrium bond length
Uangle = oscillations of 3 atoms about an equilibrium angle
Udihedral = torsional rotation of 4 atoms about a central bond
Unonbond = non-bonded energy terms (electrostatics and Lenard-Jones)
107
Multigrid I
108
Multigrid II
109
Results (10-4 rPE)
110
Simulation Results for Melittin



PME requires about 3% of the CPU time (17
days 20 hours) when measured against
Ewald
MG in pbc requires only about 1%
MG is about 66% faster than PME
111