free_energy_methods

Download Report

Transcript free_energy_methods

Free energy calculations
General methods
Free energy is the most important quantity that
characterizes a dynamical process.
Two types of free energy calculations:
1. Path independent methods for calculation of relative
binding free energies (e.g. free energy perturbation
(FEP) , thermodynamic integration(TI).
2. Path dependent methods for calculation of absolute
binding free energies ,e.g. umbrella sampling (US) with
weighted histogram analysis method (WHAM), steered
MD with Jarzynski’s equation (JE)
1
Example : Binding of a K+ ion to gramicidin A (gA)
Initial state (A): K+ ion in bulk, a water molecule at the binding site.
Final state (B): K+ ion at the binding site, water in place of the ion.
In the path independent method, we calculate the binding energy of the
K+ ion from the free energy difference between the two states:
W
gA
K+
+ K+
gA
(bulk)
A
+ W
(bulk)
B
G ( A  B )  G ( B )  G ( A)
2
In the path dependent method, we first choose a continuous path
from the binding site to bulk water (reaction coordinate). In the case
of gramicidin, the channel axis is the obvious choice for the reaction
coordinate. The free energy profile of the K+ ion along this path is
calculated using a method such as umbrella sampling. The free energy
of binding is given by the difference in free energy between the
binding site and the bulk.
K+
K+
gramicidin
B
bulk
A
G ( A  B )  G ( B )  G ( A)
3
Free energy perturbation (FEP)
Free energy differences can be calculated relatively easily and several
methods have been developed for this purpose. The starting point for
most approaches is Zwanzig’s perturbation formula for the free energy
difference between two states A and B:
 G ( A  B )  G B  G A   kT ln exp
(H B  H
 G ( B  A )  G A  G B   kT ln exp
(H
A)/
kT
A
AH B
) / kT
B
G ( A  B )   G ( B  A)
The equality should hold if there is sufficient sampling.
However, if the two states are not similar enough, this is difficult to
achieve and there will be a large hysteresis effect (i.e. the forward and
backward results will be very different).
4
Derivation of the perturbation formula
From statistical mechanics, the Helmholtz free energy is given by
(we will assume it is the same as Gibss free energy and use G for it)
G  kT ln
(Z: partition function)
1
Z
 kT ln
e
H / kT
e
e
 H / kT
 H / kT
dpdq
 kT ln e
H / kT
dpdq
G ( A  B )  G B  G A
 kT ln e
H B / kT
 kT ln e
H
B
  kT ln exp
(H B  H
A
/ kT
A
A)/
kT
A
Where it is assumed that the states A and B are similar
5
FEP with alchemical transformation
To obtain accurate results with the perturbation formula, the energy
difference between the states should be < 2 kT, which is not satisfied
for most biomolecular processes. To deal with this problem, one
introduces a hybrid Hamiltonian
H ( l )  (1  l ) H A  l H B
and performs the transformation from A to B gradually by changing
the parameter l from 0 to 1 in small steps. That is, one divides [0,1]
into n subintervals with {li, i = 0, n}, and for each li value, calculates
the free energy difference from the ensemble average
 G ( l i  l i 1 )   kT ln exp[  ( H ( l i 1 )  H ( l i ))] / kT
li
6
The total free energy change is then obtained by summing the
contributions from each subinterval
n 1
 G ( 0  1) 
  G ( l i  l i 1 )
i0
The number of subintervals is chosen such that the free energy
change at each step is < 2 kT, otherwise the method may lose its
validity. Points to be aware of:
1.
Most codes use equal subintervals for li. But the changes in Gi
are usually highly non-linear. One should try to choose li such
that Gi remains around 1-2 kT for all values.
2. The simulation times (equilibration + production) have to be
chosen carefully. It is not possible to extend them in case of
non-convergence (have to start over).
7
Thermodynamic integration (TI)
Another way to obtain the free energy difference is to integrate
the derivative of the hybrid Hamiltonian H(l):
H
dG
dl

 l
e
e
 H / kT
 H / kT
dpdq

dpdq
1
H
l
G 
l

0
H (l )
l
dl
l
This integral is evaluated most efficiently using a Gaussian quadrature.
In typical calculations for ions, 7-point quadrature is sufficient.
(But check that 9-point quadrature gives the same result for others)
The advantage of TI over FEP is that the production run can be
extended as long as necessary and the convergence of the free energy
can be monitored (when the cumulative G flattens, it has converged).
8
Example: Free energy change in mutation of a ligand
A very common question is how a mutation in a ligand (or protein)
changes the free energy of the protein-ligand complex.
 G B   G A   G bs ( A  B )   G bulk ( A  B )
+
A
GA
Gbulk(AB)
+
GB
Thermodynamic cycle
Gbs(AB)
B
9
Applications
1.
Ion selectivity of potassium channels
 G sel ( K




 Na )   G b ( Na )   G b ( K )
  G bs ( K


 Na )   G bulk ( K


 Na )
2. Selectivity of amino acid transporters (e.g. glutamate transporter)
 G sel ( Asp  Glu )   G b ( Glu )   G b ( Asp )
  G bs ( Asp  Glu )   G bulk ( Asp  Glu )
3. Free energy change when a sidechain is mutated in a bound ligand.
Similar calculation as above. Important in developing drug leads
from peptides.
10
2. Path dependent methods
Consider the previous example of binding of a K+ ion to the gramicidin
channel. In the path dependent method, K+ ion is moved from bulk to
the binding site in small steps and the free energy profile, W(z)
(also called potential of mean force or PMF), is constructed.
The relative binding free energy is given by
 G ( bulk  bs )  W ( bs )  W ( bulk )
The binding constant and the absolute binding free energy are
determined from the PMF by invoking a 1D approximation
K eq 
e
Vol
 W ( r ) kT
bs
d r  R
3
2
e
 W ( z ) kT
dz
bulk
 G bind   kT ln  K eq C 0 )
11
Calculation of PMF from umbrella sampling
One samples the ligand position along a reaction coordinate and
determines the potential of mean force (PMF) from the Boltzmann eq.
 ( z)   ( z0 ) e
 [W ( z )  W ( z 0 )] / kT
  (z)
 W ( z )  W ( z 0 )  kT ln 
  ( z0 )




Here z0 is a reference point, e.g. a point in bulk where W vanishes.
In general, a particle cannot be adequately sampled at high-energy
points. To counter that, one introduces harmonic potentials, which
restrain the particle at desired points, and then unbias its effect.
For convenience, one introduces umbrella potentials at regular intervals
along the reaction coordinate (e.g. ~0.5 Å). The PMF’s obtained in each
interval are unbiased and optimally combined using the Weighted
Histogram Analysis Method (WHAM).
12
Points to consider in umbrella sampling
Two main parameters in umbrella sampling are the force constant, k
and the distance between windows, d. In bulk, the position of the
ligand will have a Gaussian distribution given by
P(z) 
1
2 
e
 ( z  z0 )
2
2
2
,
z  z0 ,
 
k BT / k
The overlap between two Gaussian distributions separated by d
% overlap  1  erf ( d / 8 )
The parameters should be chosen such that 10% > % overlap > 5%
If the overlap is too small, PMF will have discontinuities
If it is too large, simulations are not very efficient.
Steered MD (SMD) simulations and Jarzynski’s equation
Steered MD is a more recent method where a harmonic force is
applied to an atom on a peptide and the reference point of this force
is pulled with a constant velocity. It has been used to study unfolding
of proteins and binding of ligands. The discovery of Jarzynski’s
equation in 1997 enabled determination of PMF from SMD, which has
boosted its applications.
Jarzynski’s equation:
Work done by
the harmonic force
e
  F / kT
 e
 W / kT
f
W 
 F .ds ,
F  k [ r  ( r0  v t )]
i
This method seems to work well in simple systems and when G is large
but beware of its applications in complex systems!
14
Beware of: 1) Problems with force fields
The force fields that are commonly used in MD simulations (e.g.,
CHARMM, AMBER, GROMOS) neglect the polarization interaction.
While the effects of induced polarization have been included in a mean
field sense by boosting the partial charges, such an approximation is
expected to work only in the environment where the force field has
been optimized but not in a different situation.
The most relevant case is the force fields for proteins, which are
optimized for bulk water. One has to be wary of using the same
force fields for membrane proteins because lipid molecules have a very
different polarization characteristic compared to water (dielectric
constants are 2 and 80, respectively)
Other cases that require caution are: interfaces and highly charged
15
ions.
2) Problems with sampling
At zero temperature, the potential function U is sufficient to
characterize the system completely. At room temperature, the
fundamental quantity is the free energy, F = U  TS, which creates the
sampling problem. Example: F=  24, U=  41, and TS=  17 (kJ/mol)
for liquid water at STP.
Statistical weight:
P ( x) ~ e
 U ( x ) / kT
But if S2 >> S1
we may have
F2 < F1
16
Examples from gramicidin A channel
• Dimer formed by two righthanded β helices
• Each monomer consists of 16
amino acid residues
• Pore is 26 Å long, 4 Å in diamet.
• Structure is stabilized by
hydrogen bonds
• Occupied by a single-file water
chain (~7)
• Water dipoles are aligned with
the channel axis
• Conducts monovalent cations at
diffusion rates (divalent ions bind
and block)
17
1. Potential energy profile for a K+ ion in gramicidin A
BD simulations – inverting data gives | MD simulations – Pot. mean force
Uw = 8 kT, Ub = 5 kT,
Uw = 5 kT, Ub = 22 kT
18
Free energy calculations
Free energy differences are calculated using the thermodynamic
integration (TI) and free energy perturbation (FEP) methods.
e.g. a K+ ion in bulk is translocated to the gA center while the water
molecule at that position is translocated to the ion’s position.
Two step process (via a neutral water, W0) to minimize fluctuations:
W  W0  K+ (gA center)
K+  W0  W (bulk)
FE (kT) G+
To check hysteresis effects, free energy TI
differences are calculated both in
forward (G+) and backward (-G-)
directions.
FEP
PMF
G_
Gav
11.2
12.2
11.7
13.2
14.0
13.6
13.0
19
Free energy of translocating a K+ ion to the gA center
Running average
for 700 ps
Solid: forward
(bulk to gA)
Dashed: backward
1. Convergence:
The free energy plot
should become flat
2. No hysteresis:
The two results should
agree within 1 kcal/mol
Ab initio simulations in
gramicidin show the
importance of polarization int.
Distribution of water dipole
moments in bulk and in gramicidin
In the presence of a K+ ion,
the dipole moment of
hydration waters decreases
in bulk but increases in the
gramicidin A channel.
21
Electrostatic energy of a K+ ion + 6 waters
22
PMF results for K+, Ca2+ and Cl ions
Each window is
simulated for 400 ps
Well depths:
Ub(K) ~ 7 kT
Ub(Ca) ~ 2 kT
Ca2+ binding to gA and
blocking of K+ ions
cannot be explained.
*** Problems with
divalent ions ***
23
Lessons from the gramicidin simulations
1.
Current force fields which ignore polarization are not expected to
work in narrow pores where water and ions form a single file.
Ab initio MD calculations indicate that hydration waters of a K+ ion
are more polarized in gA than in bulk.
2. Hydration waters around a divalent ion are more polarized than
those of a monovalent ion.
Example: dipole moment of water from ab initio calculations:
Bulk water: 3.0 Debye
Hydration shell of K+ ion: 2.8 Debye
Hydration shell of Ca2+ ion: 3.4 Debye
Thus the current force fields, which are optimised for monovalent
ions, cannot work well for divalent ions.
24
2. Sampling problem in a simple vs complex system:
Test of Jarzynski’s Equation
Carbon nanotube
Gramicidin A channel
25
Comparison of K+ ion PMF’s obtained from umbrella sampling & WHAM
and from Jarzynski’s equality using steered MD simulations
Carbon nanotube
Gramicidin A channel
v(A/ns)
26
Sampling is more difficult in non-equilibrium methods
1.
In a carbon nanotube, interaction of the K+ ion (and the hydration
waters) with the C atoms on the wall are short range, hence
equilibration of the system is quite fast.
In such a situation, Jarzynski’s Equation works as well as umbrella
sampling, and because it is simpler to implement, it would be the
method of choice
2. In the gramicidin channel, the K+ ion (and the hydration waters)
interact with the charged atoms on the protein wall. Because
Coulomb interaction is long range, equilibration takes more time.
In such cases, Jarzynski’s Equation is not very reliable, and umbrella
sampling should be preferred for accurate results.
27
Equilibration and convergence issues in PMF calculations
•
Finite resources means we need to make optimal choices for
equilibration and production times in free energy calculations.
•
Equilibration is the initial simulation data, where the system is
still evolving (not equilibrated yet) and must be thrown away.
Choosing it too short will blemish the result and too long will
waste computing time.
•
During production, the system is fluctuating around equilibrium.
It must be run long enough to allow the system to sample all
energetically important states. Otherwise the calculations will
not be accurate. Convergence tests can be used for this purpose
but note that there are no absolute criteria that one can use
(running longer is the only choice if you are in doubt).
28
Example: ab initio calculation of PMF’s for Na-Cl and Ca-Cl
The ion-ion potentials in force fields are determined from combination
rules with no direct experimental input . This is not satisfactory and
any guidance from ab inito calculations would be very useful.
In the examples below the PMF’s for the dissociation of Na-Cl and
Ca-Cl ion pairs are calculated from ab initio MD (Car-Parrinello MD)
simulations using the constraint-force method (faster than umbr. samp).
The average force needed to keep the ions at a fixed distance, r, is
calculated for a range of r values at 0.1 - 0.2 A intervals and these are
integrated to determine the PMF.
Note that ion-water dynamics is fast which makes these picosecond
ab initio calculations feasible. They would not be feasible for proteins.
Example: PMF for dissociation of Na-Cl
Total run is 6 ps. The data is divided into 1 ps blocks to check equilibration
Here 2 ps of data are dropped and the PMF is obtained from the last 4 ps.
1-2 ps equilibration
3-6 ps production
(black line)
Another way to check the equilibration is to drop successively more data
for equilibration and see if the result changes.
r = 3.1 A
r = 3.9 A
r = 4.7 A
Comparison of ab initio and classical PMF’s for Na-Cl
None of the classical force fields can match the ab initio PMF. In
particular AMBER has a deep contact min. which leads to crystallization
Accelerated MD for speeding up convergence
Using biasing potentials in the low energy regions of the potential
energy surface, barriers can be lowered, leading to faster convergence.
For the Na-Cl PMF considered here, this leads to ~4-fold speed up.
Example: PMF for dissociation of Ca-Cl
Convergence could not be obtained after 23 ps of ab initio simulation.
Inspection of the forces shows large variations according n(Ca).
Ca hydration
numbers
n(Ca) = 5 r<3.7
n(Ca) = 5 or 6
for r 3.7 – 4.9
n(Ca) = 6 r>4.9
Ca-Cl PMF and its dependence on n(Ca)
The PMF with n(Ca) = 5 in the intermediate region is unphysical
Switching to n(Ca) = 6 yields a reasonable PMF .
Comparison of ab initio and classical PMF’s for Na—Cl
The CHARMM force field does a better job than that of Dang & Smith
but it still needs to be improved.
Example: PMF for a K+ ion in the Kv1.2 potassium channel
The trigger for permeation of K+
ions is the entry of the K+ ion at
cavity to the S4 binding site.
To find out whether the K+ ion can
bind to S4 while S1 – S3 are
occupied, or they have to move to
S0 - S2 to enable binding, two
PMF’s are constructed with the
final states S1 – S3 – S4 and
S0 – S2 – S4.
The first check in PMF
calculations is whether there
are sufficient overlaps
between the neighbouring
windows.
This can be achieved by
visually checking the density
plots for all the windows (top)
or by directly calculating the
overlaps and plotting them in
a bar graph (bottom).
Next decide on the equilibration
time and start collecting density
data. How long do we run?
An efficient way to decide is to
run in small blocks and check for
convergence in the accumulated
data.
In the example here, the total
run is 600 ps and 100 ps is
dropped for equilibration.
To show convergence, data are
added in 100 ps blocks and a PMF
is constructed from the
accumulated data at every 100 ps
One acumulates a great deal of
trajectory data during the PMF
calculations, and it would be pity
if all one extracts from it is the
reaction coordinate of the ion.
A detailed picture of the
reaction process can be
obtained using visualisation
methods (e.g. making a video!)
Here is an example showing that
in the S1 – S3 – S4 PMF, the
cavity ion does not trap the
water at S4 as it moves in.
The K+ ions in the filter move
together, e.g., from S1 – S3 to
S0 – S2 or S2 – S4.
This can be studied by
constructing the PMF for the
center of mass of the ions.
But how do we know that this is
a reliable reaction coordinate?
A simple way to show this is to
plot the distribution of ion-ion
distances and show that it
remains Gaussian as the pair
moves across the filter (as if
they are connected by a spring)
Example: PMF for binding of charybdotoxin to K+ channel
From the previous examples, we have seen that ions equilibrate quite
fast (~100 ps) and < 1 ns production run is sufficient for PMF.
For complex ligands, the situation
is obviously more complicated.
For one thing, the ligand may be
distorted, which will lead to
erroneous results.
One also requires much longer
equlibration of the system
(typically > 1 ns), and longer
production runs ( > 1 ns).
Convergence of the toxin PMF
Force constant
k=20 kcal/mol/A2
Umbrella windows
at 0.5 A
Each color
represents 400 ps
of sampling.
The first 1.2 ns is
dropped for
equilibration and
PMF is obtained
from the last 2 ns
(black line)
Equilibration and convergence issues in FEP & TI
1. FEP calculations
•
In FEP, one has to decide on the number of windows and the
equilibration time in advance. The windows are created serially,
so if the equilibration time is inadequate, it has to be repeated
using longer equilibration time and the initial data are wasted.
•
A second potential problem in FEP calculations is the requirement
that Gi remains around 1-2 kT for all windows. Because the
change in the free energy is nonlinear, it is very difficult to guess
the number of windows one should use. For the same reason,
using fixed intervals is not optimal. Exponentially spaced
intervals would reduce the required number of windows by half.
44
Example: Na+ binding energy in glutamate transporter
Window G(Na+; b.s. bulk)
40 eq.
22.9
60 eq.
26.3
65 exp.
27.1
Free energy change G at each step of FEP calculation
Exponential versus equal spacing for l
The interval [0, 0.5] is mapped to an exponential for 40 windows.
(Fold it over to get the interval [0.5, 1] )
exp.
equal
2. TI calculations
•
In TI , one only need to specify the number of windows in
advance. The data can be divided into equilibration and
production parts later. Moreover, one can continue accumulating
data if there is a problem with convergence, thus there is no
wastage of data.
•
Convergence can be monitored by plotting the running average of
the free energy. Flattening out of the curve is usually taken as a
sign for convergence.
•
Because small number of windows are used in TI, equilibration may
prove difficult in some systems. An initial FEP calculation with
large number of windows can resolve this problem (choose the TI
windows from the nearest FEP window).
48
Example: Na+ and Asp binding energies in glut. transporter
TI calculation of the
binding free energy of
Na+ ion to the binding
site 1 in Gltph.
Integration is done using
Gaussian quadrature
with 7 points.
Thick lines show the
running averages, which
flatten out as the data
accumulate. Thin lines
show averages over 50
ps blocks of data.
Asp binding energy in glutamate transporter
TI calculation of the
binding free energy of
Asp to the binding site
in Gltph.
Asp is substituted with
5 water molecules.
First 400 ps data
account for equilibration
and the 1 ns of data are
used in the production.