Solving NMR Structures II: Calculation and evaluation

Download Report

Transcript Solving NMR Structures II: Calculation and evaluation

Solving NMR Structures II:
Calculation and evaluation
What NMR-based (solution) structures look like
the NMR ensemble
inclusion of hydrogen coordinates
Methods for calculating structures
distance geometry, restrained molecular
dynamics, simulated annealing
Evaluating the quality of NMR structures
resolution, stereochemical quality, restraint
violations, etc
NMR data do not uniquely define a 3D
protein structure (single set of
coordinates)
• Restraints are ranges of allowed distances, angles etc. rather than
single values, reflecting the fact that the experimental data contain
uncertainties both in measurement and interpretation.
• Only a limited number of the possible restraints are observable
experimentally
due to peak overlap/chemical shift degeneracy, lack of
stereospecific assignments, etc.
• View of protein structure as a single set of atomic coordinates may
itself be physically unrealistic!
proteins are dynamic molecules
The NMR Ensemble
•
NMR methods not calculate a single structure, but rather repeat a structure
calculation many times to generate an ensemble of structures
•
The structure calculations are designed to thoroughly explore all regions of
conformational space that satisfy the experimentally derived restraints
•
At the same time, they often impose some physical reasonableness on the
system, such as bond angles, distances and proper stereochemistry.
•
The ideal result is an ensemble which
A. satisfies all the experimental restraints (minimizes violations)
B. at the same time accurately represents the full permissible conformational
space under the restraints (maximizes RMSD between ensemble members)
C. looks like a real protein
The NMR Ensemble
The fact that NMR
structures are reported
as ensembles gives
them a “fuzzy”
appearance which is
both informative and
sometimes annoying
At right, an ensemble of 25
structures for Syrian hamster prion
protein(only the backbone is
shown)
Liu et al. Biochemistry (1999) 38, 5362.
NMR structures include hydrogen
coordinates
• X-ray structures do not generally include hydrogen atoms in atomic
coordinate files, because the heavy atoms dominate the diffraction
pattern and the hydrogen atoms are not explicitly seen.
• By contrast, NMR restraints such as NOE distance restraints and
hydrogen bond restraints often explicitly include the positions of
hydrogen atoms. Therefore, these positions are reported in the PDB
coordinate files.
Methods for structure calculation
•
•
•
•
distance geometry (DG)
restrained molecular dynamics (rMD)
simulated annealing (SA)
hybrid methods
Starting points for calculations
•
•
to get the most unbiased, representative ensemble, it is wise to start
the calculations from a set of randomly generated starting structures.
Alternatively, in some methods the same initial structure is used for
each trial structure calculation, but the calculation trajectory is pushed
in a different initial direction each time using a random-number
generator.
DG--Distance geometry
•
In distance geometry, one uses the nOe-derived distance restraints to
generate a distance matrix, which one then uses as a guide in
calculating a structure
•
Structures calculated from distance geometry will produce the correct
overall fold but usually have poor local geometry (e.g. improper bond
angles, distances)
•
hence distance geometry must be combined with some extensive
energy minimization method to generate physically reasonable
structures
rMD--Restrained molecular dynamics
• Molecular dynamics involves computing the potential energy V
with respect to the atomic coordinates. Usually this is defined as
the sum of a number of terms:
Vtotal= Vbond+ Vangle+ Vdihedr+ VvdW+ Vcoulomb+ VNMR
• the first five terms here are “real” energy terms corresponding to
such forces as van der Waals and electrostatic repulsions and
attractions, cost of deforming bond lengths and angles...these
come from some standard molecular force field like CHARMM
or AMBER
• the NMR restraints are incorporated into the VNMR term, which is
a “pseudoenergy” or “pseudopotential” term included to
represent the cost of violating the restraints
Pseudo-energy potentials for rMD
•
Generate “fake” energy potentials representing the cost of violating the
distance or angle restraints. Here’s an example of a distance restraint
potential
KNOE(rij-riju)2 if rij>riju
VNOE =
0
if rijl<rij < riju
KNOE(rij-rij1)2 if rij<rijl
where rijl and riju are the lower and upper bounds
of our distance restraint, and KNOE is some
chosen force constant, typically ~ 250 kcal mol-1 nm-2
So it’s somewhat permissible to violate restraints but it raises V
Example of nOe pseudopotential
VNOE
potential
rises
steeply
with degree
of violation
0
rijl
riju
SA-Simulated annealing
• SA is essentially a special implementation of rMD and uses
similar potentials but employs raising the temperature of the
system and then slow cooling in order not to get trapped in local
energy minima
• SA is very efficient at locating the global minimum of the target
function
Dealing with ambiguous restraints
•
•
•
•
often not possible to tell which atoms are involved in a NOESY
crosspeak, either because of a lack of stereospecific assignments or
because multiple protons have the same chemical shift
sometimes an ambiguous restraint is included but is expressed
ambiguously in the restraint file, e.g. 3 HA --> 6 HB#, where the #
wildcard indicates that the beta protons of residue 6 are not
stereospecifically assigned. This is quite commonly done for
stereochemical ambiguities.
it is also possible to leave ambiguous restraints out and then try to
resolve them iteratively using multiple cycles of calculation. This is
often done for restraints that involve more complicated ambiguities, e.g.
3 HA-->10 HN, 43 HN, or 57 HN, where three amides all have the same
shift.
can also make stereospecific assignments iteratively using what are
called floating chirality methods
Example of resolving an ambiguity
during structure calculation
9.52 ppm
range of interatomic
distances observed
in trial ensemble
9-11 Å
A
B
4.34 ppm
3-4 Å
C
4.34 ppm
Due to resonance overlap
between atoms B and C,
an NOE crosspeak
between 9.52 ppm
and 4.34 ppm could
be A to C or A to B-this restraint is ambiguous
But if an ensemble generated with
this ambiguous restraint
left out shows that A is never
close to B, then the restraint must
be A to C.
Iterative structure calculation with
assignment of ambiguous restraints
start with some set
of unambiguous NOEs
and calculate an ensemble
•there are programs such
as ARIA, with automatic
routines for iterative
assignment of ambiguous
restraints. The key to
success is to make
absolutely sure the
restraints you start with
are right!
source:
http://www.pasteur.fr/recherche/unites/Binfs/aria/
Acceptance criteria: choosing
structures for an ensemble
•
typical to generate 50 or more trial structures, but not all will converge
to a final structure that is physically reasonable or consistent with
the experimentally derived NMR restraints. We want to throw such
structures away rather than include them in our reported ensemble.
•
these are typical acceptance criteria for including calculated structures
in the ensemble:
– no more than 1 nOe distance restraint violation greater than 0.4 Å
– no dihedral angle restraint violations greater than 5
– no gross violations of reasonable molecular geometry
•
sometimes structures are rejected on other grounds as well:
– too many residues with backbone angles in disfavored regions of
Ramachandran space
– too high a final potential energy in the rMD calculation
Precision of NMR Structures
(Resolution)
•
judged by RMSD of superimposed ensemble of accepted structures
•
RMSDs for both backbone (Ca, N, CC=O) and all heavy atoms (i.e.
everything except hydrogen) are typically reported, e.g.
bb 0.6 Å
heavy 1.4 Å
•
sometimes only the more ordered regions are included in the reported
RMSD, e.g. for a 58 residue protein you will see RMSD (residues 5-58)
if residues 1-4 are completely disordered.
Reporting ensemble RMSD
•
two major ways of calculating RMSD of the ensemble:
– pairwise: compute RMSDs for all possible pairs of structures in the
ensemble, and calculate the mean of these RMSDs
– from mean: calculate a mean structure from the ensemble and
measure RMSD of each ensemble structure from it, then calculate
the mean of these RMSDs
– pairwise will generally give a slightly higher number, so be aware
that these two ways of reporting RMSD are not completely equal.
Usually the Materials and Methods, or a footnote somewhere in the
paper, will indicate which is being used.
“Minimized average” structure
•
•
•
•
a minimized average is just that: a mean structure is calculated from
the ensemble and then subjected to energy minimization to restore
reasonable geometry, which is often lost in the calculation of a mean
this is NMR’s way of generating a single representative structure from
the data. It is much easier to visualize structural features from a
minimized average than from the ensemble.
for highly disordered regions a minimized average will not be
informative and may even be misleading--such regions are sometimes
left out of the minimized average
sometimes when an NMR structure is deposited in the PDB, there will
be separate entries for both the ensemble and the minimized average.
It is nice when people do this. Alternatively, a member of the ensemble
may be identified which is considered the most representative (often
the one closest to the mean).
How many restraints do we need to get
a high-resolution NMR structure?
•
•
•
usually ~15-20 nOe distance restraints per residue, but the total # is not
as important as how many long-range restraints you have, meaning
long-range in the sequence: |i-j|> 5, where i and j are the two residues
involved
good NMR structures usually have ≥ ~ 3.5 long-range distance
restraints per residue in the structured regions
to get a very good quality structure, it is usually also necessary to have
some stereospecific assignments, e.g. b hydrogens; Leu, Val methyls
Assessing Structure Quality
•
•
•
•
NMR spectroscopists usually run their ensemble through the program
PROCHECK-NMR to assess its quality
high-resolution structure will have backbone RMSD ≤ ~0.8 Å, heavy
atom RMSD ≤ ~1.5 Å
low RMS deviation from restraints (good agreement w/restraints)
will have good stereochemical quality:
– ideally >90% of residues in core (most favorable) regions of
Ramachandran plot
– very few “unusual” side chain angles and rotamers (as judged by
those commonly found in crystal structures)
– low deviations from idealized covalent geometry
Structural Statistics Tables
list of restraints,
# and type
calculated energies
agreement of
ensemble structures
with restraints (RMS)
precision of
structure (RMSD)
sometimes also see listings of Ramachandran statistics, deviations
from ideal covalent geometry, etc.