Know the Limitations of your Data – X-ray, NMR, EM PHAR 201/Bioinformatics I Philip E.

Download Report

Transcript Know the Limitations of your Data – X-ray, NMR, EM PHAR 201/Bioinformatics I Philip E.

Know the Limitations of your
Data – X-ray, NMR, EM
PHAR 201/Bioinformatics I
Philip E. Bourne
SSPPS, UCSD
Prerequisite Reading: Structural Bioinformatics Chapters 4-6
PHAR 201 Lecture 3 2012
1
When You Grab a PDB Fie What
Are You Starting With?
PHAR 201 Lecture 3 2012
2
Data Views
• Depositor/Annotator
• Type of experiment: X-ray, NMR, EM
• Type of molecule: protein, nucleic acid, or protein-nucleic acid complex
Step 2
Validation Report
Depositor
Step 1
PDB ID
Archival
Data
Deposit
Annotate
Validate
Step 3
PDB
Entry
Core
DB
Distribution
Site
Corrections
Step 4
Depositor Approval
PHAR 201 Lecture 3 2012
3
Annotation
• Resolve nomenclature and format problems
• Add missing required data items
• Add higher level classifications
• Review validation report and summary letter to the
depositor
• Produce and check final mmCIF and PDB files
• Update status and load database
• Check data consistency across archive
PHAR 201 Lecture 3 2012
4
Annotation – More Specifics
• Make sure entry is complete (mandatory items from mmCIF
dictionary)
• Format exchange
– Converts between PDB and mmCIF formats
– Recognizes most variants of PDB format
• Check nomenclature
– Residue
– Polymer atoms
– Hydrogen atoms
– Ligand atoms
PHAR 201 Lecture 3 2012
5
Validation
• Covalent geometry
– Comparison with standard values (Engh and Huber1; Gelbin
et al.3; Clowney et al.2 )
– Identify outliers
• Stereochemistry – check chiral centers
• Close contacts in asymmetric unit and unit cell
• Occupancy
• Sequence in SEQRES and coordinates
• Distant waters
• Experimental (SFCHECK4)
1R.A.Engh
& R.Huber. Acta Cryst. A47 (1991):392-400
Clowney et al. J.Am.Chem.Soc. 118 (1991):509-518
3A. Gelbin et al. J.Am.Chem.Soc. 118 (1991):519-529
4A.A. Vaguine, J. Richelle, and S.J. Wodak. Acta Cryst. D55
(1999):191-205.
2L.
PHAR 201 Lecture 3 2012
6
The process by which
biological data in a database
are annotated and validated
changes over time – this
introduces a temporal
inconsistency
PHAR 201 Lecture 3 2012
7
Summary Thus Far
• The biocurators (annotators) are the unsung
heroes of modern biology
P.E.Bourne and J. McEntyre 2006 Biocurators: Contributors to the World of Science
PLoS Comp. Biol., (Editorial) 2(10) e142 [PDF]
– International Society for Biocuration
• As a resource developer - start right and the
need for data remediation in years to come will
be less likely
• As a resource user - be aware of the process
used to provide the data and hence the
limitations of the data you are using
PHAR 201 Lecture 3 2012
8
The quality of the data you use in
a bioinformatics experiment is a
function of the method used to
collect these data – understand
the method
PHAR 201 Lecture 3 2012
9
As of Oct 5, 2011
EM
254
PHAR 201 Lecture 3 2012
10
X-ray Crystallography
•
•
•
•
•
•
•
•
•
•
Oldest technique
Majority of the depositions
A number of Nobel prizes
International Union of Crystallography (IUCr) .. Acta ..
Method based on scattering from electrons – hydrogen
atoms usually not seen (sometimes modeled in)
In fact modeling in is an issue
Atoms of similar atomic weight not distinguishable eg O, N,
C
Influence of crystal packing eg malate dehydrogenase
(4MDH)
Environment in crystal highly aqueous
Produces similar structures to NMR eg thioredoxin (3TRX
vs 1SRX)
PHAR 201 Lecture 3 2012
11
The X-ray Crystallography Pipeline
Basic Steps
Crystallomics
• Isolation,
Target • Expression,
Data
Selection • Purification, Collection
• Crystallization
Structure Structure
Solution Refinement
PHAR 201 Lecture 3 2012
Functional
Annotation
Publish
12
Limitations - Crystallization
• Crystallization:
–
–
–
–
Non-soluble
Twinning
Micro heterogeneity
Disorder
PHAR 201 Lecture 3 2012
13
Limitations – Data Collection
PHAR 201 Lecture 3 2012
14
Limitations - Refinement
PHAR 201 Lecture 3 2012
15
Limitations – Map Fitting
• In an intricate study the only way to be sure
that the work is correct is to make your own
judgment from the electron density – this is
never done.
• It can be done at http://eds.bmc.uu.se/eds/
• It requires that the experimental data (the 100d
structure factors be available)
PHAR 201 Lecture 3 2012
16
Limitations – Non-crystallographic
Symmetry (NCS)
PHAR 201 Lecture 3 2012
17
Limitations – Refinement
• Introduces restraints/constraints that may or may
be realistic
• Water has been used unnecessarily
• Resolution quoted wrongly
• Standards have helped
• See for example: H. Weissig, and P.E. Bourne
1999 Bioinformatics 15(10) 807-831. An Analysis
of the Protein Data Bank in Search of Temporal
and Global Trends
PHAR 201 Lecture 3 2012
18
Limitations – Interpretation of the
Biologically Active Molecule
1QQP
http://www.pdb.org/pdb/101/static101.do?p=education_discussion/Looking-at-Structures/bioassembly_tutorial.html
PHAR 201 Lecture 3 2012
19
Limitations – Functional Annotation
• Functional annotation is ONLY in the publication
NOT PDB
• Attempt to address this with GO assignments
• Attempt to address this with literature integration
• Structural genomics – function unknown
• One structure – one to many functions (power
law) – functions may be unrecognized since the
PDB is relatively static
• Many efforts at functional annotation
PHAR 201 Lecture 3 2012
20
Why Are Understanding Limitations
Important?
• Later we will study reductionism – a key
process in the use of biological data
• As a result of reductionism you will need to
choose a representative structure for the
task at hand
• Understanding the limitations of the
experiment will help us do this
PHAR 201 Lecture 3 2012
21
Summary of Important Features in using
Structure Data Determined by X-ray
Crystallography
• Resolution is a key indicator – think about it
relative to atomic resolution ie 1.54A for a C-C
single bond
• Disorder (ie undetermined or alternative atomic
coordinates) is a natural part of many structures
• R factor (all) describes the agreement of the model
with the experimental data. It should be better than
0.20 (Rfree 0.26)
PHAR 201 Lecture 3 2012
22
Summary of Important Features in using
Structure Data Determined by X-ray
Crystallography Cont.
• B (aka temperature)
factors offer indicators
both to the accuracy of
a structure and the
most mobile regions
• At right is 5EBX
drawn with QuickPDB
PHAR 201 Lecture 3 2012
23
NMR
PHAR 201 Lecture 3 2012
24
Features of NMR
• Limited in size (25-100 kDa) – provided labeled samples
are obtainable
• Selected information on proteins to ~150kDa
• Solution study – small sample needed for soluble proteins
• Only a few solid state studies
• Reveals hydrogen positions
• Leads to an ensemble of dynamical structures – these are
rarely used in bioinformatics studies
• Useful in high throughput screens to determine protein
ligand interactions
• Used for phasing of X-ray structures ie the methods are
synergistic
• Until recently applicable to membrane proteins
PHAR 201 Lecture 3 2012
25
NMR - Methodology
• Molecules are tumbling and vibrating with thermal motion
• Usually labeled with H1 C13 N15 P31 - in an external magnetic field
have two spin states – one paired and one opposed to the external
magnetic field
• Detects and assigns chemical shifts of atomic nuclei with non-zero
spin
• The shifts depend on their electronic environments ie identities and
distances of nearby atoms
• The system can be tuned to look at specific features of the
characteristic spin moments
• H1 H1 provides NOE constraints
• Better resolution is obtained when the molecule is tumbling fast – size
slows this – offset by higher magnetic field strengths
• Protein must be soluble at high concentration and stable without
aggregation – high throughput can show this and folded vs unfolded
very quickly
PHAR 201 Lecture 3 2012
26
NMR – Methodology cont.
• Result is a set of distance constraints between pairs of
atoms either bonded or non-bonded
• If there are sufficient constraints then an ensemble of
possibilities results
• Often this ensemble is averaged and constraints adjusted to
conform to normal bond lengths and distances
• Usually left with 15-30 members of the ensemble
• Ideally less than 1Å RMSD between models (backbone
only)
• Portions of the molecule with high motion have tell-tale
signals eg apo calmodulin
PHAR 201 Lecture 3 2012
27
BMRB - http://www.bmrb.wisc.edu/
PHAR 201 Lecture 3 2012
28
NMR Terms
• COSY/NOESY spectra: Allow the space interactions between atoms
to be measured and generate a 3D structure of the protein. (what we
have discussed)
• TROSY Transverse Relaxation Optimized Spectroscopy: Invented
about 1997. First described by Professor Kurt Wuthrich. Useful for
analyzing larger protein systems. TROSY is a method for getting
sharper peaks on large proteins. TROSY is best at higher fields. If the
aim is to study a large complex or a chemical shift perturbation when
a protein binds to a receptor using NMR, it’s better to use a 900 MHz
machine than a more standard lower-field machine
• solid state NMR: Requires wider-bore (63 or even 89 mm diameter)
magnets (than solution state NMR). The higher stored energy of these
wide bore magnets means that they are significantly more difficult to
build, and as a result high-field solid state NMR lags behind liquid
state in terms of available field strength.
• multidimensional (three- and four-dimensional) NMR: Introduced
about 12-15 years ago. This technology has the advantage of resolving
the severe overlap in 2D spectra.
PHAR 201 Lecture 3 2012
29
In both X-ray crystallography and
NMR there is the danger that the
final structure reflects the model it
was computed against
PHAR 201 Lecture 3 2012
30
Additional Validation Checks
• Stereochemical quality
–
–
–
–
Ramachandran plot outliers
Dihedrals, bond lengths and angles
Fold Deviation Score (FDS)
Validation Server
http://deposit.rcsb.org/validate/
PHAR 201 Lecture 3 2012
31
Use the PDB Geometry Data
PHAR 201 Lecture 3 2012
32
Electron Microscopy
1KVP STRUCTURAL ANALYSIS OF THE SPIROPLASMA VIRUS, SPV4, IMPLICATIONS FOR
EVOLUTIONARY VARIATION TO OBTAIN HOST DIVERSITY AMONG THE MICROVIRIDAE,
• Able to look at large molecular assemblies
• Resolution now 30A to below 4A
• Cryo-EM preserves aqueous environment (no
staining)
• Experimentally more tractable
• Can resolve images (direct measurement of
phases) or diffraction patterns
• Can provide a 3D volumetric reconstruction
• Suitable for the study of membrane proteins eg
bacteriorhodopsin (1990)
PHAR 201 Lecture 3 2012
33
1P85 Real space refined coordinates of the 50S subunit fitted into the low
resolution cryo-EM map of the EF-G.GTP state of E. coli 70S ribosome
• Single particle reconstruction – multiple
orientations of the same particle found in
the specimen (viruses, ribosome…)
• Electron tomography – 3D reconstruction of
a single particle (organelles, whole cells)
PHAR 201 Lecture 3 2012
34
Example EM Result
•
Example for a hybrid study that combines
elements of electron crystallography and helical
reconstruction with homology modeling and
molecular docking approaches in order to
elucidate the structure of an actin-fimbrin
crosslink (Volkmann et al., 2001b). Fimbrin is a
member of a large superfamily of actin-binding
proteins and is responsible for crosslinking of
actin filaments into ordered, tightly packed
networks such as actin bundles in microvilli or
stereocilia of the inner ear. The diffraction
patterns of ordered paracrystalline actin-fimbrin
arrays (background) were used to deduce the
spatial relationship between the actin filaments
(white surface representation) and the various
domains of the crosslinker (the two actinbinding domains of fimbrin are pink and blue,
the regulatory domain cyan). Combination of
this data with homology modeling and data
from docking the crystal structure of fimbrin’s
N-terminal actin-binding domain into helical
reconstructions (Hanein et al., 1998), allowed
us to build a complete atomic model of the
crosslinking molecule (foreground, color
scheme as in surface representation of the
array).
•
From Structural Bioinformatics 2005 p124
PHAR 201 Lecture 3 2012
35
Example EM Result
•
•
Example for a combination of high-resolution
structural information from X-ray crystallography
and medium-resolution information from electron
cryomicroscopy (here 2.1 nm). Actin and myosin
were docked into helical reconstructions of actin
decorated with smooth-muscle myosin (Volkmann et
al., 2000). Interaction of myosin with filamentous
actin has been implicated in a variety of biological
activities including muscle contraction, cytokinesis,
cell movement, membrane transport, and certain signal transduction pathways. Attempts to crystallize
actomyosin failed due to the tendency of actin to
polymerize. Docking was performed using a global
search with a density correlation measure (Volkmann
and Hanein, 1999). The estimated accuracy of the fit
is 0.22 nm in the myosin portion and 0.18 nm in the
actin portion. One actin molecule is shown on the left
as a molecular surface representation. The yellow
area denotes the largest hydrophobic patch on the
exposed surface of the filament, a region expected to
participate in actomyosin interactions. The fitted
atomic model of myosin is shown on the right. The
transparent envelope represents the density
corresponding to myosin in the 3D reconstruction.
The solution set concept (see text) was used to
evaluate the results and to assign probabilities for
residues to take part in the interaction. The tone of
red on the myosin model is proportional to this
statistically evaluated probability (the more red, the
higher the probability).
From Structural Bioinformatics 2005 p127
PHAR 201 Lecture 3 2012
36
Small-angle X-ray Scattering SAXS
http://en.wikipedia.org/wiki/Small-angle_X-ray_scattering
• Reveals shape and size of macromolecules
in the range 5-25nm
• Handles partially ordered systems
• No need for crystalline sample; larger
molecules than NMR, but at lower
resolution
• Leading to hybrid techniques
PHAR 201 Lecture 3 2012
37
Summary Regarding Data
Limitations
•
•
•
•
•
•
Pay attention to the method its pluses and minuses
Be aware of models
Be aware of the general limitations of each method
For NMR be aware of an ensemble of structures
Be aware of hybrid models
For all methods be aware of the parameters that govern the
accuracy
• You will need to know these limitations for just about any
bioinformatics study since it will be necessary to choose a
non-redundant set (NR) – we will visit Astral and Pisces
which are tools in defining an NR set
PHAR 201 Lecture 3 2012
38