Bioinformatics 2 -- lecture 2 Where do protein structures come from? NMR
Download
Report
Transcript Bioinformatics 2 -- lecture 2 Where do protein structures come from? NMR
Bioinformatics 2 -- lecture 2
Where do protein structures come from?
X-ray crystallography
What does "resolution" mean?
What is a "temperature factor"?
NMR
What is an "ensemble"?
Why so many isotopes?
The Protein Data Bank
MOE
X-ray
X-ray Crystallography
E
+
Diffraction pattern
Electron density
map
The wavelength of X-rays is about
the size of an atom.
Wavelength of X-rays used in crystallography: l=1.54Å
Frequency = c/l ≈ 2x1018 s-1
e- oscillates in an electric field...
•e- oscillation is the same frequency as the X-rays
•Atomic nuclei don’t oscillate much.
e-
E
e-
ee-
e-
e-
e-
et
e-
Electron motion is slow compared to Xray oscillation
1Å atom (hydrogen)
Amount an electron
moves in one xray cycle
In other word, X-rays see electrons as if they were standing still.
Diffractometer setup
X-ray source x-ray beam
crystal
beamstop
goniostat
A protein crystal (usually frozen) is immersed in a powerful
beam of monochromatic X-rays.
An X-ray diffractometer
All points in a reflection plane scatter in phase
Angle from beam to detector is 2q
q
q
2q
Scattering from any of these points has the same
pathlength, therefore, all e- in the reflection plane
scatter in phase.
Note: we are now inside the crystal
Parallel reflection planes separated by d
scatter in phase
Bragg’s Law:
nl=2d sinq
d
q
The two red lines (photon paths) differ in length
by a whole number of waves.
Therefore, one spot on the detector
represents a set of Bragg planes.
d
The image of the molecule is reconstructed
from the Bragg planes.
Can you see the image of Bragg?
Fourier Transformation: This is the principle behind JPEG compression.
Fitting the model to the density
3D electron density map = electron
density at every point in space.
Visualized by drawing 3D contours.
Since we know the amino acid
sequence and we know what the
amino acids should look like, we
can "fit" a model to the density.
Coordinate refinement
Each atom is moved in X,Y and Z until:
(1) good stereochemistry is achieved,
the R-factor
(2) there is a good match between the atoms and the density.
Each atom is assigned a B-factor or "temperature-factor", to
better fit the density.
high B density profile
y
z
4 parameters are refined
for each atom
x
+
B
+
low B density profile
Refined coordinates are deposited in the Protein Data Bank: www.rcsb.org
How to view a PDB file
In Netscape, go to www.rcsb.org (make a bookmark!)
Search for the protein with PDB code "1CA2"
Download it.
Use the "jot" editor (or "vi" if you prefer) to view the file.
("jot" and "vi" are Unix commands)
Find the lines on the following slide as they are discussed.
What's in a PDB file
•HEADER, CMPND, REMARK lines contain reference
information, including quality measures.
•HET, FORMUL Any attached ligands are defined here.
•HELIX, SHEET, TURN The locations of secondary
structures in the chain are defined here.
•ATOM lines contain information about the atoms (more later)
•HETATM “hetero” atoms. Lines are like ATOM lines, but the atom
names and group names are defined in the HETNAM lines.
•There is no direct information about what atoms are bonded to
what. (This is determined by distances or atom names.)
•No direct information about the formal or partial charges on
atoms. (Partial charges may be calculated using quantum mechanics.)
Anatomy of a PDB file: the ATOM line
ATOM
1
N
VAL
A 101B
0.616
-1.613
20.826
1.00 68.81
1-6 keyword ATOM
8DFR 152
67-80 footnotes and labels
7-11 atom number
61-66 Temperature factor
13-16 atom name
55-60 Occupancy factor
17 altloc indicator
18-20 residue name
21 not used
22 chain identifier (optional)
23-26 residue number*
27 insertion code (optional)
1
47-54 Z-coordinate
39-46 Y-coordinate
31-38 X-coordinate**
28-30 not used
* Usually, but not always, residues are numbered sequentially 1,2,3 etc. Often the numbering starts from a number other than 1.
** Coordinates are in orthogonal angstroms by convention. May be converted to crystallographic coordinates using CRYST lines.
Exploring carbonic anhydrase, a
crystal structure
Try these Rasmol commands :
cartoons
color structure
restrict sheet
restrict helix
cartoons
restrict not helix and not sheet
cartoons
How much of each secondary structure
is there?
shift +
rotate
translate
scale
Exploring carbonic anhydrase, a
crystal structure
More Rasmol commands :
select all
cartoons off
select protein
wireframe
spacefill 80
set picking distance
now pick atoms (left mouse button)
What is the average distance between bonded atoms?
atoms separated by two bonds? three bonds?
What is the distance between neighboring alpha-carbons?
Ca
Ca
Rasmol: viewing temperature factors
Temperature factors (also called “B-factors”) are the results
from crystallography that measure the disorder of each atom.
Find the B-factors in the file 1ca2.pdb. What is the
minimum and maximum B?
View 1ca2 in Rasmol. Color the atoms by temperature either
by selecting the menu item (colours->Temperature), or by
typing at the prompt:
color temperature
You can select atoms based on their B-factors using (for example)
select temperature > 2000
(Note: you must use 100xB to select)
Mean square displacement <u2> is proportional to B: <u2> = B/(82)
NMR
Nuclear Magnetic Resonance
Isotopes that have nuclear spin = 1/2
1H, 13C, 15N
and 31P
..can adopt two orientations in a magnetic field (H).
At equilibrium slightly more spins are aligned with the
field than against it.
Flipping from up to down
absorbs radio waves of a
down
frequency that depends
on its precession rate,
which depends on its
up
environment.
H
Steps in Protein NMR
Overview…
1. Grow protein in 13C and/or 15N enriched media.
2. Purify and concentrate protein.
3. Collect NMR spectra (2,3 or 4-dimensions).
4. Assign the peaks (TOCSY/COSY).
5. Assign distance constraints (NOESY)
6. Solve the distance geometry problem.
Collecting NMR data
Pulse sequence defines which atoms are broadcasting
Absorption spectrum indicates which atoms are receiving.
Short pulses "ring" through a spin system --> TOCSY
(used to assign peaks to atoms)
Long pulses “resonate” through space --> NOESY
(used to assign distances between atoms)
Quic kTime™ and a
TIFF (Unc ompres sed) dec ompres sor
are needed to see this pic ture.
Assigning resonances
A TOCSY experiment finds cross-talk between 1H in a
"spin system." Characteristic sets of resonances allow the
easy identification of amino acids.
A COSY experiment finds cross-peaks between 1H that
are separated by 2 or 3 bonds.
H
H
H
H
H
H
H
H
H
H
This 1H is not
part of the spin
system
H
Chemical shifts for ILE:
NH
8.19
aH
4.23
bH
1.90
gCH3
0.97,0.94
gCH2
1.48,1.19
dCH3
0.89
H
Isoleucine spin system
TOCSY/COSY for Ile
TOCSY peaks: red diamond
COSY peaks: blue circle
NOESY: nonlocal neighbor
1H
NOESY spectra tell us which 1H are physically close in space,
causing the Nuclear Oberhauser Effect (NOE). 1H with NOEs
show absorbsion at one H's frequency when pulsed with radio
waves at the frquency of the other. NOE pairs are constrained
to a defined distance.
H
H
H
H
H
H
H
Distance geometry is the
problem of solving for the
atomic positions that satisfy
the constraint distances and
the stereochemistry,
simultaneously.
Molecular dynamics is used to
refine the solution(s).
NMR result: an ensemble of
structures
Some regions have fewer constraints than others,
due to fewer observed NOEs
Other types of experiments
Additional information about the conformation may be gained
by
•
H/D-exchange
Deuterium (2H) is invisible to NMR. Disappearing 1H's tell us which
ones are exposed to solvent. Especially amide NH's.
•
Temperature sensitivity of resonances.
Chemical shift oh 1H changes with T less if H-bonded.
•
NSQSY
Direct coupling of 15N to 1H through a single bond.
Xray versus NMR
% of structures in the
PDB
Coordinates
Requires
MW l imits
Advantages
Disadvanta ges
>80%
16%
Cartesian. One set per
molecule.
pure protein, large perfect
crystals
>500kD sometimes possible
more precise. Te mperature
factors.
ambiguous sidechains,
difficult and time
consuming, limit ed to wellordered molecules.
internal, converted to
ensemble of Cartesian
pure protein, high conc., no
aggregation
<30kD
fast, does not require
crystals. H/D-Exchange
imprecise, limit ed to
smaller macromolecules
Download an NMR structure
From the PDB, find and download the structure with PDB code
"1I2V" (defensin). Name it " 1I2V.pdb"
Open Rasmol with this structure: "rasmol 1I2V.pdb"
From the menu: Display-->backbone
Identify the disordered regions:
set picking ident
Click to get the sequence numbers.
Frequently asked questions with PDB files
What do I do about....
Missing sidechain atoms: If needed, add the atoms and energy
minimize them. Use model building tools. Name the atoms
correctly.
Missing backbone atoms: If small, model using a loop search
and energy minimization. Large missing sections cannot be
cured.
Multiple occupancy (occ < 1.00): Remove all but one of the
multiple copies, using a text edtor.
Solvent atoms: (HOH, usually) In general, don’t display them.
Useful only when doing all-atom molecular dynamics.
Frequently asked questions with PDB files
What do I do about....
Ligands: Useful for docking studies. Should be unselected
before drawing surfaces. May be saved as a separate model.
Multiple chains: Unless the molecule is a multimer, use only
one chain. Use a text editor to remove all but one. Avoid this
problem by downloading the PDB “Biological Unit” file.
Multiple (NMR) models: Display all models. Find the
disodered parts. Then edit the file keeping only one model and
removing the disordered parts.
Introduction to the Industrial Strength
molecular modeling program: MOE
Start moe (on SGI use ‘moe -gfxvisual 0x31’)
Familiarize yourself with the mouse: Help->Mouse...
Familiarize yourself with the graphical user interface (GUI):
Help-->Tutorials-->Getting Started..., click on GUI
Run first part of MoeTour (Help-->Tutorials-->Getting Started)