Transcript Document

Overview of the Phase Problem
Protein
Crystal
Data
Phases
Structure
John Rose
ACA Summer School 2006
Reorganized by Andy Howard, Biology 555, Spring 2008
Remember
We can measure reflection intensities
We can calculate structure factors from the intensities
We can calculate the structure factors from atomic positions
We need phase information to generate the image
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 1 of 42
What is the Phase Problem?
X-ray Diffraction Experiment
All phase information is lost
x,y.z
[Real Space]
Fhkl
[Reciprocal Space]
In the X-ray diffraction experiment photons are reflected from the
crystal lattice (planes) in different directions giving rise to the
diffraction pattern.
Using a variety of detectors (film, image plates, CCD area
detectors) we can estimate intensities but we lose any
information about the relative phase for different
reflections.
Phases
• Let’s define a phase fj associated with a specific plane
[hkl] for an individual atom:
fj = 2p(hxj + kyj + lzj)
• Atom at xj=0.40, yj=0.05, zj=0.10 for plane [213]:
fj = 2p(2*0.40 + 1*0.05 + 3*0.10) = 2p(1.35)
• If we examine a 2-dimensional case like k=0, then
fj = 2p(hxj + lzj)
• Thus for [201] (a two-dimensional case):
fj = 2p(2*0.40 + 0*0.05 + 1*0.10) = 2p(0.90)
• Now, to understand what this means:
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 3 of 42
0
c
F
H
B
E
0.4, y, 0.1
G
A
D
D
E
A
F
4p
B
H
G
720°
C
C
a
360°
2p
I
201 planes
201 0°Phases
I
1080°
6p
fD = 2p[ 2•(0.40) + 1•(0.10)] = 2p(0.0)
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 4 of 42
In General for Any Atom (x, y, z)
a
dhkl 6π
dhkl 4π
Atom (j) at x,y,z
dhkl
2π
φ
0
Remember:
Plane hkl
We express any position in the cell as
(1) fractional coordinates: pxyz = xja+yjb+zjc
(2) the sum of integral multiples of the reciprocal axes
hkl = ha* + kb* + lc*
14 Feb 2008
Biology 555
Crystallographic Phasing I
c
p. 5 of 42
Diffraction vector for a Bragg spot
• We set up the diffraction vector hkl associated
with a specific diffraction direction hkl:
hkl = ha* + kb* + lc*
• The magnitude of this diffraction vector is the
reciprocal of our Bragg-law plane spacing dhkl:
|hkl| = 1/ dhkl
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 6 of 42
Phase angle for a spot
• The phase angle fj associated with our atom is 2p
times the projection of the displacement vector pj
onto hkl: fj = 2p hkl• pj
• But that displacement vector pj is related to the
real-space coordinates of the atom at position j:
pj = xja + yjb + zjc
where the fractional coordinates of our atom
within the unit cell are (xj, yj, zj)
• Thus fj = 2p (ha* + kb* + lc*) • (xja + yjb + zjc)
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 7 of 42
Real-space and reciprocal space
• But these real-space and reciprocal-space
unit cell vectors (a,b,c) and (a*,b*,c*) are
duals of one another; that is, they obey:
a•a* = 1, a•b* = 0, a•c* =0
b•a* = 0, b•b* = 1, b•c* =0
c•a* = 0, c•b* = 0, c•c* = 1
• … even when the unit cell isn’t all full of
90-degree angles!
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 8 of 42
Matrix formulation of this duality
• If we construct the 3x3 reciprocal-space
unit cell matrix A = (a* b* c*)
• And the 3x3 real-space unit cell matrix
R = (a b c)
for a specific position of the sample, then
• A and R obey the simple relationship
A = R-1, i.e. AR = I
• Where I is a 3x3 identity matrix
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 9 of 42
How to use this in getting phases
• fj = 2p (ha* + kb* + lc*) • (xja + yjb + zjc)
• But using those dual relationships,
e.g. a*•a = 1, b*•c = 0, we get
fj = 2p (hxj + kyj + lzj)
• Note that this is true even if our unit cell
angles aren’t 90º!
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 10 of 42
Why Do We Need the Phase?
Fourier transform
Inverse Fourier transform
Structure Factor
Electron Density
• In order to reconstruct the molecular image
(electron density) from its diffraction pattern both
the intensity and phase, which can assume any
value from 0 to 2p, of each of the thousands of
measured reflections must be known.
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 11 of 42
Importance of Phases
Hauptman amplitudes
with Hauptman phases
Karle amplitudes
with Karle phases
Hauptman amplitudes
with Karle phases
Karle amplitudes
with Hauptman phases
14 Feb 2008
Phases dominate the image!
Phase estimatesBiology
need555to be accurate
Crystallographic Phasing I
p. 12 of 42
Understanding the Phase Problem
• The phase problem can be best understood from a simple
mathematical construct.
• The structure factors (Fhkl) are treated in diffraction theory
as complex quantities, i.e., they consist of a real part
(Ahkl) and an imaginary part (Bhkl).
• If the phases, hkl, were available, the values of Ahkl and
Bhkl could be calculated from very simple trigonometry:
• Ahkl = |Fhkl| cos (hkl)
• Bhkl = |Fhkl| sin (hkl)
• This leads to the relationship: (Ahkl)2 + (Bhkl)2 = |Fhkl|2 = Ihkl
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 13 of 42
Argand Diagram
(Ahkl)2 + (Bhkl)2 = |Fhkl|2 = Ihkl
The above relationships are often
illustrated using an
Argand
diagram (right).
From the Argand diagram, it is
obvious that Ahkl and Bhkl may
be either positive or negative,
depending on the value of the
phase angle, hkl.
imaginary
Fhkl
Bhkl
hkl
real
Ahkl
Figure 3. An Argand diagram of
structure factor Fhkl with phase
 real ( A hk) and imaginary
hk
hkl. hk
The
hkl
Note: the units of Ahkl, Bhkl and Fhkl (B ) components are also shown.
hkl
1
are in electrons.
hk
hk
Biology 555
14 Feb 2008
p. 14hkof 42
Crystallographic Phasing I
F A

 iB
B
 t an
A
N
The Structure Factor
Atomic scattering factors
Fhk   f j e
2 pi(hx j ky j  z j )
j1
Here fj is the atomic scattering factor

f0
14 Feb 2008
sinq/l
• The scattering factor for each
atom type in the structure is
evaluated at the correct sinq/l.
That value is the scattering
ability for that atom.
• Remember sinq/l = 1/(2dhkl)
• We now have an atomic
scattering factor with
magnitude f0 and direction fj
Biology 555
Crystallographic Phasing I
p. 15 of 42
The Structure Factor
Sum of all individual atom contributions
imaginary
Resultant
Fhkl
Individual
atom fjs
Bhkl
real
Ahkl
f j  2p (hxj  ky j  z j )
N
Fhk   f j e
14 Feb 2008
j 1
2pi ( hx j  ky j  z j )
Biology 555
Crystallographic Phasing I
N
  f je
j 1
if j
p. 16 of 42
Electron Density
• Remember the electron density (image of the molecule) is
the Fourier transform of the structure factor Fhkl. Thus
x,y,z
 1

1
2 pi[hx kylz]
i
  Fhkl e
   Fhkl e 
V  hkl
 V  hkl

ei  cos  isin
Here V is the volume of the unit cell
Fhkl  Ahkl  iBhkl

1
x,y,z   Ahkl cos   Bhkl sin
V  hkl

hkl

1
x,y,z   Ahkl cos[2p (hx  ky  lz)]  Bhkl sin[2p (hx  ky  lz)]
V  hkl

hkl
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 17 of 42
How to calculate (x,y,z)
• In practice, the electron density for one
three-dimensional unit cell is calculated
by starting at x, y, z = (0, 0, 0) and
stepping incrementally along each axis,
summing the terms as shown in the
equation above for all hkl (as limited by
the resolution of the data) at each point in
space.
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 18 of 42
Solving the Phase Problem
• Small molecules
• Direct Methods
• Patterson Methods
• Molecular Replacement
• Macromolecules
•
•
•
•
•
•
Multiple Isomorphous Replacement (MIR)
Multi Wavelength Anomalous Dispersion (MAD)
Single Isomorphous Replacement (SIR)
Single Wavelength Anomalous Scattering (SAS)
Molecular Replacement
Direct Methods (special cases)
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 19 of 42
Solving the Phase Problem
SMALL MOLECULES:
• The use of Direct Methods has essentially solved the
phase problem for well diffracting small molecule
crystals.
MACROMOLECULES:
• Today, anomalous scattering techniques such as MAD
or SAS are the most common techniques used for de
novo structure determination of macromolecules. Both
techniques require the presence of one or more
anomalous scatterers in the crystal.
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 20 of 42
Direct methods
• Karle, Hauptman, David Sayre, and
others determined algebraic
relationships among phase angles of
groups of reflections.
• The simplest are triplet relationships:
For three reflections
h1=(h1,k1,l1), h2=(h2,k2,l2), h3=(h3,k3,l3),
they showed that if h3= -h1- h2, then
• 1 + 2 + 3 ≈ 0
• Thus if 1 and 2 are known then we
can estimate that 3 ≈ -1 - 2
14 Feb 2008
Biology 555
Crystallographic Phasing I
David Sayre
p. 21 of 42
When do triplet relations hold?
• Note the approximately zero value in that
relationship 1 + 2 + 3 ≈ 0.
• The stronger the Bragg reflections are, the
closer this condition is to being exact.
• For very strong Bragg reflections that sum
will be very close to zero
• For weaker ones it may differ significantly
from zero
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 22 of 42
Phase probabilities
• This notion of relationships among phases
obliges us to think of phases probabilistically
rather than deterministically. This is a key to
the direct-methods approach and has a huge
influence on how we think about phase
determination.
• I’m introducing all of this mostly to get you
accustomed to the notion of phase
probability distributions!
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 23 of 42
Phase probabilities
• Any phase has a value between 0 and 2p
(or 0 and 360, if we’re using degrees)
• If we know it’s close to 2p*0.42, then:
• If it’s 2p*(0.42 0.01), it’s a sharp phase
probability distribution
• If it’s 2p*(0.42 0.32), it’s a much broader
phase probability distribution
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 24 of 42
Plots of phase probability
P(f)
• Integral of probability must
be 1, since every phase has
to have some value.
Sharp distribution
Broad distribution
0
14 Feb 2008
Biology 555
Crystallographic Phasing I
f
2π
p. 25 of 42
How can we use this?
• Obviously if we don’t know f1+f2, we can’t use
this to calculate f3, even if the intensities of all
three are large.
• But we could guess what f1 and f2 are and use this
to compute f3.
• Then we guess f4 and use the triplet relationship to
compute f5 and f6,
where h5 = -h1 - h4 and h6 = -h1 - h4 …
assuming that reflections 5 and 6 are strong, too!
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 26 of 42
Can we make this work?
• We start with guessed phases for a 10-100 strong
reflections and use the triplet relationships to
determine the phases for another 1000 reflections
• Any particular calculated phase can be determined
by several different triplet relationships, so if
they’re self-consistent, the initial guessed 10-100
are correct; if they aren’t self-consistent, the guess
was wrong!
• In the latter case, we try a different set of guesses
for our 10-100 starting phases and keep going
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 27 of 42
This actually works, provided:
• The data are correctly measured
• The data are strong enough that we can pick 1000
strong reflections to use in this process
• The data extend to high enough resolution that
atomicity (separable atoms) is really found
• There are ways to do direct methods without
assuming atomicity, but they’re more complicated
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 28 of 42
Is this relevant to
macromolecules?
• Not directly:
– Atomicity rarely present
– Systematic errors in data
• Indirectly yes, because it can be used
in conjunction with other methods
for locating heavy atoms in the SIR,
MIR, and SAS methods
• It also helps introduce the notion of
phase probability distributions
(sneaky!)
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 29 of 42
SIR and SAS Methods
1.
2.
3.
4.
Need a heavy atom (lots of electrons) or a anomalous
scatterer (large anomalous scattering signal) in the
crystal.
• SIR - heavy atoms usually soaked in.
• SAS - anomalous scatterers usually engineered in
as selenomethional labels. Can also be soaked.
SIR collect a native and a derivative data set (2 sets
total). SAS collect one highly redundant data set and
keep anomalous pairs separate during processing.
• SAS - may want to choose a scatterer or
wavelength that enhances the anomalous signal.
Must find the heavy atoms or anomalous scatterers
• can use Patterson analysis or direct methods.
Must resolve the bimodal ambiguity.
• use solvent flattening or similar technique
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 30 of 42
What’s the bimodal ambiguity?
• As we’ll show next time, a single
isomorphous derivative or anomalous
scatterer enables us to measure each phase
apart from an ambiguity
• That is, for each phase we get two answers
(e.g. 2π*0.12 and 2π*0.55), and we can’t
pick one out
• A second scatterer will resolve that
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 31 of 42
Phase probabilities with no error
P(f)
• A single derivative with no
error gives a phase
probability like this:
0
14 Feb 2008
Biology 555
Crystallographic Phasing I
2π
f
p. 32 of 42
2 derivatives, no error
P(f)
Wrong
Wrong
estimate
estimate
derived from
derived from derivative 2
derivative 1
• The two distributions
overlap at the correct
answer, not at the
wrong answer
Correct phase
0
14 Feb 2008
Biology 555
Crystallographic Phasing I
2π
f
p. 33 of 42
Errors spread this out
• Each phase estimate is not really that sharp
• Lack of isomorphism (see below) makes
each distribution spread out
• Joint probability distribution from 2 or more
experiments is the product of the probability
distributions of the individual experiments
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 34 of 42
Realistic probability distributions
P(f)
• Joint probability
distribution = product
of individual ones
0
14 Feb 2008
Biology 555
Crystallographic Phasing I
2π
f
p. 35 of 42
Joint probability distribution
0.35
Phase probability
0.3
Joint
probability
distribution
=
P1(f) *
P(phase)
0.25
P1(f) for first
derivative
with peaks at
0.32 and
0.558
0.2
P2(f) for 2nd
derivative
with peaks at
0.315 and 0.815
0.15
0.1
no rm (P 1)
no rm (P 2)
no rm (P 1*P 2)
0.05
0
0
0.1
14 Feb 2008
0.2
0.3
0.4
0.5
0.6
Phase/2p
Biology
0.7
0.8
555
Crystallographic Phasing I
0.9
1
p. 36 of 42
Heavy Atom Derivatives
Heavy atom derivatives MUST be
isomorphous
• Heavy atom derivatives are generally prepared by soaking
crystals in dilute (2 - 20 mM) solutions of heavy atom salts
(see Table II below for some examples).
• Crystal cracking is generally a good indication that that
heavy atom is interacting with the crystal lattice, and
suggests that a good derivative can be obtained by soaking
the crystal in a more dilute solution.
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 37 of 42
Is the derivative worth using?
• Once derivative data has been collected, the
merging R factor (Rmerge) between the native and
derivative data sets can be used to check for heavy
atom incorporation and isomorphism. Rmerge
values for isomorphous derivatives range from
0.05 to 0.15. Values below 0.05 indicate that there
is little heavy atom incorporation. Values above
0.15 indicate a lack of isomorphism between the
two crystals.
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 38 of 42
What is isomorphism?
• Isomorphism for derivatives means that the
structure of the derivatized macromolecule
is identical to the structure of the
underivatized molecule except at the site
where the derivative compound has been
introduced.
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 39 of 42
What is lack of isomorphism?
• A derivative may be nonisomorphous if:
– It alters the unit cell lengths or angles
significantly (>0.2%?)
– It rotates or translates the entire macromolecule
within the unit cell
– It alters significantly the conformation of a
large segment (> 8 amino acids or 4
nucleotides?) of the mcromolecule
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 40 of 42
Derivative compounds
Table II. P rotein Residues and Their Affinities for Heavy Metals
Residue:
Affinit yfor:
Condit ions:
Histidine
K2P tCl4, NaAuCl4, EtHgPO4H2
pH>6
T ryptophan
Hg(OAc)2, EtHgPO4H2
Glutamic, Aspartic Acids
UO2(NO3)2, rare earth cations
pH>5
Cysteine
Hg,Ir,P t,P d,Au cations
ph>7
Methionine
P tCl42- anion
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 41 of 42
Finding the Heavy Atoms
or Anomalous Scatterers
The Patterson function
- a F2 Fourier transform with f = 0
- vector map (u,v,w instead of
x,y,z)
- maps all inter-atomic vectors
- get N2 vectors!!
(where N= number of atoms)
1
Puvw   | Fhkl |2 cos2p (hu kv  lv)
V hkl
From Glusker, Lewis and Rossi
14 Feb 2008
Biology 555
Crystallographic Phasing I
p. 42 of 42