Macromolecular Crystallography and Structural Genomics – Recent Trends

Download Report

Transcript Macromolecular Crystallography and Structural Genomics – Recent Trends

Macromolecular Crystallography and Structural Genomics – Recent Trends

Prof. D. Velmurugan

Department of Crystallography and Biophysics University of Madras Guindy Campus, Chennai – 25.

Structural Genomics aims in identifying as many new folds as possible.

This eventually requires faster ways of determining the three dimensional structures as there are many sequences before us for which structural information is not yet available.

Although Molecular Replacement technique is still used in Crystallography for solving homologous structures, this method fails if there is not sufficient percentage of homology.

The (MAD) Multiwavelength techniques Anomalous have taken Diffraction over the conventional Multiple Isomorphous Replacement (MIR) technique.

With the advent of high energy synchrotron sources powerful detectors for the diffracted intensities, and developments in methodologies of macromolecular structure determination, there is a steep increase in the number of macromolecular structures determined and on an average eight new structures are deposited in the PDB every day and the total entries in the PDB is now around 29,000.

Instead of using the three wavelength strategies in MAD experiments, the use of single wavelength anomalous diffraction using Sulphur anomalous scattering is recently proposed.

time to 1/3 rd .

This will reduce the data collection

Also, the judicious use of the radiation damage during redundant data measurements in second generation synchrotron source and also during regular data collection in the third generation synchrotron source has been pointed out recently (RIP & RIPAS).

Protein Structure Determination

• X-ray crystallography • NMR spectroscopy • Neutron diffraction • Electron microscopy • Atomic force microscopy

As the number of available amino acid sequences exceeds far in number than the number of available three-dimensional structures, high-throughput is essential in every aspect of X-ray crystallography.

Procedure

Protein Crystal

1: Triclinic The 14 Bravais lattices 2: Monoclinic

(Blue numbers correspond o the crystal system)

The 14 Bravais lattices 3: Orthorhombic

(Blue numbers correspond to the crystal system)

The 14 Bravais lattices 4: Rhombohedral 5: Tetragonal 6: Hexagonal

(Blue numbers correspond to the crystal system)

The 14 Bravais lattices 7: Cubic

(Blue numbers correspond to the crystal system)

Synchrotron radiation More intense X-rays at shorter wavelengths mean higher resolution & much quicker data collection

Diffraction Apparatus

Diffraction Principles

n

l

= 2dsin

q

The diffraction experiment

The amplitudes of the waves scattered by an atom to that of an single electron – atomic scattering factor The amplitude of the waves scattered by all the atoms in a unit cell to that of a single electron ( The vector (amplitude and phase) representing the overall scattering from a particular set of Bragg planes) | F hkl | – structure factor The structure factor magnitude F(hk/) is represented by the length of a vector in the complex plane.

The phase angle a(hk/) is given by the angle. measured counterclockwise, between the positive real axis and the vector F.

unit cell

F (h,k,l) = V

x=0

y=0

z=0

(x,y,z).exp[2

I(hx + ky + lz)].dxdydz

A reflection electron density

V = the volume of the unit cell |F

hkl

| = the structure-factor amplitude

(proportional to the square-root of reflection intensities)

a

hkl

= the phase associated with the structure-factor amplitude |F

hkl

|We can measure the amplitudes, but the phases are lost in the experiment.

This is the phase problem.

Fourier Transform requires both structure factors and phases

Electron density calculation

ρ

Σ Σ Σ

π α

Unknown

Patterson function

• Patterson space has the same dimension as the real-space unit cell • The peaks in the Patterson map are expressed in fraction coordinates • To avoid confusion, the x, z and z dimensions of Patterson vector-space are called (u, v, w).

What does Patterson function represent?

• It represents a density map of the

vectors

between scattering atoms in the cell • Patterson density is proportional to the squared term of scattering atoms, therefore, the electron rich, i.e., heavy atoms, contribute more to the patterson map than the light atoms.

Patterson function – no phase info required

Consider phaseless term (h, k, l, F 2 ) Σ Σ Σ P No phase term

Density and position

Patterson map

 

r

 =  h k l

F

(

S

) exp (-2 

i

{

r.S

}) Direct space P 

u

 =  h k l I (

S

) exp (-2 

i

{

u.S

}) Patterson map P (

u

) =  cell  

r



r

+

u

 d 3

r

Amplitudes and phases I (

S

)=

F

*(

S

).

F

(

S

)=|

F

(

S

)| 2

F

(

S

) =  cell  

r

 exp (2 

i

{

r.S

}) d 3

r

Intensities Reciprocal space

 

r

 =  h k l

F

(

S

) exp (-2 

i

{

r.S

}) Patterson map with symmetry

Patterson map symmetry

Harker vectors u, v, w 2x, 1/2, 2z P (

u

) =  cell  

r



r

+

u

 d 3

r

P2 1 x, y, z -x, y+1/2, -z P 

u

 =  h k l I (

S

) exp (-2 

i

{

u.S

})

Diffracting a Cat

Diffraction data with phase information Real Diffraction Data

Reconstructing a Cat

FT Easy FT Hard

The importance of phases

Phasing Methods

all assume some prior knowledge of the electron density or structure

The Phase Problem

• •

Diffraction data only records intensity, not phase information (half the information is missing) To reconstruct the image properly you need to have the phases (even approx.)

– Guess the phases (molecular replacement) – Search phase space (direct methods) – Bootstrap phases (isomorphous replacement) – Uses differing wavelengths (anomolous disp.)

Acronyms for phasing techniques

• MR • SIR • MIR • SIRAS • MIRAS • MAD • SAD

Direct methods

• Based on the positivity and atomicity of electron density that leads to phase relationships between the (normalized) structure factors (E).

• Used to solve small molecules structures • Proteins upto ~1000 atoms, resolution better than 1.2 Å • Used in computer programs (SnB, SHELXD SHARP) to find heavy-atom substructure. Jerome Karle and Herbert A. Hauptman Nobel prize 1985 (chemistry)

Density modification procedures (e.g. solvent flattening and averaging) can be carried out as part of a cyclic process

Dm cycle

P s a a p u , P  P Phase combination Fourier transformation e p s a p u s F,  c c c c Inverse Fourier transformation  E n d s y Map modification  m

r

d e e r n s y

Molecular Replacement (MR)

Used when there is a homology model available (sequence identity > 25%).

1. Orientation of the model in the new unit cell (rotation function) 2. Translation

Molecular Replacement (MR)

• MR works because the Fourier transform works in both directions. –

Reflections model (density)

New Protein Coordinates in PDB •

Have to be careful of model bias

MR solution

Isomorphous replacement

• Why isomorphous replacement, making heavy atom derivatives?

– Phase determination • Calculating

F

H

F H = F PH -F P If HA position is known, FH can be calculated from ρ(x H , y H , z H ) by inverse FT

• HA position determination – Patterson function

HA shifts F

P

by F

H

Isomorphous Replacement (SIR, MIR)

– – – – – – –

Collect data on native crystals (no metals) Soak in heavy metal compounds into crystals, go to specific sites in the unit cell.

e.g. Hg, Pt, Au compounds The unit cell must remain isomorphous Collect data on the derivatives As a result, only the intensity of the reflections changes but not the indices Measure the reflection intensity differences between native and derivative data sets.

Find the position of the heavy atoms in the unit cell from the intensity differences.

generate vector maps (Patterson maps)

|F P + HA | – |F P | = |F HA |

• Must have at least two heavy atom derivatives • The main limitations in obtaining accurate phasing from MIR is non isomorphism and incomplete incorporation (low occupancy) of the heavy atom compound.

Native and heavy-atom derivative diffraction patterns superimposed and shifted vertically.

Note: intensity differences for certain reflections.

Note: the identical unit cell (reflection positions). This suggests isomorphism

.

Isomorphic HA derivatives only changes the intensity of the diffraction but not the

Native crystal

indices of the reflections

HA derivative crystal

Once we have an heavy atom structure  H (

r

)

,

we can use this to calculate F H (

S

). In turn, this allows us to calculate phases for F P and F PH for each reflection.

Harker diagram

Harker construction

for

SIR

F P - F H F PH F P  P F PH The phase probability distribution shows that SIR results in a phase ambiguity

We can use a second derivative to resolve the phase ambiguity

MIR

Harker construction for

multiple isomorphous replacement (MIR)

P H ( ) F PH2 F P -F H2  P F PH P H2 ( ) - F H  P P H ( ).

P H2  P

AS Anomalous scattering leads to a breakdown of Friedel‘s law Anomalous derivative

Anomalous scattering data can also be used to solve the phase ambiguity P + ( ) - F + H ' F + P F + PH F  * PH -F + H'' -F  * H'' Note that the anomalous differences are very small; thus very accurate data are necessary  P P  ( )  P P + P P  P  P  P

Of course, there are errors in the data, determination of heavy atom positions etc.

Blow and Crick developed a model in which all errors are associated with | F PH | obs  F H The triangle formed by F P , F PH and fails to close F H F PH  P  PH F P The 'lack of closure error' is a function of the calculated phase angle  P = ||   2  P 2E 2

most probable phase The resulting phases have a

minimum error

when the

best phase

 best , i.e. the centroid of the phase distribution  best  0 2 P P P is used instead of the

most probable phase

.

The quality of the phases is indicated by the

figure of merit m

m =  0 2 P( )exp(i )d  0 2 P( )d P P P P m o =1: 0 phase error m o =0.5: ~60 phase error m =0: all phases equally probable

Steps in MAD

Introduce anomalous scatterer – Incorporate SeMet in replace of Met – Incorporate HA eg Hg, Pt, etc… Atomic scattering factor: 3 terms • Take your crystals to a synchrotron beam-line (tunable wavelength).

• Collect data sets at 3 separate wavelengths: the Se (or other HA) absorption peak, edge and distant to the peak.

• Measure the differences in Friedel mates to get an estimate of the phases for the Se atoms.

– These differences are quite small so one need to collect a lot of data (completeness, redundancy) to get a good estimate of the error associated with each measurement.

• Use the Se positions to obtain phase estimates for the protein atoms.

Advantages of MAD

• All data is collected from one crystal – Perfect isomorphism • • Fast

Easily interpretable electron density maps obtained right away.

SAD

Single-wavelength anomalous diffraction (SAD) phasing has become increasingly popular in protein crystallography.

Two main steps – 1) obtaining the initial phases 2) improving the electron density map calculated with initial phases.

• The essential point is to break the intrinsic phase ambiguity.

• Two kinds of phase information enables the discrimination of phase doublets from SAD data prior to density modification.  From heavy atoms (expressed by Sim distribution)  From direct methods phase relationships (expressed by Cochran distribution)

Breaking the OAS phase ambiguity

 Dj

P

 j  The phase of the reflection

h

( ( j ) ) j The phase of

F”

    j ” + Dj

F

" | sin  D

h

h

 2 '

h

 ( ( j

P Cochran

( )

F

) j

h

 + "

h

) ,

A

N

 

F N

F

 1 '

h

, 

A

  "exp cos

A N A A

h

' 1 j  ( 

” f

A i

h

j  j cos 

h

” ’ f

 j

+

~  '

h h

 j

A

)

 /2  j j j 

” ’

j

+

~  Dj 2 '

A

/2   )   2 D

F F

"   2 

Mlphare

solving an unknown

Sim distribution

method phasing of

Oasis

Sim distribution Cochran distribution Solvent flattening Rusticyanin, MW: 16.8 kDa; SG: P2 1 ; a=32.43, b=60.68, c=38.01Å ; b =107.82

o ; Anomalous scatterer: Cu

Radiation damage Induced Phasing

• • •

(RIP)

Radiation damage has been a curse of macromolecular crystallography from its early days.

The X-ray radiation damage of cystals can be caused by he breakage of covalent bonds as an immediate consequence of the absorption of an X-ray quantum (a primary effect) of by the destructive effect of the propogation of radicals throughout the crystal (a secondary effect).

Total dose and dose rate play a role in the amount of radiation damage inflicted on a protein crystal.

• •

The most pronounced structural changes observed were disulphide-bond breakage and associated main-chain and side-chain movements as well as decarboxylation of aspartate and glutamate residues.

The structural changes induced on the sulphur atoms were successfully used to obtain high-quality phase estimates through an RIP (Radiation damage Induced Phasing) procedure.

Radiation damage Induced Phasing with Anomalous Scattering (RIPAS)

Substructure procedure anomalous solution using a scattering and phasing combination and of radiation damage differences.

induced isomorphous

RIPAS strategy is beneficial for both locating the substructure and subsequent phasing.

Experimental electron density before solvent flattering with SAD (left), RIP (middle) and RIPAS (right) phases for the (a) CS (thaumatin crystal soaked in a diluted N-iodisuccinamide solution) thaumatin data (b) IC thaumatin (iodinated crystallized thaumatin)

Methods of phase improvement

It is not always (!) possible to recognise features in a first electron density map. There are however ways of improving the map (phases): •

Solvent Flattening

Histogram matching

Non-crystallographic symmetry (NCS) Averaging

these methods can result in dramatic improvements in the clarity of the electron density map.

1. Solvent flattening.

Protein crystals contain large amounts of solvent; this will in general be disordered, and so will not contribute to the crystal diffraction. By knowing the protein content of the crystal, it is therefore possible to determine the threshold density below which is noise; points with density below the threshold are set to a suitable average value. This is particularly useful for locating molecular boundaries.

2. Averaging.

If the asymmetric unit possesses more than one molecule, the equivalencing of the various copies can lead to dramatic improvement in the map and the phases.

Improvement in electron density after solvent flattening and histogram matching

Before Green = solvent envelope After

Interpretation of the Electron Density (Building the Model)

• Lots of fun!

• Trace the main-chain • Try to recognize the amino acid sequence in the density.

• Programs:- Xtal view, O

The effect of resolution of the quality of the electron density map

2.0 Å 1.5 Å 1.2 Å

5.0 Å : see shape of molecule 3.0 Å : see main-chain and some side chains 2.5 Å : see main-chain carbonyls 1.5 Å : ~ atomic resolution.

Resolution

1.2 Å 2 Å 3 Å

Atomic resolution

Fitting side chains, adding waters

• If the density is good enough you can recognize alternate conformations for side-chains.

• Hydrogens are not seen in the density,

except in ultra-high resolutions structures < 1.0 Å.

• Ordered Waters are seen on the surface and occasionally in the interior of the protein.

At 2.0 Å resolution or better ~ 1 water / residue.

Waters molecules play a big role in protein stability and enzyme catalysis.

•Because the density depends on experimental phases which has error associated with them.

The first model can have many errors.

• Therefore it is essential to

refine the atomic positions and their thermal parameters

.

Chain Tracing

Electron Density Chain Trace Final Model

Maps coefficients used to minimize model bias

2F o – F c

: most common map seen in paper.

F o – F c :

(difference map) used with the above map to detect errors  (

x

,

y

,

z

)  1

V



h k l F hkl e

 2 

i

(

hx

+

ky

+

lz

) + a

hkl

Refinement Cycle

Refinement:

Improving the agreement between the model and the experimental density.

Compare F obs (From reflection Intensities) to F calc (Calculated from the model) Least squares minimization Simulated Annealing / Molecular dynamics R factor = numerical indicator to follow progress of refinement agreement between data and model data model

Fit Model

R

 

F

obs F

obs F calc

Calculate map

data

Refine

Refinement

Refinement

# iterations R =

(|F o -F c |)/

(F o ) F c = calculated structure factor F o = observed structure factor

The

best Fourier

is calculated from  best 1  m F P

S S

 Pbest

(S

))

Protein Data Base growth

Molecular Biology:

cloning of genes / over expression of proteins

Synchrotron Radiation:

MAD phasing, smaller crystals

Cryo-cooling of crystals:

collect data from 1 crystal, increase order. Instrumentational and software improvements Increase in the number of labs using the technique

• Due to the advent of synchrotron radiation and due to the seleno-methionine derivatization technique, the total number of protein structures deposited in the PDB from catastrophically.

1980 onwards has increased • MAD technique played a major role in this. At present nearly 100 new structures are deposited every week.

THANK YOU