PowerPoint Presentation - Bioinformatics 2 -

Download Report

Transcript PowerPoint Presentation - Bioinformatics 2 -

Short fast history of protein design
Site-directed mutagenesis -- protein engineering (J. Wells, 1980's)
Coiled coils, helix bundles (W. DeGrado, 1980's-90's)
Extreme protein stabilization (S. Mayo, 1990's)
Binding pocket design (H. Hellinga, 2000)
New fold design (B. Kuhlman, 2002-4)
Protein-protein interface design (J. Gray, 2004)
Experimental (non-computational) approaches:
• in vitro evolution
• phage display
**Other names in protein design: Hill, Vriend, Regan, D. Baker, Richardson,
Dunbrack, Choma, several more.
The goal of sequence design
Given a desired structure, find an amino acid sequence that
folds to that structure.
MIKYGTKIYRINSDNSG
KJHGCKAHNEEEGHA
design
folding
To do this, we must assign an energy to each possible
sequence.
Theoretical complexity of
sequence design
To design THE OPTIMAL sequence, we need the best amino
acid, and its best rotamer at every position. We can treat each
position as one of 193 possible rotamers. That's 191 rotamers
in the Richardson library, plus Gly and Ala (which have no
rotamers)
How many possible sets of rotamers are there for a protein of
length 100?
193100 = 3.6*10228
DEE reduces the complexity of sequence design to about
(193L)2 = 3.6*108
Good news for protein designers
Sequence space maps to structure space
..as many-to-one.
sequence
families
fold
This means that there is a lot of potential for "slop" in a sequence
design. Moderately big sequence changes are possible, and the
sequence can still fold to the same general structure.
reminder
Dead end elimination theorem
E(ir) + Sj mins E(irjs) > E(it) + Sj maxs E(it,js)
This can be translated into plain English as follows:
If the "worst case scenario" for t is better than the
"best case scenario" for r, then you always choose t.
DEE algorithm
a
1
r2
2
3
1
b
r1
c a
a
b
c
a -1 3
b 1 5
c 1 1
2
c
a
b
c
-1 1
1
-2 2
5
0
3
5
1
0
5
-1
0
5
5
-1
0
0
0
5
0
0
1
0
12 5
0
0
4
0
0
5
5
3
b
E(r1,r2)
-1
3
a -2 0
b 2 5
c 5 -1
0
0
12
4
0
0
0
5
3
0
0
1
0
0
10
0
5
0
0
0
0
0
E(r2)
3
a
b
c
1
2
0 10
E(r1)
Find two columns (rotamers) within the same residue, where one is
always better than the other. Eliminate the rotamer that can always be
beat. (repeat until only 1 rotamer per residue)
DEE with alternative sequences
a
1
a
b
c
-1
r2
2
3
E(r1)
1 bc
r1
a
2
bc a
3b
-1
1
1
-2 1 5 2
0
3
5
1
0
5 -1 2
0
5
5
-1
0
0 0
3
5
5
5
1
-1
a -2 0
b 2 5
a 5 -1
b
2 2
0
0 12
4
0
0
5
3
0
1
0
0
3
1 -3
1
a
1
b
1
c
0 0
5
0 0
E(r1,r2)
0
0
0
a b
3
1 1
5
0
12 5 0 -3
0
4 3 0 1
0
0
Asp
0
12
Leu
0
0 12 2
E(r2)
2
3
a
b
c
1
2
“Rotamers” within the DEE framework can have different atoms. i.e. they can
be different amino acids. Using DEE, we choose the best set of rotamers. Now
we have the sequence of the lowest energy structure. In the example, we have
D or L at position 3.
Sequence design using DEE
•Selected residues (or all) are chosen for mutating.
•Selected (or all) amino acids are allowed at those
positions.
•For the selected amino acids, all rotamers are
considered.
Now "rotamer" comes to mean the amino acid identity
and its conformation.
Since there are as many as 193 rotamers in the
rotamer library for all amino acids, each selected
position can have as many as 193 "rotamers."
If "fine grained" rotamers are used, this number may
be much larger.
DEE with alternative sequences and ligands
Ligand conformers.
a b2 c
a b3
-1
1
1
-2 1 5 2
0
3
5
1
0
5 -1 2
0
5
5
-1
0
0 0
a b1 c
L
a
b
c
-1
r2
2
3
E(r1)
3
5
5
5
1
-1
a -2 0
b 2 5
a 5 -1
b
2 2
0
0 12
4
0
0
5
3
0
1
0
0
3
1 -3
1
a
1
b
1
c
0 0
5
r1
a b
0 0
E(r1,r2)
0
0
0
3
1 1
5
0
12 5 0 -3
0
4 3 0 1
0
0
Asp
0
12
Leu
0
0 12 2
E(r2)
2
L
3
a
b
c
2
Ligands can have multiple conformations and locations within the active site.
In DEE, each position of the ligand is another “rotamer”, i.e. another row and
column in the DEE matrix.
Sidechain modeling
Given a backbone conformation and the sequence,
can we predict the sidechain conformations?
≠
Energy calculations are sensitive to small changes. So
the wrong sidechain conformation will give the wrong
energy.
Goal of sidechain modeling
Given the sequence and
only the backbone atom
coordinates, accurately
model the positions of the
sidechains.
fine lines = true structure
think lines = sidechain predictions
using the method of Desmet et al.
Desmet et al, Nature v.356, pp339-342 (1992)
Sidechain space is discrete,
almost
A random sampling of Phenylalanine sidechains, when
superimposed, fall into three classes: rotamers.
This simplifies the problem of sidechain modeling.
All we have to do is select the right rotamers and we're close to the
right answer.
What determines rotamers
3-bond or 1-4 interactions define the preferred angles, but these may
differ greatly in energy depending on the atom groups involved.
N
N
N
CA
CA
CA
CB
CB
CB
CG
H
O=C
H
"m"
-60° gauche
H
O=C
H
"t"
180° anti/trans
O=C
H
"p"
+60° gauche
Rotamer Libraries
Rotamer libraries have been compiled by clustering the
sidechains of each amino acid over the whole database. Each
cluster is a representative conformation (or rotamer), and is
represented in the library by the best sidechain angles (chi
angles), the "centroid" angles, for that cluster.
Two commonly used rotamer libraries:
*Jane & David Richardson:
http://kinemage.biochem.duke.edu/databases/rotamer.php
Roland Dunbrack: http://dunbrack.fccc.edu/bbdep/index.php
*rotamers of W on the previous page are from the Richardson library.
Dead end elimination theorem
•There is a global minimum energy conformation (GMEC),
where each residue has a unique rotamer.
In other words: GMEC is the set of rotamers that has the
lowest energy.
•Energy is a pairwise thing. Total energy can be broken down
into pairwise interactions. Each atom is either fixed (backbone)
or movable (sidechain).
fixed-fixed
fixed-movable
E is a constant,
=Etemplate
E depends on rotamer,
but independent of
other rotamers
movable-movable
E depends on rotamer,
and depends on
surrounding rotamers
Theoretical complexity of
sidechain modeling
The Global Minimum Energy Configuration (GMEC) is one,
unique set of rotamers.
How many possible sets of rotamers are there?
n1 n2 n3 n4 n5 … nL
where n1 is the number of rotamers for residue 1, and so on.
Estimated complexity for a protein of 100 residue, with an
average of 5 rotamers per position: 5100 = 8*1069
DEE reduces the complexity of the problem from 5L to
approximately (5L)2
Dead end elimination theorem
•Each residue is numbered (i or j) and each residue has a set of
rotamers (r, s or t). So, the notation ir means "choose rotamer r
for position i".
•The total energy is the sum of the three components:
fixed-movable
fixed-fixed
movable-movable
Eglobal = Etemplate + SiE(ir) + SiSjE(ir,js)
where r and s are any choice of rotamers.
NOTE:
Eglobal ≥ EGMEC
for any choice of rotamers.
Dead end elimination theorem
•If ig is in the GMEC and it is not, then we can separate the
terms that contain ig or it and re-write the inequality.
EGMEC = Etemplate + E(ig) + SjE(ig,jg) + SjE(jg) + SjSkE(jg,kg)
...is less than...
EnotGMEC = Etemplate + E(it) + SjE(it,jg) + SjE(jg) + SjSkE(jg,kg)
Canceling all terms in black, we get:
E(ir) + Sj E(irjs) > E(ig) + Sj E(ig,js)
So, if we find two rotamers ir and it, and:
E(ir) + Sj mins E(irjs) > E(it) + Sj maxs E(it,js)
Then ir cannot possibly be in the GMEC.
Dead end elimination theorem
E(ir) + Sj mins E(irjs) > E(it) + Sj maxs E(it,js)
This can be translated into plain English as follows:
If the "worst case scenario" for rotamer t is better than
the "best case scenario" for rotamer r, then you can
eliminate r.
Exercise: Dead End Elimination
Using the DEE worksheet:
(1) Find a rotamer that satisfies the DEE theorem.
(2) Eliminate it.
(3) Repeat until each residue has only one rotamer.
What is the final GMEC energy?
DEE exercise
Three sidechains. Each with three rotamers. Therefore, there are
3x3x3=27 ways to arrange the sidechains. • Each rotamer has an
energy E(r), which is the non-bonded energy between sidechain and
template. • Each pair of rotamers has an interaction energy E(r1,r2),
which is the non-bonded energy between sidechains.
3
a
b
c
1
2
DEE exercise
a
1
r2
2
3
1
b
r1
c a
a
b
c
a -1 3
b 1 5
c 1 1
2
b
c
a
b
c
-1 1
1
-2 2
5
0
3
5
1
0
5
-1
0
5
5
-1
0
0
0
5
0
0
1
0
12 5
0
0
4
0
0
5
5
3
E(r1,r2)
-1
3
a -2 0
b 2 5
c 5 -1
0
0
12
4
0
0
0
5
3
0
0
1
0
0
10
0 0
E(r1)
5
0
0
0
0
0 12
E(r2)
DEE exercise: instructions
If the “best case scenario” for r1 is worse than the “worst case
scenario” for r2 you can eliminate r1.
(1) The best (worst) energies are found using the worksheet:
Add E(r1) to the sum of the lowest (highest) E(r1,r2) that have
not been previously eliminated.
(2) There are 9 possible DEE comparisons to make: 1a versus
1b, 1a versus 1c, 1b versus 1c, 2a versus 2b, etc. etc. For each
comparison, find the minimum and maximum energy
choices of the other rotamers. If the maximum energy of r1
is less than the minimum energy of r2, eliminate r2.
(3) Scratch out the eliminated rotamer and repeat until one
rotamer per position remains.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.