Protein Secondary Structures Assignment and prediction Use of secondary structure • • • • • Classification of protein structures Definition of loops (active sites) Use in fold recognition methods Improvements of.

Download Report

Transcript Protein Secondary Structures Assignment and prediction Use of secondary structure • • • • • Classification of protein structures Definition of loops (active sites) Use in fold recognition methods Improvements of.

Protein Secondary
Structures
Assignment and prediction
Use of secondary structure
•
•
•
•
•
Classification of protein structures
Definition of loops (active sites)
Use in fold recognition methods
Improvements of alignments
Definition of domain boundaries
Classification of secondary structure
• Defining features
• Dihedral angles
• Hydrogen bonds
• Geometry
• Assigned manually by crystallographers or
• Automatic
• DSSP (Kabsch & Sander,1983)
• STRIDE (Frishman & Argos, 1995)
• DSSPcont (Andersen et al., 2002)
Dihedral Angles
From http://www.imb-jena.de
phi
psi
omega
-
dihedral angle about the N-Calpha bond
dihedral angle about the Calpha-C bond
dihedral angle about the C-N (peptide) bond
Helices
phi(deg)
psi(deg) H-bond pattern
----------------------------------------------------------------right-handed alpha-helix
-57.8
-47.0
i+4
pi-helix
-57.1
-69.7
i+5
310-helix
-74.0
-4.0
i+3
(omega is 180 deg in all cases)
----------------------------------------------------------------From http://www.imb-jena.de
Beta Strands
phi(deg)
psi(deg)
omega (deg)
-----------------------------------------------------------------beta strand
-120
120
180
-----------------------------------------------------------------
Hydrogen bond patterns in beta sheets. Here a four-stranded
beta sheet is drawn schematically which contains three
antiparallel and one parallel strand. Hydrogen bonds are
indicated with red lines (antiparallel strands) and green lines
(parallel strands) connecting the hydrogen and receptor oxygen.
From http://broccoli.mfn.ki.se/pps_course_96/
Secondary Structure Elements
ß-strand
Helix
Bend
Turn
Helix formation is local
THYROID hormone receptor
(2nll)
i
i+3
b-sheet formation is NOT local
Erabutoxin  (3ebx)
Secondary Structure Type
Descriptions
*
*
*
*
*
*
*
*
H = alpha helix
G = 310 - helix
I = 5 helix (pi helix)
E = extended strand, participates in beta ladder
B = residue in isolated beta-bridge
T = hydrogen bonded turn
S = bend
C = coil
Automatic assignment programs
DSSP ( http://www.cmbi.kun.nl/gv/dssp/ )
STRIDE ( http://www.hgmp.mrc.ac.uk/Registered/Option/stride.html )
DSSPcont ( http://cubic.bioc.columbia.edu/services/DSSPcont/ )
•
•
•
#
RESIDUE AA STRUCTURE BP1 BP2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
E
H
V
I
I
Q
A
E
F
Y
L
N
P
D
Q
S
G
E
F
M
F
D
F
D
G
D
E
E
E
E
E
E
E
E
E
T
T
T
T
E
E
E
E
E
E
E
E
T
T
E
E
-A
-A
-A
+A
+A
-A
-A
>> -A
45S+
45S+
45S<5 +
< +A
-A
-A
+A
-AB
-AB
-AB
> S-AB
3 S3 S+
< S-B
-B
DSSP
0
0
0
23
22
21
20
19
18
17
16
0
0
0
0
11
10
9
8
7
6
5
4
0
0
23
22
0
0
0
0A
0A
0A
0A
0A
0A
0A
0A
0
0
0
0
0A
0A
0A
0A
30A
29A
27A
26A
0
0
0A
0A
ACC
205
127
66
106
74
86
18
63
31
36
24
54
114
66
132
44
28
14
3
0
45
6
76
74
20
114
8
N-H-->O
O-->H-N
N-H-->O
O-->H-N
0, 0.0
2,-0.3
0, 0.0
0, 0.0
2, 0.0
2,-0.4 21, 0.0 21, 0.0
-2,-0.3 21,-2.6
2, 0.0
2,-0.5
-2,-0.4
2,-0.4 19,-0.2 19,-0.2
17,-2.8 17,-2.8 -2,-0.5
2,-0.9
-2,-0.4
2,-0.4 15,-0.2 15,-0.2
13,-2.5 13,-2.5 -2,-0.9
2,-0.3
-2,-0.4
2,-0.3 11,-0.2 11,-0.2
9,-1.5
9,-1.8 -2,-0.3
2,-0.4
-2,-0.3
2,-0.4
7,-0.2
7,-0.2
5,-3.2
4,-1.7 -2,-0.4
5,-1.3
-2,-0.4 -2, 0.0
2,-0.2
0, 0.0
0, 0.0 -1,-0.2
0, 0.0 -2, 0.0
2,-0.1 -2,-0.2
1,-0.1
3,-0.1
-4,-1.7
2,-0.3
1,-0.2 -3,-0.2
-5,-1.3 -5,-3.2
2, 0.0
2,-0.3
-2,-0.3
2,-0.3 -7,-0.2 -7,-0.2
-9,-1.8 -9,-1.5 -2,-0.3
2,-0.4
12,-0.4 12,-2.3 -2,-0.3
2,-0.3
-13,-2.5 -13,-2.5 -2,-0.4
2,-0.4
8,-2.4
7,-2.9 -2,-0.3
8,-1.0
-17,-2.8 -17,-2.8 -2,-0.4
2,-0.5
3,-3.5
3,-2.1 -2,-0.4 -19,-0.2
-21,-2.6 -20,-0.1 -2,-0.5 -1,-0.1
-22,-0.3
2,-0.4
1,-0.2 -1,-0.3
-3,-2.1 -3,-3.5 109, 0.0
2,-0.3
-2,-0.4 -5,-0.3 -5,-0.2
3,-0.1
TCO
0.000
-0.987
-0.995
-0.976
-0.972
-0.910
-0.852
-0.933
-0.967
-0.994
-0.929
-0.884
-0.963
0.752
0.936
-0.877
-0.893
-0.979
-0.982
-0.983
-0.934
-0.948
-0.947
0.904
0.291
-0.822
-0.525
KAPPA ALPHA
PHI
PSI
360.0 360.0 360.0 113.5
360.0-152.8-149.1 154.0
4.6-170.2-134.3 126.3
13.9-170.8-114.8 126.6
20.8-158.4-125.4 129.1
29.5-170.4 -98.9 106.4
11.5 172.8-108.1 141.7
4.4 175.4-139.1 156.9
13.3-160.9-160.6 151.3
16.5-156.0-136.8 132.1
11.7-122.6-120.0 133.5
84.3
9.0-113.8 150.9
125.4 60.5 -86.5
8.5
89.3-146.2 -64.6 -23.0
51.1 134.1 52.9 50.0
28.9 174.9-124.8 156.8
15.9-146.5-151.0-178.9
5.0-169.6-158.6 146.0
27.8 149.2-139.1 120.3
39.7-127.8-152.1 161.6
23.9-164.1-112.5 137.7
6.9-165.0-123.7 138.3
78.4 -27.2-127.3 111.5
128.9 -46.6 50.4 45.0
118.8 109.3 84.7 -11.1
71.8-114.7-103.1 140.3
24.9-177.7 -74.1 127.5
X-CA
5.7
9.4
11.5
15.0
16.6
19.9
20.7
23.4
24.4
27.2
28.0
29.7
32.0
33.0
33.3
32.1
29.6
28.0
26.5
24.5
21.7
18.9
16.4
13.4
15.4
18.4
21.8
Y-CA
42.2
41.3
38.4
37.6
34.9
33.0
31.8
29.4
27.6
25.3
24.8
22.0
21.6
25.2
24.2
27.7
28.7
31.5
32.2
35.4
37.0
38.9
41.3
42.1
41.4
43.4
41.8
Z-CA
25.1
24.7
23.5
24.5
22.4
23.0
19.5
18.4
15.3
14.1
10.4
8.6
6.8
7.6
11.2
12.3
14.8
16.7
20.1
20.6
22.6
20.8
22.3
20.2
17.0
18.1
19.1
Prediction of protein secondary structure
• What to predict?
• How to predict?
• How good are the best?
Secondary Structure Prediction
• What to predict?
– All 8 types or pool types into groups
DSSP
*
*
*
H = alpha helix
G = 310 -helix
I = 5 helix (pi helix)
*
*
E = extended strand
B = beta-bridge
E
*
*
*
T = hydrogen bonded turn
S = bend
C = coil
C
H
Secondary Structure Prediction
• What to predict?
– All 8 types or pool types into groups
Straight HEC
*
H = alpha helix
*
E = extended strand
H
E
*
*
*
*
*
*
T = hydrogen bonded turn
S = bend
C = coil
G = 310-helix
I = 5 helix (pi helix)
B = beta-bridge
C
Secondary Structure Prediction
• Simple alignments
• Align to a close homolog for which the structure has been
experimentally solved.
• Heuristic Methods (e.g., Chou-Fasman, 1974)
• Apply scores for each amino acid an sum up over a window.
• Neural Networks (different inputs)
•
•
•
•
Raw Sequence (late 80’s)
Blosum matrix (e.g., PhD, early 90’s)
Position specific alignment profiles (e.g., PsiPred, late 90’s)
Multiple networks balloting, probability conversion, output
expansion (Petersen et al., 2000).
The pessimistic point of view
Prediction by alignment
HoMo
1D
FoRc
….the art of
being humble
Secondary structure predictions
of 1. and 2. generation
• single residues
(1. generation)
– Chou-Fasman, GOR
50-55% accuracy
• segments
– GORIII
55-60% accuracy
1957-70/80
(2. generation)
1986-92
• problems
– < 100%
they said: 65% max
– < 40%
they said: strand non-local
– short segments
Improvement of accuracy
1974 Chou & Fasman
1978 Garnier
1987 Zvelebil
1988 Quian & Sejnowski
1993 Rost & Sander
1997 Frishman & Argos
1999 Cuff & Barton
1999 Jones
2000 Petersen et al.
~50-53%
63%
66%
64.3%
70.8-72.0%
<75%
72.9%
76.5%
77.9%
Simple Alignments
• Solved structure of a homolog to query
is needed
• Homologous proteins have ~88%
identical (3 state) secondary structure
• If no close homologue can be identified
alignments will give almost random
results
Amino acid preferences in a-Helix
Amino acid preferences in b-Strand
Amino acid preferences in coil
Chou-Fasman
Name
Ala
Arg
Asp
Asn
Cys
Glu
Gln
Gly
His
Ile
Leu
Lys
Met
Phe
Pro
Ser
Thr
Trp
Tyr
Val
P(a)
142
98
101
67
70
151
111
57
100
108
121
114
145
113
57
77
83
108
69
106
P(b)
83
93
54
89
119
37
110
75
87
160
130
74
105
138
55
75
119
137
147
170
P(turn)
66
95
146
156
119
74
98
156
95
47
59
101
60
60
152
143
96
96
114
50
f(i)
0.06
0.070
0.147
0.161
0.149
0.056
0.074
0.102
0.140
0.043
0.061
0.055
0.068
0.059
0.102
0.120
0.086
0.077
0.082
0.062
f(i+1)
0.076
0.106
0.110
0.083
0.050
0.060
0.098
0.085
0.047
0.034
0.025
0.115
0.082
0.041
0.301
0.139
0.108
0.013
0.065
0.048
f(i+2)
0.035
0.099
0.179
0.191
0.117
0.077
0.037
0.190
0.093
0.013
0.036
0.072
0.014
0.065
0.034
0.125
0.065
0.064
0.114
0.028
f(i+3)
0.058
0.085
0.081
0.091
0.128
0.064
0.098
0.152
0.054
0.056
0.070
0.095
0.055
0.065
0.068
0.106
0.079
0.167
0.125
0.053
Chou-Fasman
1. Assign all of the residues in the peptide the appropriate set of parameters.
2. Scan through the peptide and identify regions where 4 out of 6 contiguous residues have P(a-helix) >
100. That region is declared an alpha-helix. Extend the helix in both directions until a set of four
contiguous residues that have an average P(a-helix) < 100 is reached. That is declared the end of the
helix. If the segment defined by this procedure is longer than 5 residues and the average P(a-helix) >
P(b-sheet) for that segment, the segment can be assigned as a helix.
3. Repeat this procedure to locate all of the helical regions in the sequence.
4. Scan through the peptide and identify a region where 3 out of 5 of the residues have a value of P(bsheet) > 100. That region is declared as a beta-sheet. Extend the sheet in both directions until a set of
four contiguous residues that have an average P(b-sheet) < 100 is reached. That is declared the end of
the beta-sheet. Any segment of the region located by this procedure is assigned as a beta-sheet if the
average P(b-sheet) > 105 and the average P(b-sheet) > P(a-helix) for that region.
5. Any region containing overlapping alpha-helical and beta-sheet assignments are taken to be helical if
the average P(a-helix) > P(b-sheet) for that region. It is a beta sheet if the average P(b-sheet) > P(ahelix) for that region.
6. To identify a bend at residue number j, calculate the following value:
p(t) = f(j)f(j+1)f(j+2)f(j+3)
where the f(j+1) value for the j+1 residue is used, the f(j+2) value for the j+2 residue is used and the
f(j+3) value for the j+3 residue is used. If: (1) p(t) > 0.000075; (2) the average value for P(turn) >
1.00 in the tetra-peptide; and (3) the averages for the tetra-peptide obey the inequality P(a-helix) <
P(turn) > P(b-sheet), then a beta-turn is predicted at that location.
Chou-Fasman
• General applicable
• Works for sequences with no solved
homologs
• But the accuracy is low!
– 50%
Improvement of accuracy
1974 Chou & Fasman
1978 Garnier
1987 Zvelebil
1988 Quian & Sejnowski
1993 Rost & Sander
1997 Frishman & Argos
1999 Cuff & Barton
1999 Jones
2000 Petersen et al.
~50-53%
63%
66%
64.3%
70.8-72.0%
<75%
72.9%
76.5%
77.9%
PHD method
(Rost and Sander)
• Combine neural networks with sequence profiles
– 6-8 Percentage points increase in prediction accuracy over
standard neural networks (63% -> 71%)
• Use second layer “Structure to structure” network to
filter predictions
• Jury of predictors
• Set up as mail server
Neural Networks
• Benefits
• General applicable
• Can capture higher order correlations
• Inputs other than sequence information
• Drawbacks
• Needs many data (different solved structures).
• However, these does exist today (nearly 2500 solved
structures with low sequence identity/high resolution.)
• Complex method with several pitfalls
How is it done
• One network (SEQ2STR) takes sequence
(profiles) as input and predicts secondary
structure
– Cannot deal with SS elements i.e. helices are normally
formed by at least 5 consecutive aminoacids
• Second network (STR2STR) takes predictions
of first network and predicts secondary
structure
– Can correct for errors in SS elements, i.e remove
single helix prediction, mixture of strand and helix
predictions
Architecture
Weights
Input Layer
IK
EE
H
VI
HE
C
IQ
AE
Hidden Layer
Window
IKEEHVIIQAEFYLNPDQSGEF…..
Output Layer
Secondary networks
(Structure-to-Structure)
Weights
Input Layer
HE
CH
E
CH
EC
Window
HE
C
Hidden Layer
IKEEHVIIQAEFYLNPDQSGEF…..
Output Layer
Example
PITKEVEVEYLLRRLEE
HHHHHHHHHHHHTGGG.
ECCCHEEHHHHHHHCCC
CCCCHHHHHHHHHHCCC
(Sequence)
(DSSP)
(SEQ2STR)
(STR2STR)
Sequence profiles
1
fyn_human VTLFVALYDY
yrk_chick VTLFIALYDY
fgr_human VTLFIALYDY
yes_chick VTVFVALYDY
src_avis2 VTTFVALYDY
src_aviss VTTFVALYDY
src_avisr VTTFVALYDY
src_chick VTTFVALYDY
stk_hydat VTIFVALYDY
src_rsvpa ..........
hck_human ..IVVALYDY
blk_mouse ..FVVALFDY
hck_mouse .TIVVALYDY
lyn_human ..IVVALYPY
lck_human ..LVIALHSY
ss81_yeast.....ALYPY
abl_mouse ..LFVALYDF
abl1_human..LFVALYDF
src1_drome..VVVSLYDY
mysd_dicdi.....ALYDF
yfj4_yeast....VALYSF
abl2_human..LFVALYDF
tec_human .EIVVAMYDF
abl1_caeel..LFVALYDF
txk_human .....ALYDF
yha2_yeastVRRVRALYDL
abp1_sacex.....AEYDY
EARTEDDLSF
EARTEDDLSF
EARTEDDLTF
EARTTDDLSF
ESRTETDLSF
ESRTETDLSF
ESRTETDLSF
ESRTETDLSF
EARISEDLSF
ESRIETDLSF
EAIHHEDLSF
AAVNDRDLQV
EAIHREDLSF
DGIHPDDLSF
EPSHDGDLGF
DADDDdeISF
VASGDNTLSI
VASGDNTLSI
KSRDESDLSF
DAESSMELSF
AGEESGDLPF
VASGDNTLSI
QAAEGHDLRL
HGVGEEQLSL
LPREPCNLAL
TTNEPDELSF
EAGEDNELTF
HKGEKFQILN
QKGEKFHIIN
TKGEKFHILN
KKGERFQIIN
KKGERLQIVN
KKGERLQIVN
KKGERLQIVN
KKGERLQIVN
KKGERLQIIN
KKRERLQIVN
QKGDQMVVLE
LKGEKLQVLR
QKGDQMVVLE
KKGEKMKVLE
EKGEQLRILE
EQNEILQVSD
TKGEKLRVLG
TKGEKLRVLG
MKGDRMEVID
KEGDILTVLD
RKGDVITILK
TKGEKLRVLG
ERGQEYLILE
RKGDQVRILG
RRAEEYLILE
RKGDVITVLE
AENDKIINIE
SSEGDWWEAR
NTEGDWWEAR
NTEGDWWEAR
NTEGDWWEAR
NTEGDWWLAH
NTEGDWWLAH
NTEGDWWLAH
NTEGDWWLAH
TADGDWWYAR
NTEGTWWLAH
ES.GEWWKAR
.STGDWWLAR
.EAGEWWKAR
.EHGEWWKAK
QS.GEWWKAQ
.IEGRWWKAR
YnnGEWCEAQ
YnnGEWCEAQ
DTESDWWRVV
QSSGDWWDAE
ksQNDWWTGR
YNQNGEWSEV
KNDVHWWRAR
YNKNNEWCEA
KYNPHWWKAR
QVYRDWWKGA
FVDDDWWLGE
50
SLTTGETGYI
SLSSGATGYI
SLSSGKTGCI
SIATGKTGYI
SLTTGQTGYI
SLTTGQTGYI
SLTTGQTGYI
SLTTGQTGYI
SLITNSEGYI
SLTTGQTGYI
SLATRKEGYI
SLVTGREGYV
SLATKKEGYI
SLLTKKEGFI
SLTTGQEGFI
R.ANGETGII
..TKNGQGWV
..TKNGQGWV
NLTTRQEGLI
L..KGRRGKV
V..NGREGIF
RSKNG.QGWV
D.KYGNEGYI
RlrLGEIGWV
D.RLGNEGLI
L..RGNMGIF
LETTGQKGLF
P roteinAlignments
profile table
:
G
Y
I
Y
: : :
GGG
YYY
I I E
YYY
:
G
Y
E
Y
GSAPD
5. . . .
. . . . .
. . . . .
. . . . .
NTEKQ
. . . . .
. . . . .
..2..
. . . . .
CVHIR
. . . . .
. . . . .
. . .3.
. . . . .
D
P
E
D
G
D
P
D
D
G
V
N
P
DDD
P P P
AEA
VVE
GGG
DDD
P P P
DTD
NQN
GNG
V I V
E PK
P P P
D
P
A
E
G
D
P
D
N
G
V
K
P
. . ..5
. . .5.
..3..
. . ..1
5....
. . ..5
. . .5.
. . ..4
. . ..1
4....
. . . . .
. . . 1.
. . .5.
. . . . .
. . . . .
.. 2..
..2..
. . . . .
. . . . .
. . . . .
.1...
3.. . 1
1 ....
. . . . .
1.12.
. . . . .
. . . . .
. . . . .
. . . . .
.2.. .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
.4.1.
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
G
T
D
F
:
GGG
TTT
EKS
F F F
: : :
G
T
A
F
:
5. . . .
. . . . .
. 11.1
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
.
.
.
.
. . . .
5...
. 1 1.
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
LMYFW
. . . . .
. . 5 . .
. . . . .
. . 5 . .
corresponds to the the 21*3 bits coding for the profile of one residue
Slide courtesy by B. Rost 2004
>
>
L>
s0
input
layer
1
J
2
J
s1
first or
hidden layer
s2
second or
output layer
pick
maximal
unit
=>
current
prediction
PHDsec
input local in sequence
local
alignment
13
adjacent
residues
global
statist.
whole
protein
:::
AAA
AA.
LLL
LII
AAG
CCS
GVV
:::
%AA
Length
² N-term
² C-term
A C L I G S V ins del cons
100 0 0 0 0 0 0 0 0 1.17
100 0 0 0 0 0 0 33 0 0.42
0 0 100 0 0 0 0 0 33 0.92
0 0 33 66 0 0 0 0 0 0.74
66 0 0 0 33 0 0 0 0 1.17
0 66 0 0 0 33 0 0 0 0.74
0 0 0 33 0 0 66 0 0 0.48
input global in sequence
percentage of each amino acid in protein
length of protein
(Š60, Š120, Š240, >240)
distance: centre, N-term (Š40,Š30,Š20,Š10)
distance: centre, C-term (Š40,Š30,Š20,Š10)
input
layer
21+3
"
"
"
"
"
"
hidden
layer
output
layer
H
E
20
4
4
4
L
first level
sequence-tostructure
Slide courtesy by B. Rost 2004
4+1
"
"
"
"
"
"
20
4
4
4
H
0.5
E
0.1
L
0.4
second level
structure-tostructure
Prediction accuracy PHD
70
<Q3>=72.3% ; sigma=10.5%
50
40
30
1bct
1stu
10
3ifm
1psm
20
1spf
Number of protein chains
60
0
0
10
20
Slide courtesy by B. Rost 2004
30
40
50
60
Per-residue accuracy (Q
70
3
)
80
90
100
Stronger predictions more accurate!
70
<Q3>=72.3% ; sigma=10.5%
50
40
30
1bct
1stu
10
3ifm
1psm
20
1spf
0
0
10
20
30
40
50
60
70
Per-residue accuracy (Q
.
3
)
80
90
100
Q3 per protein
fit: Q3 fit = 21 + 8.7 * Q 3
100
100
80
80
60
60
40
40
20
20
0
0
Q3 per protein
Number of protein chains
60
3
4
5
6
7
8
9
Reliability index averaged over protein
PSI-Pred (Jones)
• Use alignments from iterative sequence
searches (PSI-Blast) as input to a neural
network (Just like PHDsec)
• Better predictions due to better
sequence profiles
• Available as stand alone program and via
the web
Petersen et al. 2000
• SEQ2STR (>70 networks)
– Not one single network architecture is best
for all sequences
• STR2STR (>70 network)
• => 4900 network predictions,
– (wisdom of the crowd!!!)
– Others have 1
Why so many networks?
Why not select the best?
Prediction accuracy (Q3=81.2%). 2006.
(Petersen et al. 2000)
Spectrin homology domain (SH3)
HEADER
COMPND
SOURCE
AUTHOR
CYTOSKELETON
ALPHA SPECTRIN (SH3 DOMAIN)
CHICKEN (GALLUS GALLUS) BRAIN
M.NOBLE,R.PAUPTIT,A.MUSACCHIO,M.SARASTE
59%
65%
72%
CEEEEEEECCCCCCCCCCCCCCCCEEEEEECCCCCEEEEEECCCEEEECCCCCEECC
.EEEEESS.B...STTB..B.TT.EEEEEE..SSSEEEEEETTEEEEEEGGGEEE..
93%
Benchmarking secondary structure
predictions
• CASP
– Critical Assessment of Structure Predictions
– Sequences from about-to-be-deposited-structures are given
to groups who submit their predictions before the structure
is published
– Every 2. year
• EVA
– Newly solved structures are send to prediction servers.
– Every week
EVA results (Rost et al., 2001)
•
•
•
•
•
•
PROFphd
PSIPRED
SAM-T99sec
SSpro
Jpred2
PHD
77.0%
76.8%
76.1%
76.0%
75.5%
71.7%
– Cubic.columbia.edu/eva
EVA: secondary structure
76%
Method B
Q3 C
PROF
P SIPRED
SSpro
76.0
76.0
76.0
JP red2
PHDpsi
75.0
75.0
PHD
Q3 Claim D
SOV E Info F
CorrH G
CorrE H CorrL I
Class K BAD L
72
72
71
0.35
0.36
0.35
0.67
0.65
0.67
0.63
0.62
0.63
0.55
0.55
0.56
82
78
83
2.7
2.8
2.8
76.4
69
71
0.34
0.33
0.65
0.65
0.60
0.60
0.54
0.54
76
81
2.6
3.0
71.4
71.6
68
0.28
0.59
0.58
0.49
77
4.3
Copenhagen 78 N
77.8
76.5-78.3
76
M
53 O
Wang/Yuan
Petersen et al. Proteins 2000
Prediction of protein secondary structure
•
•
•
•
•
•
1980: 55%
1990: 60%
1993: 70%
2000: 76%
2006: 80%
2008: >80%
simple
less simple
evolution
more evolution
more evolution
more evolution
Links to servers
• Database of links
http://mmtsb.scripps.edu/cgi
bin/renderrelres?protmodel
• ProfPHD
http://www.predictprotein.org/
• PSIPRED
http://bioinf.cs.ucl.ac.uk/psipred/
• JPred
http://www.compbio.dundee.ac.uk/~www-jpred/
Conclusions
• The big break through in SS prediction came due to sequence
profiles
– Rost et al.
• Prediction of secondary structure has not changed in the last 5
years
– More protein sequences => higher prediction accuracy
– No new theoretical break through
• Accuracy is close to 80% for globular proteins
• If you need a secondary structure prediction use one of profile
based:
– ProfPHD, PSIPRED, and JPred
• And not one of the older ones such as :
– Chou-Fasman
– Garnier