Protein Secondary Structures Assignment and prediction Use of secondary structure • • • • • Classification of protein structures Definition of loops (active sites) Use in fold recognition methods Improvements of.
Download ReportTranscript Protein Secondary Structures Assignment and prediction Use of secondary structure • • • • • Classification of protein structures Definition of loops (active sites) Use in fold recognition methods Improvements of.
Protein Secondary Structures Assignment and prediction Use of secondary structure • • • • • Classification of protein structures Definition of loops (active sites) Use in fold recognition methods Improvements of alignments Definition of domain boundaries Classification of secondary structure • Defining features • Dihedral angles • Hydrogen bonds • Geometry • Assigned manually by crystallographers or • Automatic • DSSP (Kabsch & Sander,1983) • STRIDE (Frishman & Argos, 1995) • DSSPcont (Andersen et al., 2002) Dihedral Angles From http://www.imb-jena.de phi psi omega - dihedral angle about the N-Calpha bond dihedral angle about the Calpha-C bond dihedral angle about the C-N (peptide) bond Helices phi(deg) psi(deg) H-bond pattern ----------------------------------------------------------------right-handed alpha-helix -57.8 -47.0 i+4 pi-helix -57.1 -69.7 i+5 310-helix -74.0 -4.0 i+3 (omega is 180 deg in all cases) ----------------------------------------------------------------From http://www.imb-jena.de Beta Strands phi(deg) psi(deg) omega (deg) -----------------------------------------------------------------beta strand -120 120 180 ----------------------------------------------------------------- Hydrogen bond patterns in beta sheets. Here a four-stranded beta sheet is drawn schematically which contains three antiparallel and one parallel strand. Hydrogen bonds are indicated with red lines (antiparallel strands) and green lines (parallel strands) connecting the hydrogen and receptor oxygen. From http://broccoli.mfn.ki.se/pps_course_96/ Secondary Structure Elements ß-strand Helix Bend Turn Helix formation is local THYROID hormone receptor (2nll) i i+3 b-sheet formation is NOT local Erabutoxin (3ebx) Secondary Structure Type Descriptions * * * * * * * * H = alpha helix G = 310 - helix I = 5 helix (pi helix) E = extended strand, participates in beta ladder B = residue in isolated beta-bridge T = hydrogen bonded turn S = bend C = coil Automatic assignment programs DSSP ( http://www.cmbi.kun.nl/gv/dssp/ ) STRIDE ( http://www.hgmp.mrc.ac.uk/Registered/Option/stride.html ) DSSPcont ( http://cubic.bioc.columbia.edu/services/DSSPcont/ ) • • • # RESIDUE AA STRUCTURE BP1 BP2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 A A A A A A A A A A A A A A A A A A A A A A A A A A A E H V I I Q A E F Y L N P D Q S G E F M F D F D G D E E E E E E E E E T T T T E E E E E E E E T T E E -A -A -A +A +A -A -A >> -A 45S+ 45S+ 45S<5 + < +A -A -A +A -AB -AB -AB > S-AB 3 S3 S+ < S-B -B DSSP 0 0 0 23 22 21 20 19 18 17 16 0 0 0 0 11 10 9 8 7 6 5 4 0 0 23 22 0 0 0 0A 0A 0A 0A 0A 0A 0A 0A 0 0 0 0 0A 0A 0A 0A 30A 29A 27A 26A 0 0 0A 0A ACC 205 127 66 106 74 86 18 63 31 36 24 54 114 66 132 44 28 14 3 0 45 6 76 74 20 114 8 N-H-->O O-->H-N N-H-->O O-->H-N 0, 0.0 2,-0.3 0, 0.0 0, 0.0 2, 0.0 2,-0.4 21, 0.0 21, 0.0 -2,-0.3 21,-2.6 2, 0.0 2,-0.5 -2,-0.4 2,-0.4 19,-0.2 19,-0.2 17,-2.8 17,-2.8 -2,-0.5 2,-0.9 -2,-0.4 2,-0.4 15,-0.2 15,-0.2 13,-2.5 13,-2.5 -2,-0.9 2,-0.3 -2,-0.4 2,-0.3 11,-0.2 11,-0.2 9,-1.5 9,-1.8 -2,-0.3 2,-0.4 -2,-0.3 2,-0.4 7,-0.2 7,-0.2 5,-3.2 4,-1.7 -2,-0.4 5,-1.3 -2,-0.4 -2, 0.0 2,-0.2 0, 0.0 0, 0.0 -1,-0.2 0, 0.0 -2, 0.0 2,-0.1 -2,-0.2 1,-0.1 3,-0.1 -4,-1.7 2,-0.3 1,-0.2 -3,-0.2 -5,-1.3 -5,-3.2 2, 0.0 2,-0.3 -2,-0.3 2,-0.3 -7,-0.2 -7,-0.2 -9,-1.8 -9,-1.5 -2,-0.3 2,-0.4 12,-0.4 12,-2.3 -2,-0.3 2,-0.3 -13,-2.5 -13,-2.5 -2,-0.4 2,-0.4 8,-2.4 7,-2.9 -2,-0.3 8,-1.0 -17,-2.8 -17,-2.8 -2,-0.4 2,-0.5 3,-3.5 3,-2.1 -2,-0.4 -19,-0.2 -21,-2.6 -20,-0.1 -2,-0.5 -1,-0.1 -22,-0.3 2,-0.4 1,-0.2 -1,-0.3 -3,-2.1 -3,-3.5 109, 0.0 2,-0.3 -2,-0.4 -5,-0.3 -5,-0.2 3,-0.1 TCO 0.000 -0.987 -0.995 -0.976 -0.972 -0.910 -0.852 -0.933 -0.967 -0.994 -0.929 -0.884 -0.963 0.752 0.936 -0.877 -0.893 -0.979 -0.982 -0.983 -0.934 -0.948 -0.947 0.904 0.291 -0.822 -0.525 KAPPA ALPHA PHI PSI 360.0 360.0 360.0 113.5 360.0-152.8-149.1 154.0 4.6-170.2-134.3 126.3 13.9-170.8-114.8 126.6 20.8-158.4-125.4 129.1 29.5-170.4 -98.9 106.4 11.5 172.8-108.1 141.7 4.4 175.4-139.1 156.9 13.3-160.9-160.6 151.3 16.5-156.0-136.8 132.1 11.7-122.6-120.0 133.5 84.3 9.0-113.8 150.9 125.4 60.5 -86.5 8.5 89.3-146.2 -64.6 -23.0 51.1 134.1 52.9 50.0 28.9 174.9-124.8 156.8 15.9-146.5-151.0-178.9 5.0-169.6-158.6 146.0 27.8 149.2-139.1 120.3 39.7-127.8-152.1 161.6 23.9-164.1-112.5 137.7 6.9-165.0-123.7 138.3 78.4 -27.2-127.3 111.5 128.9 -46.6 50.4 45.0 118.8 109.3 84.7 -11.1 71.8-114.7-103.1 140.3 24.9-177.7 -74.1 127.5 X-CA 5.7 9.4 11.5 15.0 16.6 19.9 20.7 23.4 24.4 27.2 28.0 29.7 32.0 33.0 33.3 32.1 29.6 28.0 26.5 24.5 21.7 18.9 16.4 13.4 15.4 18.4 21.8 Y-CA 42.2 41.3 38.4 37.6 34.9 33.0 31.8 29.4 27.6 25.3 24.8 22.0 21.6 25.2 24.2 27.7 28.7 31.5 32.2 35.4 37.0 38.9 41.3 42.1 41.4 43.4 41.8 Z-CA 25.1 24.7 23.5 24.5 22.4 23.0 19.5 18.4 15.3 14.1 10.4 8.6 6.8 7.6 11.2 12.3 14.8 16.7 20.1 20.6 22.6 20.8 22.3 20.2 17.0 18.1 19.1 Prediction of protein secondary structure • What to predict? • How to predict? • How good are the best? Secondary Structure Prediction • What to predict? – All 8 types or pool types into groups DSSP * * * H = alpha helix G = 310 -helix I = 5 helix (pi helix) * * E = extended strand B = beta-bridge E * * * T = hydrogen bonded turn S = bend C = coil C H Secondary Structure Prediction • What to predict? – All 8 types or pool types into groups Straight HEC * H = alpha helix * E = extended strand H E * * * * * * T = hydrogen bonded turn S = bend C = coil G = 310-helix I = 5 helix (pi helix) B = beta-bridge C Secondary Structure Prediction • Simple alignments • Align to a close homolog for which the structure has been experimentally solved. • Heuristic Methods (e.g., Chou-Fasman, 1974) • Apply scores for each amino acid an sum up over a window. • Neural Networks (different inputs) • • • • Raw Sequence (late 80’s) Blosum matrix (e.g., PhD, early 90’s) Position specific alignment profiles (e.g., PsiPred, late 90’s) Multiple networks balloting, probability conversion, output expansion (Petersen et al., 2000). The pessimistic point of view Prediction by alignment HoMo 1D FoRc ….the art of being humble Secondary structure predictions of 1. and 2. generation • single residues (1. generation) – Chou-Fasman, GOR 50-55% accuracy • segments – GORIII 55-60% accuracy 1957-70/80 (2. generation) 1986-92 • problems – < 100% they said: 65% max – < 40% they said: strand non-local – short segments Improvement of accuracy 1974 Chou & Fasman 1978 Garnier 1987 Zvelebil 1988 Quian & Sejnowski 1993 Rost & Sander 1997 Frishman & Argos 1999 Cuff & Barton 1999 Jones 2000 Petersen et al. ~50-53% 63% 66% 64.3% 70.8-72.0% <75% 72.9% 76.5% 77.9% Simple Alignments • Solved structure of a homolog to query is needed • Homologous proteins have ~88% identical (3 state) secondary structure • If no close homologue can be identified alignments will give almost random results Amino acid preferences in a-Helix Amino acid preferences in b-Strand Amino acid preferences in coil Chou-Fasman Name Ala Arg Asp Asn Cys Glu Gln Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val P(a) 142 98 101 67 70 151 111 57 100 108 121 114 145 113 57 77 83 108 69 106 P(b) 83 93 54 89 119 37 110 75 87 160 130 74 105 138 55 75 119 137 147 170 P(turn) 66 95 146 156 119 74 98 156 95 47 59 101 60 60 152 143 96 96 114 50 f(i) 0.06 0.070 0.147 0.161 0.149 0.056 0.074 0.102 0.140 0.043 0.061 0.055 0.068 0.059 0.102 0.120 0.086 0.077 0.082 0.062 f(i+1) 0.076 0.106 0.110 0.083 0.050 0.060 0.098 0.085 0.047 0.034 0.025 0.115 0.082 0.041 0.301 0.139 0.108 0.013 0.065 0.048 f(i+2) 0.035 0.099 0.179 0.191 0.117 0.077 0.037 0.190 0.093 0.013 0.036 0.072 0.014 0.065 0.034 0.125 0.065 0.064 0.114 0.028 f(i+3) 0.058 0.085 0.081 0.091 0.128 0.064 0.098 0.152 0.054 0.056 0.070 0.095 0.055 0.065 0.068 0.106 0.079 0.167 0.125 0.053 Chou-Fasman 1. Assign all of the residues in the peptide the appropriate set of parameters. 2. Scan through the peptide and identify regions where 4 out of 6 contiguous residues have P(a-helix) > 100. That region is declared an alpha-helix. Extend the helix in both directions until a set of four contiguous residues that have an average P(a-helix) < 100 is reached. That is declared the end of the helix. If the segment defined by this procedure is longer than 5 residues and the average P(a-helix) > P(b-sheet) for that segment, the segment can be assigned as a helix. 3. Repeat this procedure to locate all of the helical regions in the sequence. 4. Scan through the peptide and identify a region where 3 out of 5 of the residues have a value of P(bsheet) > 100. That region is declared as a beta-sheet. Extend the sheet in both directions until a set of four contiguous residues that have an average P(b-sheet) < 100 is reached. That is declared the end of the beta-sheet. Any segment of the region located by this procedure is assigned as a beta-sheet if the average P(b-sheet) > 105 and the average P(b-sheet) > P(a-helix) for that region. 5. Any region containing overlapping alpha-helical and beta-sheet assignments are taken to be helical if the average P(a-helix) > P(b-sheet) for that region. It is a beta sheet if the average P(b-sheet) > P(ahelix) for that region. 6. To identify a bend at residue number j, calculate the following value: p(t) = f(j)f(j+1)f(j+2)f(j+3) where the f(j+1) value for the j+1 residue is used, the f(j+2) value for the j+2 residue is used and the f(j+3) value for the j+3 residue is used. If: (1) p(t) > 0.000075; (2) the average value for P(turn) > 1.00 in the tetra-peptide; and (3) the averages for the tetra-peptide obey the inequality P(a-helix) < P(turn) > P(b-sheet), then a beta-turn is predicted at that location. Chou-Fasman • General applicable • Works for sequences with no solved homologs • But the accuracy is low! – 50% Improvement of accuracy 1974 Chou & Fasman 1978 Garnier 1987 Zvelebil 1988 Quian & Sejnowski 1993 Rost & Sander 1997 Frishman & Argos 1999 Cuff & Barton 1999 Jones 2000 Petersen et al. ~50-53% 63% 66% 64.3% 70.8-72.0% <75% 72.9% 76.5% 77.9% PHD method (Rost and Sander) • Combine neural networks with sequence profiles – 6-8 Percentage points increase in prediction accuracy over standard neural networks (63% -> 71%) • Use second layer “Structure to structure” network to filter predictions • Jury of predictors • Set up as mail server Neural Networks • Benefits • General applicable • Can capture higher order correlations • Inputs other than sequence information • Drawbacks • Needs many data (different solved structures). • However, these does exist today (nearly 2500 solved structures with low sequence identity/high resolution.) • Complex method with several pitfalls How is it done • One network (SEQ2STR) takes sequence (profiles) as input and predicts secondary structure – Cannot deal with SS elements i.e. helices are normally formed by at least 5 consecutive aminoacids • Second network (STR2STR) takes predictions of first network and predicts secondary structure – Can correct for errors in SS elements, i.e remove single helix prediction, mixture of strand and helix predictions Architecture Weights Input Layer IK EE H VI HE C IQ AE Hidden Layer Window IKEEHVIIQAEFYLNPDQSGEF….. Output Layer Secondary networks (Structure-to-Structure) Weights Input Layer HE CH E CH EC Window HE C Hidden Layer IKEEHVIIQAEFYLNPDQSGEF….. Output Layer Example PITKEVEVEYLLRRLEE HHHHHHHHHHHHTGGG. ECCCHEEHHHHHHHCCC CCCCHHHHHHHHHHCCC (Sequence) (DSSP) (SEQ2STR) (STR2STR) Sequence profiles 1 fyn_human VTLFVALYDY yrk_chick VTLFIALYDY fgr_human VTLFIALYDY yes_chick VTVFVALYDY src_avis2 VTTFVALYDY src_aviss VTTFVALYDY src_avisr VTTFVALYDY src_chick VTTFVALYDY stk_hydat VTIFVALYDY src_rsvpa .......... hck_human ..IVVALYDY blk_mouse ..FVVALFDY hck_mouse .TIVVALYDY lyn_human ..IVVALYPY lck_human ..LVIALHSY ss81_yeast.....ALYPY abl_mouse ..LFVALYDF abl1_human..LFVALYDF src1_drome..VVVSLYDY mysd_dicdi.....ALYDF yfj4_yeast....VALYSF abl2_human..LFVALYDF tec_human .EIVVAMYDF abl1_caeel..LFVALYDF txk_human .....ALYDF yha2_yeastVRRVRALYDL abp1_sacex.....AEYDY EARTEDDLSF EARTEDDLSF EARTEDDLTF EARTTDDLSF ESRTETDLSF ESRTETDLSF ESRTETDLSF ESRTETDLSF EARISEDLSF ESRIETDLSF EAIHHEDLSF AAVNDRDLQV EAIHREDLSF DGIHPDDLSF EPSHDGDLGF DADDDdeISF VASGDNTLSI VASGDNTLSI KSRDESDLSF DAESSMELSF AGEESGDLPF VASGDNTLSI QAAEGHDLRL HGVGEEQLSL LPREPCNLAL TTNEPDELSF EAGEDNELTF HKGEKFQILN QKGEKFHIIN TKGEKFHILN KKGERFQIIN KKGERLQIVN KKGERLQIVN KKGERLQIVN KKGERLQIVN KKGERLQIIN KKRERLQIVN QKGDQMVVLE LKGEKLQVLR QKGDQMVVLE KKGEKMKVLE EKGEQLRILE EQNEILQVSD TKGEKLRVLG TKGEKLRVLG MKGDRMEVID KEGDILTVLD RKGDVITILK TKGEKLRVLG ERGQEYLILE RKGDQVRILG RRAEEYLILE RKGDVITVLE AENDKIINIE SSEGDWWEAR NTEGDWWEAR NTEGDWWEAR NTEGDWWEAR NTEGDWWLAH NTEGDWWLAH NTEGDWWLAH NTEGDWWLAH TADGDWWYAR NTEGTWWLAH ES.GEWWKAR .STGDWWLAR .EAGEWWKAR .EHGEWWKAK QS.GEWWKAQ .IEGRWWKAR YnnGEWCEAQ YnnGEWCEAQ DTESDWWRVV QSSGDWWDAE ksQNDWWTGR YNQNGEWSEV KNDVHWWRAR YNKNNEWCEA KYNPHWWKAR QVYRDWWKGA FVDDDWWLGE 50 SLTTGETGYI SLSSGATGYI SLSSGKTGCI SIATGKTGYI SLTTGQTGYI SLTTGQTGYI SLTTGQTGYI SLTTGQTGYI SLITNSEGYI SLTTGQTGYI SLATRKEGYI SLVTGREGYV SLATKKEGYI SLLTKKEGFI SLTTGQEGFI R.ANGETGII ..TKNGQGWV ..TKNGQGWV NLTTRQEGLI L..KGRRGKV V..NGREGIF RSKNG.QGWV D.KYGNEGYI RlrLGEIGWV D.RLGNEGLI L..RGNMGIF LETTGQKGLF P roteinAlignments profile table : G Y I Y : : : GGG YYY I I E YYY : G Y E Y GSAPD 5. . . . . . . . . . . . . . . . . . . NTEKQ . . . . . . . . . . ..2.. . . . . . CVHIR . . . . . . . . . . . . .3. . . . . . D P E D G D P D D G V N P DDD P P P AEA VVE GGG DDD P P P DTD NQN GNG V I V E PK P P P D P A E G D P D N G V K P . . ..5 . . .5. ..3.. . . ..1 5.... . . ..5 . . .5. . . ..4 . . ..1 4.... . . . . . . . . 1. . . .5. . . . . . . . . . . .. 2.. ..2.. . . . . . . . . . . . . . . . .1... 3.. . 1 1 .... . . . . . 1.12. . . . . . . . . . . . . . . . . . . . . .2.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G T D F : GGG TTT EKS F F F : : : G T A F : 5. . . . . . . . . . 11.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 . . . . . . . . 5... . 1 1. . . . . . . . . . . . . . . . . . . . . LMYFW . . . . . . . 5 . . . . . . . . . 5 . . corresponds to the the 21*3 bits coding for the profile of one residue Slide courtesy by B. Rost 2004 > > L> s0 input layer 1 J 2 J s1 first or hidden layer s2 second or output layer pick maximal unit => current prediction PHDsec input local in sequence local alignment 13 adjacent residues global statist. whole protein ::: AAA AA. LLL LII AAG CCS GVV ::: %AA Length ² N-term ² C-term A C L I G S V ins del cons 100 0 0 0 0 0 0 0 0 1.17 100 0 0 0 0 0 0 33 0 0.42 0 0 100 0 0 0 0 0 33 0.92 0 0 33 66 0 0 0 0 0 0.74 66 0 0 0 33 0 0 0 0 1.17 0 66 0 0 0 33 0 0 0 0.74 0 0 0 33 0 0 66 0 0 0.48 input global in sequence percentage of each amino acid in protein length of protein (Š60, Š120, Š240, >240) distance: centre, N-term (Š40,Š30,Š20,Š10) distance: centre, C-term (Š40,Š30,Š20,Š10) input layer 21+3 " " " " " " hidden layer output layer H E 20 4 4 4 L first level sequence-tostructure Slide courtesy by B. Rost 2004 4+1 " " " " " " 20 4 4 4 H 0.5 E 0.1 L 0.4 second level structure-tostructure Prediction accuracy PHD 70 <Q3>=72.3% ; sigma=10.5% 50 40 30 1bct 1stu 10 3ifm 1psm 20 1spf Number of protein chains 60 0 0 10 20 Slide courtesy by B. Rost 2004 30 40 50 60 Per-residue accuracy (Q 70 3 ) 80 90 100 Stronger predictions more accurate! 70 <Q3>=72.3% ; sigma=10.5% 50 40 30 1bct 1stu 10 3ifm 1psm 20 1spf 0 0 10 20 30 40 50 60 70 Per-residue accuracy (Q . 3 ) 80 90 100 Q3 per protein fit: Q3 fit = 21 + 8.7 * Q 3 100 100 80 80 60 60 40 40 20 20 0 0 Q3 per protein Number of protein chains 60 3 4 5 6 7 8 9 Reliability index averaged over protein PSI-Pred (Jones) • Use alignments from iterative sequence searches (PSI-Blast) as input to a neural network (Just like PHDsec) • Better predictions due to better sequence profiles • Available as stand alone program and via the web Petersen et al. 2000 • SEQ2STR (>70 networks) – Not one single network architecture is best for all sequences • STR2STR (>70 network) • => 4900 network predictions, – (wisdom of the crowd!!!) – Others have 1 Why so many networks? Why not select the best? Prediction accuracy (Q3=81.2%). 2006. (Petersen et al. 2000) Spectrin homology domain (SH3) HEADER COMPND SOURCE AUTHOR CYTOSKELETON ALPHA SPECTRIN (SH3 DOMAIN) CHICKEN (GALLUS GALLUS) BRAIN M.NOBLE,R.PAUPTIT,A.MUSACCHIO,M.SARASTE 59% 65% 72% CEEEEEEECCCCCCCCCCCCCCCCEEEEEECCCCCEEEEEECCCEEEECCCCCEECC .EEEEESS.B...STTB..B.TT.EEEEEE..SSSEEEEEETTEEEEEEGGGEEE.. 93% Benchmarking secondary structure predictions • CASP – Critical Assessment of Structure Predictions – Sequences from about-to-be-deposited-structures are given to groups who submit their predictions before the structure is published – Every 2. year • EVA – Newly solved structures are send to prediction servers. – Every week EVA results (Rost et al., 2001) • • • • • • PROFphd PSIPRED SAM-T99sec SSpro Jpred2 PHD 77.0% 76.8% 76.1% 76.0% 75.5% 71.7% – Cubic.columbia.edu/eva EVA: secondary structure 76% Method B Q3 C PROF P SIPRED SSpro 76.0 76.0 76.0 JP red2 PHDpsi 75.0 75.0 PHD Q3 Claim D SOV E Info F CorrH G CorrE H CorrL I Class K BAD L 72 72 71 0.35 0.36 0.35 0.67 0.65 0.67 0.63 0.62 0.63 0.55 0.55 0.56 82 78 83 2.7 2.8 2.8 76.4 69 71 0.34 0.33 0.65 0.65 0.60 0.60 0.54 0.54 76 81 2.6 3.0 71.4 71.6 68 0.28 0.59 0.58 0.49 77 4.3 Copenhagen 78 N 77.8 76.5-78.3 76 M 53 O Wang/Yuan Petersen et al. Proteins 2000 Prediction of protein secondary structure • • • • • • 1980: 55% 1990: 60% 1993: 70% 2000: 76% 2006: 80% 2008: >80% simple less simple evolution more evolution more evolution more evolution Links to servers • Database of links http://mmtsb.scripps.edu/cgi bin/renderrelres?protmodel • ProfPHD http://www.predictprotein.org/ • PSIPRED http://bioinf.cs.ucl.ac.uk/psipred/ • JPred http://www.compbio.dundee.ac.uk/~www-jpred/ Conclusions • The big break through in SS prediction came due to sequence profiles – Rost et al. • Prediction of secondary structure has not changed in the last 5 years – More protein sequences => higher prediction accuracy – No new theoretical break through • Accuracy is close to 80% for globular proteins • If you need a secondary structure prediction use one of profile based: – ProfPHD, PSIPRED, and JPred • And not one of the older ones such as : – Chou-Fasman – Garnier