Statistical analysis of DNA microarray data

Download Report

Transcript Statistical analysis of DNA microarray data

Protein Structure, Classification
and Prediction
BMI 730
Victor Jin
Department of Biomedical Informatics
Ohio State University
Protein Structure
Protein Structure Determination
Protein Structure Classification
- SCOP
- CATH
Secondary Structure Predication
Tertiary Prediction
Structure Prediction Evaluation
- CASP
Protein Structure
Protein Structure Determination
Protein Structure Classification
- SCOP
- CATH
Secondary Structure Predication
Tertiary Prediction
Structure Prediction Evaluation
- CASP
Chemistry
Proteins are linear hetero-polymers of amino acids
twenty different amino acids (building blocks)
3-letter code
VAL
ARG
LYS
ILE
GLU
PRO
ARG
GLU
1-letter code
V
R
K
I
E
P
R
E
Peptide bond
Double bond character of
the peptide bond
Peptide
Polypeptide
Protein
The peptide bond is planar
~ 2-10 amino acids
~ 10-50 amino acids
~ 50- amino acids
2 angles freely rotatable
1 is fixed
http://www.imb-jena.de/~rake/Bioinformatics_WEB/basics_peptide_bond.html
Amino acids
Side chain properties
 Size
 Charge
 Polarity
http://www.ch.cam.ac.uk/SGTL/Structures/amino/
Hierarchical nature of protein structure
Primary structure (Amino acid sequence)
↓
Secondary structure (local conformations: α-helix, β-sheet, and
reverse turn and loop)
↓
Tertiary structure (Global conformations: a three-dimensional
structure resulted from folding together secondary structures)
↓
Quaternary structure (Structure formed by more than one
polypeptide chains)
Basic structural units of proteins:
Secondary structure
α-helix
β-sheet
Secondary structures, α-helix
and β-sheet, have regular
hydrogen-bonding patterns.
Tertiary structure
 In globular proteins such as enzymes,
the long chain of amino acids becomes
folded into a three-dimensional functional
shape or tertiary structure. This is because
certain amino acids with sulfhydryl or SH
groups form disulfide (S-S) bonds with
other amino acids in the same chain. Other
interactions between R groups of amino
acids such as hydrogen bonds, ionic
bonds, covalent bonds, and hydrophobic
interactions also contribute to the tertiary
structure
A few examples of tertiary structure
Dihydrofolate reductase
Myoglobin
Quaternary structure
 non-covalent interactions that bind
multiple polypeptides into a single,
larger protein. Hemoglobin has
quaternary structure due to
association of two alpha globin and
two beta globin polyproteins.
Structure Stabilizing Interactions
Non-covalent
 Van der Waals forces (transient, weak electrical attraction
of one atom for another)
 Hydrophobic (clustering of nonpolar groups)
 Hydrogen bonding
Covalent
 Disulfide bonds
Protein Structure
Protein Structure Determination
Protein Structure Classification
- SCOP
- CATH
Secondary Structure Predication
Tertiary Prediction
Structure Prediction Evaluation
- CASP
Protein structure determination
 Protein expression
 membrane proteins
 aggregation
 X-Ray crystallography
 NMR (nuclear magnetic resonance)
 Cryo-EM (electron microscopy)
Protein Structure
Protein Structure Determination
Protein Structure Classification
- SCOP
- CATH
Secondary Structure Predication
Tertiary Prediction
Structure Prediction Evaluation
- CASP
Protein Structure Classification - SCOP
• Structure Classification Of Proteins database
• http://scop.mrc-lmb.cam.ac.uk/scop/
• Hierarchical Clustering
• Family – clear evolutionarily relationship
• Superfamily – probable common evolutionary origin
• Fold – major structural similarity
• Boundaries between levels are more or less
subjective
• Conservative evolutionary classification leads to
many new divisions at the family and superfamily
levels, therefore it is recommended to first focus
on higher levels in the classification tree.
Protein Structure Classification - SCOP
• a/a
Protein Structure Classification - SCOP
• b/b
Protein Structure Classification - SCOP
• a/b
Protein Structure Classification - SCOP
• a+b
Protein Structure Classification - SCOP
• Misc
HIV Protease complexed with pepstatin
T-Cell-receptor/MHC/CD8 complex
Protein Structure Classification - SCOP
Scop Classification Statistics
SCOP: Structural Classification of Proteins. 1.69 release
25973 PDB Entries (1 Oct 2004). 70859 Domains. 1 Literature Reference
(excluding nucleic acids and theoretical models)
Number of folds
Number of
superfamilies
Number of
families
All alpha proteins
218
376
608
All beta proteins
144
290
560
Alpha and beta proteins
(a/b)
136
222
629
Alpha and beta proteins
(a+b)
279
409
717
Multi-domain proteins
46
46
61
Membrane and cell
surface proteins
47
88
99
Small proteins
75
108
171
945
1539
2845
Class
Total
Protein Structure Classification - SCOP
Protein Structure Classification - SCOP
Protein Structure Classification - SCOP
Protein Structure Classification - SCOP
Protein Structure Classification - SCOP
Protein Structure Classification - CATH
• CATH Protein Structure Classification
• http://www.cathdb.info/latest/index.html
• CATH is a hierarchical classification of protein domain structures, which
clusters proteins at four major levels, Class(C), Architecture(A), Topology(T)
and Homologous superfamily (H).
• Class, derived from secondary structure content, is assigned for
more than 90% of protein structures automatically.
• Architecture, which describes the gross orientation of secondary
structures, independent of connectivities, is currently assigned
manually.
• The topology level clusters structures into fold groups according
to their topological connections and numbers of secondary
structures.
• The homologous superfamilies cluster proteins with highly
similar structures and functions. The assignments of structures
to fold groups and homologous superfamilies are made by
sequence and structure comparisons.
Protein Structure Classification - CATH
http://www.cathdb.info/cgi-bin/cath/GotoCath.pl?link=cath_info.html
Only crystal structures solved to resolution better than 4.0 angstroms
are considered, together with NMR structures. All non-proteins,
models, and structures with greater than 30% "C-alpha only" are
excluded from CATH
The boundaries and assignments for each protein domain are
determined using a combination of automated and manual
procedures. These include computational techniques, empirical and
statistical evidence, literature review and expert analysis.
Domains within each H-level are subclustered into sequence families
using multi-linkage clustering at the following levels:
Name
Sequence Identity Overlap
S
35%
80%
O
60%
80%
L
95%
80%
I
100%
80%
Level
Protein Structure Classification - CATH
Protein Structure Classification - CATH
Protein Structure Classification - CATH
CATH vs. SCOP
Protein Structure
Protein Structure Determination
Protein Structure Classification
- SCOP
- CATH
Secondary Structure Predication
Tertiary Prediction
Structure Prediction Evaluation
- CASP
Secondary Structure Prediction
AGADIR - An algorithm to predict the helical content of peptides
APSSP - Advanced Protein Secondary Structure Prediction Server
GOR - Garnier et al, 1996
HNN - Hierarchical Neural Network method (Guermeur, 1997)
Jpred - A consensus method for protein secondary structure prediction
at University of Dundee
JUFO - Protein secondary structure prediction from sequence (neural
network)
nnPredict - University of California at San Francisco (UCSF)
Porter - University College Dublin
PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader,
MaxHom, EvalSec from Columbia University
Prof - Cascaded Multiple Classifiers for Secondary Structure Prediction
PSA - BioMolecular Engineering Research Center (BMERC) / Boston
PSIpred - Various protein structure prediction methods at Brunel
University
SOPMA - Geourjon and Deléage, 1995
SSpro - Secondary structure prediction using bidirectional recurrent
neural networks at University of California
DLP - Domain linker prediction at RIKEN
http://us.expasy.org/tools/#secondary
Determining the Residue Environment
 Six basic environment classes (E, P1, P2, B1, B2 and B3)
The environment of each residue in the three-dimensional structure is
first classified according to the area of the side chain that is buried in
the protein.
---- A residue is considered exposed to solvent (environment class E) if
the area buried is less than 40 Å2.
---- It is considered partially buried (class P) if the area buried is
between 40 and 114 Å2.
---- It is considered buried (class B) if the area buried is greater than
114 Å2.
 The buried and partially buried classes are further subdivided
according to the fraction of the side chain area that is exposed to polar
atoms ("fraction polar", denoted f).
---- For this purpose polar atoms are defined as those of the solvent
and the oxygen and nitrogen atoms of the protein.
---- The buried class is subdivided into classes B1 (f < 0.45), B2 (0.45
<= f < 0.58) and B3 (f >= 0.58).
---- The partially buried class is subdivided into classes P1 (f < 0.67)
and P2 (f >= 0.67).
Structural
environments
Sequence residue and predicted secondary structure classes
rcC
rc
H
rc
S
rw
C
rw
H
rw
S
rb
C
rb
H
rb
S
ra
C
ra
H
raS
rh
C
rh
H
rh
S
rs
C
rs
H
rs
S
rp
C
rp
H
rp
S
rcC_E
3.3
2.
4
0.8
0.5
−9.
0
−9.
0
−0.
6
−9.
0
−1.
2
−0.
1
−1.
5
−0.
8
−0.
1
−2.
1
−1.
0
0.1
−2.
3
−0.
5
0.6
−1.
9
−0.
9
rcC_B
3.7
−9
.0
−9.
0
−9.
0
−9.
0
−9.
0
−0.
7
−9.
0
0.1
0.2
−9.
0
−9.
0
0.7
−0.
9
0.0
0.1
−9.
0
−1.
2
0.1
−9.
0
−9.
0
rcH_E
1.7
3.
1
−9.
0
1.2
1.3
−9.
0
−9.
0
1.4
−9.
0
−0.
3
1.0
−9.
0
−1.
1
1.0
−9.
0
−1.
5
0.7
−9.
0
−9.
0
0.8
−9.
0
rcH_B
2.5
3.
7
−9.
0
−9.
0
−9.
0
−9.
0
−9.
0
−0.
5
−9.
0
−9.
0
0.0
−9.
0
−1.
1
1.3
−9.
0
−2.
1
0.9
−9.
0
−9.
0
0.0
−9.
0
rcS_E
0.4
−9
.0
3.9
−9.
0
−9.
0
1.5
−1.
2
−9.
0
1.5
−0.
2
−0.
7
1.6
−1.
1
−2.
0
0.6
−0.
5
−9.
0
0.8
−0.
8
−9.
0
1.5
rcS_B
0.7
−9
.0
4.0
−9.
0
−9.
0
−9.
0
−0.
2
−9.
0
0.9
−0.
7
−9.
0
−0.
5
−9.
0
−1.
8
1.0
−0.
9
−9.
0
1.0
0.0
−9.
0
1.3
Secondary Structure Prediction - HNN
• http://npsa-pbil.ibcp.fr/cgi-bin/secpred_hnn.pl
• >gi|78099986|sp|P0ABK2|CYDB_ECOLI Cytochrome d ubiquinol oxidase subunit 2
(Cytochrome d ubiquinol oxidase subunit II) (Cytochrome bd-I oxidase
subunit II)
MIDYEVLRFIWWLLVGVLLIGFAVTDGFDMGVGMLTRFLGRNDTERRIMINSIAPHWDGNQVWLITAGGA
LFAAWPMVYAAAFSGFYVAMILVLASLFFRPVGFDYRSKIEETRWRNMWDWGIFIGSFVPPLVIGVAFGN
LLQGVPFNVDEYLRLYYTGNFFQLLNPFGLLAGVVSVGMIITQGATYLQMRTVGELHLRTRATAQVAALV
TLVCFALAGVWVMYGIDGYVVKSTMDHYAASNPLNKEVVREAGAWLVNFNNTPILWAIPALGVVLPLLTI
LTARMDKAAWAFVFSSLTLACIILTAGIAMFPFVMPSSTMMNASLTMWDATSSQLTLNVMTWVAVVLVPIILLY
TAWCYWKMFGRITKEDIERNTHSLY
Secondary Structure Prediction - HNN
Sequence length : 379
HNN :
Alpha helix (Hh) : 209 is 55.15%
310 helix (Gg) : 0 is 0.00%
Pi helix (Ii) : 0 is 0.00%
Beta bridge (Bb) : 0 is 0.00%
Extended strand (Ee) : 55 is 14.51%
Beta turn (Tt) : 0 is 0.00%
Bend region (Ss) : 0 is 0.00%
Random coil (Cc) : 115 is 30.34%
Ambiguous states (?) : 0 is 0.00%
Other states : 0 is 0.00%
10
20
30
40
50
60
70
|
|
|
|
|
|
|
MIDYEVLRFIWWLLVGVLLIGFAVTDGFDMGVGMLTRFLGRNDTERRIMINSIAPHWDGNQVWLITAGGA
ccchhhhhhhhhhhhhhheeeeehccchhcchhhhhheecccccceeeeeeccccccccceeeeeeccch
LFAAWPMVYAAAFSGFYVAMILVLASLFFRPVGFDYRSKIEETRWRNMWDWGIFIGSFVPPLVIGVAFGN
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccchhhhhhhhhhcceeehccchccheehhhhhc
LLQGVPFNVDEYLRLYYTGNFFQLLNPFGLLAGVVSVGMIITQGATYLQMRTVGELHLRTRATAQVAALV
hhcccccchhhhheeeeccchhhhhcchceccceeeeeeeeeccchhhhhhhchhhhhhchhhhhhhhhh
TLVCFALAGVWVMYGIDGYVVKSTMDHYAASNPLNKEVVREAGAWLVNFNNTPILWAIPALGVVLPLLTI
hhhhhhccceeeeeeccceeeeeccccccccccchhhhhhhhhhhheeccccceeeeccchhhhhhhhhh
LTARMDKAAWAFVFSSLTLACIILTAGIAMFPFVMPSSTMMNASLTMWDATSSQLTLNVMTWVAVVLVPI
hhhhhhhhhhhhhhhhhhhhhhhhhcchhhcccccccchhhccccchhcccchhhhhhhhhhhhhhhhhh
ILLYTAWCYWKMFGRITKEDIERNTHSLY
hhhhhhhhhhhhhhhcchhhhhhhccccc
Secondary Structure Prediction - HNN
Secondary Structure Prediction - PHD
•PHDsec predicts secondary structure from multiple sequence
alignments. Secondary structure is predicted by a system of
neural networks rating at an expected average accuracy > 72%
for the three states helix, strand and loop (Rost & Sander, PNAS,
1993 , 90, 7558-7562; Rost & Sander, JMB, 1993 , 232, 584-599;
and Rost & Sander, Proteins, 1994 , 19, 55-72).
•Evaluated on the same data set, PHDsec is rated at ten
percentage points higher three-state accuracy than methods
using only single sequence information, and at more than six
percentage points higher than, e.g., a method using alignment
information based on statistics (Levin, Pascarella, Argos &
Garnier, Prot. Engng., 6, 849-54, 1993).
•PHDsec predictions have three main features:
• improved accuracy through evolutionary information from multiple
sequence alignments
• improved beta-strand prediction through a balanced training procedure
• more accurate prediction of secondary structure segments by using a
multi-level system
Secondary Structure Prediction - PHD
Rost
• B, Sander C. Prediction of protein secondary structure at better than 70%
accuracy. J. Mol. Bio. 1993
Motifs Readily Identified from Sequence
• Zinc Finger - order and spacing of a pattern for cysteine and
histidine.
• Leucine zippers – two antiparallel alpha helices held together by
interactions between hybrophobic leucine residues at every
seventh position in each helix.
• Coiled coils – 2-3 helices coiled around each other in a lefthanded supercoil (3.5 residue/turn instead of 3.6 – 7/two
turns); first and fourth are always hydrophobic, others
hydrophilic; 5-10 heptads.
• Transmembrane-spanning proteins – alpha helices comprising
amino acids with hydrophobic side chains, typically 20-30
residues.
Protein Structure
Protein Structure Determination
Protein Structure Classification
- SCOP
- CATH
Secondary Structure Predication
Tertiary Prediction
Structure Prediction Evaluation
- CASP
Tertiary Structure Prediction
Comparative modeling
SWISS-MODEL - An automated knowledge-based protein modelling server
3Djigsaw - Three-dimensional models for proteins based on homologues of
known structure
CPHmodels - Automated neural-network based protein modelling server
ESyPred3D - Automated homology modeling program using neural networks
Geno3d - Automatic modeling of protein three-dimensional structure
SDSC1 - Protein Structure Homology Modeling Server
Threading
3D-PSSM - Protein fold recognition using 1D and 3D sequence profiles
coupled with secondary structure information (Foldfit)
Fugue - Sequence-structure homology recognition
HHpred - Protein homology detection and structure prediction by HMM-HMM
comparison
Libellula - Neural network approach to evaluate fold recognition results
LOOPP - Sequence to sequence, sequence to structure, and structure to
structure alignment
SAM-T02 - HMM-based Protein Structure Prediction
Threader - Protein fold recognition
ProSup - Protein structure superimposition
SWEET - Constructing 3D models of saccharides from their sequences
Ab initio
HMMSTR/Rosetta - Prediction of protein structure from sequence
http://us.expasy.org/tools
Tertiary Structure Prediction – Comparative
Modeling
Example: 3Djigsaw Three-dimensional models
for proteins based on
homologues of known
structure
Contreras-Moreira,B., Bates,P.A. (2002)
Domain Fishing: a first step in protein
comparative modelling. Bioinformatics
18: 1141-1142.
3D Protein Sequence Profiles
 A 3D profile is based on a 3D structure-specific scoring matrix
 A 3D scoring matrix is similar to the 1D scoring matrices we
discussed in the multiple sequence alignment lectures, with the
additional attribute of the structural environment of the amino
acid side chain
 There are 6 basic environment classes (E, P1, P2, B1, B2 and
B3), differing in the area of the side chain that is buried, and by
the fraction of the side chain that is exposed to polar atoms
 Since amino acids can assume 3 different secondary
structures, there are 3 x 6 = 18 different environmental classes
 The log odds of each amino acid in each environment type
gives the values for the 3D-1D scoring matrix -- calculated from
database of protein structures
Using 3D Profiles in Structure Prediction
 The alignment of an amino acid sequence with a 3D profile yields
an overall 3D-1D score. The 3D-1D score is a measure of the
compatibility of the sequence with the structure described by the
profile
 Given a amino acid sequence, find compatible structures
---- Useful for finding homologous structures when doing
homology modeling
 Given a preliminary or model structure, test its validity
--- Useful for the final phase of homology modeling
 Given a structure, find compatible sequences
---- Useful for analyzing evolutionary relationships among proteins
Homology Modeling
 Definition: Predicting the tertiary structure of an unknown protein
using a known 3D structure of protein(s) with homologous sequence
 Based on assumption that structure is more conserved than
sequence
 Important to use homologous proteins whose structures were
determined by X-ray crystallography or NMR
 Homology modeling is an important method since the number of
different protein folds (unique structures) is much smaller than the
number of different proteins
 Likely that homologous protein sequences will share a common
protein fold
Some of the material from this section is from:
http://www.cs.wright.edu/~mraymer/cs790/Homology_Modeling.ppt
Homology Modeling Procedure
 Search databases for homologous protein sequences
The Protein Data Bank (PDB) is a good choice, since all of the
sequences contained in PDB have solved 3D structures
 Align homologous protein sequence with the sequence of interest
---- Pair-wise or Multiple Sequence Alignment can be used
 Build a model of the structure of the protein of interest using the
known structures of homologous proteins. Possible methods
include:
1. Modeling by rigid body assembly
2. Modeling by segment matching or coordinate reconstruction
3. Modeling by satisfaction of spatial constraints
Evaluate and refine model structure
Tertiary Structure Prediction
Threading
3D-PSSM - Protein fold recognition using 1D and 3D sequence profiles
coupled with secondary structure information (Foldfit)
Fugue - Sequence-structure homology recognition
HHpred - Protein homology detection and structure prediction by HMM-HMM
comparison
Libellula - Neural network approach to evaluate fold recognition results
LOOPP - Sequence to sequence, sequence to structure, and structure to
structure alignment
SAM-T02 - HMM-based Protein Structure Prediction
Threader - Protein fold recognition
ProSup - Protein structure superimposition
SWEET - Constructing 3D models of saccharides from their sequences
Tertiary Structure Prediction - Threading
• First coined by Jones, Taylor and Thornton in 1992. Originally
for fold recognition.
• Today, the terms threading and fold recognition are frequently
(though somewhat incorrectly) used interchangeably.
• The basic idea is that the target sequence (structure to be
predicted) is threaded through the backbone structures of
template proteins (known as the fold library) and a “goodness
of fit” scores are calculated (usually derived in terms of an
empirical energy function).
• Threading methods share some of the characteristics of both
comparative modelling methods (the sequence alignment
aspect) and ab initio prediction methods (predicting structure
based on identifying low-energy conformations of the target
protein).
http://en.wikipedia.org/wiki/Threading_%28protein_sequence%29
Protein Threading
 Generalization of homology modeling method
---- Homology Modeling: Align sequence to sequence
---- Threading: Align sequence to structure (templates)
 Rationale:
---- Limited number of basic folds found in nature
---- Amino acid preferences for different structural
environments provides sufficient information to choose the
best-fitting protein fold (structure)
Tertiary Structure Prediction
Ab initio (de novo)
• From scratch – using physical property instead of known
structures
• Mimic folding process – minimize certain energy function,
stochastic modeling (e.g., simulated annealing)
• Computationally expensive – requires large clusters, large
machines (e.g., IBM BlueGene) or distributed computing,
currently only work for small peptides
• Big potential in the future – understand the dynamics,
accuracy, and applications in drug development
Tertiary Structure Prediction
Ab initio (de novo)
Prediction Scoring with Rosetta
Rosetta uses a scoring function to judge different
conformations. The process consists of making
'moves' (changing the bond angles of a particular
group of amino acids) and then scoring the new
conformation.
The Rosetta score is a weighted sum of component
scores, where each component score is judging a
different aspect of protein structure.
Environment score: Here, hydrophobic residues as
represented as orange stars, so the left
conformation is good (all the hydrophobics
together) while the rightmost conformation is bad
(with the hydrophobic amino acids not touching).
Pair-score: Two conformations of a polypeptide are
shown, one (top) where the chain is folded back on
itself bringing two cysteins together (yellow+yellow
= possible disulphide bond) and forming a saltbridge (blue+red = opposites attract). The
conformation at bottom does not make these
pairings and the pair-score would, thus, favor the
top conformation.
http://www.grid.org/projects/hpf/howitworks_scoring.htm
Protein Structure
Protein Structure Determination
Protein Structure Classification
- SCOP
- CATH
Secondary Structure Predication
Tertiary Prediction
Structure Prediction Evaluation
- CASP
Evaluation - CASP
CASP - Critical Assessment of Techniques for Protein Structure Prediction, is a
community-wide experiment (though it is commonly referred to as a
competition) for protein structure prediction taking place every two years
since 1994. (http://predictioncenter.org/)
The main goal of CASP is to obtain an in-depth and objective assessment of
our current abilities and inabilities in the area of protein structure
prediction. To this end, participants will predict as much as possible about
a set of soon to be known structures. These will be true predictions, not
‘post-dictions’ made on already known structures. CASP7 will particularly
address the following questions:
1. Are the models produced similar to the corresponding experimental
structure?
2. Is the mapping of the target sequence onto the proposed structure (i.e. the
alignment) correct?
3. Have similar structures that a model can be based on been identified?
4. Are comparative models more accurate than can be obtained by simply
copying the best template?
5. Has there been progress from the earlier CASPs?
6. What methods are most effective?
7. Where can future effort be most productively focused?
Evaluation - CASP
Evaluation of the results is carried out in the following prediction categories:
• tertiary structure prediction (all CASPs)
• secondary structure prediction (dropped after CASP5)
• prediction of structure complexes (CASP2 only; a separate experiment CAPRI - carries on this subject)
• residue-residue contact prediction (starting CASP4)
• disordered regions prediction (starting CASP5)
• domain boundary prediction (starting CASP6)
• function prediction (starting CASP6)
• model quality assessment (starting CASP7)
• model refinement (starting CASP7)
Tertiary structure prediction category was further subdivided into
• homology modelling
• fold recognition (also called protein threading; Note, this is incorrect as
threading is a method)
• de novo structure prediction Now referred to as 'New Fold' as many
methods apply evaluation, or scoring, functions that are biased by
knowledge of native protein structures, such an example would be an
artificial neural network.
Evaluation - CASP
Number of human expert groups registered
207
Number of targets released
104
Number of prediction servers registered
98
Targets canceled
4
Valid targets
100
Refinement targets
9
Number of groups
contributing
Number of models
designated as 1
Total number of
models
180
12393
48339
Alignments to PDB
structures
15
966
3896
Residue-residue
contacts
17
1473
1561
Structural domains
assignments
27
2258
2515
Disordered regions
19
1801
1801
Function prediction
22
1317
1930
Quality assessment
29
2326
3228
Model refinement
26
136
447
255 (unique)
22670
63717
Prediction format
3D coordinates
All
Summary
 Proteins are key players in our living systems.
 Proteins are polymers consisting of 20 kinds of
amino acids.
 Each protein folds into a unique three-dimensional
structure defined by its amino acid sequence.
 Protein structure has a hierarchical nature.
 Protein structure prediction is a grand challenge of
computational biology.