Transcript Slide 1

Protein Structure Prediction
•
•
•
•
What’s the big deal?
Why is it important?
Who is working on it
Different methods
Methods;
• Comparative modeling
• Fold recognition or threading
• ab initio folding
• Genetic algorthms
Why do we need structure prediction?
3D structure give clues to function:
• active sites, binding sites, conformational changes...
• structure and function conserved more than sequence
3D structure determination is difficult, slow and expensive
Intellectual challenge, Nobel prizes etc...
Engineering new proteins
IEEE Computer July 2002 page 27
Computational Biology’s Holy Grail
“When asked what the Holy Grail of computational biology is,
most researchers would answer that it is either
sequence-structure-function prediction or
Computing the genotype-phenotype map.”
Comparison of Different methods
Structure prediction
Summary of the four main approaches to structure prediction. Note that
there are overlaps between nearly all categories.
Method
Knowledge
Comparative
Modelling
(Homology
modelling)
Proteins of
known
structure
Fold
Recognition
Proteins of
known
Structure
Approach
Difficulty
Identify related
Relatively
structure with
easy
sequence methods,
copy 3D coords and
modify where
necessary
Same as above, but
use more
sophisticated
methods to find
related structure
Medium
Usefulness
Very, if
sequence
identity
> 40% drug
design
Limited due
to poor
models
Comparison of Different methods
Structure prediction
Summary of the four main approaches to structure prediction. Note that
there are overlaps between nearly all categories.
Method
Knowledge
Approach
Difficulty
Secondary
structure
Prediction
Sequencestructure
statistics
Forget 3D
Medium
arrangement and
predict where the
helices/strands are
ab initio
Tertiary
Structure
Prediction
Energy
functions,
statistics
Simulate folding, or Very hard
generate lots of
structures and try to
pick the correct one
Usefulness
Can improve
alignments,
fold
recognition,
ab initio
Not really
Protein Structure Prediction
Instrumentation methods for determining a
proteins structure
• X-ray crystallography
• NMR spectroscopy
A Guide to Structure Prediction (version 2.1)
EMBL
Meyerhofstrasse, 1
D-69117 Heidelberg
Germany
speedy.embl-heidelberg.de/gtsp/
Experimental Data
Much experimental data can aid the structure prediction
process. Some of these are:
• Disulphide bonds, which provide tight restraints on the location
of cysteines in space.
• Spectroscopic data, which can give you an idea as to the
secondary structure content of your protein.
• Site directed mutagenesis studies, which can give insights as
to residues involved in active or binding sites.
• Knowledge of proteolytic cleavage sites, post-translational
modifications, such as phosphorylation or glycosylation
can suggest residues that must be accessible.
Remember to keep all of the available data in mind when
doing predictive work. Always ask yourself whether a
prediction agrees with the results of experiments. If not,
then it may be necessary to modify what you've done.
The PSA Protein Structure Prediction Server
bmerc-www.bu.edu/psa/
The Protein Sequence Analysis (PSA) server predicts
probable secondary structures and folding classes for
a given amino acid sequence.
Used for proteins of unknown structure and for which no
homologous sequences are known
Developed at:
The BioMolecular Engineering Research Center
(BMERC) of Boston University in Boston, Massachusetts,
and TASC, Inc. in Reading, Massachusetts.
Email or webpage submissions
Return data in PDF or PS format
NNPREDICT
Protein Secondary Structure Prediction
www.cmpharm.ucsf.edu/~nomi/nnpredict.html
nnpredict is a program that predicts the secondary structure
type for each residue in an amino acid sequence. The basis
of the prediction is a two-layer, feed-forward neural network.
The network weights were determined by a separate program
-- a modification of the Parallel Distributed Programming suite
of McClelland and Rumelhart (1).
Input a sequence consisting of one-letter amino acid codes
(A C D E F G H I K L M N P Q R S T V W Y)
(NOTE: B and Z are not recognized as valid amino acid codes)
or three-letter amino acid codes separated by spaces
(ALA CYS ASP GLU PHE GLY HIS ILE LYS LEU MET ASN
PRO GLN ARG SER THR VAL TRP TYR).
Other Sources of Protein Structure Prediction
123d.ncifcrf.gov/sarf2.html
Common SARFs in protein structures
SARF stands for Spatial ARangement of backbone Fragments.
Small alpha helix 1aca
Submit
123d.ncifcrf.gov/run123D+.html
http://www.sbg.bio.ic.ac.uk/~3dpssm/
A Fast, Web-based Method for Protein Fold Recognition
using 1D and 3D Sequence Profiles coupled with Secondary
Structure and Solvation Potential Information.
UCSC HMM Applications
2GLIA. Chain A, Five-Finger [gi:2392684]
TITLE Crystal structure of a five-finger GLI-DNA
complex: new perspectives on zinc fingers
SOURCE
Homo sapiens (human)
See graphics (Secondary structure of 2gli) and 2GLIx500
Amino acid sequence:
>vyetdcrwdgcsqefdsqeqlvhhinsehihgerkefvchwggcsrelrpfk
aqymlvvhmrrhtgekphkctfegcrksysrlenlkthlrshtgekpymcehe
gcskafsnasdrakhqnrthsnekpyvcklpgctkrytdpsslrkhvktvhgpda
Testing The UCSC HMM Application
Using a known protein
five-finger GLI on DNA
LIBRA I
Structure Prediction by Threading:
Forward Folding Protocol
www.ddbj.nig.ac.jp/E-mail/libra/LIBRA_I.html
Compatible structures of a target sequence are sought from
the structural library chosen from Protein Data Bank (PDB).
The target sequence and 3D profile are aligned by simple
dynamic programming. According to the alignment, sequence
re-mounts on the structure and its fitness are evaluated by
psuedo-energy potential.
Scores are sorted from the best match and shown as well
as their alignments.
LIBRA I
Sequence Homology Search
by Threading: Inverse Folding
Compatible sequences of a target structure are sought from
the sequence database (Swiss-Prot). Scores are sorted from
the best match and shown as well as their alignments.
A recent study revealed that it is suitable in this search
to use the 3D-1D alignment score per se as the compatibility
score rather than the sequence re-mounting score.
The problem of predicting protein structure from sequence
remains fundamentally unsolved despite more than three
decades of intensive research effort.
The search has been driven by the belief that the 3D structure
of a protein is determined by its amino acid sequence
(Anfinsen, 1973). While it is now known that chaperones
often play a role in the folding pathway, and in correcting
misfolds (Corrales and Fersht, 1996, Hartl et al., 1994),
it is believed that the final structure is at the free-energy
minimum. Thus, all information needed to predict the
native structure of a protein is contained in the amino
acid sequence, plus a knowledge of its native solution
environment.
Ab initio prediction of protein structure from sequence: not yet;
Given only the amino acid sequence, it should be possible in
principle to directly predict protein structure from physicochemical principles using, for example, molecular dynamics
methods (Levitt and Warshel, 1975). In practice, however,
such approaches are frustrated by the enormous complexity
of the calculation (requiring many orders of magnitude more
computing time than is currently feasible) and by inaccuracies
in the experimental determination of basic parameters
(van Gunsteren, 1993, Shortle et al., 1996). Thus, the most
successful structure prediction tools are knowledge-based,
using a combination of statistical theory and empirical rules.
Odyssey of evolution teaches us structure prediction;
It appears that for most proteins, almost all residues can be
changed without affecting the structure (Rost et al., 1996b);
however, a single, randomly chosen mutation is more likely
to destabilize than to maintain a particular structure. Thus,
the precise pattern of amino acid exchanges observed in a
multiple sequence alignment of a protein family is highly
indicative of the particular structure. These patterns constitute
a fossil record of mutations preserving protein structure and
function. The importance of such evolutionary information for
structure prediction was realized very early and has long been
exploited in exceptional cases by experts, as well as in
automatic and systematic ways. More recently, the use of
evolutionary information has grown in importance. This
importance was made particularly clear recently when it was
shown that the accuracy of secondary structure was improved to
over 70% due to the use of evolutionary Information.
Genetic programming for protein structure prediction
S. Sun, Reduced representation model of protein structure
prediction: statistical potential and genetic algorithms,
Protein Science, vol 2, no 5, pp. 762-785, 1993.
Lamont, Gary B., Charles Kaiser, George Gates, Laurence
Merkle, and Ruth Pachter, Real-Valued Genetic Algorithm
Case Studies in Protein Structure Prediction, Proceedings
of the SIAM Conference on Parallel Applications, March 1997.
Natalio Krasnogor & Daniel H. Marcos & David Pelta & Walter A.
Risi. Protein Structure Prediction as a Complex Adaptive System
,Frontiers in Evolutionary Algorithms (FEA98), 1998
Natalio Krasnogor & Bill Hart & Jim Smith & David Pelta
Protein Structure Prediction With Evolutionary Algorithms,
Proceedings of the 1999 International Genetic and Evolutionary
Computation Conference (GECCO99).
Some source pages
http://www.sbc.su.se/~maccallr/
http://scpd.stanford.edu/SOL/courses/proEd/RACMB/vidList.htm
http://scpd.stanford.edu/SOL/courses/proEd/RACMB/vidList.htm