EXPLORING PROTEIN STRUCTURE

Download Report

Transcript EXPLORING PROTEIN STRUCTURE

Protein Structure Databases

Databases of three dimensional structures of proteins,
where structure has been solved using X-ray
crystallography or nuclear magnetic resonance (NMR)
techniques

Protein Databases:




PDB (protein data bank)
Swiss-Prot
PIR (Protein Information Resource)
SCOP (Structural Classification of Proteins)
Fibrous proteins have a structural role
•Collagen
is the most abundant protein in
vertebrates. Collagen fibers are a major
portion of tendons, bone and skin. Alpha
helices of collagen make up a triple helix
structure giving it tough and flexible
properties.
•Fibroin
fibers make the silk spun by spiders
and silk worms stronger weight for weight
than steel! The soft and flexible properties
come from the beta structure.
•Keratin
is a tough insoluble protein that
makes up the quills of echidna, your hair and
nails and the rattle of a rattle snake. The
structure comes from alpha helices that are
cross-linked by disulfide bonds.
Source:http://www.prideofindia.net/images/nails.jpg
http://opbs.okstate.edu/~petracek/2002%20protein%20structure%20function/CH06/Fig%2006-12.GIF
http://my.webmd.com/hw/health_guide_atoz/zm2662.asp?printing=true
2
The globular proteins
The globular proteins have a number of biologically important roles. They
include:
Cell motility – proteins link together to form filaments which make movement
possible.
Organic catalysts in biochemical reactions – enzymes
Regulatory proteins – hormones, transcription factors
Membrane proteins – MHC markers, protein channels, gap junctions
Defense against pathogens – poisons/toxins, antibodies, complement
Transport and storage – hemoglobin and myosin
3
Proteins for cell motility
Above: Myosin (red) and actin filaments
(green) in coordinated muscle contraction.
Right: Actin bound to the mysoin binding
site (groove in red part of myosin protein).
Add energy (ATP) and myosin moves,
moving actin with it.
Source: http://www.ebsa.org/npbsn41/maf_home.html
http://sun0.mpimf-
4
Proteins in the Cell Cytoskeleton
Eukaryote cells have a cytoskeleton made up of straight hollow
cylinders called microtubules (bottom left).
Tubulin
forms
helical
filaments
They help cells maintain their shape, they act like conveyer belts
moving organelles around in the cytoplasm, and they participate in
forming spindle fibres in cell division.
Microtubules are composed of filaments of the protein, tubulin (top
left) . These filaments are compressed like springs allowing
microtubules to ‘stretch and contract’.
13 of these filaments attach side to side, a little like the slats in a
barrel, to form a microtubule. This barrel shaped structure gives
strength to the microtubule.
Source:
heidelberg.mpg.de/shared/docs/staff/user/0001/24.php3?department=01&LANG=en
http://www.fz-juelich.de/ibi/ibi-1/Cellular_signaling/
http://cpmcnet.columbia.edu/dept/gsas/anatomy/Faculty/Gundersen/main.html
5
Proteins speed up reactions - Enzymes
2
2
Catalase speeds up the
breakdown of
hydrogen peroxide,
(H2O2) a toxic by
product of metabolic
reactions, to the
harmless substances,
water and oxygen.
The reaction is extremely
rapid as the enzyme
lowers the energy
needed to kick-start
the reaction (activation
energy)
+
No catalyst =
Input of 71kJ energy required
Energy
Activation
Energy
With catalase
= Input of 8 kJ energy required
Substrate
Product
Progress of reaction
6
Proteins can regulate metabolism – hormones
When your body detects an increase in the sugar
content of blood after a meal, the hormone
insulin is released from cells in the pancreas.
Insulin binds to cell membranes and this triggers the
cells to absorb glucose for use or for storage as
glycogen in the liver.
Proteins span membranes –protein channels
The CFTR membrane protein is an ion channel that
regulates the flow of chloride ions.
Not enough of this protein gets inserted into the
membranes of people suffering Cystic fibrosis. This causes
secretions to become thick as they are not hydrated. The
lungs and secretory ducts become blocked as a
consequence.
Source: http://www.biology.arizona.edu/biochemistry/tutorials/chemistry/page2.html
http://www.cbp.pitt.edu/bradbury/projects.htm
7
Proteins Defend us against pathogens –
antibodies
Left: Antibodies like IgG found in
humans, recognise and bind to
groups of molecules or epitopes
found on foreign invaders.
Right: The binding site of an antigen
protein (left) interacting with the
epitope of a foreign antigen (green)
Source: http://www.biology.arizona.edu/immunology/tutorials/antibody/FR.html
http://tutor.lscf.ucsb.edu/instdev/sears/immunology/info/sears-ab.htm
http://www.spilya.com/research/
http://www.umass.edu/microbio/chime/
8
Making Proteins
How are such a diverse range of proteins possible? The code for making a protein is
found in your genes (on your DNA). This genetic code is copied onto a messenger
RNA molecule. The mRNA code is read in multiples of 3 (a codon) by ribosomes
which join amino acids together to form a polypeptide. This is known as gene
expression.
Source: http://genetics.nbii.gov/Basic1.html
9
The protein folds
to form its
working shape
Gene Expression
Gene
DNA
G T
NUCLEUS
Chromosome
CELL
A C T A
The order of bases in
DNA is a code for
making proteins. The
code is read in groups of
three
Cell machinery
copies the code
making an mRNA
molecule. This
moves into the
cytoplasm.
Ribosomes read the
code and accurately
AUGAGUAAAGGAGAAGAACUUUUCACUGGAUA join Amino acids
together to make a
S
L F T
M
E
E
protein
K
10
The building blocks
The amino acids for making new proteins come from
the proteins that you eat and digest. Every time you
eat a burger (vege or beef), you break the proteins
down into single amino acids ready for use in
building new proteins. And yes, proteins have the
job of digesting proteins, they are known as
proteases.
There are only 20 different amino acids but they can
be joined together in many different combinations
to form the diverse range of proteins that exist on
this planet
11
Amino Acids
An amino acid is a relatively small molecule with characteristic groups of
atoms that determine its chemical behaviour.
The structural formula of an amino acid is shown at the end of the animation
below. The R group is the only part that differs between the 20 amino acids.
Phenylalanine
Cysteine
Alanine
Glycine
Valine
Amino
H3H
C
H N
H
S
H H
CH
3
C
H H
R
C C O H
H O
Acid
12
The 20 Amino Acids
The amino acids each have their own shape and charge
due to their specific R group.
View the molecular shape of amino acids by clicking on
the URL link below:
http://sosnick.uchicago.edu/amino_acids.html
Would the shape of a protein be affected if the wrong
amino acid were added to a growing protein chain?
13
Making a Polypeptide
R
H2N
C
H
O H
H N
C
O
Peptide Bond
R
H2N
C
O
N
C
C
O¯H
H
H N
R
Peptide Bond Peptide Bond
H
C
O
O
C
R
C
H
N
O H
R
H
C
C
H N
C
C
O H
O
R
R
C
O
O H
O H
C
O
Polypeptide
Growth
Polypeptide production = Condensation Reaction
14
Protein structure
15
Why Investigate Protein Structure?
Proteins are complex molecules whose
structure can be discussed in terms of:
primary structure
secondary structure
tertiary structure
quaternary structure
The structure of proteins is important as
the shape of a protein allows it to
perform its particular role or function
16
Four levels of protein
structure
Protein Primary Structure
The primary structure is the sequence of amino acids that are linked
together. The linear structure is called a polypeptide
http://www.mywiseowl.com/articles/Image:Protein-primary-structure.png
18
Protein Secondary Structure
The secondary structure of proteins consists of:
alpha helices
beta sheets
Random coils – usually form the binding and active sites of proteins
Source: http://www.rothamsted.bbsrc.ac.uk/notebook/courses/guide/prot.htm#I
19
Protein Tertiary Structure
Involves the way the random coils, alpha
helices and beta sheets fold in respect to
each other.
This shape is held in place by bonds such as
•
weak Hydrogen bonds between amino
acids that lie close to each other,
•
strong ionic bonds between R groups
with positive and negative charges, and
•
disulfide bridges (strong covalent S-S
bonds)
Amino acids that were distant in the primary
structure may now become very close to
each other after the folding has taken
place
Source: io.uwinnipeg.ca/~simmons/ cm1503/proteins.htm
The subunit of a more complex protein has
now been formed. It may be globular or
fibrous. It now has its functional shape or
conformation.
20
Protein Quaternary Structure
This is packing of the protein subunits to
form the final protein complex. For
example, the human hemoglobin
molecule is a tetramer made up of
two alpha and two beta polypeptide
chains (right)
Source: www.ibri.org/Books/
Pun_Evolution/Chapter2/2.6.htm
This is also when the protein associates
with non-proteic groups. For example,
carbohydrates can be added to form a
glycoprotein
Source:
www.cem.msu.edu/~parrill/movies/neur
am.GIF
21
Protein Structure Prediction


Why ?
Type of protein structure
predictions





Sec Str. Pred
Homology Modelling
Fold Recognition
Ab Initio
Secondary structure prediction




Why
History
Performance
Usefullness
Why do we need structure
prediction?

3D structure give clues to function:
active sites, binding sites, conformational
changes...
 structure and function conserved more than
sequence
 3D structure determination is difficult, slow and
expensive
 Intellectual challenge, Nobel prizes etc...
 Engineering new proteins

The Use of Structure
The Use of Structure
The Use of Structure
It's not that simple...



Amino acid sequence contains all
the information for 3D structure
(experiments of Anfinsen, 1970's)
But, there are thousands of atoms,
rotatable bonds, solvent and other
molecules to deal with...
Levinthal's paradox
Structure prediction
Summary of the four main approaches to structure prediction. Note
that there are overlaps between nearly all categories.
Approach
Difficulty
Usefulness
Comparative Proteins of
modelling
known
(Homology
structure
modelling)
Identify related structure with
sequence methods, copy 3D
coords and modify where
necessary
Relatively easy
Ver y, if sequence identity
drug design
Fold
recognition
Proteins of
known
structure
Same as above, but use more
sophisticated methods to find
related structure
Medium
Limited due to poor models
Secondary
structure
prediction
Sequencestructure
statistics
Forget 3D arrangement and
Medium
predict where the helices/strands
are
Can improve align ments,
fold recognition, ab initio
ab initio
tertiary
structure
prediction
Energy
functions,
statistics
Simulate folding, or generate lots Very hard
of structures and try to pick the
correct one
Not really
Method
Knowledge
Secondary structures -Helix
Secondary Structure - Sheet
Secondary structure - turns
Secondary Structure Predictions
Some highlights in performance
1974 Chou and Fasman
 1978 Garnier
 1993 PhD
 2000 PsiPred

50%
62%
72%
76%
Secondary
structure
prediction
1st generation
methods
Helix
Strand
EAL
MVI
HMQWVF
CYFQLTW
KI
A
DTSRC
RGD
Breaker
NY
KSHNP
Strong
breaker
PG
E
Strong former
Former
Weak former
Indif ferent

Chou and Fassman

1) Assign all residues the appropriate set of parameters.
2) Scan through the peptide and identify helical regions
3) Repeat this procedure to locate all of the helical regions in the
sequence.
4) Scan through the peptide and identify sheet regions.
5) Solve conflicts between helical and sheet assignments
6) Identify turns
Claims of around 70-80% - actual accuracy about 50-60%
GOR III
Garnier, Osguthorpe, Robson, 1990

Secondary structure depends on aminoacids
propensities


As in Chou Fassman
Also influences by neighboring residues
Helix capping
 Turns etc



How to include distant information.
Performance approximately 67%
GOR III
Garnier, Osguthorpe, Robson, 1990
The helix propensity tables thus have 20x17 entries.
Assign the state with the highest propensity
Status of predictions in 1990




Too short secondary structure segments
About 65% accuracy
Worse for Beta-strands
Example:
Secondary structure prediction
2nd generation methods



sequence-to-structure relationship modelled
using more complex statistics, e.g. artificial
neural networks (NNs) or hidden Markov
models (HMMs)
evolutionary information included (profiles)
prediction accuracy >70% (PhD, Rost 1993)
PhD-predictions


Secondary structure ``prediction'' by
homology
If sequence of unknown secondary structure has a homologue of known
structure, it is more accurate to make an alignment and copy the known
secondary structure over to the unknown sequence, than to do ``ab initio''
secondary structure prediction.
3rd generation methods


enhanced evolutionary sequence information
(PSI-BLAST profiles) and larger sequence
databases takes Q3 to > 75%
PHD and PSIPRED are the best known
methods
PSIPRED




Similar to PhD
Psiblast to detect more remote homologs
only two layers
SVM or NN gives similar performance
Alignment of Protein Structure

Compare 3D structure of one protein against 3D
structure of second protein

Compare positions of atoms in three-dimensional structures

Look for positions of secondary structural elements (helices
and strands) within a protein domain

Exam distances between carbon atoms to determine degree
structures may be superimposed

Side chain information can be incorporated


Buried; visible
Structural similarity between proteins does not necessarily
mean evolutionary relationship
Alignment of Protein Structure
Structure alignment
Simple case – two closely related proteins with the
same number of amino acids.
T
Find a transformation
to achieve the best
superposition
Types of
Structure Comparison
 Sequence-dependent vs. sequence-independent
structural alignment
 Global vs. local structural alignment
 Pairwise vs. multiple structural alignment
Sequence-dependent Structure
Comparison
1234567
ASCRKLE
¦¦¦¦¦¦¦
ASCRKLE
2
1
3
4
6
5
7
2
1
4
5
3
7
6
Minimize rmsd
of distances 1-1,...,7-7
2
rm sd 
1
N
N
 ( x(i)  y(i))
2
i
1
3
4
5
6
7
Sequence-dependent Structure
Comparison

Can be solved in O(n) time.

Useful in comparing structures of the same protein
solved in different methods, under different
conformation, through dynamics.

Evaluation protein structure prediction.
Sequence-independent Structure
Comparison
Given two configurations of points in the three
dimensional space:
T
find T which produces “largest” superimpositions of
corresponding 3-D points.
Evaluating Structural Alignments
1.
Number of amino acid correspondences created.
2.
RMSD of corresponding amino acids
3.
Percent identity in aligned residues
4.
Number of gaps introduced
5.
Size of the two proteins
6.
Conservation of known active site environments
7.
…
No universally agreed upon criteria. It depends on what you are
using the alignment for.