Protein structure Anne Mølgaard, Center for Biological Sequence Analysis

Download Report

Transcript Protein structure Anne Mølgaard, Center for Biological Sequence Analysis

Protein structure
Anne Mølgaard, Center for Biological Sequence Analysis
“Could the search for ultimate truth
really have revealed so hideous and
visceral-looking an object?”
Max Perutz, 1964
on protein structure
John Kendrew, 1959
with myoglobin model
Holdings of the Protein Data Bank (PDB):
X-ray
NMR
theoretical
total
Sep. 2001
13116
2451
338
15905
Feb. 2005
25350
4383
0
29733
Methods for structure determination
X-ray crystallography
Nuclear Magnetic Resonance (NMR)
Modeling techniques
Modeling
Only applicable to ~50% of sequences
Fast
Accuracy poor for low sequence id.
• There is still need for experimental structure
determination!
Structual genomics consortium (SGC)
• The SGC deposited its 275th structure into the
Protein Data Bank in August 2006
• currently operating at a pace of 170 structures per
year
• at a cost of USD$125,000 per structure.
• Scientific highlights include:
• several (> 1!!) novel structures of protein kinases
• completing the structural descriptions of the human
14-3-3
• adenylate kinase and cytosolic sulfotransferase
protein families
• human chromatin modifying enzymes; human inositol
phosphate signaling
• and a significant number of structures from human
parasites.
Amino acids
http://www.ch.cam.ac.uk/magnus/molecules/amino/
Amino acids
Livingstone & Barton, CABIOS, 9, 745-756, 1993
A – Ala
C – Cys
D – Asp
E – Glu
F – Phe
G – Gly
H – His
I – Ile
K – Lys
L – Leu
M – Met
N – Asn
P – Pro
Q – Gln
R – Arg
S – Ser
T – Thr
V – Val
W – Trp
Y - Tyr
Levels of protein structure
•Primary
•Secondary
•Tertiary
•Quarternary
Primary structure
MKTAALAPLFFLPSALATTVYLA
GDSTMAKNGGGSGTNGWGEYL
ASYLSATVVNDAVAGRSAR…(etc)
Ramachandran plot
left-handed -helix
-sheet
-helix
Hydrophobic core
Hydrophobic side chains go into the core of
the molecule – but the main chain is highly
polar
The polar groups (C=O and NH) are
neutralized through formation of H-bonds
Secondary structure
-helix C=O(n)…HN(n+4)
-sheet
(anti-parallel)
… and all the rest
310 helices (C=O(n)…HN(n+3)), p-helices
(C=O(n)…HN(n+5))
-turns and loops (in old textbooks
sometimes referred to as random coil)
The -helix has a dipole moment
+
N
C
-
Two types of -sheet:
anti-parallel
parallel
Tertiary structure (domains, modules)
Rhamnogalacturonan lyase (1nkg)
Rhamnogalacturonan
acetylesterase (1k7c)
Quaternary structure
B.caldolyticus UPRTase (1i5e)
B.subtilis PRPP synthase (1dkr)
Classification schemes
SCOP
– Manual classification (A. Murzin)
CATH
– Semi manual classification (C. Orengo)
FSSP
– Automatic classification (L. Holm)
Levels in SCOP
Class
All alpha proteins
All beta proteins
Alpha and beta proteins (a/b)
Alpha and beta proteins (a+b)
Multi-domain proteins
Membrane and cell surface
proteins
Small proteins
Total
# Folds
202
141
130
260
40
# Superfamilies
342
280
213
386
40
# Families
550
529
593
650
55
42
72
887
82
104
1447
91
162
2630
http://scop.berkeley.edu/count.html#scop-1.67
Major classes in SCOP
Classes
–
–
–
–
–
–
All alpha proteins
Alpha and beta proteins (a/b)
Alpha and beta proteins (a+b)
Multi-domain proteins
Membrane and cell surface proteins
Small proteins
All : Hemoglobin (1bab)
All : Immunoglobulin (8fab)
/: Triose phosphate isomerase (1hti)
a+b: Lysozyme (1jsf)
Folds*
• Proteins which have >~50% of their secondary
structure elements arranged the in the same
order in the protein chain and in three
dimensions are classified as having the same
fold
• No evolutionary relation between proteins
*confusingly also called fold classes
Superfamilies
Proteins which are (remote) evolutionarily
related
– Sequence similarity low
– Share function
– Share special structural features
Relationships between members of a
superfamily may not be readily recognizable
from the sequence alone
Families
Proteins whose evolutionarily relationship is
readily recognizable from the sequence (>~25%
sequence identity)
Families are further subdivided into Proteins
Proteins are divided into Species
– The same protein may be found in several species
Links
PDB (protein structure database)
– www.rcsb.org/pdb/
SCOP (protein classification database)
– scop.berkeley.edu
CATH (protein classification database)
– www.biochem.ucl.ac.uk/bsm/cath
FSSP (protein classification database)
– www.ebi.ac.uk/dali/fssp/fssp.html
Why are protein structures so interesting?
They provide a detailed picture of interesting
biological features, such as active site, substrate
specificity, allosteric regulation etc.
They aid in rational drug design and protein
engineering
They can elucidate evolutionary relationships
undetectable by sequence comparisons
Inferring biological
features from the
structure
1deo
NH2
Asp
His
COOH
Ser
Topological switchpoint
Inferring biological features
from the structure
Active site
Triose phosephate isomerase (1ag1)
(Verlinde et al. (1991) Eur.J.Biochem. 198, 53)
Engineering thermostability in serpins
Overpacking
Buried polar groups
Cavities
Im, Ryu & Yu (2004) Engineering thermostability in serine protease inhibitors
PEDS, 17, 325-331.
Evolution...
Structure is conserved longer than
both sequence and function
Rhamnogalacturonan
acetylesterase
(A. aculeatus) (1k7c)
Platelet activating factor acetylhydrolase
(Bos Taurus) (1wab)
Serine esterase (S. scabies)
(1esc)
Rhamnogalacturonan
acetylesterase
Serine esterase
Platelet activating
factor acetylhydrolase
Mølgaard, Kauppinen & Larsen (2000) Structure, 8, 373-383.
"We wish to suggest a structure for the salt of
deoxyribose nucleic acid (D.N.A.). This structure
has novel features which are of considerable
biological interest….
…It has not escaped our notice that the specific
pairing we have postulated immediately suggests a
possible copying mechanism for the genetic
material."
J.D. Watson & F.H.C. Crick (1953) Nature, 171, 737.