The Evolution of Protein Structure and Function as Studied through Structural Bioinformatics Philip E.

Download Report

Transcript The Evolution of Protein Structure and Function as Studied through Structural Bioinformatics Philip E.

The Evolution of Protein Structure and
Function as Studied through Structural
Bioinformatics
Philip E. Bourne
Skaggs School of Pharmacy and Pharmaceutical
Sciences
University of California San Diego
[email protected]
1
Agenda
• What is structural bioinformatics and how do YOU
drive it?
• Prerequisites: the sequence-structure-function
relationship
• Some exciting developments
– Using protein structure to study evolution
– Functional prediction, pathway mapping and the RCSB PDB
response
• Unsolved problems
– Structure comparison
– Domain definition
• What more could be done to drive the field forward?
2
3
Personal Definition
• Improving our
understanding of living
systems through the
study of
macromolecular
structure en masse
2nd Edition J. Gu and P.E.
Bourne (Eds.) John Wiley and
Sons NJ
What is Structural Bioinformatics?
• Each structure is a data
point is an effort to gain
broader understanding
4
A Field Driven by Your Activity
Number of released entries
Depositions to the PDB by decade
Year:
What is Structural Bioinformatics?
5
Lysozyme
Blake, Koenig,
Mair, North,
Phillips, Sarma
(1965) Nature 206
757
Proportion of
enzyme classes
relative to
total enzyme
structures
Ribonuclease Kartha,
Bello, Harker (1967)
Nature 213, 862-865;
Wyckoff, Hardman,
Allewell, Inagami,
Johnson, Richards
(1967) J. Biol. Chem.
242, 3753-3757.
Ligases
Isomerases
Lyases
Hydrolases
Transferases
Oxidoreductases
Percent
Enzymes
A Field Subject to Some Bias
Decade:
RNA-containing structures
tRNA J.L. Sussman, S.-H. Kim
(1976) Biochem Biophys Res
Commun. 68:89-96; J.D. Robertus,
J.E. Ladner, J.T. Finch, D. Rhodes,
R.S. Brown, B.F.C. Clark, & A. Klug
(1974) Nature 250: 546-551.
Protein/RNA
complexes
RNA only
DNA/RNA hybrid
Protein/DNA/RNA
complexes
What is Structural Bioinformatics?
6
Decade:
A Field Subject to Some Bias
PDB vs Human Genome
EC – Hydrolases – Begins to Illustrate the Bias in the
PDB
PDB
2.5 Transferring alkyl or aryl groups
over represented in PDB
2.4 Glycosyltransferases
under represented in PDB
Ensembl
Human
Genome
Annotation
What is Structural Bioinformatics?
7
Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31
http://sg.rcsb.org
Agenda
• What is structural bioinformatics and how do YOU
drive it?
• Prerequisites: the sequence-structure-function
relationship
• Some exciting developments
– Using protein structure to study evolution
– Functional prediction, pathway mapping and the RCSB PDB
response
• Unsolved problems
– Structure comparison
– Domain definition
• What more could be done to drive the field forward?
8
Sequence vs Structure
Twilight Zone
Midnight Zone
The classic hssp curve from Sander and Schneider (1991)
Proteins 9:56-68
The Sequence Structure Function Relationship
9
There Are No Absolute Rules - Similar Sequences
– Different Structures
1PIV:1
Viral Capsid Protein
1HMP:A
Glycosyltransferase
10
80 Residue Stretch (Yellow) with Over 40% Sequence Identity
The Sequence Structure Function Relationship
Structure vs Function Follows a
Power Law Distribution
• Some folds are
promiscuous and
adopt many different
functions - superfolds
Qian J, Luscombe NM, Gerstein M. JMB 2001313(4):673-81
11
The Sequence Structure Function Relationship
Examples of Superfolds..
12
The Sequence Structure Function Relationship
Structure Is Highly Redundant
Structure Alignments using CE with z>4.0
The Russian Doll Effect
Homology
modeling
is used here
Pharm 201 Lecture 09, 2009
The Sequence Structure Function Relationship
13
I.N. Shindyalov and P.E. Bourne 2000
Proteins 38(3), 247-260
How Can we Utilize these Seemingly
Complex Relationships?
14
Agenda
• What is structural bioinformatics and how do YOU
drive it?
• Prerequisites: the sequence-structure-function
relationship
• Some exciting developments
– Using protein structure to study evolution
– Functional prediction, pathway mapping and the RCSB PDB
response
• Unsolved problems
– Structure comparison
– Domain definition
• What more could be done to drive the field forward?
15
Nature’s Reductionism
There are ~ 20300 possible proteins
>>>> all the atoms in the Universe
9.5M protein sequences from
UniProt/TrEMBL (10/09)
38,221 protein structures
Yield 1195 folds, 1962 superfamilies,
3902 families (SCOP 1.75)
Using Protein Structure to Study Evolution
Consider First the Evolutionary
History of One Superfamily – the
Protein Kinase-like Superfamily
E. Scheeff and P.E. Bourne 2005 PLoS Comp. Biol. 1(5): e49.
17
Using Protein Structure to Study Evolution
The Protein Kinase-like Superfamily
• A large family important
to signal transduction in
eukaryotes and many
bacteria.
• Phosphotransferases:
transfer phosphate group
from ATP to Ser/Thr or Tyr
residue on target protein,
producing a range of
downstream signaling
effects.
• PKA: an example of a
typical protein kinase
(TPK) fold, shown in
“open book” format
PSB 2007
Using Protein Structure to Study Evolution
18
The Protein Kinase-Like Superfamily
• A range of different
families, all
phosphotransferases
• A variety of different
targets
• All possess a core
cassette of elements
shared with the TPKs:
• ATP binding
• Catalysis
• Structures can be
highly variable,
particularly in the
substrate binding
regions
Family
Structural
Representative
Phosphorylates
Biological result
Typical Protein
Kinases (TPKs)
Protein Kinase A
(PKA)
Ser/Thr or Tyr
residues of proteins
Range of signaling
effects
Alpha kinases
Channel Kinase
(ChaK)
Ser/Thr residues in
alpha-helices
Range of signaling
effects
Actin-Fragmin
Kinase (AFK)
Actin-Fragmin
Kinase (AFK)
Thr residue of actin
Control of actin
polymerization
Phosphatidyl
-inositol 3- and 4kinases
Phosphatidylinositol
3-kinase (PI3K)
Phosphatidylinositol
(PI), PIphosphates, PIbisphosphates
Range of secondmessenger signaling
effects
Phosphatidylinositol phosphate
kinases
Phosphatidylinositol
phosphate kinase
(PIPK)
PI-phosphates
Range of secondmessenger signaling
effects
Choline/
ethanolamine
kinases
Choline Kinase
(CK)
Choline
Part of pathway that
eventually produces
phoshpatidylcholine,
important constituent
of membranes
Aminoglycoside
Kinases
Aminoglycoside
Kinases (AK)
Aminoglycoside
antibiotics
Antibiotic resistance
19
Using Protein Structure to Study Evolution
Method
• Begin with a multiple structure alignment using CEMC (NAR 2004) of 30 “comparable” TPKs and APKs
and manually correct in a pair-wise manner over a
period of 1-2 person years
• Review the literature on each structure
• Review the associated sequence alignments derived
from structure
E. Scheeff and P.E. Bourne 2005 PLoS Comp. Biol. 1(5): e49.
20
Using Protein Structure to Study Evolution
Let Us Side Track for One Minute on Structural
Bioinformatics Methodology
Biological vs Geometric Alignments Plastocyanin versus Azurin
(from Godzik 1996)
Maintain 9 of 10 interactions
RMSD 1.5 Å
Maintain 5 of 10 interactions
RMSD 0.5 Å
Pharm 201 Lecture 10, 2009
Structural Bioinformatics Unsolved Problems
21
Phosphoinositide-3 Kinase
(D) and Actin-Fragmin
Kinase (E)
PKA
ChaK (“Channel Kinase”)
22
Using Protein Structure to Study Evolution
Can We Propose an Evolutionary History for the
Protein Kinase-Like Superfamily?
•Bayesian inference of phylogeny
(MrBayes)
•Manual structure alignment
produces very high-quality
sequence alignment of diverse
homologues
Example columns:
1BO1
Atypical
0
0
0
0
1
1IA9
Atypical
1
1
1
1
0
1) Ion pair analogous
to K72-E91 in PKA
1E8X
Atypical
1
0
1
1
1
2) α-Helix B present
3) State of α-Helix C
(0: kinked, 1: straight)
•But, sequence information too
degraded to produce branching
with sufficient support (i.e. a high
posterior probability)
4) State of Strand 4
(0: kinked, 1: straight)
5) α-Helix D present
•Addition of a matrix of structural
characteristics (similar to
morphological characteristics)
produces a well supported
combined model
•Neither sequence structural
characteristics sufficient to alone
produce resolved tree, must be
used in combination.
PSB 2007
Using Protein Structure to Study Evolution
1 2 3 4 5
1CJA
Atypical
1
0
1
1
1
1NW1
Atypical
1
0
1
0
0
1J7U
Atypical
1
0
1
0
1
1CDK
AGC
1
1
1
0
1
1O6L
AGC
1
1
1
0
1
1OMW
AGC
1
1
1
0
1
1H1W
AGC
1
1
1
0
1
1MUO
Other
1
1
1
0
1
1TKI
CAMK
1
0
1
0
1
1JKL
CAMK
1
0
1
0
1
1A06
CAMK
1
0
1
0
1
1PHK
CAMK
1
0
1
0
1
1KWP
CAMK
1
0
1
0
1
1IA8
CAMK
1
0
1
0
0
1GNG
CMGC
1
0
1
0
1
1HCK
CMGC
1
0
1
0
1
1JNK
CMGC
1
0
1
0
1
1HOW
CMGC
1
0
1
0
1
1LP4
Other
1
0
1
0
1
1F3M
STE
1
0
1
0
1
1O6Y
Other
1
0
1
0
1
1CSN
CK1
1
0
1
0
1
1B6C
TKL
1
0
1
0
1
2SRC
TK
1
0
1
0
1
1LUF
TK
1
0
1
0
1
1IR3
TK
1
0
1
0
1
1M14
TK
1
0
1
0
1
1GJO
TK
1
0
1
0
1
23
Proposed Evolutionary History for the Protein
Kinase-Like Superfamily
APH
• Suggests distinctive
history for atypical
kinases, as opposed to
intermittent divergence
from the typical protein
kinases (TPKs)
AGC
CK
0.64
AFK
• TPK portion of tree
shows high degree of
agreement with
Manning tree
• Branching is
supported by species
representation of
kinase families
CAMK
0.97
CMGC
1.0
0.85
0.78
TKL
PI3K
CK1
TK
•Atypical kinase families: Blue
PIPKIIβ
ChaK
PSB 2007
Using Protein Structure to Study Evolution
•Typical protein kinase groups
(subfamilies): Red
•Branch labels: posterior
24
probability of branch
What Happens if We Use
Structure to Look Across
Superfamilies?
Yang, Doolittle & Bourne (2005) PNAS 102(2) 373-8
25
Using Protein Structure to Study Evolution
To Answer this Question We Only
Need to Make Use of Existing
Resources!
• SCOP – Further catalogs Nature’s reductionism into
structural domains, folds, families and superfamilies
• SUPERFAMILY assigns the above to fully sequenced
proteomes
26
Using Protein Structure to Study Evolution
Use of SCOP Superfamilies
• How do you distinguish convergent versus
divergent evolution?
• The SCOP notion of SUPERFAMILY with
evidence of weak sequence relationships can
be used to discount convergence.
27
Using Protein Structure to Study Evolution
Structure Provides an Evolutionary
Fingerprint
Distribution among the three kingdomsas taken from
SUPERFAMILY Eukaryota (650)
135
153/14
• Superfamily
distributions would
seem to be related to
the complexity of life
• Update of the work of
Caetano-Anolles2
(2003) Genome
Biology 13:1563
10
118
21/2
310/0
387
645/49
9/1
12
17
29/0
Archaea (416)
42
68/0
Bacteria (564)
SCOP fold (765 total)
Any genome / All genomes
28
Using Protein Structure to Study Evolution
The Unique Superfamily in Archaea – d.17.6
• Archaeosine tRNAguanine transglycosylase
(tgt), C2 domain
• First step in the
biosynthesis of an
archaea-specific modified
base, archaeosine (7formamidino-7deazaguanosine)
• Found in tRNAs
• Was found exclusively in
Archaea.
Reference: Interpro IPR004804
29
Using Protein Structure to Study Evolution
Method – Distance Determination
Presence/Absence
Data Matrix
(FSF)
SCOP
organisms
SUPERFAMILY
C. intestinalis
C. briggsae
F. rubripes
a.1.1
1
1
1
a.1.2
1
1
1
a.10.1
0
0
1
a.100.1
1
1
1
a.101.1
0
0
0
a.102.1
0
1
1
a.102.2
1
1
1
Distance Matrix
C. intestinalis
C. intestinalis
C. briggsae
F. rubripes
0
101
109
0
144
C. briggsae
F. rubripes
0
30
Using Protein Structure to Study Evolution
Is Structure a Useful
Discriminator of Species? - Yes
Archaea
Bacteria
Eukaryota
The method cleanly placed all species in their
correct superkingdoms
Yang, Doolittle & Bourne (2005) PNAS 102(2) 373-8
31
Using Protein Structure to Study Evolution
If Structure is so Conserved
is it a Useful Tool in the Study of Evolution?
The Answer Would Appear to be Yes
• It is possible to
generate a
reasonable tree of
life from merely the
presence or
absence of
superfamilies
(FSFs) within a
given proteome
Yang, Doolittle & Bourne (2005) PNAS 102(2) 373-8
32
Using Protein Structure to Study Evolution
The Influence of Environment on Life
Chris Dupont
Scripps Institute of Oceanography
UCSD
DuPont, Yang, Palenik, Bourne. 2006 PNAS 103(47) 17822-17827
33
Using Protein Structure to Study Evolution
Consider the Distribution of Disulfide
Bonds among Folds
• Disulphides are only stable under
oxidizing conditions
• Oxygen content gradually
accumulated during the earth’s
evolution
• The divergence of the three
kingdoms occurred 1.8-2.2 billion
years ago
• Oxygen began to accumulate ~ 2.0
billion years ago
• Logical deduction – disulfides more
prevalent in folds (organisms) that
evolved later
• This would seem to hold true
Eukaryota
31.9%
(43/135)
0%
(0/10)
0%
(0/2)
1
4.7%
(18/387)
14.4%
(17/118)
5.9%
(1/17)
16.7%
(7/42)
Archaea
Bacteria
SCOP fold (708 total)
• Can we take this further?
34
Using Protein Structure to Study Evolution
Evolution of the Earth
•
•
•
•
•
4.5 billion years of change
300+50K
1-5 atmospheres
Constant photoenergy
Chemical and geological
changes
• Life has evolved in this
time
• The ocean was the
“cradle” for 90% of
evolution
35
Using Protein Structure to Study Evolution
Theoretical Levels of Trace Metals and Oxygen in
the Deep Ocean Through Earth’s History
Bacteria
Archaea
Eukarya
1
Oxygen
0
1.00E-08
Zinc
1.00E-12
1.00E-16
1.00E-20
1.00E-06
Iron
1.00E-09
1.00E-12
1.00E-15
1.00E-07
Cobalt
Manganese
1.00E-09
Concentration
(O2 in arbitrary units, Zn and Fe in moles L-1
0.5
1.00E-11
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
Billions of years before present
• Whether the deep ocean
became oxic or euxinic
following the rise in
atmospheric oxygen (~2.3
Gya) is debated, therefore both
are shown (oxic ocean-solid
lines, euxinic ocean-dashed
lines).
• The phylogenetic tree symbols
at the top of the figure show
one idea as to the theoretical
periods of diversification for
each Superkingdom.
Replotted from Saito et al, 2003
Inorganica Chimica Acta 356: 308-318
36
Using Protein Structure to Study Evolution
The Gaia Hypothesis
Gaia (pronounced /'geɪ.ə/ or /'gaɪ.ə/) "land" or "earth", from the
Greek Γαῖα; is a Greek goddess personifying the Earth
Gaia - a complex entity involving the Earth's biosphere,
atmosphere, oceans, and soil; the totality
constituting a feedback system which seeks an
optimal physical and chemical environment for life
on this planet.
James Lovelock
37
Using Protein Structure to Study Evolution
The Question
• Have the emergent properties of an
organism as judged by its protein content
been influenced by the environment?
• Will do this by consideration of the
metallomes of a broad range of species
• The metallomes can only be deduced by
consideration of the protein structures to
which the metal is covalently bound
• Will hypothesize that these emergent
properties in turn influenced the
environment
38
Using Protein Structure to Study Evolution
Making the Metallome of Each Species – Can Only
be Done from Structure and Requires Human Effort
1.
2.
3.
4.
5.
6.
7.
Start with SCOP
Each {super}family level
assignment was checked
manually for metal binding
All the structures
representing the family had
to bind the metal for it to be
considered unambiguous
The literature was consulted
to resolve ambiguities
Superfamily database used
to map to proteomes
23 Archaea, 233 Bacteria, 57
Eukaryota
Cu, Ni, Mo ignored (<0.3%)
of proteome
39
Using Protein Structure to Study Evolution
Levels of Ambiguity
• Ambiguous superfamily binds different metals or
have members that are not known to bind metals
• Ditto families
• Approx 50% of superfamilies and 10% of families are
ambiguous
• Only unambiguous families used in this study
40
Using Protein Structure to Study Evolution
Superfamily Distribution As Well
As Overall Content Has Changed
Bacteria Fe
superfamilies
a.1.1
a.1.2
a.1.1
a.1.2
a.104.1
a.110.1
a.104.1
a.110.1
a.119.1
a.138.1
a.119.1
a.138.1
a.2.11
a.24.3
a.2.11
a.24.3
a.24.4
a.25.1
a.24.4
a.25.1
a.3.1
a.39.3
a.3.1
a.39.3
a.56.1
a.93.1
a.56.1
a.93.1
b.1.13
b.2.6
b.1.13
b.2.6
b.3.6
b.33.1
b.3.6
b.33.1
b.70.2
b.82.2
b.70.2
b.82.2
c.56.6
c.83.1
c.56.6
c.83.1
c.96.1
d.134.1
c.96.1
d.134.1
d.15.4
d.174.1
d.15.4
d.174.1
d.178.1
d.35.1
d.178.1
d.35.1
d.44.1
d.58.1
d.44.1
d.58.1
e.18.1
e.19.1
e.18.1
e.19.1
e.26.1
e.5.1
e.26.1
e.5.1
f.21.1
f.21.2
f.21.1
f.21.2
f.24.1
f.26.1
f.24.1
f.26.1
g.35.1
g.36.1
g.35.1
g.36.1
g.41.5
Eukaryotic Fe
superfamilies
g.41.5
41
Using Protein Structure to Study Evolution
14
100
90
80
70
60
50
40
30
20
10
0
12
10
8
6
4
2
(♦)Average copy number
(x) Percent of Bacterial proteomes
which a fold family occurs in
Fe Containing Proteins in
Bacteria
0
Unique Fe-binding fold families
(108 total)
• A quantile plot showing the
percent of Bacterial
proteomes each Fe-binding
fold family occurs in (x).
• This plot also shows the
average copy number of that
fold family in the proteomes
where it occurs (♦).
• Few Fe-binding folds are in
most proteomes.
• Widespread Fe-binding folds
are not necessarily
abundant.
• Similar trends are observed
for Zn, Mn, and Co in all
three Superkingdoms.
42
Using Protein Structure to Study Evolution
2
A
Slope of fitted power law
Total Zn-binding domains in a proteome
10
10 4
Metal Binding Proteins are Not
Consistent Across Superkingdoms
102.5
Total domains in a proteome
105
B
Archaea
Bacteria
Eukarya
Zn
Fe
Mn
1
0
Co
Since these data are derived from current species they are independent of
evolutionary events such as duplication, gene loss, horizontal transfer and
endosymbiosis
43
Using Protein Structure to Study Evolution
Power Laws: Fundamental
Constants in the Evolution of
Proteomes
A slope of 1 indicates that a group of structural
domains is in equilibrium with genome
growth, while a slope > 1 indicates that the
group of domains is being preferentially
duplicated (or retained in the case of genome
reductions).
van Nimwegen E (2006) in: Koonin EV, Wolf YI, Karev GP, (Ed.).
Power laws, scale-free networks, and genome biology
Using Protein Structure to Study Evolution
2
A
102.5
Slope of fitted power law
Total Zn-binding domains in a proteome
10
10 4
Metal Binding Proteins are Not Consistent
Across Superkingdoms
Total domains in a proteome
105
B
Archaea
Bacteria
Eukarya
Zn
Fe
Mn
1
0
Co
45
Using Protein Structure to Study Evolution
Why are the Power Laws Different
for Each Superkingdom?
• Power laws are likely influenced by selective pressure.
Qualitatively, the differences in the power law slopes
describing Eukarya and Prokarya are correlated to the
shifts in trace metal geochemistry that occur with the rise
in oceanic oxygen
• We hypothesize that proteomes contain an imprint of the
environment at the time of the last common ancestor in
each Superkingdom
• This suggests that Eukarya evolved in an oxic
environment, whereas the Prokarya evolved in anoxic
environments
46
Using Protein Structure to Study Evolution
Do the Metallomes Contain Further
Support for this Hypothesis?
Superkingdom
Eukarya
Archaea
Bacteria
Fold Family
Cytochrome P450
Cytochrome c3-like
Cytochrome b5
Purple acid phosphatase
Penicillin synthase-like
Hypoxia-inducible factor
Di-heme elbow motif
4Fe-4S ferredoxins
MoCo biosynthesis proteins
Heme-binding PAS domain
HemN
a helical ferrodoxin
biotin synthase
ROO N-terminal domain-like
High potential iron protein
Heme-binding PAS domain
MoCo biosynthesis proteins
HemN
4Fe-4S ferredoxins
cytochrome c
a helical ferrodoxin
%
0.44 + 0.48
0.13 + 0.3
0.12 + 0.09
0.11 + 0.08
0.07 + 0.1
0.07 + 0.04
0.06 + 0.01
1.80 + 0.7
1.60 + 0.3
1.10 + 1.0
0.80 + 0.20
0.60 + 0.16
0.55 + 0.1
0.5 + 0.1
0.38 + 0.25
0.3 + 0.4
0.21 + 0.15
0.2 + 0.15
0.2 + 0.2
0.14 + 0.2
0.12 + 0.09
Fe-binding
heme
heme
heme
amino
amino
amino
heme
Fe-S
Fe-S
heme
Fe-S
Fe-S
Fe-S
amino
Fe-S
heme
Fe-S
Fe-S
Fe-S
heme
Fe-S
O2
yes
no
no
no
yes
yes
no
no
no
no
1
no
no
2
no
1
no
no
no
no
no
Overall percent of Fe bound by
Fe-S
heme
amino
21 + 9
47 + 19
32 + 12
68 + 12
13 + 14
19 + 6
47 + 11
22 + 12
31 + 16
1. Some, but not all, PAS domains actually sense oxygen
2. The Rubredoxin oxygen:oxidoreductase (ROO) protein does not contact oxygen, but catalyzes an oxygen reduction pathway
47
Using Protein Structure to Study Evolution
e- Transfer Proteins
Same Broad Function, Same Metal, Different Chemistry
Induced by the Environment?
Fe-S clusters
Cytochromes
Fe bound by S
Fe bound by heme (and
amino-acids)
Cluster held in place by
Cys
Generally negative
reduction potentials
Generally positive
reduction potentials
Less susceptible to
oxidation
Very susceptible to
oxidation
48
Using Protein Structure to Study Evolution
Hypothesis
• Emergence of cyanobacteria changed oxygen
concentrations
• Impacted relative metal ion concentrations in the ocean
• Organisms evolved to use these metals in new ways to
evolve new biological processes eg complex signaling\
• This in turn further impacted the environment
• Only protein structures could reveal such dependencies
49
Using Protein Structure to Study Evolution
Agenda
• What is structural bioinformatics and how do YOU
drive it?
• Prerequisites: the sequence-structure-function
relationship
• Some exciting developments
– Using protein structure to study evolution
– Functional prediction, pathway mapping and the RCSB PDB
response
• Unsolved problems
– Structure comparison
– Domain definition
• What more could be done to drive the field forward?
50
Our Methods are Still Not Good Enough The 3D Domain Assignment Problem
A domain is a fundamental structural, functional and
evolutionary unit of a protein:
Compact
Stable
Have hydrophobic core
Fold independently
Perform specific function
Can be re-shuffled and put together in
different combinations
Evolution works on the level of domain
Unsolved Problems – 3D Domain Definition
Evaluation of automatic domain assignment
methods
Structures with issues (all/most methods)
Large structures, complex architectures
1dcea
Very small simple domains: difficult to
separate. Issues: minimum domain size,
low contact density
1ubdc
Experts: 3
NCBI method, PDP,
DomainParser : 5
PUU: 6
1bxrc
Experts: 4
NCBI method: 4
DomainParser: 2
PDP, PUU: 2
1e88a
Experts: 3
PUU: 1
PDP: 2
Experts: 6
DomainParser: 5
PUU: 2
PDP: 2
NCBI: 2
Unsolved Problems – 3D Domain Definition
NCBI methods: 8
Manual vs. Automatic Consensus
Chains with manual consensus: 375 (80% of entire dataset)
Chains with automatic consensus: 374 (80% of entire dataset)
Chains with consensus (automatic or manual) : 424 (90.6% of entire dataset)
Automatic consensus only
46 chains (10.9% of chains
with consensus)
Manual consensus only
47 chains (11.1% of
chains with consensus)
Manual and automatic consensus
agree
328 chains
(77.3% of chains with consensus)
Automatic consensus and manual
consensus disagree 3 chains (0.7%
of chains with consensus)
Unsolved Problems – 3D Domain Definition
JMB 2004 339(3), 647-678
Natalie Dawson
Unpublished
http://itol.embl.de/
Natalie
55
Agenda
• What is structural bioinformatics and how do YOU
drive it?
• Prerequisites: the sequence-structure-function
relationship
• Some exciting developments
– Using protein structure to study evolution
– Functional prediction, pathway mapping and the RCSB PDB
response
• Unsolved problems
– Structure comparison
– Domain definition
• What more could be done to drive the field forward?
56
Structure
determination or
modeling of whole
metabolic network
What are the implications of this?
• Biochemical reactions, pathways, and networks
can now be described in the context of entire
cells
• Enables more realistic simulations of the
behavior of metabolic networks
• Better understanding of evolution - compare
pathways between organisms
• Predict effects of mutations and drugs
• Synthetic Biology
Pathway
Agenda
• What is structural bioinformatics and how do YOU
drive it?
• Prerequisites: the sequence-structure-function
relationship
• Some exciting developments
– Using protein structure to study evolution
– Functional prediction, pathway mapping and the RCSB PDB
response
• Unsolved problems
– Structure comparison
– Domain definition
• What more could be done to drive the field forward?
62
Better Interoperability Between the Data
and the Literature Upon Which it is Based
63
What More Could be Done to Drive the Field Forward?
Data
Database
Knowledge
Knowledgebase
Data Only
Wikis
Datapacks
Journals
Annotation
Data +
Annotation
Data + Some
Annotation
Data + Some
Annotation
+
Some
Integration
PLoS
iStructure
The Database View
www.rcsb.org/pdb/explore/literature.do?structureId=1TIM
Context
What More Could be Done to Drive the Field Forward?
The Literature View – Web 3.0?
http://betastaging.rcsb.org
What More Could be Done to Drive the Field Forward?
Acknowledgements
• Protein-protein Interactions
– JoLan Chung & Wei Wang
• Functional Flexibility
– Jenny Gu & Michael
Gribskov
• Multipolar Representation
– Apostol Gramada
• Funding, NSF, NIH
67