Using comparative genomics to study mitochondrial proteome

Download Report

Transcript Using comparative genomics to study mitochondrial proteome

Turning genomics data into Biology
Martijn Huynen
Nijmegen Center for Molecular Life Sciences,
Centre for Molecular and Biomolecular Informatics
Comparative genomics
The (somewhat) intelligent
comparative genomics meat
grinder
Method development
Prediction of protein function, pathways
Evolution of biosystems
A phosphomannomutase (pmm) is predicted to
have acquired a phosphoribomutase (deoB)
deoxycitidine
function
Cdd
deoxyuridine, deoxythimidine
DeoA
Glyceraldehyde-3-p,
deoC acetaldehyde
deoxyribose-5-P
deoB
deoxyribose-1-P
DeoD
purine deoxyribonucleosides
M.genitalium
M.tuberculosis
deoD
deoB ?
deoC
deoA
cdd
pmm
Predicting functional relations between genes
using (conserved) genomic context
http://string.embl.de
Genomic Context Types:
Conserved Neighborhood
Co-occurrence
Gene Fusion
Dandekar et al., 1998 Marcotte et al., 1999 Huynen and Bork 1998
Overbeek et al., 1999 Enright et al., 1999 Pellegrini et al., 1999
Snel et al., NAR 1999
von Mering et al., NAR 2002
von Mering et al, NAR 2005
YJR109C
D2085.1
88
96
YJL130C
Rv1384
sll0370
100
100
AF1274
AQ2101 & AQ1172
HP0919
88
92
93
EC0033
MTH997 & MTH996
83
MJ1378 & MJ1381
CarB
100
PyrAB
Gene fission in the evolution of carbamoyl phosphate
synthase B (carB)
Predicting functional interactions between proteins by the
co-occurrence of their genes in genomes.
Distribution of four M.genitalium genes among 25 genomes
MG299 (pta)
MG357(ackA)
MG019(dnaJ)
MG305(dnaK)
0001100001101011000101111
0001100001101011000101111
0011111101111011100111111
0011111101111011100111111
Using the mutual information between genes as a scoring heuristic for
their co-occurrence.
M(pta, ackA)=0.69 (phospotransacetylase, acetate kinase)
M(dnaJ, dnaK)=0.55 (heat shock proteins)
M(dnaJ, ackA)=0.19
Evolutionary conservation of genomic context
increases the likelihood of functional interaction
1
0.8
0.6
0.4
Fusion
Gene Order
Co-occurrence
0.2
00
0.2
0.4
0.6
0.8
Evolutionary conservation score
1
Correlation between the strength of the
genomic and functional associations
6
number of COGS
5
1000
average metabolic
distance
100
4
3
2
10
1
1
0
30
27
24
21
18
15
12
9
6
3
0
co-occurrences in operons
average metabolic
distance
number of COGs
10000
Genomic associations correlate with a wide array
of functional interactions
Gene Order Conservation
4%
physical interaction
complex
metabolic pathway
non-metabolic pathway
process
hypothetical
unknown interaction
10%
30%
10%
6%
7%
Co-occurrence in Genomes
Gene Fusion
33%
22%
15%
23%
23%
56%
11%
14%
4%
7%
25%
Huynen et al, Genome research 2000
Combining homology information with genomic
association for function prediction
Repeated occurrence of MG009, a phosphohydrolase,
with thymidilate kinase (tmk) suggests a role of MG009
in pyrimidine metabolism.
Conservation of gene order of the hypothetical gene MG134
with dnaX, RecR suggests physical interaction between their
gene products
Phylogenomics for protein function prediction
An ancient paralog of N7BM has been lost in the same lineages as N7BM itself, implicating a
possible role in Complex I
Gabaldon et al. (2005) J. Mol. Biol.
Experimental confirmation of a role of the N7BM paralog in Complex I
J. Clin. Invest. (2005)
Verified function predictions: Making predictions
is easy, testing them is another matter.
Protein
Context
type of interaction
function
ref
Mt-Ku
GnlK
PH0272
PrpD
arok
ComB
KynB
PvlArgDC
FabK
FabM
COG0042
Yfh1
YchB
SmpB
ThyX
ThiN
ThiE
Prx
YgbB
SelR
FadE
TogMNAB
MetD
gene order
gene order
gene order
gene order
gene order
gene order
gene order
gene order
gene order
gene order
gene order
co-occurrence
co-occurrence
co-occurrence
complementary
complementary
complementary
fusion
fusion/ gene order
fusion./order/co-o.
reg. sequence
reg. sequence
reg. sequence
physical interaction
physical interaction
metabolic pathway
metabolic pathway
metabolic pathway
metabolic pathway
metabolic pathway
metabolic pathway
metabolic pathway
metabolic pathway
tRNA modification
process
metabolic pathway
process
enzymatic activity
enzymatic activity
enzymatic activity
pathway
metabolic pathway
enzymatic activity
metabolic pathway
metabolic pathway
metabolic pathway
double-stranded DNA repair
signal transduction for ammonium transport
methylmalonyl-CoA racemase
2-methylcitrate dehydratase
shikimate kinase
2-phosphosulfolactate phosphatase
kynurenine formamidase
arginine decarboxylase
enoyl-ACP reductase
trans-2-decenoyl-ACP isomerase
tRNA-dihydrouridine synthase
iron-sulfur cluster assembly
terpenoid synthesis
trans-translation
thymidilate synthase
thiamine phosphate synthase
thiamine phosphate synthase
peroxiredoxin
terpenoid synthesis
methionine sulfoxide reductase
acyl CoA dehydrogenase
[78,79]
Oligogalacturonide transport
Methionine transport
[56]
[57,58]
[59]
[22,60]
[61]
[62]
[63]
[64]
[65]
[66]
[67]
[68,69]
[70]
[5,71]
[14,72]
[73,74]
[74]
[75]
[76]
[14,22,77]
Huynen et al., Curr Op. Cell Biol. 2003
[80,81]
[82]
Experimentally confirmed protein functions, predicted with various
types of context
13
gene order
3
gene fusion
3
regulatory
element
4
compl.
distribution
4
co-occurrence
Predicting gene function by conservation of
co-expression
Evolutionary conservation of co-expression increases
the likelihood of functional interaction
Low but significant levels of conservation of co-expression
(see Teichmann et al, TIBS 2002, Stuart et al., Science 2003)
Total #
of pairs
# of pairs
> 0.6
Observed
fraction > 0.6
Expected
fraction > 0.6
Observed/Expected
Gene-pairs with an orthologous gene-pair > 0.6
Worm
18161
803
0.0442*
0.00379
12
Yeast
36548
1215
0.0332*
0.00216
15
Gene-pairs with a paralogous gene-pair > 0.6
Worm
207214
29031
0.1401*
0.00379
37
Yeast
38253
2167
0.0566*
0.00216
26
van Noort et al, TIG, 2003
Conservation of protein-protein interaction measured by
yeast-2-hybrid increases the likelihood of interaction
Comparison of Giot (Fly) and Ito (Yeast), Uetz (Yeast) y-2-h interactions
Fraction hypothetical proteins in conserved Y2H interactions relatively low
Hypotheticals:
In conserved interactions
In complete genome
13
~1600
5%
27%
A “new”, conserved interaction:
GTPase XAB1/CG3704

hypothetical, GTPase YOR262/CG10222
XAB1 interacts with the DNA repair protein XPA1, inferred to be required for XPA1’s
import in the nucleus.
Conservation of protein-protein interaction between species
Dataset
Comparison
Protein interactions, both
proteins in the other dataset
Conserved
interactions
Fraction conserved
interactions
Average fraction conserved
interactions
Ito / Uetz
Yeast vs. Yeast
858 / 697
201
23.4% / 28.8%
26.1%
Ito / Giot
Yeast vs. Fly
229 / 394
45
19.6% / 11.4%
15.5%
Uetz / Giot
Yeast vs. Fly
120 / 168
33
27.5% / 19.6%
23.5%
Physical interaction is reasonably well conserved between
(…..compared to the “conservation” within a species…)
Huynen et al, TIG, 2004
Is the low level of conservation between S. cerevisiae
and C. elegans of co-expression ( < 5%) “real”, reflecting
evolution and species-specific interactions, or are we
just comparing noisy datasets ?
Species specific (idiosyncratic) coregulation:
“Efficient expression of the Saccharomyces cerevisiae glycolytic
gene ADH1 is dependent upon a cis-acting regulatory element
UASRPG found initially in genes encoding ribosomal proteins.”
Tornow and Santangelo, Gene, 1990
Noisy genomics data
Low (but significant) correlation between ChIP-on-chip data (sharing Transcription
Factor Binding Sites) and expression data in S.cerevisiae
Filtering out the noise by combining ChIP-on-chip
and co-expression in yeast
Correlation of co-regulation with functional interactions
Data set of gene pairs
Percent same pathway
Number of gene pairs
r > 0.5
r > 0.6
r > 0.7
Sharing  1 TFBS
Sharing  2 TFBS
Sharing  1 TFBS and r > 0.3
Sharing  1 TFBS and r > 0.4
Sharing  1 TFBS and r > 0.5
Sharing  1 TFBS and r > 0.6
Sharing  1 TFBS and r > 0.7
43
52
51
50
77
86
88
90
90
86
169,768
65,430
22,459
356,947
39,818
19,386
11,434
6,687
3,382
1,156
High level of conservation of co-regulation
after speciation
0.45
worm orthologous gene pairs of
yeast gene pairs with r > 0.6
and sharing TFBS
all worm gene pairs
0.4
frequency distribution
0.35
0.3
0.25
0.2
0.15
76 %
0.1
0.05
0
-0.5 -0.4 -0.3 -0.2 -0.1
0
0.1
0.2
0.3
0.4
co-expression correlation (r)
0.5
0.6
0.7
0.8
0.9
Comparing co-regulation in Bacteria indicates a level of conservation of 80%
(operons in B. subtilis versus regulons in E.coli)
NB:
1) Based on operon conservation is only 50%
2) Disregard cases of gene loss
Noisy genomics data lead to
drastic underestimations of
conservation of interactions
Conclusions co-regulation
conservation
• Gene co-regulation tends to be
conserved in Eukaryotes (76%) and in
prokaryotes (80%)
• In the case of gene duplication one
gene tends to maintain the coregulatory link  there appears to be
one functionally equivalent ortholog
Snel et al, Nucleic Acids Res 2004
Exploiting genomics data to predict the function for a
hypothetical protein: BolA
An interaction of BolA with a mono-thiol glutaredoxin ?
(STRING)
BolA
BolA and Grx occur as neighbors in a number of genomes
Bola
Grx
BolA and Grx have an (almost) identical phylogenetic distribution
BolA and Grx have been shown to interact in Y2H in S.cerevisiae
and D.melanogaster, and in Flag tag in S.cerevisiae
BolA phylogeny
Cell division / Cell wall
(oxidative) stress
BolA does have (predicted) interactions with cell-division / cell-wall proteins.
Those appear secondary to the link with GrX
 Genomic context analyses have obtained a higher resolution in function
prediction than phenotypic analyses
BolA is homologous to the peroxide reductase OsmC, suggesting a similar
function
BolA is, relative to other class II KH folds and sequences, most similar to OsmC
Protein Family
(PDB entry)
3D similarity to BolA.
DALI, Z-scores
Sequence profile similarity to BolA.
COMPASS, SW-score (E-value)
OsmC
Ohr
(1ml8A/1lqlA)
(1n2fA)
5.8 / 5.5
5.2
73
(2.4 E-5)
KH 1
(1hnxC)
5.3
46
(9.4 E-3)
DUF150 (1ib8A)
3.7
44
(4.2 E-2)
GMP synthase C (1gpmA)
2.9
57
(7.0 E-4)
KH 2
(1egaB)
3.8
35
(2.7 E-1)
RBFA
(1kkgA)
4.2
40
(9.6 E-2)
OsmC uses thiol groups of two, evolutionary conserved cysteines to
reduce substrates
Problem: The BolA family does not have conserved cysteines.
…It would have to obtain its reducing equivalents from elsewhere…
BolA family alignment
Prediction of interaction partner and molecular function complement each other
BolA interacts with GrX
?
BolA is (homologous to) a reductase
GrX provides BolA with reducing equivalents !?
There is a wealth of functional and structural
genomics data that can be related to the function
of individual proteins.
Exploiting that data is becoming a trade in itself
(biochemistry by other means)