Transcript Document

CMBI
Center for Molecular and Biomolecular Informatics
Drug design (Jacob de Vlieg, Organon)
Bacterial Genomics, systems biology of Lactobacillus
plantarum (Ronald Siezen)
Comparative Genomics (Martijn Huynen)
Protein structure bioinformatics (Gert Vriend)
Huynen group: genome comparison
De Vlieg group: drug design
Siezen group: bacterial genomics
Vriend’s group: predicting the effect of mutations on the protein structure
A mutation in rhodopsin (eye) impairs the binding of retinoic acid
Warning: Bioinformatics can be addictive
You can check out any time you want, but you can never leave.
Human genome, great expectations
A large fraction of the human genes has an unknown function
(Science, 2001)
Predicting Protein Function
What is function ?
Various levels of description:
Sequence
similarity/homology has the
largest relevance for
“Molecular Function”.
This aspect of protein
function is best conserved.
Molecular function can often
be predicted from similarities
between protein sequences
(BLAST), or structures.
Comparative genomics
The (somewhat) intelligent
comparative genomics meat
grinder
Prediction of protein function, pathways
Predicting functional interactions between proteins
by the co-occurrence of their genes in genomes.
Distribution of four M.genitalium genes among 25 genomes
MG299 (pta)
MG357(ackA)
MG019(dnaJ)
MG305(dnaK)
0001100001101011000101111
0001100001101011000101111
0011111101111011100111111
0011111101111011100111111
Using the mutual information between genes as a scoring heuristic for
their co-occurrence.
M(pta, ackA)=0.69 (phospotransacetylase, acetate kinase)
M(dnaJ, dnaK)=0.55 (heat shock proteins)
M(dnaJ, ackA)=0.19
YJR109C
D2085.1
88
96
YJL130C
Rv1384
sll0370
100
100
AF1274
AQ2101 & AQ1172
HP0919
88
92
93
EC0033
MTH997 & MTH996
83
MJ1378 & MJ1381
CarB
100
PyrAB
Gene fission in the evolution of carbamoyl phosphate
synthase B (carB)
Predicting functional relations between genes
using (conserved) genomic associations
http://www.bork.embl-heidelberg.de/STRING/
Genomic Context Types:
Conserved neighborhood
Co-occurrence
Gene fusion
Dandekar et al., 1998 Marcotte et al., 1999 Huynen and Bork 1998
Overbeek et al., 1999 Enright et al., 1999 Pellegrini et al., 1999
Snel et al., NAR 1999
von Mering et al., NAR 2002
Genomic associations correlate with a wide array
of functional interactions
Gene Order Conservation
4%
physical interaction
complex
metabolic pathway
non-metabolic pathway
process
hypothetical
unknown interaction
10%
30%
10%
6%
7%
Co-occurrence in Genomes
Gene Fusion
33%
22%
15%
23%
23%
56%
11%
14%
4%
7%
25%
Huynen et al, Genome research 2000
Predicting function of a disease gene
protein with unknown function: frataxin
• Friedreich’s ataxia
• No (homolog with) known function
P.falciparum
E.cuniculi
H.sapiens
D.melan.
C.elegans
S.cerevisiae
C.albicans
S.pombe
A.thaliana
M.jannaschii
A.pernix
E.coli
P.multocida
H.influenzae
V.cholerae
Buchnera
P.aeruginosa
X.fastidiosa
N.meningitidis
M.loti
C.crescentus
R.prowazekii
C.jejuni
H. pylori
D.radiodurans
M.tuberculosis
M.genitalium
B.subtilis
Synechocystis
A.aeolicus
cyaY Yfh1
hscB Jac1
hscA
ssq1
iscS Nfs1
iscU Isu1-2
iscA Isa1-2
fdx Yah1
RnaM
IscR
Hyp
Atm1
Nfu1
Arh1
Frataxin has co-evolved with hscA and hscB indicating that it plays a role in iron-sulfur cluster
Assembly (Huynen et al, Hum Mol Gen 2001)
Iron-Sulfur (2Fe-2S) cluster in the Rieske protein (Iwata et al, Structure 1996)
Prediction:
Confirmation:
Zooming in on one mitochondrial complex, NADH:ubiquinone
oxidoreductase (Complex I)
-Complex I deficiency is a severe hereditary disease (patients < 5
year) without therapy
-For 60% of the patients no mutation is found in known CI genes
Tracing the evolution of Complex I from 14 subunits in the
Bacteria to 46 subunits in the Mammals by comparative genome
analysis
Fungi: 37
Bacteria: 14 subunits
Mammals: 46
Plants: 30
Algae: 30
Phylogenomics for protein function prediction
An ancient paralog of
N7BM has been lost in
the same lineages as
N7BM itself, implicating
a possible role in
Complex I
An ancient paralog of N7BM has been lost in the same lineages as N7BM itself, implicating a
possible role in Complex I
Gabaldon, Rainey and Huynen, JMB. (2005)
Experimentally confirmed protein functions, predicted with various
types of context
13
gene order
3
gene fusion
3
regulatory
element
4
compl.
distribution
4
co-occurrence
Conclusions conservation of coregulation, interaction
• Gene co-regulation tends to be
conserved in Eukaryotes (76%) and in
prokaryotes (80%)
• In the case of gene duplication one
gene tends to maintain the coregulatory link  there appears to be
one functionally equivalent ortholog
Snel et al, Nucleic Acids Res 2004
There is a lot of information out there.
Combining it to make specific, testable
predictions that get us closer to
“function” is possible but does (still)
require handwork.
(Predictions are easy, specific
predictions a bit less so)