Transcript Document
CMBI Center for Molecular and Biomolecular Informatics Drug design (Jacob de Vlieg, Organon) Bacterial Genomics, systems biology of Lactobacillus plantarum (Ronald Siezen) Comparative Genomics (Martijn Huynen) Protein structure bioinformatics (Gert Vriend) Huynen group: genome comparison De Vlieg group: drug design Siezen group: bacterial genomics Vriend’s group: predicting the effect of mutations on the protein structure A mutation in rhodopsin (eye) impairs the binding of retinoic acid Warning: Bioinformatics can be addictive You can check out any time you want, but you can never leave. Human genome, great expectations A large fraction of the human genes has an unknown function (Science, 2001) Predicting Protein Function What is function ? Various levels of description: Sequence similarity/homology has the largest relevance for “Molecular Function”. This aspect of protein function is best conserved. Molecular function can often be predicted from similarities between protein sequences (BLAST), or structures. Comparative genomics The (somewhat) intelligent comparative genomics meat grinder Prediction of protein function, pathways Predicting functional interactions between proteins by the co-occurrence of their genes in genomes. Distribution of four M.genitalium genes among 25 genomes MG299 (pta) MG357(ackA) MG019(dnaJ) MG305(dnaK) 0001100001101011000101111 0001100001101011000101111 0011111101111011100111111 0011111101111011100111111 Using the mutual information between genes as a scoring heuristic for their co-occurrence. M(pta, ackA)=0.69 (phospotransacetylase, acetate kinase) M(dnaJ, dnaK)=0.55 (heat shock proteins) M(dnaJ, ackA)=0.19 YJR109C D2085.1 88 96 YJL130C Rv1384 sll0370 100 100 AF1274 AQ2101 & AQ1172 HP0919 88 92 93 EC0033 MTH997 & MTH996 83 MJ1378 & MJ1381 CarB 100 PyrAB Gene fission in the evolution of carbamoyl phosphate synthase B (carB) Predicting functional relations between genes using (conserved) genomic associations http://www.bork.embl-heidelberg.de/STRING/ Genomic Context Types: Conserved neighborhood Co-occurrence Gene fusion Dandekar et al., 1998 Marcotte et al., 1999 Huynen and Bork 1998 Overbeek et al., 1999 Enright et al., 1999 Pellegrini et al., 1999 Snel et al., NAR 1999 von Mering et al., NAR 2002 Genomic associations correlate with a wide array of functional interactions Gene Order Conservation 4% physical interaction complex metabolic pathway non-metabolic pathway process hypothetical unknown interaction 10% 30% 10% 6% 7% Co-occurrence in Genomes Gene Fusion 33% 22% 15% 23% 23% 56% 11% 14% 4% 7% 25% Huynen et al, Genome research 2000 Predicting function of a disease gene protein with unknown function: frataxin • Friedreich’s ataxia • No (homolog with) known function P.falciparum E.cuniculi H.sapiens D.melan. C.elegans S.cerevisiae C.albicans S.pombe A.thaliana M.jannaschii A.pernix E.coli P.multocida H.influenzae V.cholerae Buchnera P.aeruginosa X.fastidiosa N.meningitidis M.loti C.crescentus R.prowazekii C.jejuni H. pylori D.radiodurans M.tuberculosis M.genitalium B.subtilis Synechocystis A.aeolicus cyaY Yfh1 hscB Jac1 hscA ssq1 iscS Nfs1 iscU Isu1-2 iscA Isa1-2 fdx Yah1 RnaM IscR Hyp Atm1 Nfu1 Arh1 Frataxin has co-evolved with hscA and hscB indicating that it plays a role in iron-sulfur cluster Assembly (Huynen et al, Hum Mol Gen 2001) Iron-Sulfur (2Fe-2S) cluster in the Rieske protein (Iwata et al, Structure 1996) Prediction: Confirmation: Zooming in on one mitochondrial complex, NADH:ubiquinone oxidoreductase (Complex I) -Complex I deficiency is a severe hereditary disease (patients < 5 year) without therapy -For 60% of the patients no mutation is found in known CI genes Tracing the evolution of Complex I from 14 subunits in the Bacteria to 46 subunits in the Mammals by comparative genome analysis Fungi: 37 Bacteria: 14 subunits Mammals: 46 Plants: 30 Algae: 30 Phylogenomics for protein function prediction An ancient paralog of N7BM has been lost in the same lineages as N7BM itself, implicating a possible role in Complex I An ancient paralog of N7BM has been lost in the same lineages as N7BM itself, implicating a possible role in Complex I Gabaldon, Rainey and Huynen, JMB. (2005) Experimentally confirmed protein functions, predicted with various types of context 13 gene order 3 gene fusion 3 regulatory element 4 compl. distribution 4 co-occurrence Conclusions conservation of coregulation, interaction • Gene co-regulation tends to be conserved in Eukaryotes (76%) and in prokaryotes (80%) • In the case of gene duplication one gene tends to maintain the coregulatory link there appears to be one functionally equivalent ortholog Snel et al, Nucleic Acids Res 2004 There is a lot of information out there. Combining it to make specific, testable predictions that get us closer to “function” is possible but does (still) require handwork. (Predictions are easy, specific predictions a bit less so)