Symplectic Biology: universals in bacterial genomics 31 july 2006 Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected].
Download ReportTranscript Symplectic Biology: universals in bacterial genomics 31 july 2006 Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected].
Symplectic Biology: universals in bacterial genomics 31 july 2006 Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Symplectic biology: The Delphic Boat Biology is a science of relationships between objects rather than of objects: from sun together, plektein, to weave Proteins are part of complexes, as are parts in an engine As for constructing a boat, failing to understand their relationships will result in ultimate failure of synthetic biology The Delphic Boat: Harvard University Press, février 2003 Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Authors Génétique des Génomes Bactériens Gang Fang Evelyne Krin Etienne Larsabal Géraldine Pascal Eduardo Rocha Agnieszka Sekowska Genoscope Valérie Barbe Stéphane Cruveiller Sophie Mangenot Géraldine Pascal Zoé Rouy David Vallenet Claudine Médigue Génétique in silico Marc Bailly-Béchet Massimo Vergassola Abdus Salam International Center In Theoretical Physics Mudassar Iqbal Matteo Marsili Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Context Physics: matter, energy, time Statistical physics: Physics + information Biology: Physics + information, coding, control... Arithmetics: sequence of integers, recursivity, coding… Computation: Arithmetics + program + machine... A metaphor with practical consequences, the genetic program: we know how to manipulate the genes and their products Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] What is Life? Three processes are needed for Life: 3. Information transfer (Living Computers?) => the goal of genomics is to decipher the program associated to the machine Driving force for a coupling between the genome structure and the structure of the cell: 1. Metabolism 2. Compartmentalisation The cell is the atom of life, with two strategies: a single envelope (prokaryotes) or multiplication of membrane and skins(eukaryotes); this is correlated with the genome sequence: at first sight prokaryotic genomes look random and eukaryotic genomes look repeated Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Information transfer Replication “effective”) (law: “complementarity”; concept: Transcription (law: “complementarity”; concept: “constructive”) Translation (law: a “cypher”, the “genetic code”; concept: “prospective” ) Myhill, J. (1952) Some philosophical implications of mathematical logic. Three classes of ideas. The Review of Metaphysics 6 : 165-198. Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Process The action process Replication, transcription, translation: high parallelism “Beginning, Repeated Routine and Check Points, End” The action is always oriented, with a beginning and an end The control process of Check Points is rarely taken into account in present research (except in replication/division), but its role is essential to permit coordination of multiple actions in parallel Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Machines… What is computing? Two processes are needed for computing: A read/write machine A program on a physical support (typically, a tape illustrates the sequential string of symbols that makes up the program), split (in practice) into two entities: Program (providing the goal) Data (providing the context) The machine is distinct from the program Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Cells as computers Genetic studies rest on the description of genomes as texts written with a four letters alphabet: do cells behave as computers? Horizontal Gene Transfer Virus Genetic engineering => reconstruction of the hepatitis C virus Animal cloning all point to separation between A « Machine » (cell factory) and Data + Programme Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Turing Is there a map of the cell in the chromosome? If the machine has not only to behave as a computer but has also to construct the machine itself, one must find an image of the machine somewhere in the machine (John von Neumann) Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Drosophiloculus, Homunculus ? Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Transition Genome organisation Is the gene order random in the chromosomes? At first sight, consistent with different DNA management processes not much is conserved, and genes transferred from other organisms are distributed throughout genomes However, groups of genes such as operons or pathogenicity islands tend to cluster in specific places, and they code for proteins with common functions. « Persistent » genes are clustered together Also, some motifs are ubiquitously present, suggesting general rules constraining genome organisation E Larsabal, A Danchin Genomes are covered with ubiquitous 11bp periodic patterns, the "class A flexible patterns" BMC Bioinformatics (2005) 6: 206 Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Universal Motifs Larsabal E, Danchin A. Genomes are covered with ubiquitous 11 bp periodic patterns, the "class A flexible patterns » BMC Bioinformatics. 2005 6:206. Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] A universal rule: class A flexible patterns The flexible nature of the patterns permits DNA to accomodate superturns or local bending Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] A universal feature of the program: the period of 10-11.5 Helicobacter pylori real 0 10 20 30 40 50 60 70 80 90 100bp model 0 10 20 30 40 50 60 70 80 90 100bp 0.01 difference 0 fs(G-) -0.01 0 10 20 30 40 50 60 70 Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ 80 90 100bp motifs [email protected] Flexible motifs of type A 1-xAxxxxTxxxxAxxxxTTxxxxxAxxxxTxxxxAxxx: All kindoms 2-xxxxxxxxxxxGxxxxTTxxxCxxxxxTxxxxxxxxx: Proteobacteria 4-xxxxxxTxxxxAGxxxTTxxxxxxxxTxxxxxxxxxx: Archaea 5'-xxx-10xxxxxxxxx0xxxxxxxx10xxxxxxbp-3' TTxxxGxxxTxxxxxxxxxxTT G TT T TT AA C A AA Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ The nucleotides composing this class A flexible pattern are fully accessible through this side and the dinucleotides are set in major grooves. The nucleotides composing this class A flexible pattern are accessible through this side too but the dinucleotides are set in minor grooves. methods [email protected] From the leading strand to the lagging strand Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Ori 0 Ori 270 Escherichia coli 90 Bacillus subtilis 270 90 75% leading 55% leading Ter CDSs density 180 Leading CDSs density Ori 270 Treponema pallidum 65% leading 180 90 270 Ter Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ Ter Ori Thermoanaerobacter tengcongensis 87% leading 90 Ter [email protected] To lead or to lag... Is it possible to see whether the position of genes in the chromosome is randomly distributed on the leading and lagging strand? Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] To lag or to lead... Chosing arbitrarily an origin of replication and a property of the strand (base composition, codon composition, codon usage, amino acid composition of the coded protein…) one can use discriminant analysis to see whether the hypothesis holds. E. Rocha, A. Danchin & A. Viari Universal replication biases in bacteria. Mol. Microbiol. (1999) 32: 11-16 Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] That is the question… . 0,85 0,8 1 accuracy Bacillus subtilis 0,75 1 Borrelia burgdorferi 0,9 0,7 0,65 0,6 Chlamydia trachomatis 0,9 0,8 0,8 0,7 0,7 0,6 0,6 0,5 0,5 Bases 0,55 0,5 0,45 0 20 40 60 80 100 0,4 0 20 40 60 80 0,4 100 0 20 40 60 80 100 Amino acids 0,75 0,75 Escherichia coli accuracy 0,7 Heamophilus influenzae 0,7 0,65 0,55 0,55 0,5 0,5 0,5 0 0,8 0,75 20 40 60 80 10 0 0,45 0,45 0 20 40 60 80 100 accuracy 0,7 0,6 20 40 60 80 100 Treponema pallidum 0,9 0,8 0,6 0,55 0 1 Mycobacterium tuberculosis 0,7 0,65 0,65 Dinucleotides 0,4 0,75 Méthanobacterium thermoautotrophicum Codons 0,6 0,6 0,55 Helicobacter Pylori 0,65 0,65 0,6 0,45 0,7 0,7 0,55 0,6 0,5 0,5 0,45 0,4 0 20 40 60 position (%) 80 0,45 100 0,5 0 20 40 60 80 100 0,4 position (%) Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ 0 20 40 60 80 100 position (%) [email protected] Visible in proteins… GT on the leading strand, CA on the lagging strand... Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Essential genes in Bacillus subtilis: function enters essential genes non-essential genes 100% lagging 75% 50% leading 25% 0% non-highly expressed hi ghly expressed non-highly expressed hi ghly expressed Rocha EP, Danchin A. Essentiality, not expressiveness, drives gene-strand bias in bacteria Nature Genetics. 2003 34:377-378. Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] When polymerases collide Co-oriented Head-on Consequences: DNAP deceleration Arrest of RNAP & DNAP 1. Replication slow-down 2. Loss of transcripts Consequences: End of transcription Transcription abortion Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ 1. Aborted transcripts 2. Truncated essential proteins [email protected] From function to structure, not vice versa Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] The first discovery of genomics In 1991, at the EU meeting on genome programs in Elounda, Greece, the presentation of the yeast chromosome III and the first 100 kb of the Bacillus subtilis genome revealed that, contrary to expectation (the only cases where this had been observed were phages, for obvious reasons), at least half of the genes uncovered were totally unknown, whether in structure or in function Among reasons for that is our present lack of deep knowledge of metabolism, as well as our lack of knowledge of the way new genes are created, selecting function first, then recruiting a structure that will be improved as it is submitted to natural selection for increased fitness of its host (acquisitive evolution) Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Genome projects 2045 ongoing projects, 348 completed, mostly from microbes (228 with more than 1500 genes, more or less correctly annotated) 144,116,054,623 nucleotides at International Nucleotide Sequence Database Collaboration (INSDC) Microbes make 50% of the Earth protoplasm 40-50% coding DNA sequences (CDSs) do not correspond to known functions; 10% correspond to the core genome ( « persistent » genes) Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] The darwinian trio Variation / Selection / Amplification Stabilisation Evolution creates Function captures (recruits) Structure code Sequence Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] What functions for life? Extending Cuvier’s vision Root function and helper functions [the root function of a printer is “printing”, “feeding paper”, “supplying energy” are helper functions] To be — to persist in time — can be proposed as the root function of living organisms Self-consistence implies correlation of forms Fighting weathering implies chemical turnover (metabolism) and protection (compartmentalisation) Exploration, associated to sensing and memorizing is the discovery that made life as we know it Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ causeries/causeries.html [email protected] What functions for life? e xploration s ens ing re pre s entation (me mory) Compartmentalisation Metabolism ene rgis ation shaping making making an a Information trans fer re plication construction trans cription making of biomas s appendages pre cursors env elope skeleton phospholipid and e nvelope bios ynthe s is trans lation trans port degradation editing circulation (chane lling) s alvage folding/s caffolding prote ction cleaning control partitioning inactivation storage mainte nance (repair, de gradation) modification (labe lling, m aturation, addre s s n i g, stabilis ation, prote ction, control) Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] From sequence to function? Inductive reasoning combines sequence data and in silico predictions, that are tested using expression profiling (transcriptome and proteome) as well as many other « neighborhoods », such as amino acid composition, isolectric point in proteins or codon usage biases in genes One notes that regulation evolves much faster than any other process: in the long run the structural genes are the most important ones Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Three examples of the role of the context Microbial genes are of infinite diversity but there exists universals; only about 10% of their genes are of persistent and recognized function; we do not have yet a fair idea of the number of microbial species; the number of genes in a given species is highly variable (horizontal gene transfer) Example 1: persistent genes Example 2: orphan genes and universal amino acids [Example 3: a new metabolic pathway] .... Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Gene Persistence Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Gene persistence Some of the essential genes missing from the list of persistent genes have diverged considerably. To assess the contribution of this effect we measured for each pair of genomes the correlation between the similarity of orthologous pairs and that of the 16S rRNA. The correlations were high. For example (A), 38% (resp. 48%) of B. subtilis (resp. E. coli) persistent genes showed a correlation coefficient >0.9 between the sequence similarity of the pair of orthologs and the 16S RNA. In contrast, some genes (B) evolve in an erratic way. This may be due to horizontal gene transfer, local adaptations leading to faster or slower evolutionary pace, or simply wrong assignments of orthology. The latter can be a significant problem, especially in large protein families. The genes presenting such an erratic pattern are seldom found in the persistent set. G Fang, EPC Rocha, A Danchin How essential are non-essential genes? Mol Biol Evol (2005) 22: 2147-2156 Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Genomic islands A clustering method based on the analysis of codon usage biases using an information theory groups the genes into homogeneous clusters, which are not distributed randomly in the chromosome. The method allows finding both the specific codon usage bias in a class and the most relevant number of classes (4 for E. coli and 5 for B. subtilis). One cluster is related to expression levels. Other groups feature an over-representation of genes belonging to different functional groups: horizontally transferred genes, motility and intermediary metabolism. M. Bailly-Béchet M Bailly-Bechet, A Danchin, M Iqbal, M Marsili, M Vergassola Codon usage domains over bacterial chromosomes PLoS Computational Biology (2006) 2: april 20th Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] What functions for life? Scenario for the origin of life To be — to persist in time — can be proposed as the root function of living organisms Fighting weathering implies chemical turnover (metabolism) on solid surfaces and immobility requires protection (compartmentalisation) Compartementalised metabolism creates surface substitutes (RNA) Exploration, associated to sensing and memorizing (information transfer) is the discovery that made life as we know it A Danchin Homeotopic transformation and the origin of translation Progress in Biophysics and Molecular Biology (1989) 54: 81-86 Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ causeries/causeries.html [email protected] Persistent genes organisation recapitulates the origin of life Using 228 genomes, we have identified genes that tend to remain close to one another; this « mutual attraction » constructs a remarkable network made of three concentric circles The external network, made from genes of intermediary metabolism, is highly fragmented; the middle network has tARN synthetases at its core, et le internal network, almost continuous makes the core of information transfer around the ribosome, transcription and replication Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ G. Fang [email protected] Thank you Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Coping with cold and aging: Lessons from Pseudoalteromonas haloplanktis 1 august 2006 Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Authors Génétique des Génomes Bactériens Gang Fang Evelyne Krin Etienne Larsabal Géraldine Pascal Eduardo Rocha Agnieszka Sekowska Genoscope Valérie Barbe Stéphane Cruveiller Sophie Mangenot Géraldine Pascal Zoé Rouy David Vallenet Claudine Médigue Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ University of Hong Kong Christine Ho Frankie Cheung University of Liège Georges Feller University of Naples Luisa Tutino University of Strasbourg Philippe Bertin University of Stockholm Gunnar von Heijne [email protected] Ecological neighbourhood: growth in the cold Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Challenges posed by cold Protein folding RNA folding Membrane fluidity Sensitivity towards oxygen and radicals Distribution of aminoacids in proteins... => A series of unexpected answers Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] folding Genome Organisation P. haloplanktis Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ origins [email protected] Bias in amino acid distribution Neighbourhood: distribution of aminoacids in the proteome G. Pascal Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Universal biases in protein amino acid composition First axis: separates Integral Inner Membrane Proteins (IIMP) from the rest; driven by opposition between charged and large hydrophobic residues Second axis: separates proteins according to an opposition driven by the G+C content of the first codon base Third axis: separates proteins by their content in aromatic amino acids; enriched in orphan proteins Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Temperature-dependent biases in protein amino acid composition • The general trend of amino acid composition bias is to avoid some aminoacids at higher temperatures (associated to aging processes) • Mesophilic bacteria belong to at least two different classes (in a 5-clusters analysis) • Biases are always dominated by the IIMP clustering C Médigue, E Krin, G Pascal, V Barbe, A Bernsel, PN Bertin, F Cheung, S Cruveiller, S D'Amico, A Duilio, G Fang, G Feller, C Ho, S Mangenot, G Marino, J Nilsson, E Parrilli, EPC Rocha, Z Rouy, A Sekowska, ML Tutino, D Vallenet, G von Heijne, A Danchin Coping with cold: the genome of the versatile marine Antarctica bacterium Pseudoalteromonas haloplanktis TAC125 Genome Research (2005) 15: 1325-1335 Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Biases correlated with temperature An asparagine bias is specific to psychrophiles IIMPs Motility Envelope, outer membrane Transport (TonB), secretion Adaptation to stress DNA and RAN metabolism 53% mesophiles 55% psychrophiles 62% thermophiles G. Pascal Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] isoaspartate Orphans: the gluons A remarkable role of aromatic amino acids creates a universal bias. Expressed orphan proteins are enriched in these residues, suggesting that they might participate in a process of gain of function during evolution. We postulate that the majority is made of proteins — gluons — involved in stabilising complexes, thus defining the "self" of the species. G Pascal, C Médigue, A Danchin Universal biases in protein composition of model prokaryotes Proteins (2005) 60: 27-35 Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Why aromatic residues in orphan proteins? From orphans « Gluons » G. Pascal Orphan loose their status in the course of evolution: Rocha. 2002. Pedulla. 2003 Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected] Thank you Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected]