Symplectic Biology: universals in bacterial genomics 31 july 2006 Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected].

Download Report

Transcript Symplectic Biology: universals in bacterial genomics 31 july 2006 Génétique des Génomes Bactériens http://www.pasteur.fr/recherche/unites/REG/ [email protected].

Symplectic Biology:
universals in bacterial genomics
31 july 2006
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Symplectic biology:
The Delphic Boat
Biology is a science of
relationships between
objects rather than of
objects: from sun
together, plektein, to
weave
Proteins are part of
complexes, as are parts
in an engine
As for constructing a
boat, failing to
understand their
relationships will result in
ultimate failure of
synthetic biology
The Delphic Boat: Harvard University
Press, février 2003
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Authors
Génétique des Génomes Bactériens
Gang Fang
Evelyne Krin
Etienne Larsabal
Géraldine Pascal
Eduardo Rocha
Agnieszka Sekowska
Genoscope
Valérie Barbe
Stéphane Cruveiller
Sophie Mangenot
Géraldine Pascal
Zoé Rouy
David Vallenet
Claudine Médigue
Génétique in silico
 Marc Bailly-Béchet
 Massimo Vergassola
Abdus Salam International Center
In Theoretical Physics
Mudassar Iqbal
Matteo Marsili
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Context
Physics: matter, energy, time
Statistical physics: Physics + information
Biology: Physics + information, coding,
control...
Arithmetics: sequence of integers, recursivity,
coding…
Computation: Arithmetics + program +
machine...
A metaphor with practical consequences,
the genetic program: we know how to
manipulate the genes and their products
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
What is Life?
Three processes are needed for Life:
3. Information transfer (Living Computers?) => the goal of
genomics is to decipher the program associated to the machine
Driving force for a coupling between the genome structure and the
structure of the cell:
1. Metabolism
2. Compartmentalisation
The cell is the atom of life, with two strategies: a single envelope
(prokaryotes) or multiplication of membrane and skins(eukaryotes); this
is correlated with the genome sequence: at first sight prokaryotic
genomes look random and eukaryotic genomes look repeated
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Information transfer
Replication
“effective”)
(law: “complementarity”; concept:
Transcription (law: “complementarity”; concept:
“constructive”)
Translation (law: a “cypher”, the “genetic code”; concept:
“prospective” )
Myhill, J. (1952) Some philosophical implications of mathematical logic. Three classes of ideas. The Review of Metaphysics 6 :
165-198.
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Process
The action process
Replication, transcription, translation: high parallelism
“Beginning, Repeated Routine and Check Points, End”
The action is always oriented, with a beginning and an end
The control process of Check Points is rarely taken into
account in present research (except in replication/division),
but its role is essential to permit coordination of multiple
actions in parallel
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Machines…
What is computing?
Two processes are needed for computing:
A read/write machine
A program on a physical support (typically, a tape illustrates the
sequential string of symbols that makes up the program), split (in
practice) into two entities:
Program (providing the goal)
Data (providing the context)
The machine is distinct from the program
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Cells as computers
Genetic studies rest on the description of genomes as texts written
with a four letters alphabet: do cells behave as computers?
Horizontal Gene Transfer
Virus
Genetic engineering => reconstruction of the hepatitis C virus
Animal cloning
all point to separation between
A « Machine » (cell factory)
and
Data + Programme
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Turing
Is there a map of the cell
in the chromosome?
If the machine has not only to behave as a
computer but has also to construct the
machine itself, one must find an image of
the machine somewhere in the machine
(John von Neumann)
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Drosophiloculus,
Homunculus ?
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Transition
Genome organisation
Is the gene order random in the chromosomes?
At first sight, consistent with different DNA management
processes not much is conserved, and genes transferred from
other organisms are distributed throughout genomes
However, groups of genes such as operons or pathogenicity
islands tend to cluster in specific places, and they code for
proteins with common functions. « Persistent » genes are
clustered together
Also, some motifs are ubiquitously present, suggesting
general rules constraining genome organisation
E Larsabal, A Danchin
Genomes are covered with ubiquitous 11bp periodic patterns, the "class A flexible patterns"
BMC Bioinformatics (2005) 6: 206
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Universal Motifs
Larsabal E, Danchin A.
Genomes are covered with ubiquitous 11 bp periodic patterns, the "class A flexible patterns »
BMC Bioinformatics. 2005 6:206.
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
A universal rule: class
A flexible patterns
The flexible nature of the patterns permits DNA to
accomodate superturns or local bending
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
A universal feature of the program: the period of 10-11.5
Helicobacter pylori
real
0
10
20
30
40
50
60
70
80
90
100bp
model
0
10
20
30
40
50
60
70
80
90
100bp
0.01
difference
0
fs(G-)
-0.01
0
10
20
30
40
50
60
70
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
80
90
100bp
motifs
[email protected]
Flexible motifs of type A
1-xAxxxxTxxxxAxxxxTTxxxxxAxxxxTxxxxAxxx: All kindoms
2-xxxxxxxxxxxGxxxxTTxxxCxxxxxTxxxxxxxxx: Proteobacteria
4-xxxxxxTxxxxAGxxxTTxxxxxxxxTxxxxxxxxxx: Archaea
5'-xxx-10xxxxxxxxx0xxxxxxxx10xxxxxxbp-3'
TTxxxGxxxTxxxxxxxxxxTT
G
TT
T
TT
AA
C
A
AA
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
The nucleotides composing this class
A flexible pattern are fully accessible
through this side and the dinucleotides
are set in major grooves.
The nucleotides composing this class
A flexible pattern are accessible
through this side too but the
dinucleotides are set in minor grooves.
methods
[email protected]
From the leading
strand to the lagging
strand
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Ori
0
Ori
270
Escherichia coli
90
Bacillus subtilis
270
90
75% leading
55% leading
Ter
CDSs density
180
Leading CDSs density
Ori
270
Treponema pallidum
65% leading
180
90
270
Ter
Génétique
des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
Ter
Ori
Thermoanaerobacter
tengcongensis
87% leading
90
Ter
[email protected]
To lead or to lag...
Is it possible to see whether the position of
genes in the chromosome is randomly
distributed on the leading and lagging strand?
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
To lag or to lead...
Chosing arbitrarily an
origin of replication
and a property of the
strand (base
composition, codon
composition, codon
usage, amino acid
composition of the
coded protein…) one
can use discriminant
analysis to see
whether the
hypothesis holds.
E. Rocha, A. Danchin & A. Viari Universal replication biases in bacteria. Mol. Microbiol. (1999) 32: 11-16
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
That is the question…
.
0,85
0,8
1
accuracy
Bacillus
subtilis
0,75
1
Borrelia
burgdorferi
0,9
0,7
0,65
0,6
Chlamydia
trachomatis
0,9
0,8
0,8
0,7
0,7
0,6
0,6
0,5
0,5
Bases
0,55
0,5
0,45
0
20
40
60
80
100
0,4
0
20
40
60
80
0,4
100
0
20
40
60
80
100
Amino acids
0,75
0,75
Escherichia
coli
accuracy
0,7
Heamophilus
influenzae
0,7
0,65
0,55
0,55
0,5
0,5
0,5
0
0,8
0,75
20
40
60
80
10
0
0,45
0,45
0
20
40
60
80
100
accuracy
0,7
0,6
20
40
60
80
100
Treponema
pallidum
0,9
0,8
0,6
0,55
0
1
Mycobacterium
tuberculosis
0,7
0,65
0,65
Dinucleotides
0,4
0,75
Méthanobacterium
thermoautotrophicum
Codons
0,6
0,6
0,55
Helicobacter
Pylori
0,65
0,65
0,6
0,45
0,7
0,7
0,55
0,6
0,5
0,5
0,45
0,4
0
20
40
60
position (%)
80
0,45
100
0,5
0
20
40
60
80
100
0,4
position (%)
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
0
20
40
60
80
100
position (%)
[email protected]
Visible in proteins…
GT on the leading strand, CA on the lagging strand...
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Essential genes in
Bacillus subtilis: function
enters
essential genes
non-essential genes
100%
lagging
75%
50%
leading
25%
0%
non-highly
expressed
hi ghly
expressed
non-highly
expressed
hi ghly
expressed
Rocha EP, Danchin A.
Essentiality, not expressiveness, drives gene-strand bias in bacteria
Nature Genetics. 2003 34:377-378.
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
When polymerases
collide
Co-oriented
Head-on
Consequences:
DNAP
deceleration
Arrest of
RNAP & DNAP
1.
Replication slow-down
2.
Loss of transcripts
Consequences:
End of
transcription
Transcription
abortion
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
1.
Aborted transcripts
2.
Truncated essential
proteins
[email protected]
From function to
structure, not vice
versa
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
The first discovery of
genomics
In 1991, at the EU meeting on genome programs in Elounda,
Greece, the presentation of the yeast chromosome III and the first
100 kb of the Bacillus subtilis genome revealed that, contrary to
expectation (the only cases where this had been observed were
phages, for obvious reasons), at least half of the genes uncovered
were totally unknown, whether in structure or in function
Among reasons for that is our present lack of deep knowledge of
metabolism, as well as our lack of knowledge of the way new
genes are created, selecting function first, then recruiting a
structure that will be improved as it is submitted to natural
selection for increased fitness of its host (acquisitive evolution)
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Genome projects
2045 ongoing projects, 348 completed, mostly from
microbes (228 with more than 1500 genes, more or
less correctly annotated)
144,116,054,623 nucleotides at International Nucleotide
Sequence Database Collaboration (INSDC)
Microbes make 50% of the Earth protoplasm
40-50% coding DNA sequences (CDSs) do not
correspond to known functions; 10% correspond to
the core genome ( « persistent » genes)
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
The darwinian trio
Variation / Selection / Amplification
Stabilisation
Evolution
creates
Function
captures (recruits)
Structure
code
Sequence
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
What functions for life?
Extending Cuvier’s vision
Root function and helper functions [the root
function of a printer is “printing”, “feeding
paper”, “supplying energy” are helper
functions]
To be — to persist in time — can be proposed
as the root function of living organisms
Self-consistence implies correlation of forms
Fighting weathering implies chemical turnover
(metabolism) and protection (compartmentalisation)
Exploration, associated to sensing and memorizing
is the discovery that made life as we know it
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/ causeries/causeries.html [email protected]
What functions for life?
e xploration
s ens ing
re pre s entation (me mory)
Compartmentalisation
Metabolism
ene rgis ation
shaping
making
making
an
a
Information trans fer
re plication
construction trans cription
making
of biomas s
appendages pre cursors
env elope skeleton
phospholipid and e nvelope bios ynthe s is
trans lation
trans port
degradation
editing
circulation (chane lling)
s alvage
folding/s caffolding
prote ction
cleaning
control
partitioning
inactivation
storage
mainte nance (repair, de gradation)
modification (labe lling, m aturation,
addre s s n
i g, stabilis ation, prote ction,
control)
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
From sequence to
function?
Inductive reasoning combines sequence data and in silico
predictions, that are tested using expression profiling
(transcriptome and proteome) as well as many other
« neighborhoods », such as amino acid composition,
isolectric point in proteins or codon usage biases in
genes
One notes that regulation evolves much faster than any
other process: in the long run the structural genes are the
most important ones
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Three examples of the
role of the context
Microbial genes are of infinite diversity but there
exists universals; only about 10% of their genes are of
persistent and recognized function; we do not have yet
a fair idea of the number of microbial species; the
number of genes in a given species is highly variable
(horizontal gene transfer)
Example 1: persistent genes
Example 2: orphan genes and universal amino acids
[Example 3: a new metabolic pathway]
....
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Gene Persistence
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Gene persistence
Some of the essential genes missing from the list of
persistent genes have diverged considerably. To assess
the contribution of this effect we measured for each pair
of genomes the correlation between the similarity of
orthologous pairs and that of the 16S rRNA. The
correlations were high. For example (A), 38% (resp.
48%) of B. subtilis (resp. E. coli) persistent genes
showed a correlation coefficient >0.9 between the
sequence similarity of the pair of orthologs and the 16S
RNA.
In contrast, some genes (B) evolve in an erratic way.
This may be due to horizontal gene transfer, local
adaptations leading to faster or slower evolutionary pace,
or simply wrong assignments of orthology. The latter can
be a significant problem, especially in large protein
families. The genes presenting such an erratic pattern
are seldom found in the persistent set.
G Fang, EPC Rocha, A Danchin
How essential are non-essential genes?
Mol Biol Evol (2005) 22: 2147-2156
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Genomic islands
A clustering method based on the
analysis of codon usage biases using
an information theory groups the
genes into homogeneous clusters,
which are not distributed randomly in
the chromosome. The method allows
finding both the specific codon usage
bias in a class and the most relevant
number of classes (4 for E. coli and 5
for B. subtilis). One cluster is related
to expression levels. Other groups
feature an over-representation of
genes belonging to different
functional groups: horizontally
transferred genes, motility and
intermediary metabolism.
M. Bailly-Béchet
M Bailly-Bechet, A Danchin, M Iqbal, M Marsili, M Vergassola
Codon usage domains over bacterial chromosomes
PLoS Computational Biology (2006) 2: april 20th
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
What functions for life?
Scenario for the origin of life
To be — to persist in time — can be proposed as
the root function of living organisms
Fighting weathering implies chemical turnover
(metabolism) on solid surfaces and immobility requires
protection (compartmentalisation)
Compartementalised metabolism creates surface
substitutes (RNA)
Exploration, associated to sensing and memorizing
(information transfer) is the discovery that made life as
we know it
A Danchin Homeotopic transformation and the origin of translation Progress in Biophysics and Molecular Biology (1989) 54: 81-86
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/ causeries/causeries.html [email protected]
Persistent genes organisation
recapitulates the origin of life
Using 228 genomes, we have
identified genes that tend to remain
close to one another; this « mutual
attraction » constructs a remarkable
network made of three concentric
circles
The external network, made from
genes of intermediary metabolism, is
highly fragmented; the middle network
has tARN synthetases at its core, et le
internal network, almost continuous
makes the core of information transfer
around the ribosome, transcription
and replication
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
G. Fang
[email protected]
Thank you
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Coping with cold and aging:
Lessons from Pseudoalteromonas
haloplanktis
1 august 2006
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Authors
Génétique des Génomes Bactériens
Gang Fang
Evelyne Krin
Etienne Larsabal
Géraldine Pascal
Eduardo Rocha
Agnieszka Sekowska
Genoscope
Valérie Barbe
Stéphane Cruveiller
Sophie Mangenot
Géraldine Pascal
Zoé Rouy
David Vallenet
Claudine Médigue
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
University of Hong Kong
Christine Ho
Frankie Cheung
University of Liège
Georges Feller
University of Naples
Luisa Tutino
University of Strasbourg
Philippe Bertin
University of Stockholm
Gunnar von Heijne
[email protected]
Ecological
neighbourhood:
growth in the cold
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Challenges posed by cold
Protein folding
RNA folding
Membrane fluidity
Sensitivity towards oxygen and radicals
Distribution of aminoacids in proteins...
=> A series of unexpected answers
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
folding
Genome
Organisation
P. haloplanktis
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
origins
[email protected]
Bias in amino acid distribution
Neighbourhood:
distribution of
aminoacids in the
proteome
G. Pascal
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Universal biases in protein
amino acid composition
First axis: separates Integral Inner Membrane Proteins
(IIMP) from the rest; driven by opposition between charged
and large hydrophobic residues
Second axis: separates proteins according to an
opposition driven by the G+C content of the first codon
base
Third axis: separates proteins by their content in
aromatic amino acids; enriched in orphan proteins
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Temperature-dependent
biases in protein amino acid
composition
• The general trend of amino acid composition bias
is to avoid some aminoacids at higher
temperatures (associated to aging processes)
• Mesophilic bacteria belong to at least two
different classes (in a 5-clusters analysis)
• Biases are always dominated by the IIMP
clustering
C Médigue, E Krin, G Pascal, V Barbe, A Bernsel, PN Bertin, F Cheung, S Cruveiller, S D'Amico, A Duilio, G Fang, G Feller, C Ho, S Mangenot, G
Marino, J Nilsson, E Parrilli, EPC Rocha, Z Rouy, A Sekowska, ML Tutino, D Vallenet, G von Heijne, A Danchin
Coping with cold: the genome of the versatile marine Antarctica bacterium Pseudoalteromonas haloplanktis TAC125
Genome Research (2005) 15: 1325-1335
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Biases correlated with temperature
An asparagine bias is specific to psychrophiles
IIMPs
Motility
Envelope,
outer
membrane
Transport
(TonB),
secretion
Adaptation to
stress
DNA and
RAN
metabolism
53%
mesophiles
55%
psychrophiles
62%
thermophiles
G. Pascal
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
isoaspartate
Orphans: the gluons
A remarkable role of aromatic amino acids creates
a universal bias. Expressed orphan proteins are
enriched in these residues, suggesting that they
might participate in a process of gain of function
during evolution. We postulate that the majority is
made of proteins — gluons — involved in
stabilising complexes, thus defining the "self" of
the species.
G Pascal, C Médigue, A Danchin
Universal biases in protein composition of model prokaryotes
Proteins (2005) 60: 27-35
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Why aromatic residues in orphan proteins?
From orphans « Gluons »
G. Pascal
Orphan loose their status in the course of evolution: Rocha. 2002.
Pedulla. 2003
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]
Thank you
Génétique des Génomes Bactériens
http://www.pasteur.fr/recherche/unites/REG/
[email protected]