Evolution of regulatory interactions in bacteria

Download Report

Transcript Evolution of regulatory interactions in bacteria

Comparative genomics and
evolution of regulatory
interactions in bacteria
Mikhail Gelfand
Research and Training Center of Bioinformatics
Institute for Information Transmission Problems
Russian Academy of Sciences
September 2006
Это – ряд наблюдений. В углу – тепло.
Взгляд оставляет на вещи след.
Вода представляет собой стекло.
Человек страшней, чем его скелет.
Иосиф Бродский
A list of some observations. In a corner, it’s warm.
A glance leaves an imprint on anything it’s dwelt on.
Water is glass’s most public form.
Man is more frightening than its skeleton.
Joseph Brodsky
Basic assumptions and techniques
• Phylogenetic footprinting (Ross Hardison, eukaryotes, 1988):
regulatory (transcription factor-binding) sites are more
conserved than surrounding non-coding regions
=> TF-binding sites are seen as conserved islands in
multiple alignments of gene upstream regions.
• Works for close genomes (e.g. E.coli – Salmonella, sometimes
Yersinia), where upstream regions are alignable.
• Ignores site turnover
• Consistency filtering (Gelfand and Mironov, 1999, bacteria):
regulatory systems are biologically reasonable
=> regulons are conserved (more or less)
=> true sites occur upstream of orthologous genes
(false sites are scattered at random)
• need to take care of the operon structure
• assumes conservation of TF-binding motif in DNA
• ignores evolution of regulatory systems
Conserved motif upstream of nrd genes
Identification of the candidate regulator
by the analysis of phyletic patterns
• COG1327: the only COG with exactly the
same phylogenetic pattern as the motif
– “large scale” on the level of major taxa
– “small scale” within major taxa:
• absent in small parasites among alpha- and gammaproteobacteria
• absent in Desulfovibrio spp. among delta-proteobacteria
• absent in Nostoc sp. among cyanobacteria
• absent in Oenococcus and Leuconostoc among Firmicutes
• present only in Treponema denticola among four spirochetes
COG1327 “Predicted transcriptional regulator,
consists of a Zn-ribbon and ATP-cone domains”:
regulator of the riboflavin pathway?
Additional evidence – 1
• nrdR is
sometimes
clustered
with nrd
genes or
with
replication
genes
dnaB, dnaI,
polA
Additional evidence – 2
• In some genomes,
candidate NrdRbinding sites are
found upstream of
other replicationrelated genes
– dNTP salvage
– topoisomerase I,
replication initiator
dnaA, chromosome
partitioning, DNA
helicase II
Multiple sites (nrd genes): FNR, DnaA, NrdR
Mode of regulation
• Repressor (overlaps with promoters)
• Co-operative binding:
– most sites occur in tandem (> 90% cases)
– the distance between the copies (centers of
palindromes) equals an integer number of DNA turns:
• mainly (94%) 30-33 bp, in 84% 31-32 bp – 3 turns
• 21 bp (2 turns) in Vibrio spp.
• 41-42 bp (4 turns) in some Firmicutes
• experimental confirmation in Streptomyces
(Borovok et al. 2004, Grinberg et al. 2006) and
in E. coli (Grinberg et al. 2006)
Evolutionary processes that
shape regulatory systems
• Expansion and contraction of regulons
(birth or death of sites)
• Duplications of regulators
(with or without regulated loci)
• Loss of regulators
(with or without regulated loci)
• Re-assortment of regulators and structural genes
• … especially in complex systems
• Change of regulator specificity
• Horizontal transfer
Birth and death of sites is a very dynamic
process (even in bacteria)
NadR-binding sites upstream of pncB seem absent in
Klebsiella pneumoniae and Serratia marcescens
… but there are candidate sites further upstream …
… and they are clearly diferent (not simply misaligned).
Loss of regulators and cryptic sites
Loss of RbsR in Y. pestis
(ABC-transporter also is lost)
RbsR binding site
Start codon of rbsD
Regulon expansion:
how FruR has become CRA
Mannose
Glucose
manXYZ
ptsHI-crr
edd
epd
eda
adhE
aceEF
Mannitol
mtlA
gapA
fbp
Fructose
pykF
mtlD
fruBA
fruK
pfkA
pgk
gpmA
icdA
ppsA
pckA
aceA
tpiA
aceB
Gamma-proteobacteria
Common ancestor of Enterobacteriales
Mannose
Glucose
manXYZ
ptsHI-crr
edd
epd
eda
adhE
aceEF
Mannitol
mtlA
gapA
fbp
Fructose
pykF
mtlD
fruBA
fruK
pfkA
pgk
gpmA
icdA
ppsA
pckA
aceA
tpiA
aceB
Gamma-proteobacteria
Enterobacteriales
Common ancestor of Escherichia and Salmonella
Mannose
Glucose
manXYZ
ptsHI-crr
edd
epd
eda
adhE
aceEF
Mannitol
mtlA
gapA
fbp
Fructose
pykF
mtlD
fruBA
fruK
pfkA
pgk
gpmA
icdA
ppsA
pckA
aceA
tpiA
aceB
Gamma-proteobacteria
Enterobacteriales
E. coli and Salmonella spp.
Trehalose/maltose catabolism, alpha-proteobacteria
Duplicated LacI-family
regulators: lineage-specific
post-duplication loss
The binding motifs are very similar (the blue branch is
somewhat different: to avoid cross-recognition?)
Utilization of maltose/maltodextrin, Firmicutes
Displacement: invasion of a regulator from a
different subfamily (horizontal transfer from a
related species?) – “blue” sites
Orthologous TFs with
completely different regulons
(alpha-proteobaceria and
Xanthomonadales)
Utilization of an unknown galactoside in
gamma-proteobacteria
Yersinia and Klebsiella: two regulons, GalR (not shown,
includes genes galK and galT) and Laci-X
Erwinia: one regulon, GalR
Loss of regulator and merger of regulons:
It seems that laci-X was present in the
common ancestor (Klebsiella is an
outgroup)
Catabolism of gluconate, proteobacteria
extreme variability of regulation of “marginal” regulon members
β
Pseudomonas spp.
γ
Combined regulatory network for iron homeostasis genes in in a-proteobacteria.
[- Fe]
[+Fe]
[ - Fe]
[+Fe]
RirA
RirA
Irr
Irr
FeS
heme
degraded
Siderophore
uptake
2+
3+
Fe / Fe
uptake
Iron uptakesystems
Fur
[- Fe]
Iron storage
ferritins
FeS
synthesis
Heme
synthesis
Iron-requiring
enzymes
[ironcofactor]
Fur
IscR
Fe
FeS
Transcription
factors
FeS status
of cell
[+Fe]
The connecting line denote regulatory interactions, which the thickness reflecting the frequency of the interaction in the
analyzed genomes. The suggested negative or positive mode of operation is shown by dead-end and arrow-end of the line.
Fe and Mn regulons
Rhizobiaceae
Organism
Abb.
Irr
MUR /
FUR
MntR
RirA
IscR
Sinorhizobium meliloti
SM
+
+
-
+
-
+ +
+
-
+
-
Rhizobium leguminosarum
RL
Rhizobium etli
RHE
+
+
-
+
-
Agrobacterium tumefaciens
AGR
+
+
-
+
-
ML
+
-
+
+
-
MBNC
+
+ +
-
+
-
+
-
+
-
+
-
Mesorhizobium loti
Mesorhizobium
sp. BNC1
Brucella melitensis
Rhizobiales
Rhodobacteraceae
BQ
+
+
Bradyrhizobium japonicum
BJ
Rhodopseudomonas palustris
RPA
+ +
+ +
+
+
-
-
-
Nitrobacter hamburgensis
Nham
+
+
-
-
-
Nitrobacter winogradskyi
Nwi
+
+
-
-
-
Rhodobacter capsulatus
RC
-
Rhodobacter sphaeroides
Rsph
+
+
+
+
-
+
+
+
+
Silicibacter
STM
+
+
-
+
+
Silicibacter pomeroyi
S PO
+
+
-
+
+
Jannaschia
Jann
+
+
-
#?
+
+
+
+
quintana
and spp.
sp. TM1040
sp.CC51
HTCC2654
Rhodobacterales bacterium
Roseobacter
sp. MED193
Roseovarius nubinhibens
- proteobacteria
Rhodobacterales
Roseovarius
ISM
sp.217
Loktanella vestfoldensis
Sulfitobacter sp.
SKA53
EE-36
RB2654
+
+
-
MED193
+
+
-
ISM
+
+
-
+
#?
ROS217
+
+
-
+
+
SKA53
+
+
-
#?
+
EE36
+
+
-
#?
#?
+
OB2597
+
+
OA2633
-
+
-
-
+
CC
-
+
-
-
+
PB2503
-
+
-
-
+
Erythrobacter litoralis
ELI
-
-
Novosphingobium aromaticivorans
Saro
-
+
+
-
-
+
+
Sphinopyxis
g
alaskensis
HTCC2597
Oceanicola batsensis
HTCC2633
Oceanicaulis alexandrii
Caulobacterales
Caulobacter crescentu
s
Parvularculales
Parvularcula bermudensis
Rhodospirillales
SAR11 cluster
Rickettsiales
HTCC2503
Sala
-
+
-
-
+
ZM
-
+
-
-
+
Gluconobacter oxydans
GOX
-
+
-
+
Rhodospirillum rubrum
Rrub
-
+
+
-
-
+ +
Magnetospirillum magneticum
Amb
-
+ +
-
-
+
PU1002
+
+
-
-
+
-
-
-
-
+
Pelagibacter ubique
Rickettsia
HTCC1002
and Ehrlichia
species
B.
C.
+
Zymomonas mobilis
RB2256
A.
Distribution of
Irr,
Fur/Mur,
MntR,
RirA, and
IscR regulons
in α-proteobacteria
+
-
Hyphomonadaceae
Sphingomonadales
+
-
Bartonella
Bradyrhizobiaceae
BME
Group
D.
#?' in RirA column denotes
the absence of the rirA gene
in an unfinished genomic sequence
and the presence of candidate
RirA-binding sites upstream of
the iron uptake genes.
Fe and Mn regulons
Rhizobiaceae
Organism
Abb.
Irr
MUR /
FUR
MntR
RirA
IscR
Sinorhizobium meliloti
SM
+
+
-
+
-
+ +
+
-
+
-
Rhizobium leguminosarum
RL
Rhizobium etli
RHE
+
+
-
+
-
Agrobacterium tumefaciens
AGR
+
+
-
+
-
ML
+
-
+
+
-
MBNC
+
+ +
-
+
-
+
-
+
-
+
-
Mesorhizobium loti
Mesorhizobium
sp. BNC1
Brucella melitensis
Rhizobiales
Rhodobacteraceae
BQ
+
+
Bradyrhizobium japonicum
BJ
Rhodopseudomonas palustris
RPA
+ +
+ +
+
+
-
-
-
Nitrobacter hamburgensis
Nham
+
+
-
-
-
Nitrobacter winogradskyi
Nwi
+
+
-
-
-
Rhodobacter capsulatus
RC
-
Rhodobacter sphaeroides
Rsph
+
+
+
+
-
+
+
+
+
Silicibacter
STM
+
+
-
+
+
Silicibacter pomeroyi
S PO
+
+
-
+
+
Jannaschia
Jann
+
+
-
#?
+
+
-
+
+
quintana
and spp.
sp. TM1040
sp.CC51
HTCC2654
Rhodobacterales bacterium
Roseobacter
sp. MED193
Roseovarius nubinhibens
- proteobacteria
Rhodobacterales
Roseovarius
ISM
sp.217
Loktanella vestfoldensis
Sulfitobacter sp.
SKA53
EE-36
RB2654
MED193
+
+
-
+
+
-
+
#?
+
ISM
ROS217
+
+
-
+
+
SKA53
+
+
-
#?
+
EE36
+
+
-
#?
#?
+
+
+
OA2633
-
+
-
-
+
CC
-
+
-
-
+
PB2503
-
+
-
-
+
Erythrobacter litoralis
ELI
-
-
Novosphingobium aromaticivorans
Saro
-
+
+
-
-
+
+
Sphinopyxis
g
alaskensis
HTCC2633
Oceanicaulis alexandrii
Caulobacterales
Caulobacter crescentu
s
Parvularculales
Parvularcula bermudensis
SAR11 cluster
Rickettsiales
HTCC2503
Sala
-
+
-
-
+
ZM
-
+
-
-
+
Gluconobacter oxydans
GOX
-
+
+
Rrub
-
+
+
-
-
Rhodospirillum rubrum
-
+ +
Magnetospirillum magneticum
Amb
-
+ +
-
-
+
PU1002
+
+
-
-
+
-
-
-
-
+
Pelagibacter ubique
Rickettsia
HTCC1002
and Ehrlichia
species
Distribution of
Irr,
Fur/Mur,
MntR,
RirA, and
IscR regulons
in α-proteobacteria
B.
C.
+
Zymomonas mobilis
RB2256
A.
+
OB2597
HTCC2597
Oceanicola batsensis
Rhodospirillales
+
-
Hyphomonadaceae
Sphingomonadales
+
-
Bartonella
Bradyrhizobiaceae
BME
Group
D.
#?' in RirA column
denotes
the absence of the
rirA gene
in an unfinished
genomic sequence
and the presence of
candidate RirAbinding sites
upstream of the
iron uptake genes.
Fe and Mn regulons
Rhizobiaceae
Organism
Abb.
Irr
MUR /
FUR
MntR
RirA
IscR
Sinorhizobium meliloti
SM
+
+
-
+
-
+ +
+
-
+
-
Rhizobium leguminosarum
RL
Rhizobium etli
RHE
+
+
-
+
-
Agrobacterium tumefaciens
AGR
+
+
-
+
-
ML
+
-
+
+
-
MBNC
+
+ +
-
+
-
+
-
+
-
+
-
Mesorhizobium loti
Mesorhizobium
sp. BNC1
Brucella melitensis
Rhizobiales
Rhodobacteraceae
BQ
+
+
Bradyrhizobium japonicum
BJ
Rhodopseudomonas palustris
RPA
+ +
+ +
+
+
-
-
-
Nitrobacter hamburgensis
Nham
+
+
-
-
-
Nitrobacter winogradskyi
Nwi
+
+
-
-
-
Rhodobacter capsulatus
RC
-
Rhodobacter sphaeroides
Rsph
+
+
+
+
-
+
+
+
+
Silicibacter
STM
+
+
-
+
+
Silicibacter pomeroyi
S PO
+
+
-
+
+
Jannaschia
Jann
+
+
-
#?
+
RB2654
+
+
-
+
+
MED193
+
+
-
+
ISM
+
+
-
+
#?
ROS217
+
+
-
+
+
SKA53
+
+
-
#?
+
EE36
+
+
-
#?
#?
+
quintana
and spp.
sp. TM1040
sp.CC51
HTCC2654
Rhodobacterales bacterium
Roseobacter
sp. MED193
Roseovarius nubinhibens
- proteobacteria
Rhodobacterales
Roseovarius
ISM
sp.217
Loktanella vestfoldensis
Sulfitobacter sp.
SKA53
EE-36
OB2597
+
+
OA2633
-
+
-
-
+
CC
-
+
-
-
+
PB2503
-
+
-
-
+
Erythrobacter litoralis
ELI
-
-
Novosphingobium aromaticivorans
Saro
-
+
+
-
-
+
+
Sphinopyxis
g
alaskensis
HTCC2597
Oceanicola batsensis
HTCC2633
Oceanicaulis alexandrii
Caulobacterales
Caulobacter crescentu
s
Parvularculales
Parvularcula bermudensis
Rhodospirillales
SAR11 cluster
Rickettsiales
HTCC2503
Sala
-
+
-
-
+
ZM
-
+
-
-
+
Gluconobacter oxydans
GOX
-
+
+
Rrub
-
+
+
-
-
Rhodospirillum rubrum
-
+ +
Magnetospirillum magneticum
Amb
-
+ +
-
-
+
PU1002
+
+
-
-
+
-
-
-
-
+
Pelagibacter ubique
Rickettsia
HTCC1002
and Ehrlichia
species
B.
Not RirA.
IscR?
C.
+
Zymomonas mobilis
RB2256
A.
Distribution of
Irr,
Fur/Mur,
MntR,
RirA, and
IscR regulons
in α-proteobacteria
+
-
Hyphomonadaceae
Sphingomonadales
+
-
Bartonella
Bradyrhizobiaceae
BME
Group
#?' in RirA column denotes
the absence of the rirA gene
in an unfinished genomic sequence
and the presence of candidate
RirA-binding sites upstream of
D.the iron uptake genes.
UPDATE: the
genomes finished,
still no rirA gene.
Distribution of the conserved members of the Fe- and Mn-responsive regulons
and the predicted RirA, Fur/Mur, Irr, and DtxR binding sites in a-proteobacteria
Genes Functions:
Iron uptake
Iron storage
FeS synthesis
Iron usage
Heme biosynthesis
Regulatory genes
Manganese uptake
Phylogenetic tree of the Fur family of transcription factors in a-proteobacteria - I
Fur
sp|
Escherichia coli: P0A9A9
ECOLI
Pseudomonas aeruginosa
PSEAE
NEIMA
Neisseria meningitidis
: sp|Q03456
: sp|P0A0S7
Fur in g- and b- proteobacteria
HELPY Helicobacter pylori : sp|O25671
Bacillus subtilis : P54574
sp|
BACSU
SM mur
Sinorhizobium meliloti
Mesorhizobium sp. BNC1 (I)
MBNC03003179
BQ fur2
Bartonella quintana
BMEI0375
Brucella melitensis
EE36 12413 Sulfitobacter sp. EE-36
MBNC03003593Mesorhizobium sp. BNC1 (II)
HTCC2654
Rhodobacterales bacterium
RB2654 19538
Agrobacterium
tumefaciens
AGR C 620
RHE_CH00378 Rhizobium etli
Rhizobium leguminosarum
RL mur
Nham 0990 Nitrobacter hamburgensis X14
Nwi 0013
Nitrobacter winogradskyi
Rhodopseudomonas palustris
RPA0450
Bradyrhizobium japonicum
BJ fur
Roseovarius sp.217
ROS217 18337
Jannaschia sp. CC51
Jann 1799
Silicibacter pomeroyi
SPO2477
STM1w01000993Silicibacter sp. TM1040
MED193 22541 Roseobacter sp. MED193
OB2597 02997 Oceanicola batsensis HTCC2597
Loktanella vestfoldensisSKA53
SKA53 03101
Rhodobacter sphaeroides
Rsph03000505
Roseovarius nubinhibens ISM
ISM 15430
PU1002 04436Pelagibacter ubiqueHTCC1002
GOX0771 Gluconobacter oxydans
Zmomonas
y
mobilis
ZM01411
Novosphingobium aromaticivorans
Saro02001148
Sphinopyxis alaskensis RB2256
Sala 1452
ELI1325
Erythrobacter litoralis
Oceanicaulis alexandrii HTCC2633
OA2633 10204
PB2503 04877 Parvularcula bermudensis HTCC2503
CC0057
Caulobacter crescentus
Rhodospirillum rubrum
Rrub02001143
(I)
Magnetospirillum magneticum
Amb1009
Magnetospirillum magneticum (II)
Amb4460
Fur in e- proteobacteria
Fur in Firmicutes
Mur
in a-proteobacteria
Regulator of manganese
uptake genes (sit, mntH)
Fur
in a-proteobacteria
Regulator of iron uptake
and metabolism genes
Irr
a-proteobacteria
Erythrobacter litoralis
Caulobacter crescentus
Zymomonas mobilis
Novosphingobium aromaticivorans
Oceanicaulis alexandrii
Sphinopyxis alaskensis
Gluconobacter oxydans
Rhodospirillum rubrum
Parvularcula bermudensis -
Magnetospirillum magneticum
Identified Mur-binding sites
The A, B, and C groups
of a - proteobacteria
-
Sequence logos
for identified
Fur-binding sites
in the “other”
group of
a-proteobacteria
Bacillus subtilis
Mur
Escherichia coli
Sequence logos for
known
Fur-binding sites
in Escherichia coli
and Bacillus subtilis
Phylogenetic tree of the Fur family of transcription factors in a-proteobacteria - II
Fur
Escherichia coli : P0A9A9
sp|
ECOLI
Pseudomonas aeruginosa : sp|Q03456
PSEAE
NEIMA
Fur in g- and b- proteobacteria
Neisseria meningitidis : sp|P0A0S7
HELPY Helicobacter pylori : sp|O25671
sp|
BACSU Bacillus subtilis : P54574
Fur in e- proteobacteria
Fur in Firmicutes
a-proteobacteria
Mur / Fur
Agrobacterium tumefaciens
AGR C 249
Sinorhizobium meliloti
SM irr
Rhizobium etli
RHE CH00106
Rhizobium leguminosarum (I)
RL irr1
RL irr2 Rhizobium leguminosarum (II)
Mesorhizobium loti
MLr5570
MBNC03003186 Mesorhizobium sp. BNC1
BQ fur1 Bartonella quintana
Brucella melitensis (I)
BMEI1955
Brucella melitensis (II)
BMEI1563
BJ blr1216 Bradyrhizobium japonicum (II)
RB2654 182 Rhodobacterales bacterium HTCC2654
Loktanella vestfoldensis SKA53
SKA53 01126
Roseovarius sp.217
ROS217 15500
Roseovarius nubinhibens ISM
ISM 00785
OB2597 14726 Oceanicola batsensis HTCC2597
Jann 1652 Jannaschia sp. CC51
Rsph03001693Rhodobacter sphaeroides
Sulfitobacter sp. EE-36
EE36 03493
STM1w01001534 Silicibacter sp. TM1040
Roseobacter sp. MED193
MED193 17849
SPOA0445
Silicibacter pomeroyi
Rhodobacter capsulatus
RC irr
RPA2339
Rhodopseudomonas palustris (I)
RPA0424*
Rhodopseudomonas palustris (II)
Bradyrhizobium japonicum (I)
BJ irr*
Nwi 0035* Nitrobacter winogradskyi
Nham 1013* Nitrobacter hamburgensis X14
PU1002 04361
Pelagibacter ubique HTCC1002
Irr in a-proteobacteria
regulator of iron
homeostasis
Sequence logos for the identified Irr binding sites in a-proteobacteria.
The A group (8 species) - Irr
The B group (4 species) - Irr
The C group (12 species) - Irr
Phylogenetic tree of the Rrf2 family of transcription factors in a-proteobacteria
Nitrite/NO-sensing regulator NsrR
(Nitrosomonas europeae, Escherichia coli)
ROS217_15206
Rsph03001477
RC NsrR
GOX0860
Amb1318
Nwi_0743
Iron repressor RirA
(Rhizobium leguminosarum)
SPOA0186
Ricket.
Sala_1049
Saro02000305
NE NsrR
OB2597_05195
ROS217_02155
ROS217_14291
SMc00785
RHE CH00735
AGR_C_344
Cysteine metabolism
repressor CymR
(Bacillus subtilis)
AGR_L_1131
SPO3722
RHE_CH02777
RL_3336
SPO1393
MBNC02000669
MLl1642
SMc02238
AGR_C_872
RHE_CH00547
OA2633_11510
RL RirA
BMEII0707
MLr1147
MBNC02002196
BQ04990
RC 0780
RB2654_19993
Rsph023178
SPO0432
MED193_09800
STM_634
Positional clustering of rrf2-like genes with:
iron uptake and storage genes;
Fe-S cluster synthesis operons;
genes involved in nitrosative stress protection;
sulfate uptake/assimilation genes;
CC0132
thioredoxin reductase;
SMc01160
BJ blr7974
carboxymuconolactone
RL_5159
AGR_L_2343
decarboxylase-family genes;
AGR_C_402
hmc cytochrome operon
NsrR
RirA
RL_619
ZMO0116
ROS217_16231
GOX0099
BS CymR
IscR-II
Rrub02000219
ZMO0422
Sala_1236
IscR
ELI0458
Saro3534
DV Rrf2
OA2633_03246
CC1866
EC IscR
Jann_2366
STM_3629
EE36_14302
SPO2025
Rsph023725
RC_0477
Rrub_1115
Amb0200
GOX1196
RPA0663
Ricket.
Cytochrome complex
regulator Rrf2
(Desulfovibrio vulgaris)
Iron-Sulfur cluster
synthesis repressor IscR
(Escherichia coli)
PB2503_ 09884
proteins with the conserved C-X(6-9)-C(4-6)-C motif within effector-responsive domain
proteins without a cysteine triad motif
Sequence logos for the identified RirA-binding sites in a-proteobacteria
The A group - RirA (8 species)
(12 (12
species)
The C group - quasi-RirA
genomes)
An attempt to reconstruct the history
Regulators and their binding motifs
• Subtle changes at close evolutionary
distances
• Cases of motif conservation at surprisingly
large distances
• Surprisingly similar motifs of unrelated
regulators: “site usurpation” (???)
• Correlation between contacting
nucleotides and amino acid residues
DNA motifs and protein-DNA interactions
Entropy at aligned sites and the number of contacts
(heavy atoms in a base pair at a distance <cutoff from a protein atom)
CRP
PurR
IHF
TrpR
Specificity-determining positions
in the LacI family
• Training set: 459 sequences,
average length: 338 amino acids,
85 specificity groups
– 44 SDPs
10 residues contact NPF (analog of
the effector)
7 residues in the effector contact zone
(5Ǻ<dmin<10Ǻ)
6 residues in the intersubunit
contacts
5 residues in the intersubunit
contact zone (5Ǻ<dmin<10Ǻ)
7 residues contact the operator
sequence
6 residues in the operator contact
zone (5Ǻ<dmin<10Ǻ)
LacI from E.coli
CRP/FNR family of regulators
TGTCGGCnnGCCGACA
CooA
Desulfovibrio
TTGTGAnnnnnnTCACAA
FNR
Gamma
TTGATnnnnATCAA
HcpR
Desulfovibrio
TTGTgAnnnnnnTcACAA
Correlation between contacting
nucleotides and amino acid residues
•
•
•
•
DD
DV
EC
YP
VC
DD
DV
EC
YP
VC
CooA in Desulfovibrio spp.
CRP in Gamma-proteobacteria
HcpR in Desulfovibrio spp.
FNR in Gamma-proteobacteria
COOA
COOA
CRP
CRP
CRP
HCPR
HCPR
FNR
FNR
FNR
ALTTEQLSLHMGATRQTVSTLLNNLVR
ELTMEQLAGLVGTTRQTASTLLNDMIR
KITRQEIGQIVGCSRETVGRILKMLED
KXTRQEIGQIVGCSRETVGRILKMLED
KITRQEIGQIVGCSRETVGRILKMLEE
DVSKSLLAGVLGTARETLSRALAKLVE
DVTKGLLAGLLGTARETLSRCLSRMVE
TMTRGDIGNYLGLTVETISRLLGRFQK
TMTRGDIGNYLGLTVETISRLLGRFQK
TMTRGDIGNYLGLTVETISRLLGRFQK
Contacting residues:
REnnnR
TG: 1st arginine
GA: glutamate and 2nd
arginine
TGTCGGCnnGCCGACA
TTGTGAnnnnnnTCACAA
TTGTgAnnnnnnTcACAA
TTGATnnnnATCAA
The
correlation
holds for
other
factors in
the family
Open problems
• Model the evolution of regulatory systems (a catalog of
elementary events, estimates of probabilities)
–
–
–
–
–
–
–
Birth of a binding site; what are the mechanisms?
Loss of a binding site
Duplication of a regulated gene and/or a regulator
Horizontal transfer of a regulated gene and/or a regulator
Loss of a regulated gene and/or a regulator
Change of specificity
General properties?
• Distribution of TF family and regulon sizes
• Stable cores and flexible margins of functional systems (in terms of gene
presence and regulation)
• Co-evolution of TFs and DNA sites:
– “Neutral” model for the evolution of binding sites (with invariant functional
pressure from the bound protein)
– How do the motifs evolve? What is the driving force – changes in TFs?
– TF-family, position-specific protein-DNA recognition code?
All that needs to take into account the incompleteness and noise in
the data
RNA regulatory systems
• Riboswitches: regulation by formation of
alternative structures dependent on binding of
small molecules
• T-boxes: regulation by formation of alternative
structures dependent on binding of uncharged
tRNA
• Highly conserved
(sequence, secondary structure)
=> easy to recognize
• Large
=> phylogenetic trees, duplications etc.
Systematic
analysis of Tboxes
(very preliminary
results)
•
T-boxes: the mechanism
(Grundy & Henkin)
Partial alignment of predicted T-boxes
Terminator(underlined)
===========> <===========
TGG: T-box Antiterminator
==> ===>
<===<==
AminoacyltRNA
synthetases
Amino acid
biosynthetic
genes
Amino acid
transporters
SA
DHA
ST
CA
DF
PN
MN
DF
HD
DF
ZC
BQ
MN
MN
ST
serS
tyrZ
trpS
aspS
valS
thrS
ileS
leuS
argS
proS
lysS
metS
pheS
glyQ
alaS
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
26
47
37
39
41
30
89
28
41
33
46
55
14
14
20
CGTTA
CGTTA
CCTTA
CGTTA
CGTTA
CGTTA
CGTTA
AGCTA
CGTTA
CGTTA
CGTTA
CGTTA
AATTA
AGCTA
AATTA
51
65
61
34
77
38
68
29
27
30
63
66
20
23
18
AAATAGGGTGGCAACGCGTAGAC------------CACGTCCCTTGTAGGGATGTGGTCTTTTTTTA
AGGTAAGGTGGTAACACGGGAGCA-------TACTCTCGTCCTTCTGGCAATGAAGGACGGGAGTTTTTTGTTTT
AATTGAGGTGGTACCGCGTATTACTT----GTAATAACGCCCTCACGTTTTAATAGCGTGGGGACTTTTTGCTAT
ATAAAGGATGGCACCGTGAAAA----------GCCTTCACTCCTTACTGGAGTGGAGGCTTTTTTTATTTTAAATAAA
AATTAAGGTGGTAACGCGAGC------------TTTTCGTCCTTTTTAAAGAGGATGAAGAGCTCTTTTTTATTTCT
AATGAAGGTGGAACCACGTTG-------------CGACGTCCTTTCGAGGATGTCGCATTTTTTTATTAG
AATTAAGGTGGTACCACGAGC-------------TTTCGTCCTTTGATGAAAGTTCTTTTTTATTGAT
AATTAGGGTGGTACCGCGAAGATT-------TATCCTCGTCCCTAAACGTAAGTTTAGTGACGAGGATTTTTTATTTTCA
AACGAGAGTGGTACCGCGGGTAA---------AAGCTCGCCTCTTTTTAGAAGAGGCGGGTTTTTTATTTT
AACTAGAGTGGTACCGCGGAAAT-----TAAACCTTTCGTCTCTATACTTGTATAGAGATGAGAGGTTTTTTATATTTTCAGG
AACTGAGGTGGTACCGCGAAGCTAA-----CAACTCTCGTCCTCAAGATGAATAATCTTGGGGGTGGGAGTTTTTTTGTTGCA
AAATAAGGTGGTACCGCGACTGTTTA---TACAGCCCCGCCCTTATCTTTTTTAGATAAGGGCGGGGCTTTTTATATTTAA
AAAACGGATGGTACCGCGTGTC-------------AACGCTCCGCTTAAGGAGTTTTGGCACTTTTTTTGTTTT
AATTAGGGTGGAACCGCGTTT------------CAAACGCCCCTATGTCAGTTGGCATGGGAGTGATTGAGCGTGGCTCTTTT
AATAGAGGTGGTACCGCGGTT--------------TTCGCCCTCTGTGAGATGGACTTGTTTTGTATGGAGGACTATTTGAAA
SA
BS
CA
BQ
BS
SA
MN
DHA
HD
BQ
EF
trpE
ilvB
ilvC
asnA
proB
cysE
hisC
pheA
serA
phhA
yxjH
->
->
->
->
->
->
->
->
->
->
->
32
50
40
51
33
33
46
41
42
51
40
AATTA
CGTTA
CGTTA
CGTTA
CGTTA
CATTA
CGTTA
CGTTA
cgtta
CGTTA
CGTTA
4
47
14
62
30
62
50
50
57
34
51
AACTAAGGTGGCACCACGGTA-------------ACGCGTCCTTACAGGTATATGCGTTATGTGGTGTCTTTTT
AACAAGGGTGGTACCGCGGAAAGAAA---AGCCTTTTCGCCCCTTTTAGCTATCGCAGTTACTGCGCGGCTGATTGT
AATTTGGGTGGTACCGCGCGACCAAA-----AATTCTCGCCCCAAGCAGGGAATTTTGGCCGTTTTTTTATATAAATAAAT
AATTTGGGTGGTACCGCGGAACC-----AAAGCCTTTCGTCCCAGTTTTTTGGGAAAGAAGGGCTTTTTTTGTTGGCTT
AATCAAGGTGGTACCACGGAAAC--------CCATTTCGTCCTTATGAATCAGGATGAAATGGGTTTTTTTATTGTAGA
ATTCAGAGTGGAACCGTGCGG-------------AAGCGCCTCTAACAATACAATTTGTATGTTAGTGGTGCTTTTTTG
AATGAAGGTGGAACCACGTGTGT---------GTCAGCGTCCTTGCAAGTTTTTTGCAAGGGCGCTTTTTTGAATAGT
AAAAAGGGTGGTACCGCGTGAC---------TTAACTCGTCCCTTATTTGGGGGTGAGGTAAGTCTTTTTTTATTTA
AATGAGGGTGGCACCGCGGTATG-------AACCTTCCGCCCCTCACGACAGTCGTCGTGTGGGCAGAAGGTTTTTTTACTAT
AAATAGGGTGGTACCGCGATTC------------TTTCGCCCCTATCGGATTTTCCGATAGGGGCTTTTTCTATTTC
AAAAAAGGTGGTACCGCGATAA-----------TAATCGCCCTTTTACTAGTTACGGCTAGTAAAAGGGCGTTTTTTTATAAA
CA yckK -> 38
DF yqiX -> 41
HD BH0807->74
EF yheL -> 8
BQ ykbA -> 46
BQ sdt2 -> 40
EF yusC -> 42
CA yhaG -> 48
BQ brnQ -> 44
REF01723 -> 44
BS yvbW -> 56
CGTTA
CCTTA
TGTTA
AATTA
CGTTA
CGTTA
CGTTA
CGTTA
CGTTA
CGTTA
CGTTA
57
30
56
33
45
56
60
51
66
55
32
AATTAGAGTGGTACCGTGGAATT-------CAACTTCTGCCTCTAACTATGAGGATAGAAGTTTTTTGTTTTTAT
AAAAAGAGTGGTAACGCGGATAT----------AATTCGTCTCTTAGCTGTAAAGCTAAGGGACTTTTTTGATTTA
AACTGGGGTGGCACCACGACAAG----------TGATCGTCCCCAAGACTTTTATCAGTCTTGGGGACGTTTTTTTGTTCAT
AATTAAGGTGGTACCGCGGAGA-----------GATTCGTCCTTATTCTTTAAGGATGAATCTCTCTTTTTATGTAGC
AACAAGGGTGGAACCACGAATAT--------AACACTCGTCCCTTTTTTAGGGAGGAGTGTTTTTTTATT
AATTGAGGTGGTACCACGGTATTAACATTACATATATCGTCCTCTACATGCATATTTGCGTGTAGGGGACTTTTTTATTTTC
AATTAAGGTGGTATCACGAAATGA-----CAAACTTTCGTCCTTTTTGCTGTAATAGCAAAAGGATGGAAGTTTTTTTGTTT
AATTTAGGTGGTACCGCGGAAGT---------ATCTCCGTCCTAATTAATAAGATTAGGGCGGAGTTTTTTATTTGC
AATTAGGGTGGTATCGCGGGTAAA------TATAACTCGTCCCTTTCTTTAGGGACGAGTTTTTTGTGTTCTT
AATTGAGGTGGCACCACGAATGC----------GATTCGTCCTCTTGGCTCACAGCCAAGAGGCTTTTTTGTTTTTTTAATA
AACAAGAGTGGTACCGCGGTCAGC--CGAAGGCTCGTCGTCTCTTTATCTATTAGATTAGGTAGGAGACGGCGGGCTTTTTT
… continued (in the 5’ direction)
specifier hairpin
===>
==>
===>
<=== <==
anti-anti
(specifier)
codon
SC<===
SA
DHA
ST
CA
DF
PN
MN
DF
HD
DF
ZC
BQ
MN
MN
ST
SERS
tyrZ
trpS
ASPS
VALS
THRS
ileS
leuS
ARGS
proS
lysS
metS
pheS
glyQ
alaS
SER
Tyr
Trp
ASP
VAL
THR
Ile
Leu
ARG
Pro
Lys
Met
Phe
Gly
Ala
---GTAGGACAAGTA
----AAGAACAAGTA
---ATTAGAAGAGTA
-----GAGAAAAGTA
-GAAGAAGAGGAGTA
----AGAGACAAGTC
----CAAAAACACAA
----CTAGAGCAGTA
-----TGGGAGAGTA
---AAAGAAATAGTA
---AAGAGAAGAGTA
---AAAGGAAAAGTA
----TGAGATTAGTA
---AGAAAGAGAGTT
-AGTTAAGAATTGTT
19
18
16
18
16
18
17
19
20
18
19
19
18
15
17
AGAGAGCTTGTGGTT---AGTGTGAACAAG--AGAAAGTTGCCGGCT---GATGAGAGGCGCTT
AGAGAGTTAGTGGTT---GGTGCAAGCTAACAGCGAATTGGGAAAT---GGTGTGAGCCCAAAGAGAGGAAAATTCACTGGCTGTAAGATTTTC
AGAGAGTGCGTGGTT---GCTGGAAACGCATAGCGAATAGGTGAT----GGTGTAAGACCTATT
AGAGGAAGTGGAA-----GGTGAGAACTAATATT
AGCGAGTCGGGAT-----GGTGGGAGCCGATAGAGAGAAAACGGT----GGTGAGAGTTTTC-AGAGAGCTCTGGTA----GCTGAGAAAGAGC-AGAGAGCTTCGGTA----GCTGAGAAGAAGC-AGGGAATGCGGGGCGTG-ACTGGAAACCCGCAGCGAACCTGAGAG----AGTGTAAGTCAGGT
AGAAAAGTGACGGTT---GCTGCGAGTCATT-
15
18
12
15
17
14
18
10
14
14
15
14
16
14
17
GAA--TCTACCTACTT
GAA--TACCTCTTTGA
GAAA-TGGACTAATGA
GAAA-GACATCTCGGA
GAAT-GTAGCTTTGGA
GAT--ACTACTCTTGA
-----ATCATTTTGTT
GAA--CTTACTAGATT
GAAA-CGCACCCATGA
GAA--CCTGTCTTTTA
GAAAAAAGACTTGGAG
GAACAATGGCCTTTGA
GAA--TTCACTCAGAA
GACT-GGCACTTTCTC
-----GCTACTTAACT
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
Amino acid
biosynthetic
genes
SA
BS
CA
BQ
BS
SA
MN
DHA
HD
BQ
EF
trpE
ilvB
ilvC
asnA
proB
cysE
hisC
pheA
serA
phhA
yxjH
Trp
Leu
Val
Asn
Pro
Cys
His
Phe
Ser
Tyr
Met
TCTAAAGAAATAGTA
---TGAGGATAAGTA
-----AGGAAGAGTA
--AGGACGAGTAGTA
-----AGGATTAGTA
--CGAAGGATTAGTA
-----AGAGAAAAAA
-----AAAGAGAGCA
----GAAGATGAGGA
AGAATCGCAGTAGTA
-----TAGGAAAGTA
22
20
17
15
18
18
16
19
17
17
17
AGAAAGCTAATGGGT---GATGGGAATTAGC-AGAGAACCGGGTTA----GCTGAGAACCGG--AGAGAGTGAGATACT---GGTGGGAACTCAT-AGCGAGTCAGGGGT----GGTGTGAGCCTGA-AGAGAGCAAAATGAACC-GCTGAAACATTTTGC
AGAGAGTGTACGGTT---GCTGTGAGTACA--AGAGAGTATGGGAA----GCTGAAAACATAC-AGGGAACTAAAGTCGGAGACTGAAAGCTTTAGT
AGAGAGCTGGTGGTT---GCTGTGAACCAGCTAGAGAGCTAATGGTC---GGTGGAAATTGGC-AGAGAGACTTTGGTT---GGTGAAAAAAGTT--
14
16
13
15
15
14
15
14
18
14
13
GAAT-TGGACTTTGGA
GAA--CTCGCCTCAGA
GAAG-GTAGCCTTTGA
GAAG-AACCTCCTGGA
GAA--CCTGCCTTGGA
GAA--TGCACCTTCGT
-----CACATTCTTGA
GAGA-TTCACTCTGGA
-----AGCCCTTCTGA
GAAT-TACAATTCTGG
GAAAAATGGCCTAGGA
->
->
->
->
->
->
->
->
->
->
->
Amino acid
transporters
CA yckK
DF yqiX
HD BH0807
EF yheL
BQ ykbA
BQ sdt2
EF yusC
CA yhaG
BQ brnQ
REF01723
BS yvbW
Cys
Arg
Lys
Tyr
Thr
Trp
Met
Trp
Ile
His
Leu
----AAGAACCAGTA
-----AGAGAAAGTA
----AGAGAAGAGTA
-TTATTAGCCCAGTA
--GAGGACACGATCA
---GCAAGAAGAGTA
----AAAGAAGAGTA
----AAGGAAGAGTA
----GAGAACGAGTA
--TTAGGACATAGTA
-----GGGAGCAGTA
17
16
19
19
16
18
18
18
19
18
18
AGAGAAAAATCTCCAAG-GCTGAAAGGGATTTT
AGCGAGTTAGGGGTT---GGTGTAAGCCTAGCAGAAAGCCTGTAGTT---GCTGAGAACGGGT-AGAAAGTCGATGGTT---GCTGCGAATCGAT-AGAGAGGGAAGCCTTTG-GCTGTGAGCTTCCTAGAGAGCTGGGGGAA---GGTGTGAGCCCGGTAGAGAGCCCTGTTT----GCTGAGAATGGG--AGAGAGCTGAGGGT----GGTGTGATCTCAGTAGAGAGTTGGCGATTT--GCTGAAAGCCAAC-AGAGACTTTTTCATTG--GCTGAAAGAAAAAGAGAGAGCTGCGGGGT---GGTGCGACGCAGC--
15
14
14
13
14
15
16
15
15
17
13
GAA--TGCATCTTTGA
GAAG-AGAGCTCTGGA
GAAGCAAGACTCTGAG
GAAT-TACACTAATAA
GATT-ACCACCTCTGA
GAA--TGGGCTTGCGA
GAAG-ATGGTCTTTGA
GAA--TGGACCTTTTA
GAAA-ATCATCTCCGA
-----CACACCTAAAA
GAA--CTCGCCCGGGA
->
->
->
->
->
->
->
->
->
->
->
AminoacyltRNA
synthetases
~800 T-boxes in ~90 bacteria
• Firmicutes
–
–
–
–
aa-tRNA synthetases
enzymes
transporters
all amino acids excluding glutamine, glutamate, lysine
• Actinobacteria (regulation of translation – predicted)
– branched chain (ileS)
– aromatic (Atopobium minutum)
• Delta-proteobacteria
– branched chain (leu – enzymes)
• Thermus/Deinococcus group (aa-tRNA synthases)
– branched chain (ileS, valS)
– glycine
• Chloroflexi, Dictyoglomi
– aromatic (trp – enzymes)
– branched chain (ileS)
– threonine
Recent duplications and bursts:
ARG-T-box in Clostridium difficile
LR_ARGS
CPE_ARGS
CAC_ARGS
CB_ARGS
CBE_ARGS
Lactobacillales
CTC_ARGS
LP_ARGS
LME_ARGS
Clostridiales
argS
argS
LJ_ARGS
CDF_YQIXYZ
LGA_ARGS
RDF02391
PPE_ARGS
LSA_ARGS
СDF_ARGC
BC_ARGS2
EF_ARGS
BH_ARGS
CDF_ARGH
Bacillales
argS
: ARG-specific T-box regulatory site
yqiXYZ
NEW
NEW
aminoacyl-tRNA synthetase
biosynthetic genes
amino acid transporters
Clostridium
difficile
RDF02391
argCJBDF
argH
others
argG
predicted
amino acid
transporters
amino acid
biosynthetic
genes
Gram+ bacteria:
Clostridium
difficile:
AhrC regulatory protein
(negative regulation of arginine metabolism
positive regulation of arginine catabolism)
Binding to 5’ UTR gene region
regulation of gene expression
5’
...
AhrC site
AhrC is lost
Expansion of T-box regulon
regulation of expression of
arginine biosynthetic
and transport genes by
T-box antitermination
Other clostridia spp.
(CA, CTC, CTH, CPE, CB, CPE)
yqiXYZ
yqiXYZ
argC
argC
argH
argH
argG
: AhrC binding site
: ARG-specific T-box regulatory site
CH_HISS
Bacillales
Other Gram+
hisS aspS
CTH_HISS
Lactobacillales
ASP\ASN
his operon
DRE_HISS
ASN/ASP/HIS
T-boxes:
Duplications
and changes
in specificity
HIS
TTE_HISS
his XYZ
PL_HISS
NEW
BE_HISS
BL_HISS
BS_HISS
BC_HISS
LRE_HISXYZ
LSA_HISXYZ
OOE_HISXYZ SGO_HISC
SMU_HISC
Z
XY
H IS
_
LP
EF_HISXYZ
OB_HISS
BCL_HISS
HIS
BH_HISS
EX_HISS
LME_HISXYZ
CDF_HISZX
EF_HISS
LMO_HISXYZ
EF_HISXYZ
LME_HIS(Z\G)
LL_HISC
LP_HISZ
Clostridiales
CPE_ASNS2
CDF_ASNA
CB_ASNS2
CDF_ASNS2
CTC_ASNA
asnS
LCA_HISZ
CB_ASNS3
CAC_ASNS32
asnA
BC_ASNS2
BC_ASNA
CBE_ASNS2
P. pentosaceus
asnS
CTC_ASNS2
CPE_ASNA
ASP
PPE_HISXYZ
hisXYZ
HIS
PPE_ASNS
aspS
EX_ASNA
LCA_HISS
LB_ASNA
LB_ASNS2
LJ_HISS
LP_ASNA
PPE_ASNA
Lactobacillales
asnS
ASN
LB_HISS
asnA
LRE_ASPS
LJ_ASNA
LP_HISS PPE_HISS
LRE_HISS
ASN
L. johnsonii
asnA
LJ_glnQHMP
LD_ASNA
ASN
glnQHMP
ASP
SG_ASPS2 SMU_ASPS2
Blow-up
LCA_HISS
LJ_HISS
PPE_HISXYZ
PPE_ASNS2
LB_HISS
LRE_ASPS
LB_ASNA
LP_HISS PPE_HISS
PPE_ASNA
LP_ASNA
LRE_HISS
ASN
AAC
HIS
CAC
P. pentosaceus
asnS
ASP
LJ_ASNA
hisXYZ
LJ_GLNQHMP
ASP
ASN
AAC
HIS
CAC
GAC
ASP
GAC
Lactobacillales
Lactobacillales
asnA
hisS aspS
ASN
ASP
L. reuteri
L. johnsonii
aspS
hisS
HIS
LD_ASNA
ASP
disruption of hisS-aspS operon
mutation of regulatory codon
asnA
ASN
glnQHMP
ASP
HIS
Branched-chain amino acids:
duplications and changes in specificity
Firmicutes
leuS
LEU
LEU
Bacillales
PL_ILVB
Ilv operon
LEU
BH_ILVB
C. thermocellum
148_0001
.......
B. cereus
YOCR3
LEU
LEU
δ-proteobacteria
Clostridium difficile
Desulfitobacterium
hafniense
BS_ILVB
DTH_ILVB
LE
UA
LEU
CBE_LEUS
A_
Syntrophomonas
wolfei
029_0008
CPE_LEUS
BCL_ILVB
LEU
DH
.......
Oceanobacillus
iheyensis
OB1271
CDF_LEUA
B. Subtilis
B. licheniformis
yvbW
leu operon
LEU
CTH_148_0001
BE_ILVB
DF_LEUS
BL_ILVB
TTE_LEUS
CTC_LEUS
GSU_LEUA
BS_LEUS
CB_LEUS
CA_LEUS
BL_LEUS
LEU
LP_BRNQ1_ile
BCL_LEUS
BH_LEUS
BC_LEUS
BE_LEUS
Firmicutes
DAC_LEUA
US
OB_LEUS
SWO_029_0008
Firmicutes
LCR_ILES
LL_ILES
LE
O_
LP_LEUS
DRE_070_0004
CH_LEUS
LE
US BS_YVBW
BL_YVBW
LM
SW
O_
LSA_LEUS
EX_LEUS
ileS
OB_ILVB
LJ_LEUS
LGA_LEUS
valS
VAL
LB_LEUS
ILE
SPY_ILES
SZ_ILES
SEQ_ILES
EF_LEUS
BC_YOCR3
STH_ILES
PPE_LEUS
OB1271
C. acetobutylicum
OOE_LEUS
SMU_ILES EF_ILES
LP3666
VAL
DG_VALS
SG_ILES
SAG_ILES
ilvC
CA_ILVC
SA_VALS
BE_VALS
CTH_VALS CH_VALS BH_VALS
Ilv operon2
SMI_ILES
SP_ILES
SOB_ILES
ILE
LME_ILES
Ilv operon2
BC_VALS
EX_VALS BCL_VALS
HMO_VALS
VA
LS
E_
CPE_ILES
CB_ILES
CTC_VALS CBE_VALS
LJ_VALS
VAL
Lactobacillaceae
Clostridiaceae
Bacillus cereus
LJ_OPP
PPE_ILES
CAC_VALS
LS
VA
A_
LS
LL_VALS
LCR_VALS
brnQ
ILE
LMO_ILES
VAL
DF_ILES
EX_ILES
BC_YBGE*
BC_YBGE
LR_VALS
Lactobacillus casei
Lactobacillus plantarum
brnQ
CTC_ILES
LD_VALS
LME_VALS
CB_VALS
DF_VALS
CP
DHA_VALS
PPE_VALS
EF_VALS
LCA_BRNQ2_ile
LRE_BRNQ_ile
TTE_ILES
BL_VALS
IlvCB
ILE
LP_BRNQ2_val
LSA_ILES
BS_VALS
TTE_VALS
LP_VALS
OB_ILES
ILE
LRE_3666_1
BC_ILES
CPE_BRNQ
CTC_BRNQ2
LP_ILES
BCE_BRNQ1
HMO_ILVB
ATC
CTC
BS_ILES
BL_ILES
BC_ILES2
VAL
ILE
CAC_BRNQ
CTH_ILES
LR_LEUS
GTC
T-box duplication and mutation
of regulatory codon
BCL_ILES
BH_ILES
CTC_BRNQ1 CDF_ILVC
BC_ILVB
Lactobacillales
lp3666
DHA_ILES
BE
_IL
CH_ILES
ILE
Desulfotomaculum reducens
Ilv operon
ES
OOE_ILES
Lactobacillus johnsonii
opp
LRE_3666_2
DRE_ILES
CH_YBGE
ILE
LEU
HMO_ILES
ILE
DRE_ILVD*_leu
Lactobacillus reuteri
panE
ILE
DRE_ILVD_ile
IlvBN
ILE
.......
LCA_BRNQ1_val
LJ_BRNQ_ile
DRE_VALS
.......
C. difficile
ILE
LB_ILES
OOE_LP3666
LRE_PANE
Heliobacillus mobilis
Ilv operon
ILE
.......
LJ_ILES
LD_ILES SA_ILES
LMO_VALS
Carboxydothermus
hydrogenoformans
B. cereus
SUB_ILES
Recent T-box duplication and mutation
of regulatory codon
ILE
CTC
ATC
LEU
ATC
CTC
Blow-up
transporter:
ATC
GTC
dual
regulation of
common
enzymes:
ATC
CTC
PEP
Same enzymes
– different
regulators
(common part
of the aromatic
amino acids
biosynthesis
pathway)
E4P
aroA
aro:
Regulated by TYR (BC)
Regulated by PHE (SWO, DRE, HMO, CH, MTH, CTH)
Regulated by TRP (DE, DEH)
DAHP
aroB
aroC
aroD
SHIKIMATE
aroI
aroE
aroF
pabA
pabB
CHORISMATE
aroA
trpE
pheB
aroH
trpG
ANTHRANILATE
tyrA
hisC
aspB
trpDCFBA
kinurenine
pathway
TRP
yhaG
TRP
TYR
PHE
phhA
TRP trpXYZ
TRP\PHE yocR family
TYR yheL
ADC
FOLATE
cf. E.coli:
AroF,G,H:
feedback
inhibition by
TRP, TYR,
PHE;
transcriptional
regulation by
TrpR, TyrR
S-box (SAM riboswitch)
D
c
C
a
A
C
G
R
c
gg
y
N G Aa
r Cc N
CCCD
c AG G G A
P3
Gr
y GgN
g
A
P2
Ga
Nc
U
A
u
P1 U
C
u
5'
a
H
g
G
P4
U
G
C
YAA
N
u
c
c
N
P5
g
car
Ga
A
U
R
A
G
a
N
3'
base stem
r gu y
Grundy and
Henkin, 1998
S-box riboswitch: regulator of methionine biosynthesis
Firmicutes
Loss of
S-boxes
Lactobacillales:
Met-T-box
Streptotoccales:
MtaR (transcription factor);
SAM-III riboswitch (metK)
(the Henkin group)
Bacillales:
S-box
Clostridiales:
S-box
proteobacteria
Other genomes with S-boxes: the Zoo
•
Petrotoga
•
actinobacteria
(Streptomyces, Thermobifida)
•
Chlorobium, Chloroflexus, Cytophaga
•
Fusobacterium
•
Deinococcus
Xanthomonas:
S-box
E.coli:
TFs
alphas:
SAM-II
Geobacter:
S-box
Need more genomes
Acknowledgements
• Andrei A. Mironov (algorithms and software)
• Alexandra B. Rakhmaninova (SDPs)
•
•
•
•
•
Olga Kalinina (SDPs/LacI)
Olga Laikova (LacI, sugars)
Dmitry Ravcheev (FruR)
Dmitry Rodionov (now at Burnham Institute) (NrdR, iron)
Alexei Vitreschak (RNA)
• Leonid Mirny, MIT (protein/DNA contacts, SDPs)
• Andy Johnston, University of East Anglia (iron)
•
•
•
•
Howard Hughes Medical Institute
Russian Fund of Basic Research
Russian Academy of Sciences, program “Molecular and Cellular Biology”
INTAS