Evolution of regulatory interactions in bacteria

Download Report

Transcript Evolution of regulatory interactions in bacteria

Evolution of regulatory
interactions in bacteria
Mikhail Gelfand
Research and Training Center “Bioinformatics”,
Institute for Information Transmission Problems, RAS
Moscow, Russia
Singapore, 17-18 July 2006
Comparative genomics of regulation
•
Why
– Functional annotation of genes
– Metabolic modeling
– Practical applications in genetic engineering, drug targeting etc.
•
How
– Close genomes: phylogenetic footprinting.
Regulatory sites are seen as conservation islands in alignments of gene
upstream regions
– Distant genomes: consistency filtering.
Candidate sites in one genome may be unreliable, but independent occurrence
upstream of orthologous genes in many genomes yields reliable predictions
•
Caveats
–
–
–
–
–
•
Presense of (predicted) binding sites does not immediately imply functional regulation
Operon structure
Need to verify presence of orthologous transcription factors in the studied genomes
Orthologous factors may have different binding motifs
One functional system may be regulated by different factors within and between
genomes
Many genomes
– Taxon-specific regulation
– Evolution
•
•
•
•
individual sites
transcription-fator families
transcription factors and their binding motifs
simple and complex regulatory systems
How it works: Two simple examples
• Biotin regulator of alpha-proteobacteria
• Universal regulator of ribonucleotide
reductases: reconstruction of the
regulatory system and the mechanism of
regulation
BirA (biotin regulator in eubacteria and archaea):
conserved signal, changed spacing
Profile 1: Gram-positive bacteria, Archaea
Profile 2: Gram-negative bacteria
BirA (biotin regulator in eubacteria and archaea):
conserved signal, changed spacing
Profile 1: Gram-positive bacteria, Archaea
Profile 2: Gram-negative bacteria
BirA of alpha-proteobacteria: no DNA-binding domain
Identification of the candidate
regulator (BioR) in alphaproteobacteria
1.
Candidate binding sites: similar
palindromes upstream of biotin
biosynthesis and transport
genes in different genomes
TTATAGATAA
TTATCTATAA
TTATAGATAg
TTATCTATAA
TTATCTATAA
TTATAGATAg
TTATCTATAA
TcATATATtA
TcATAGATAg
TTATCTATAA
TTATCTATAA
TTATCTATtA
TTATCTAcAA
TTATCTATAA
TTATCTATAA
TTATCTATAA
TcATAGATtA
cTATAGATAA
TTATCTAcAA
1.
2. Positional clustering:
candidate
transcription factor
from the GntR family
is often found in the
same loci (black
arrows)
3. Phyletic patterns:
phyletic distribution
of candidate sites
(red cirsles) exactly
coincides with the
phyletic distribution
of the candidate
regulator
4. Autoregulation: in
many cases there
are candidate sites
upstream of the
bioR gene itself
Conserved signal upstream of nrd genes
Identification of the candidate regulator
by the analysis of phyletic patterns
• COG1327: the only COG with exactly the
same phylogenetic pattern as the signal
– “large scale” on the level of major taxa
– “small scale” within major taxa:
• absent in small parasites among alpha- and gammaproteobacteria
• absent in Desulfovibrio spp. among delta-proteobacteria
• absent in Nostoc sp. among cyanobacteria
• absent in Oenococcus and Leuconostoc among Firmicutes
• present only in Treponema denticola among four spirochetes
COG1327 “Predicted transcriptional regulator,
consists of a Zn-ribbon and ATP-cone domains”:
regulator of the riboflavin pathway?
Additional evidence – 1
• nrdR is
sometimes
clustered
with nrd
genes or with
replication
genes dnaB,
dnaI, polA
Additional evidence – 2
• In some genomes,
candidate NrdR-binding
sites are found upstream
of other replicationrelated genes
– dNTP salvage
– topoisomerase I,
replication initiator dnaA,
chromosome partitioning,
DNA helicase II
Multiple sites (nrd genes): FNR, DnaA, NrdR
Mode of regulation
• Repressor (overlaps with promoters)
• Co-operative binding:
– most sites occur in tandem (> 90% cases)
– the distance between the copies (centers of
palindromes) equals an integer number of DNA turns:
• mainly (94%) 30-33 bp, in 84% 31-32 bp – 3 turns
• 21 bp (2 turns) in Vibrio spp.
• 41-42 bp (4 turns) in some Firmicutes
• experimental confirmation in Streptomyces
(Borovok et al., 2004)
Evolutionary processes that
shape regulatory systems
• Expansion and contraction of regulons
• Duplications of regulators with or without
regulated loci
• Loss of regulators with or without
regulated loci
• Re-assortment of regulators and structural
genes
• … especially in complex systems
• Horizontal transfer
Loss of regulators, and cryptic sites
Loss of the RbsR in Y. pestis
(ABC-transporter also is lost)
RbsR binding site
Start codon of rbsD
Regulon expansion:
how FruR has become CRA
Mannose
Glucose
manXYZ
ptsHI-crr
edd
epd
eda
adhE
aceEF
Mannitol
mtlA
gapA
fbp
Fructose
pykF
mtlD
fruBA
fruK
pfkA
pgk
gpmA
icdA
ppsA
pckA
aceA
tpiA
aceB
Gamma-proteobacteria
Common ancestor of Enterobacteriales
Mannose
Glucose
manXYZ
ptsHI-crr
edd
epd
eda
adhE
aceEF
Mannitol
mtlA
gapA
fbp
Fructose
pykF
mtlD
fruBA
fruK
pfkA
pgk
gpmA
icdA
ppsA
pckA
aceA
tpiA
aceB
Gamma-proteobacteria
Enterobacteriales
Common ancestor of Escherichia and Salmonella
Mannose
Glucose
manXYZ
ptsHI-crr
edd
epd
eda
adhE
aceEF
Mannitol
mtlA
gapA
fbp
Fructose
pykF
mtlD
fruBA
fruK
pfkA
pgk
gpmA
icdA
ppsA
pckA
aceA
tpiA
aceB
Gamma-proteobacteria
Enterobacteriales
E. coli and Salmonella spp.
Trehalose/maltose catabolism, alpha-proteobacteria
Duplicated LacI-family
regulators: lineage-specific
post-duplication loss
The binding signals are very similar (the blue branch is
somewhat different: to avoid cross-recognition?)
Utilization of an unknown galactoside,
gamma-proteobacteria
Yersinia and Klebsiella: two regulons, GalR (not shown,
includes genes galK and galT) and Laci-X
Erwinia: one regulon, GalR
Loss of regulator and merger of
regulons: It seems that laci-X was
present in the common ancestor
(Klebsiella is an outgroup)
Utilization of maltose/maltodextrin, Firmicutes
Two different ABC transporters (shades of red)
PTS (pink)
Glucoside hydrolases (shades of green)
Two regulators (black and grey)
Modularity of the functional subsystem
Two different ABC systems
Three hydrolases in one operon (E. faecalis) or separately
Changes of regulation
Displacement: invasion of a regulator from a
different subfamily (horizontal transfer from a
related species?) – blue sites
Orthologous TFs with
completely different regulons
(alpha-proteobaceria and
Xanthomonadales)
Catabolism of gluconate, proteobacteria
extreme variability of regulation of “marginal” regulon members
β
Pseudomonas spp.
γ
Combined regulatory network for iron homeostasis genes in a-proteobacteria
[- Fe]
[+Fe]
[ - Fe]
[+Fe]
RirA
RirA
Irr
Irr
FeS
heme
degraded
Siderophore
uptake
2+
3+
Fe / Fe
uptake
Iron uptakesystems
Fur
[- Fe]
Iron storage
ferritins
FeS
synthesis
Heme
synthesis
Iron-requiring
enzymes
[ironcofactor]
Fur
IscR
Fe
FeS
Transcription
factors
FeS status
of cell
[+Fe]
The connecting line denote regulatory interactions, which the thickness reflecting the frequency of the interaction in the
analyzed genomes. The suggested negative or positive mode of operation is shown by dead-end and arrow-end of the line.
Fe and Mn regulons
Rhizobiaceae
Organism
Abb.
Irr
MUR /
FUR
MntR
RirA
IscR
Sinorhizobium meliloti
SM
+
+
-
+
-
+ +
+
-
+
-
Rhizobium leguminosarum
RL
Rhizobium etli
RHE
+
+
-
+
-
Agrobacterium tumefaciens
AGR
+
+
-
+
-
Mesorhizobium loti
ML
+
-
+
+
-
MBNC
+
+ +
-
+
-
+
-
+
-
+
-
Mesorhizobium
sp. BNC1
Brucella melitensis
Rhizobiales
Rhodobacteraceae
BQ
+
+
Bradyrhizobium japonicum
BJ
Rhodopseudomonas palustris
RPA
+ +
+ +
+
+
-
-
-
Nitrobacter hamburgensis
Nham
+
+
-
-
-
Nitrobacter winogradskyi
Nwi
+
+
-
-
-
Rhodobacter capsulatus
RC
-
Rhodobacter sphaeroides
Rsph
+
+
+
+
-
+
+
+
+
Silicibacter
STM
+
+
-
+
+
Silicibacter pomeroyi
S PO
+
+
-
+
+
Jannaschia
Jann
+
+
-
#?
+
+
+
+
quintana
and spp.
sp. TM1040
sp.CC51
HTCC2654
Rhodobacterales bacterium
Roseobacter
sp. MED193
Roseovarius nubinhibens
- proteobacteria
Rhodobacterales
Roseovarius
ISM
sp.217
Loktanella vestfoldensis
Sulfitobacter sp.
SKA53
EE-36
RB2654
+
+
-
MED193
+
+
-
ISM
+
+
-
+
#?
ROS217
+
+
-
+
+
SKA53
+
+
-
#?
+
EE36
+
+
-
#?
#?
+
OB2597
+
+
OA2633
-
+
-
-
+
CC
-
+
-
-
+
PB2503
-
+
-
-
+
Erythrobacter litoralis
ELI
-
-
Novosphingobium aromaticivorans
Saro
-
+
+
-
-
+
+
Sphinopyxis
g
alaskensis
HTCC2597
Oceanicola batsensis
HTCC2633
Oceanicaulis alexandrii
Caulobacterales
Caulobacter crescentu
s
Parvularculales
Parvularcula bermudensis
Rhodospirillales
SAR11 cluster
Rickettsiales
HTCC2503
Sala
-
+
-
-
+
ZM
-
+
-
-
+
Gluconobacter oxydans
GOX
-
+
-
+
Rhodospirillum rubrum
Rrub
-
+
+
-
-
+ +
Magnetospirillum magneticum
Amb
-
+ +
-
-
+
PU1002
+
+
-
-
+
-
-
-
-
+
Pelagibacter ubique
Rickettsia
HTCC1002
and Ehrlichia
species
B.
C.
+
Zymomonas mobilis
RB2256
A.
Distribution of
Irr,
Fur/Mur,
MntR,
RirA, and
IscR regulons
in α-proteobacteria
+
-
Hyphomonadaceae
Sphingomonadales
+
-
Bartonella
Bradyrhizobiaceae
BME
Group
D.
#?' in RirA column denotes
the absence of the rirA gene
in an unfinished genomic sequence
and the presence of candidate
RirA-binding sites upstream of
the iron uptake genes.
Phylogenetic tree of the Fur family of transcription factors in a-proteobacteria - I
Fur
sp|
Escherichia coli: P0A9A9
ECOLI
Pseudomonas aeruginosa
PSEAE
NEIMA
Neisseria meningitidis
: sp|Q03456
: sp|P0A0S7
Fur in g- and b- proteobacteria
HELPY Helicobacter pylori : sp|O25671
Bacillus subtilis : P54574
sp|
BACSU
SM mur
Sinorhizobium meliloti
Mesorhizobium sp. BNC1 (I)
MBNC03003179
BQ fur2
Bartonella quintana
BMEI0375
Brucella melitensis
EE36 12413 Sulfitobacter sp. EE-36
MBNC03003593Mesorhizobium sp. BNC1 (II)
HTCC2654
Rhodobacterales bacterium
RB2654 19538
Agrobacterium
tumefaciens
AGR C 620
RHE_CH00378 Rhizobium etli
Rhizobium leguminosarum
RL mur
Nham 0990 Nitrobacter hamburgensis X14
Nwi 0013
Nitrobacter winogradskyi
Rhodopseudomonas palustris
RPA0450
Bradyrhizobium japonicum
BJ fur
Roseovarius sp.217
ROS217 18337
Jannaschia sp. CC51
Jann 1799
Silicibacter pomeroyi
SPO2477
STM1w01000993Silicibacter sp. TM1040
MED193 22541 Roseobacter sp. MED193
OB2597 02997 Oceanicola batsensis HTCC2597
Loktanella vestfoldensisSKA53
SKA53 03101
Rhodobacter sphaeroides
Rsph03000505
Roseovarius nubinhibens ISM
ISM 15430
PU1002 04436Pelagibacter ubiqueHTCC1002
GOX0771 Gluconobacter oxydans
Zmomonas
y
mobilis
ZM01411
Novosphingobium aromaticivorans
Saro02001148
Sphinopyxis alaskensis RB2256
Sala 1452
ELI1325
Erythrobacter litoralis
Oceanicaulis alexandrii HTCC2633
OA2633 10204
PB2503 04877 Parvularcula bermudensis HTCC2503
CC0057
Caulobacter crescentus
Rhodospirillum rubrum
Rrub02001143
(I)
Magnetospirillum magneticum
Amb1009
Magnetospirillum magneticum (II)
Amb4460
Fur in e- proteobacteria
Fur in Firmicutes
Mur
in a-proteobacteria
Regulator of manganese
uptake genes (sit, mntH)
Fur
in a-proteobacteria
Regulator of iron uptake
and metabolism genes
Irr
a-proteobacteria
Erythrobacter litoralis
Caulobacter crescentus
Zymomonas mobilis
Novosphingobium aromaticivorans
Oceanicaulis alexandrii
Sphinopyxis alaskensis
Gluconobacter oxydans
Rhodospirillum rubrum
Parvularcula bermudensis -
Magnetospirillum magneticum
Identified Mur-binding sites
The A, B, and C groups
of a - proteobacteria
-
Sequence logos for
the identified
Fur-binding sites
in the D group of
a-proteobacteria
Bacillus subtilis
Mur
Escherichia coli
Sequence logos for
the known
Fur-binding sites
in Escherichia coli
and Bacillus subtilis
Phylogenetic tree of the Fur family of transcription factors in a-proteobacteria - II
Fur
Escherichia coli : P0A9A9
sp|
ECOLI
Pseudomonas aeruginosa : sp|Q03456
PSEAE
NEIMA
Fur in g- and b- proteobacteria
Neisseria meningitidis : sp|P0A0S7
HELPY Helicobacter pylori : sp|O25671
sp|
BACSU Bacillus subtilis : P54574
Fur in e- proteobacteria
Fur in Firmicutes
a-proteobacteria
Mur / Fur
Agrobacterium tumefaciens
AGR C 249
Sinorhizobium meliloti
SM irr
Rhizobium etli
RHE CH00106
Rhizobium leguminosarum (I)
RL irr1
RL irr2 Rhizobium leguminosarum (II)
Mesorhizobium loti
MLr5570
MBNC03003186 Mesorhizobium sp. BNC1
BQ fur1 Bartonella quintana
Brucella melitensis (I)
BMEI1955
Brucella melitensis (II)
BMEI1563
BJ blr1216 Bradyrhizobium japonicum (II)
RB2654 182 Rhodobacterales bacterium HTCC2654
Loktanella vestfoldensis SKA53
SKA53 01126
Roseovarius sp.217
ROS217 15500
Roseovarius nubinhibens ISM
ISM 00785
OB2597 14726 Oceanicola batsensis HTCC2597
Jann 1652 Jannaschia sp. CC51
Rsph03001693Rhodobacter sphaeroides
Sulfitobacter sp. EE-36
EE36 03493
STM1w01001534 Silicibacter sp. TM1040
Roseobacter sp. MED193
MED193 17849
SPOA0445
Silicibacter pomeroyi
Rhodobacter capsulatus
RC irr
RPA2339
Rhodopseudomonas palustris (I)
RPA0424*
Rhodopseudomonas palustris (II)
Bradyrhizobium japonicum (I)
BJ irr*
Nwi 0035* Nitrobacter winogradskyi
Nham 1013* Nitrobacter hamburgensis X14
PU1002 04361
Pelagibacter ubique HTCC1002
Irr in a-proteobacteria
regulator of iron
homeostasis
Sequence logos for the identified Irr binding sites in a-proteobacteria
The A group (8 species) - Irr
The B group (4 species) - Irr
The C group (12 species) - Irr
Phylogenetic tree of the Rrf2 family of transcription factors in a-proteobacteria
Nitrite/NO-sensing regulator NsrR
(Nitrosomonas europeae, Escherichia coli)
ROS217_15206
Rsph03001477
RC NsrR
GOX0860
Amb1318
Nwi_0743
Iron repressor RirA
(Rhizobium leguminosarum)
SPOA0186
Ricket.
Sala_1049
Saro02000305
NE NsrR
OB2597_05195
ROS217_02155
ROS217_14291
SMc00785
RHE CH00735
AGR_C_344
Cysteine metabolism
repressor CymR
(Bacillus subtilis)
AGR_L_1131
SPO3722
RHE_CH02777
RL_3336
SPO1393
MBNC02000669
MLl1642
SMc02238
AGR_C_872
RHE_CH00547
OA2633_11510
RL RirA
BMEII0707
MLr1147
MBNC02002196
BQ04990
RC 0780
RB2654_19993
Rsph023178
SPO0432
MED193_09800
STM_634
Positional clustering of rrf2-like genes with:
iron uptake and storage genes;
Fe-S cluster synthesis operons;
genes involved in nitrosative stress protection;
sulfate uptake/assimilation genes;
CC0132
thioredoxin reductase;
SMc01160
BJ blr7974
carboxymuconolactone
RL_5159
AGR_L_2343
decarboxylase-family genes;
AGR_C_402
hmc cytochrome operon
NsrR
RirA
RL_619
ZMO0116
ROS217_16231
GOX0099
BS CymR
IscR-II
Rrub02000219
ZMO0422
Sala_1236
IscR
ELI0458
Saro3534
DV Rrf2
OA2633_03246
CC1866
EC IscR
Jann_2366
STM_3629
EE36_14302
SPO2025
Rsph023725
RC_0477
Rrub_1115
Amb0200
GOX1196
RPA0663
Ricket.
Cytochrome complex
regulator Rrf2
(Desulfovibrio vulgaris)
Iron-Sulfur cluster
synthesis repressor IscR
(Escherichia coli)
PB2503_ 09884
proteins with the conserved C-X(6-9)-C(4-6)-C motif within effector-responsive domain
proteins without a cysteine triad motif
Sequence logos for the identified RirA-binding sites in a-proteobacteria
The A group - RirA (8 species)
The C group - RirA (12 species)
Distribution of the conserved members of the Fe- and Mn-responsive regulons
and the predicted RirA, Fur/Mur, Irr, and DtxR binding sites in a-proteobacteria
Genes Functions:
Iron uptake
Iron storage
FeS synthesis
Iron usage
Heme biosynthesis
Regulatory genes
Manganese uptake
An attempt to reconstruct the history
Regulators and their signals
• Subtle changes at close evolutionary
distances
• Cases of motif conservation at surprisingly
large distances
• Correlation between contacting
nucleotides and amino acid residues
DNA signals and protein-DNA interactions
Entropy at aligned sites and the number of contacts
(heavy atoms in a base pair at a distance <cutoff from a protein atom)
CRP
PurR
IHF
TrpR
Specificity-determining positions
in the LacI family
• Training set: 459 sequences,
average length: 338 amino acids,
85 specificity groups
– 44 SDPs
10 residues contact NPF (analog of
the effector)
7 residues in the effector contact zone
(5Ǻ<dmin<10Ǻ)
6 residues in the intersubunit
contacts
5 residues in the intersubunit
contact zone (5Ǻ<dmin<10Ǻ)
7 residues contact the operator
sequence
6 residues in the operator contact
zone (5Ǻ<dmin<10Ǻ)
LacI from E.coli
The LacI family:
subtle changes in signals at close distances
G
A
CG
Gn GC
n
CRP/FNR family of regulators
TGTCGGCnnGCCGACA
CooA
Desulfovibrio
TTGTGAnnnnnnTCACAA
FNR
Gamma
TTGATnnnnATCAA
HcpR
Desulfovibrio
TTGTgAnnnnnnTcACAA
Correlation between contacting
nucleotides and amino acid residues
•
•
•
•
DD
DV
EC
YP
VC
DD
DV
EC
YP
VC
CooA in Desulfovibrio spp.
CRP in Gamma-proteobacteria
HcpR in Desulfovibrio spp.
FNR in Gamma-proteobacteria
COOA
COOA
CRP
CRP
CRP
HCPR
HCPR
FNR
FNR
FNR
ALTTEQLSLHMGATRQTVSTLLNNLVR
ELTMEQLAGLVGTTRQTASTLLNDMIR
KITRQEIGQIVGCSRETVGRILKMLED
KXTRQEIGQIVGCSRETVGRILKMLED
KITRQEIGQIVGCSRETVGRILKMLEE
DVSKSLLAGVLGTARETLSRALAKLVE
DVTKGLLAGLLGTARETLSRCLSRMVE
TMTRGDIGNYLGLTVETISRLLGRFQK
TMTRGDIGNYLGLTVETISRLLGRFQK
TMTRGDIGNYLGLTVETISRLLGRFQK
Contacting residues:
REnnnR
TG: 1st arginine
GA: glutamate and 2nd
arginine
TGTCGGCnnGCCGACA
TTGTGAnnnnnnTCACAA
TTGTgAnnnnnnTcACAA
TTGATnnnnATCAA
The
correlation
holds for
other
factors in
the family
Open problems
• Model the evolution of regulatory systems (a catalog of
elementary events, estimates of probabilities)
–
–
–
–
–
–
Birth of a binding site; what are the mechanisms?
Loss of a binding site
Duplication of a regulated gene and/or a regulator
Horizontal transfer of a regulated gene and/or a regulator
Loss of structural a gene and/or a regulator
General properties?
• Distribution of TF family and regulon sizes
• Stable cores and flexible margins of functional systems (in terms of gene
presence and regulation)
• Co-evolution of TFs and DNA sites:
– “Neutral” model for the evolution of binding sites (with invariant functional
pressure from the bound protein)
– How do the signals evolve? What is the driving force – changes in TFs?
– TF-family, position-specific protein-DNA recognition code?
All that needs to take into account the incompleteness and noise in
the data
Acknowledgements
• Andrei A. Mironov (algorithms and software)
• Alexandra B. Rakhmaninova (SDPs)
• Dmitry Rodionov (now at Burnham Institute) (BioR,
NrdR, iron)
• Olga Laikova (LacI, sugars)
• Dmitry Ravcheev (FruR)
• Olga Kalinina (SDPs/LacI)
• Leonid Mirny, MIT (protein/DNA contacts, SDPs)
• Andy Johnston, University of East Anglia (iron)
•
•
•
•
Howard Hughes Medical Institute
Russian Fund of Basic Research
Russian Academy of Sciences, program “Molecular and Cellular Biology”
INTAS