Riboswitches: the oldest regulatory system?

Download Report

Transcript Riboswitches: the oldest regulatory system?

Evolution of bacterial
regulatory systems
Mikhail Gelfand
Research and Training Center “Bioinformatics”
Institute for Information Transmission Problems
Moscow, Russia
CASB-20, UCDS, La Jolla, 13-14.III.2009
Plan
• Co-evolution of transcription factors
and their binding motifs
• Evolution of regulatory systems and
regulons
Regulators and their motifs
• Cases of motif conservation at
surprisingly large distances
• Subtle changes at close evolutionary
distances
• Correlation between contacting
nucleotides and amino acid residues
NrdR (regulator of ribonucleotide reducases
and some other replication-related genes):
conservation at large distances
DNA motifs and protein-DNA interactions
Entropy at aligned sites and the number of contacts
(heavy atoms in a base pair at a distance <cutoff from a protein atom)
CRP
PurR
IHF
TrpR
The CRP/FNR family of regulators
TGTCGGCnnGCCGACA
CooA
Desulfovibrio
TTGTGAnnnnnnTCACAA
FNR
Gamma
TTGATnnnnATCAA
HcpR
Desulfovibrio
TTGTgAnnnnnnTcACAA
Correlation between contacting
nucleotides and amino acid residues
•
•
•
•
DD
DV
EC
YP
VC
DD
DV
EC
YP
VC
CooA in Desulfovibrio spp.
CRP in Gamma-proteobacteria
HcpR in Desulfovibrio spp.
FNR in Gamma-proteobacteria
COOA
COOA
CRP
CRP
CRP
HCPR
HCPR
FNR
FNR
FNR
Contacting residues: REnnnR
TG: 1st arginine
GA: glutamate and 2nd arginine
ALTTEQLSLHMGATRQTVSTLLNNLVR
ELTMEQLAGLVGTTRQTASTLLNDMIR
KITRQEIGQIVGCSRETVGRILKMLED
KXTRQEIGQIVGCSRETVGRILKMLED
KITRQEIGQIVGCSRETVGRILKMLEE
DVSKSLLAGVLGTARETLSRALAKLVE
DVTKGLLAGLLGTARETLSRCLSRMVE
TMTRGDIGNYLGLTVETISRLLGRFQK
TMTRGDIGNYLGLTVETISRLLGRFQK
TMTRGDIGNYLGLTVETISRLLGRFQK
TGTCGGCnnGCCGACA
TTGTGAnnnnnnTCACAA
TTGTgAnnnnnnTcACAA
TTGATnnnnATCAA
The
correlation
holds for
other
factors in
the family
The LacI family:
subtle changes in motifs at close distances
G
A
CG
Gn GC
n
The LacI family: systematic analysis
• 1369 DNA-binding domains in 200 orthologous
rows <Id>=35%, <L>=71 а.о.
• 4484 binding sites, L=20н., <Id>=45%
• Calculate mutual information between columns of
TF and site alignments
• Set threshold on mutual information of correlated
pairs
Definitions
Protein alignment
Sites
LAFDHDQILQMAQERLQGKVRYQP-IGFELLPEKFSLRQLQRMYETVLGRS---LDKRNF
LAFDHNQILDYGYQRLRNKLEYSP-IAFEVLPELFTLNDLFQLYTTVLGED--FADYSNF
LSFDHNEILAYGHRRLRNKLEYSP-VAFEVLPEMFTLNDLYQLYTTVLGEN--FSDYSNF
LSFDHNEILAYGHRRLRNKLEYSP-VAFEVLPEMFTLNDLYQLYTTVLGEN--FSDYSNF
LAFDHSKILAYGHRRLCNKLEYSP-VAFDVLPEYFTLNDLYQFYSTVLGAN--FSDYSNF
LAFDHSKILAYGHRRLCNKLEYSP-VAFDVLPEYFTLNDLYQFYSTVLGAN--FSDYSNF
LAFDHSKILAYGHRRLCNKLEYSP-VAFDVLPEYFTLNDLYQFYSTVLGAN--FSDYSNF
LAFDHNQILDYGYQRLRNKLEYSP-IAFEVLPELFTLNDLFQLYTTVLGED--FADYSNF
LSFDHNEILAYGHRRLRNKLEYSP-VAFEVLPEMFTLNDLYQLYTTVLGEN—-FSDYSNF
LSFDHNEILAYGHRRLRNKLEYSP-VAFEVLPEMFTLNDLYQLYTTVLGEN--FSDYSNF
tTAaTGgCTTTAtGcCACTAT
TTAaaGTAAtAaTTACCATAA
AaAtTGTCTTTAtGcCACTAT
TTATGGTAAATTcTACCATAA
TTATGGTAAATTcTACCATAA
TTATgGTCAgTTTcACcAaAA
TTaGTCgAAATAaccaACtAA
TTATCGTCAtCtcGACGACAA
TttAGGTAAgTTATACTTTTA
tTAaTGgCTTTAtGcCACTAT
j
i
Mutual information
4
20
I (i, j )   pi , j (a, n)
n 1 a 1
I~i , j
Zi , j
log pi , j (a, n)
pi (a) p j (n)
I i , j  E ( I~i , j )

 ( I~ )
i, j
Z-score
Correlated pairs
Higher order correlations
-ATIKDVAKRANVSTTTV- AATTGTGAGCGCTCACT
SL
SQ
TL
TQ
Not a phylogenetic trace
41 _S R _GGA
1[ 4]_S R _ GA A _GC A
36 [10 ]_A R _GA A
56 _S R _GGT
99 [6]_ S R _GC A
98 _S R _A C A _ GCA
11 0[2 ]_A R _GT A
10 0[3 ]_A R _GT A
10 1[3 ]_A R _GA A
10 2_A R _ GT A
37 _A R _GT A
40 _A R _GA
A A R _GA A _GC A
97 [2]_
16 _S R _GA A _ GAC 43 [2]_ S R _GA A
84 [13 ]_A R _GA A _ GT G
13 [3]_ S R _GC A
42 [3]_ A R _GT A
11 5[5 ]_A R _GA A
10 [3]_ S R _GC A
85 [2]_ A R _GGA _GT A
12 [3]_ S R _GC A
11 [12 ]_S R _GC A _ GA A
93 _A R _GA 86
A _A R _GA A _ GCA 14 [3]_ S R _GA A _GA C
87
[3]_
A
R _GT A
94 _A R _GT A
49 [7]_ A R _GA A _GGA
88 [5]_ A R _GT A
11 4[5 ]_A R _GT A _ GA T
35 [19 ]_T R _C A A _ GA A
91 _A R _GA A
83 _S R _GA A
50 [2]_ A R _GA A
11 7[1 8]_ SR _ GGG_GGT
90 [3]_ A R _GT A
30 _S R _GA A _ GCA
82 [5]_ S R _GC A _A C A
29 [12 ]_S R _GC A _ GA A
92 [3]_ S R _GA A
17 [11 ]_A R _GT A _ GA A
32 [5]_ S R _GC A _GA A 89 [3]_ A R _GT A
53 _S R _T A A _ GAA
31 _S R _GA A _ GCA
75 [4]_ A R _GT A _T T A 51 [14 ]_S R _GGA _ T GA
27 [8]_ S R _GC A _A C A
54 _S R _GGA _ GT A
28 [4]_ S R _GC A _A C A
52 [8]_ S R _GGA _A A A
26 [4]_ A R _GT A _GC A
55 [4]_ S R _GGA _GA A
57 [3]_ S R _GGG_GC A
39 [2]_ A R _GA A25 [5]_ S R _GC A _GA A
6_AMR _GT T _GGT
9[ 7]_A R _ GT A _GA
38 [3]_ A R _GA A
18 [4]_ A R _GT A
23 [3]_ A R _GT A _GA A
46 [7]_ S R _GGG_GGA
33 [3]_ A R _GA A
21 [2]_ A R _GC A _GT A
44 [2]_ S R _GGA _GC A
20 _S R _GC A
45 _S R _GA A
7[ 8]_MR _ GA T _GT T
19 _A R _GT A _ GAA
47 [3]_ A R _GA A _GT A
5[ 7]_MR _ GT T _T T T
48 _A R _GA A _ GGT
8[ 10]_ MR _GT T _GA T
34 [11 ]_A R _GT A _ GA A
72 _T R _GA A _ GGA
4[ 5]_S R _ GGC _GGT
15 [2]_ S R _GC A _GA A
11 6_S R _ GA A _GC A
73 _T R _GA A _ GAG
74 [2]_ A R _GT A 11 2[8 ]_A R _GT A _ GA A
58 [4]_ A R _GT T _GT A
79 [4]_ S R _GGT _GGA
59 _T R _GA A _ GT T
62 [8]_ S R _GGA _GGT
10 7[2 ]_A R _GA A _ GGA
61 [8]_ S R _GA A
10 9[3 ]_A R _GT A _ GA A
76 _T R _GA A _ GT A
10 3[9 ]_A R _GT A _ GA A
63 [3]_ T R _GA A
60 [8]_ T R _GA A
10 6_A R _ GT A _GA T
78 [3]_ T R _C GA _GA
10 5[6
A ]_A R _GT A _ GGA
65 _S R _GA A
10 4[2 2]_ AR _ GT A _GA A
66 [4]_ T R _GA A
10 8[3 1]_ AR _ GT A _GA A
64 _T R _GA A
67 _S R _GGT
70 [3]_ S R _GGA
69 _A R _GA A
68 _S R _GA A _ GGT
80 [5]_ S R _GGT _GGA
71 [4]_ A R _GT A
11 3[2 ]_S R _GA A
11 1[7 ]_A R _GA A _ GA T
3[ 30]_ MR _GA T _GT T
96 [3]_ S R _GA A
22 [7]_ A R _GT A _GA A
2[ 54]_ MR _GT T _GA T
81 [65 ]_S R _GC A _ GA A
24 [17 ]_MR _GT T _ T T
95T[12 ]_MR _GT T _ GA T
77 _MR _GT T
NrtR (regulator of NAD metabolism)
Comparison with the recently solved structure:
correlated positions indeed bind the DNA
(more exactly, form a hydrophobic cluster)
Catalog of events
• Expansion and contraction of regulons
• New regulators (where from?)
• Duplications of regulators with or without
regulated loci
• Loss of regulators with or without regulated
loci
• Re-assortment of regulators and structural
genes
• … especially in complex systems
• Horizontal transfer
Regulon expansion, or
how FruR has become CRA
• CRA (a.k.a. FruR) in Escherichia coli:
– global regulator
– well-studied in experiment
(many regulated genes known)
• Going back in time: looking for candidate
CRA/FruR sites upstream of (orthologs of)
genes known to be regulated in E.coli
Common ancestor of gamma-proteobacteria
Mannose
Glucose
manXYZ
ptsHI-crr
edd
epd
eda
adhE
aceEF
Mannitol
mtlA
gapA
fbp
Fructose
pykF
mtlD
fruBA
fruK
pfkA
pgk
gpmA
icdA
ppsA
pckA
aceA
tpiA
aceB
Gamma-proteobacteria
Common ancestor of the Enterobacteriales
Mannose
Glucose
manXYZ
ptsHI-crr
edd
epd
eda
adhE
aceEF
Mannitol
mtlA
gapA
fbp
Fructose
pykF
mtlD
fruBA
fruK
pfkA
pgk
gpmA
icdA
ppsA
pckA
aceA
tpiA
aceB
Gamma-proteobacteria
Enterobacteriales
Common ancestor of Escherichia and Salmonella
Mannose
Glucose
manXYZ
ptsHI-crr
edd
epd
eda
adhE
aceEF
Mannitol
mtlA
gapA
fbp
Fructose
pykF
mtlD
fruBA
fruK
pfkA
pgk
gpmA
icdA
ppsA
pckA
aceA
tpiA
aceB
Gamma-proteobacteria
Enterobacteriales
E. coli and Salmonella spp.
Regulation of amino acid biosynthesis
in the Firmicutes
• Interplay between regulatory RNA elements and
transcription factors
• Expansion of T-box systems (normally – RNA
structures regulating aminoacyl-tRNA-synthetases)
Recent duplications and bursts:
ARG-T-box in Clostridium difficile
LR_ARGS
CPE_ARGS
CAC_ARGS
CB_ARGS
CBE_ARGS
Lactobacillales
CTC_ARGS
LP_ARGS
LME_ARGS
Clostridiales
argS
argS
LJ_ARGS
CDF_YQIXYZ
LGA_ARGS
RDF02391
PPE_ARGS
LSA_ARGS
СDF_ARGC
BC_ARGS2
EF_ARGS
BH_ARGS
CDF_ARGH
Bacillales
argS
: ARG-specific T-box regulatory site
yqiXYZ
NEW
NEW
aminoacyl-tRNA synthetase
biosynthetic genes
amino acid transporters
Clostridium
difficile
RDF02391
argCJBDF
argH
others
argG
predicted
amino acid
transporters
amino acid
biosynthetic
genes
… caused by loss of transcription factor AhrC
Gram+ bacteria:
Clostridium
difficile:
AhrC regulatory protein
(negative regulation of arginine metabolism
positive regulation of arginine catabolism)
Binding to 5’ UTR gene region
regulation of gene expression
5’
...
AhrC site
AhrC is lost
Expansion of T-box regulon
regulation of expression of
arginine biosynthetic
and transport genes by
T-box antitermination
Other clostridia spp.
(CA, CTC, CTH, CPE, CB, CPE)
yqiXYZ
yqiXYZ
argC
argH
argC
argH
argG
: AhrC binding site
: ARG-specific T-box regulatory site
CH_HISS
Bacillales
Other Gram+
hisS aspS
CTH_HISS
Lactobacillales
ASP\ASN
his operon
DRE_HISS
HIS
TTE_HISS
ASP
GAC
his XYZ
PL_HISS
Rapid mutation
of regulatory codons
NEW
BE_HISS
ASN
AAC
BL_HISS
BS_HISS
BC_HISS
LRE_HISXYZ
LSA_HISXYZ
OOE_HISXYZ SGO_HISC
SMU_HISC
Z
XY
HIS
_
LP
EF_HISXYZ
OB_HISS
Duplications
and changes in
specificity:
ASN/ASP/HIS
T-boxes
BCL_HISS
HIS
BH_HISS
EX_HISS
LME_HISXYZ
CDF_HISZX
EF_HISS
LMO_HISXYZ
EF_HISXYZ
LME_HIS(Z\G)
LL_HISC
LP_HISZ
Clostridiales
CPE_ASNS2
CDF_ASNA
CB_ASNS2
CDF_ASNS2
CTC_ASNA
asnS
ASN
LCA_HISZ
CB_ASNS3
CAC_ASNS32
asnA
BC_ASNS2
BC_ASNA
ASN
CBE_ASNS2
P. pentosaceus
asnS
CTC_ASNS2
CPE_ASNA
ASP
PPE_HISXYZ
Lactobacillales
hisS aspS
PPE_ASNS
EX_ASNA
LCA_HISS
ASP
hisXYZ
HIS
LB_ASNA
LB_ASNS2
LJ_HISS
LP_ASNA
PPE_ASNA
Lactobacillales
asnS
ASN
LB_HISS
asnA
LRE_ASPS
LP_HISS PPE_HISS
L. reuteri
aspS
ASP
hisS
HIS
LRE_HISS
ASN
LJ_ASNA
L. johnsonii
asnA
LJ_glnQHMP
LD_ASNA
ASN
glnQHMP
ASP
SG_ASPS2 SMU_ASPS2
Blow-up 1
LCA_HISS
LJ_HISS
PPE_HISXYZ
PPE_ASNS2
LB_HISS
LRE_ASPS
LB_ASNA
LP_HISS PPE_HISS
PPE_ASNA
LP_ASNA
LRE_HISS
ASN
AAC
HIS
CAC
P. pentosaceus
asnS
ASP
LJ_ASNA
hisXYZ
LJ_GLNQHMP
ASP
ASN
AAC
HIS
CAC
GAC
ASP
GAC
Lactobacillales
Lactobacillales
asnA
hisS aspS
ASN
ASP
L. reuteri
L. johnsonii
aspS
hisS
HIS
LD_ASNA
ASP
disruption of hisS-aspS operon
mutation of regulatory codon
asnA
ASN
glnQHMP
ASP
HIS
Blow-up 2. Prediction
Regulators
lost in
lineages
with
expanded
HIS-T-box
regulon??
… and validation
• conserved motifs upstream of HIS biosynthesis genes
Bacillales
(his operon)
Clostridiales
Thermoanaerobacteriales
Halanaerobiales
Bacillales
• candidate transcription factor yerC co-localized with the his genes
• present only in genomes with the motifs upstream of the his genes
• genomes with neither YerC motif nor HIS-T-boxes: attenuators
The evolutionary history of the his genes
regulation in the Firmicutes
T-boxes: Summary / History
Life without Fur
Regulation of iron homeostasis
(the Escherichia coli paradigm)
Iron:
• essential cofactor (limiting in many environments)
• dangerous at large concentrations
FUR (responds to iron):
• synthesis of siderophores
• transport (siderophores, heme, Fe2+, Fe3+)
• storage
• iron-dependent enzymes
• synthesis of heme
• synthesis of Fe-S clusters
Similar in Bacillus subtilis
Regulation of iron homeostasis in α-proteobacteria
[- Fe]
[+Fe]
[ - Fe]
[+Fe]
RirA
RirA
Irr
Irr
FeS
heme
degraded
Siderophore
uptake
2+
3+
Fe / Fe
uptake
Iron uptakesystems
Fur
[- Fe]
Iron storage
ferritins
FeS
synthesis
Heme
synthesis
Iron-requiring
enzymes
[ironcofactor]
Fur
IscR
Fe
FeS
Transcription
factors
FeS status
of cell
[+Fe]
Experimental studies:
• FUR/MUR: Bradyrhizobium, Rhizobium and Sinorhizobium
• RirA (Rrf2 family): Rhizobium and Sinorhizobium
• Irr (FUR family): Bradyrhizobium, Rhizobium and Brucella
Distribution of
transcription
factors in
genomes
Search for
candidate
motifs and
binding sites
using
standard
comparative
genomic
techniques
Regulation of genes
in functional
subsystems
Rhizobiales
Bradyrhizobiaceae
Rhodobacteriales
The Zoo (likely
ancestral state)
Reconstruction of history
Frequent
co-regulation
with Irr
Strict division
of function
with Irr
Appearance of the
iron-Rhodo motif
All logos and Some Very
Tempting Hypotheses:
Cross-recognition of
FUR and IscR motifs
in the ancestor.
2. When FUR had
become MUR, and
IscR had been lost in
Rhizobiales, emerging
RirA (from the Rrf2
family, with a rather
different general
consensus) took over
their sites.
3. Iron-Rhodo boxes
are recognized by
IscR: directly
2
1.
testable
1
3
Summary and open problems
• Regulatory systems are very flexible
–
–
–
–
easily lost
easily expanded (in particular, by duplication)
may change specificity
rapid turnover of regulatory sites
• With more stories like these, we can start thinking about
a general theory
– catalog of elementary events; how frequent?
– mechanisms (duplication, birth e.g. from enzymes, horizontal
transfer)
– conserved (regulon cores) and non-conserved (marginal regulon
members) genes in relation to metabolic and functional
subsystems/roles
– (TF family-specific) protein-DNA recognition code
– distribution of TF families in genomes; distribution of regulon
sizes; etc.
People
•
•
•
•
•
Andrei A. Mironov – software, algorithms
Alexandra Rakhmaninova – SDP, protein-DNA correlations
•
•
•
•
•
•
•
Anna Gerasimova (now at LBNL) – NadR
Olga Kalinina (on loan to EMBL) – SDP
Yuri Korostelev – protein-DNA correlations
Olga Laikova – LacI
Dmitry Ravcheev– CRA/FruR
Dmitry Rodionov (on loan to Burnham Institute) – iron etc.
Alexei Vitreschak – T-boxes and riboswitches
•
•
•
Andy Jonson (U. of East Anglia) – experimental validation (iron)
Leonid Mirny (MIT) – protein-DNA, SDP
Andrei Osterman (Burnham Institute) – experimental validation
Howard Hughes Medical Institute
Russian Foundation of Basic Research
Russian Academy of Sciences, program “Molecular and Cellular Biology”