Transcript sews4 7185

A biophysical approach to
predicting intrinsic and extrinsic
nucleosome positioning signals
Alexandre V. Morozov
Department of Physics & Astronomy and
the BioMaPS Institute for Quantitative Biology,
Rutgers University
[email protected]
IPAM, Nov. 26 2007
Introduction to chromatin scales
Electron micrograph of D.Melanogaster
chromatin: arrays of regularly spaced
nucleosomes, each ~80 A across.
Overview of gene regulation
RNA Pol II + TAFs
[mRNA]
[TF1]
[TF2]
[TF3]
Gene
[Nucleosomes]
Prediction and design of gene expression levels from
DNA sequence:
1. Prediction of transcription factor and nucleosome
occupancies in vitro and in vivo from genomic sequence
2. Prediction of levels of mRNA production from transcription
factor and nucleosome occupancies
Data for modeling eukaryotic gene
regulation
Available data sources: …accagtttacgt…




DNA sequence data for multiple organisms:
Genome-wide transcription factor
occupancy data (ChIP-chip):
Structural data for 100s of protein-DNA complexes:
Nucleosome positioning data: MNase digestion + sequencing or
microarrays
Biophysical picture of gene transcription
Wray, G. A. et al. Mol Biol Evol 2003 20:1377-1419
Chromatin Structure & Nucleosomes
Structure of the nucleosome core particle (NCP)
Left-handed super-helix: (1.84 turns, 147 bp, R = 41.9 A, P = 25.9 A)
PDB code: 1kx5
T.J.Richmond: K.Luger et al. Nature 1997 (2.8 Ǻ); T.J.Richmond & C.A.Davey Nature 2003 (1.9 Ǻ)
Gene regulation through chromatin structure
 Transcription factor – DNA interactions are affected by the chromatin
 Chromatin remodeling by ATP-dependent complexes
 Histone variants (H2A.Z)
 Post-translational histone modifications
H2A
(“histone code”)
H2B
H3
H4
H3 tail
Experimental validation of the
histone-DNA interaction model
Jon Widom


Adding key dinucleotide motifs increases nucleosome affinity
Deleting dinucleotide motifs or disrupting their spacing decreases affinity
dyad
8
28
48
58
68
78
88
98
108
118
128
138
c t ggagaat c c c ggt gc c gaggc c gc t c aat t ggt c gt agc aagc t c t agc ac c gc t t aaac gc ac gt ac gc gc t gt c c c c c gc gt t t t aac c gc c aaggggat t ac t c c c t agt c t c c aggc ac gt gt c agat at at ac at c c t gt
c t ggagat ac c c ggt gc t aaggc c gc t t aat t ggt c gt agc aagc t c t agc ac c gc t t aaac gc ac gt ac gc gc t gt c t ac c gc gt t t t aac c gc c aat aggat t ac t t ac t agt c t c t aggc ac gt gt aagat at at ac at c c t gt
gt c gt agc aagc t c t agc ac c gc t t aaac gc ac gt ac gc gc t gt c t ac c gc gt t t t aac c gc c aat aggat t ac t t ac t ag
g1
g2
g3
g4
g5
at ggat c c t t gc aagc t c t t ggt gc gc t t t t t c ggc t gt t gac gc c c t gt t c ggc agt t t t t gc gc ac c t t gagc c c c c t c t c c ggaat t c ac
at ggat c c gc gc aagc t c gc ggt gc gc t t aaac ggc t ggc gac gc c c t ggc c ggc agt t t aagc gc ac c gc gagc c c c c t c t c c ggaat t c ac
at ggat c c t c gc aagc gagc t t t gc t aggc c c c gt c t gt c gc c t c ac gggac ggaaggggc c t agc ac agc t c gc c c c c gc t c c ggaat t c ac
at ggat c c at gc aagc t c at ggt gc gc aat t t c ggc t gat gac gc c c t gat c ggc agaaat t gc gc ac c at gagc c c c c t c t c c ggaat t c ac
at ggat c c at gc aagc t c at ggt gc gc c c gggc ggc t gat gac gc c c t gat c ggc agc c c gggc gc ac c at gagc c c c c t c t c c ggaat t c ac
c t ggagaat c c c ggt gc c gaggc c gc t c aat t ggt c gt agc aagc t c t agc ac c gc t t aaac gc ac gt ac gc gc t gt c c c c c gc gt t t t aac c gc c aaggggat t ac t c c c t agt c t c c aggc ac gt gt c agat at at ac at c c t gt
at ggat c c t agc aagc t c t aggt gc gc t t aaac ggc t gt agac gc c c t at c c t gt ac ggc agt t t aagc gc ac c t agagc c t c c ggaat t c ac
at ggat c c t agc at ac t c t aggt t agc t t aaac t ac t gt agac t t ac t gt ac ggc agt t t aagc t aac c t agagt ac c c t c t c c ggaat t c ac
3.00
Relative affinity (fold to g1)
Relative affinity (fold to f1)
h1
h2
h3
38
2.00
1.00
0.00
f1
f2
f3
1.00
Relative affinity (fold to h1)
f1
f2
f3
18
0.80
0.60
0.40
0.20
0.00
1.00
0.80
0.60
0.40
0.20
0.00
g1
g2
g3
g4
g5
h1
h2
h3
Histone-DNA interaction model and DNA
flexibility
a cc gc t ta a a
c
g
cg
ta
c
tc
g
c
t
g
ca
a
C
G
gcc a a g
acc
g
g
ta
g
tt
a
GC
t
GC
AA
TT
TA
AA
TT
TA
g
g
c
c
AA
TT
TA
a
t
dyad
ct gt cc
cc
gc g
c
cg
ta
c
GC
dyad
GC
AA
TT
TA
AA
TT
TA
ta
AA
TT
TA
GC
AA
TT
TA

Nucleosome affinity depends on the presence and spacing of key
dinucleotide motifs (e.g. TA,CA)
Nucleosome affinity can be explained by DNA flexibility
GC

Base-pair steps are fundamental units for DNA
mechanics
Data-driven model for DNA elastic energy
(DNABEND)
Geometry distributions for TA steps in ~100
non-homologous protein-DNA complexes:
Quadratic sequence-specific
DNA elastic energy:
• mean = <θ>
• width ~ <(θ - <θ>)2>-1
• Matrix of force constants: F
Eel   6i, j 1 Fij (i   i  )( j    j  )
W.K. Olson et al., PNAS 1998
bs
Elastic rod model
DNA looping induced by a Lac repressor tetramer
Elastic energy and geometry of DNA
constrained to follow an arbitrary curve
(DNABEND)
Δr
Econstr   rbp 2
bp
Sequence-specific DNA elastic energy
Etot  Eel  wEconstr
Minimize Etot to determine
energy & geometry:
Etot
0
i
“Constraint” energy
System of linear equations: ½ x 6Nbs x 6Nbs
Example of DNA geometry prediction:
nucleosome structure
Ideal superhelix
Prediction for NCP (1kx5)
Predictions of nucleosome binding affinities
Experimental techniques:
 nucleosome dialysis
A.Thastrom et al., J.Mol.Biol. 1999,2004;
P.T.Lowary & J.Widom, J.Mol.Biol. 1998
 nucleosome exchange
T.E.Shrader & D.M.Crothers PNAS 1989;
T.E.Shrader & D.M.Crothers J.Mol.Biol. 1990
Alignment model (Segal E. et al. Nature 2006):
Collect nucleosome-bound sequences
in yeast
Center align sequences
Construct nucleosome-DNA model
using observed dinucleotide frequencies
Alignment Model (in vivo selection)
MNase digestion
Extract DNA, clone into plasmids
Sequence and center-align
AGGTTTATAG..
AGGTTAATCG..
AGGTAAATAA..
………………..
Di-nucleotide log score:
L 1
142-152 bp
  log[ P( Si 1 | Si ) / PB ( Si 1 )]
i 1
From nucleosome energies to probabilities and
occupancies
Nucleosome energy
Chromosomal coordinate
Use dynamic programming to find the partition function Z
and thus probabilities and occupancies of each DNA-binding
factor, e.g. nucleosomes
  exp[ E ( conf )]
Nucleosome Probability & Occupancy
Chromosomal coordinate
conf
Nucleosome occupancy is dynamic
Nucleosome-free site
TGACGTCA
Nucleosome-occluded site
TGACGTCA
Nucleosome is displaced
by the bound TF
TGACGTCA
Nucleosome occupancy of TATA boxes
explains gene expression levels
Nucleosome occupancy in the vicinity of
genes
Nucleosome occupancy in the vicinity of
TATA boxes: default repression
TATA
Functional sites by ChIP-chip:
in vivo genome-wide measurements
of TF occupancy
 Genome-wide occupancies for 203 transcription factors in yeast by ChIP-chip
(Harbison et al., Nature 2004: “Transcriptional regulatory code”)
 MacIsaac et al., BMC Bioinformatics 2006: “An improved map of
phylogenetically conserved regulatory sites”
(98 factor specificities + 26 more from the literature)
Nucleosome occupancy of transcription
factor binding sites: default repression
• <Occ(functional sites)> - <Occ(non-functional sites)>
• In vitro: nucleosomes compete for DNA sequence only with each other
DNABEND: Nucleosomes
p < 0.05
Nucleosome occupancy of
transcription factor binding sites
• <Occ(functional sites)> - <Occ(non-functional sites)>
• In vivo: nucleosomes compete for DNA sequence with TFs
DNABEND: Nucleosomes + TFs
p < 0.05
Functional transcription factor sites are
clustered
DNABEND: Nucleosomes + TFs, randomized functional sites
p < 0.05
functional sites
non-functional sites
Clustering!
Functional transcription factor sites are
not occupied by nucleosomes in vivo
Yuan et al. microarray experiment
DNABEND + Transcription Factors
DNABEND
Alignment model
Nucleosome-induced cooperativity
Nucleosome-occluded
TF sites: no separate
binding
Nucleosome-occluded
TF sites: cooperative
binding
TGACGTCA
TAAGGCCT
TGACGTCA
TAAGGCCT
Miller and Widom, Mol.Cell.Biol. 2003
Nucleosome occupancy of TF sites in a
model system
TF sites
pCYC1
Nucleosome-induced cooperativity:
example
Nucleosome position predictions:
GAL1-10 locus
GAL10
GAL1
Nucleosomes in vitro
Nucleosomes in vivo
TBP
GAL4
Nucleosome position predictions:
HIS3-PET56 locus
Nucleosomes in vitro
Nucleosomes in vivo
TBP
GCN4
Conclusions
Predicted histone-DNA binding affinities and genome-wide
nucleosome occupancies using a DNA mechanics model + a
thermodynamic model of nucleosomes competing with other factors
for genomic sequence
Chromatin structure around ORF starts is consistent with
microarray-based measurements of nucleosome positions, and can
be explained with a simple model of nucleosomes “phasing off” bound
TBPs
Nucleosome-induced cooperativity (brought about by clustering
of functional transcription factor binding sites) is responsible for
the increased accessibility of functional sites
Future Directions
Lots of nucleosome positioning sequences [soon to become]
available – can a better model of dinucleotide (base stacking)
energies be built? {Anirvan Sengupta, Rutgers}
Can such a model be used to inform a better DNA mechanics
model? Conversely, can a DNA mechanics model be “compressed”,
i.e. encapsulated in a simple set of dinucleotide energies? {Anirvan
Sengupta, Rutgers}
DNABEND extensions to non-nucleosome systems, i.e. nucleoid
proteins, DNA loops etc.? {John Marko, Jon Widom, Northwestern}
Prediction of in vivo nucleosome positions in gene expression
libraries {Ligr et al., Genetics 2006: random libraries of yeast
promoters; Lu Bai et al., unpublished}
Acknowledgements
PEOPLE:

Eric Siggia (Rockefeller University)

Jon Widom (Northwestern University)

Harmen Bussemaker (Columbia University)
FUNDING:


Leukemia & Lymphoma Society Fellowship
BioMaPS Institute, Rutgers University
Nucleosome occupancy of chromosomal
regions
Induced periodicity of stable nucleosomes
stable
stable
Nucleosome position predictions:
summary