Transcript Slide 1

Integrative Analysis of Multiple Genome-wide Chromatin State Maps
Jason Ernst, Pouya Kheradpour, Luke Ward, Nisha Rajagopal, Bing Ren, Brad Bernstein, Manolis Kellis
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA. Broad Institute of MIT and Harvard, Cambridge, MA.
Summary
A Multivariate Hidden Markov model for discovering and analyzing
Chromatin States
The multitude of histone modifications has led to the hypothesis of a histone
code, capable of encoding a plethora of epigenetic information about cellular
state. However, while a strictly combinatorial usage of these marks would imply
over thousands of possible distinct codes, only a small number of distinct and
biologically-meaningful sets of combinations of marks seem to occur in practice.
To systematically discover these hidden ‘chromatin states’, we have developed
ChromHMM, a Multivariate Hidden Markov Model that integrates ChIP-seq and
ChIP-chip datasets in a rigorous graphical probabilistic framework, where each
chromatin state is associated with a vector of emission probabilities indicating
the precise combination of chromatin marks that define it, and a vector of
transition probabilities encoding spatial relationships between neighboring
chromatin states. We have used ChromHMM to learn chromatin states de novo
across full epigenomes, revealing distinct classes of promoter, enhancer,
transcribed and repressed regions, and other types of epigenomic elements
independent of any previous annotations. We have also used ChromHMM to
study dynamic changes in chromatin states across multiple cell types, to classify
genes based on chromatin mark combinations across their entire length, and to
relate enhancer regions to likely target genes based on their co-variation in
multiple cell types. The overall framework enables us to annotate regions of the
genome likely associated with functional genetic variants, study motifs and
transcription factor binding in the context of epigenetic state, associate
chromatin state with gene expression state, and predict the most informative
combinations of marks to guide future experiments.
State correlation with expression as a function of position
0kb
-50kb
Emission probabilities in a state
are modeled with a product of
independent bernoulli random
variables
50kb
Lower expressed
Promoter
40-state model using 21 common marks in IMR90|ES
Transition Matrix
Repetitive Repressed Active Intergenic
Transcribed
One advantage of HMMs is
that they naturally capture
dependencies between
neighboring states, such as
those found between
promoter regions and
transcribed states, or in the
spreading of chromatin as is
found for large-scale
repressed regions
Higher expressed
Greedy
Mark
Ordering
fromIMR9
0 only
model with
chromatin
Dynamics of Chromatin States across Nine Human Cell Types
Joint work with Pouya Kheradpour, Tarjei S. Mikkelsen, Lucas D. Ward, Noam Shoresh, Charles B. Epstein, Bradley E. Bernstein
Chromatin marks from Epigenome Roadmap Reference Epigenome Mapping Center (REMC) led by Bing Ren, UCSD.
States in cell type II
Comparing models with different numbers of states
States in cell type
Example of chromatin state annotation
Pairwise fold enrichments
Intergenic SNP
identified
associated with
plasma
eosinophil
count levels in
Gudbjartsson et
al, 2009 found
in candidate
enhancer state.
Assessing predictive value for a subset of marks in a new cell type
Posterior Confusion Matrix
Promoter
State with Subset of 11 Marks
GWAS SNP data from Hindroff et al
Hemidesmosome assembly 34; 10-7
Anatomical structure morphogenesis 2.3; 10-13
Muscle structure development; 4.6; 10-6
A
B
C
D
E
F
G
H
I
State with All Marks
These states also showed distinct enrichments and depletions in
Genome-wide Association Studies, suggesting distinct epigenomic
roles in different diseases.
State 1
Blood vessel development; 5.1; 10-6
Strong enhancer (state 4)
Top motifs
NF-Y; Rfx; Jundm2; Myc; CTCF
ChIP-seq
Enrichments
State
in
HSMM
Top Biological Process GO
Category for TSS in
intersection
Corrected
P-value
Fold
1
1
Cellular macromolecule
metabolic process
<10-167
1.4
1
13
Immune Response
<10-21
6.8
3
3
Nervous system
development
<10-12
2.8
13
1
Extracellular structure
organization
0.008
8.0
13
13
Sensory perception of
smell
<10-203
4.5
Emission parameters from models
learned independently
Average Expression
Level of Nearest Gene
p53; Egr; TBX5; AREB6; AP-1
Enhancer activity patterns predict upstream regulators and
downstream gene targets.
p53; Bach2; Eomes; Nkx6-1; AP-2
Bach1; AP-1; NF-E2; Bach2; TEF-1
Bach2; TEF-1; Nrf-2; p53; Maf
RP58; HEN1; Mef2; Gfi1b; Msx-1
Ascl2; AP-4; Myf; HEN1; HEB
Bach1; Bach2; AP-1; NF-E2; Nrf-2
Bach2; Bach1; HEN1; Tal1beta; Arid3a
Ets; FEV; Tel2; Nanog; Sox
J
Immune response 3.4; 10-19
Positive regulation of T-cell
differentiation; 11; 0.01
H1
K562
GM12878
HepG2
Huvec
HSMM
NHLF
NHEK
HMEC
Enrichment in GWAS studies
Model inferered from concatenating data from nine cell types
H1
K562
GM12878
HepG2
Huvec
HSMM
NHLF
NHEK
HMEC
Top Is the most likely state assignments based on the raw input signal shown at bottom.
The top table shows the best correlation of emission parameters for each state in a larger 79-state model
for models with increasing numbers of states based on a nested parameter initialization. Shown below is
that distinct biological states emerge only with a sufficient number of states.
State in
GM12878
K
L
Nrf-2; AP-1; Ets; NF-E2; Bach1
M
Irf; NF-kappaB; PU.1; Pou2f1; Octamer
Tel2; Ets; Elk1; NF-kappaB; Irf
Oxygen transport; 28;10-5
TF binding and motif enrichment
N
Cellular macromolecule
metabolic process; 1.4; 10-146
This matrix compares the state assignments based on all 41 marks and the subset of 11 Encode marks
in the 51 state CD4T model. An entry in a cell indicates the proportion of the state of the row using all
the marks that is assigned to the state of the column using a subset of the marks.
Greedy Algorithm Ordering of Marks
RNA Processing;
O
PU.1; STAT; NF-kappaB; Nrf-2; Elf1
GATA; Nrf-2; STAT; TCF11::MafG; Lmo2-complex
2.5; 10-87
Lipid homeostasis; 8;10-5
P
Q
R
S
Nrf-2; Maf; NF-E2; Hoxb7; TCF11::MafG
HNF1; Bach1; Bach2; TEF-1; Fox1
HNF1; HNF4; Foxd1; RXR::LXR; PPARG
T
Rfx; Oct4; Pou2f1; Octamer; Sox
HNF1; HNF4; PPARG; RXR::LXR; Tcf
ChIP-seq data fromKasowski et al, Science 2010 and
Kunarso et al, Nature Genetics 2010
Disease variants annotated by chromatin dynamics and predicted regulators
Class I
Class II Class III Class IV
References
1. Barski A., Suresh C., Cui K., Tae-Young R., Schones D.E., Wang Z., Wei G., Chepelev I., and Zhao K. High-Resolution Profiling of Histone Methylations in the Human Genome. Cell 129: 823-837, 2007.
2. Boyle A.P., Davis S., Shulha H.P., Meltzer P., Margulies E.H., Weng Z., and Furey T.S., Crawford G.E. High-resolution mapping and characterization of open chromatin across the genome. Cell 132: 311-322, 2008.
3. Gudbjartsson, D.F. et al. Sequence variants affecting eosinophil numbers associate with asthma and myocardial infarction. Nat Genet. 41: 342-347 (2009).
4. Hindorff LA, Junkins HA, Mehta JP and Manolio TA. A Catalog of Published Genome-Wide Association Studies. Available at: www.genome.gov/26525384. Accessed April 15, 2009.
5. Li et al., PLoS Biol 6(2) (2007)
6. Sandmann et al., Genes & Dev. 21: 436-449 (2007) 10.1101/gad.1509007
7. Sandmann et al., Dev. Cell, 10(6):797-807 (2006) 10.1016/j.devcel.2006.04.009
8. Siepel A., Bejerano G., Pedersen J.S., Hinrichs A.S., Hou M., Rosenbloom K., Clawson H., Spieth J., Hillier L.W., Richards S., Weinstock G.M., Wilson R.K., Gibbs R.A., Kent W.J. , Miller W., and Haussler D. Evolutionarily conserved elements in
vertebrate, insect, worm, and yeast genomes. Genome Res, 15:1034–1050, 2005.
9. Wang Z., Zang C., Rosenfeld J.A., Schones D.E., Barski A., Cuddapah S., Cui K., Roh T.Y., Peng W., Zhang M.Q., and Zhao K. Combinatorial patterns of histone acetylations and methylations in the human genome. Nature Genetics 40: 897-903, 2008.
10. Zeitlinger et al.,Genes & Dev. 21: 385-390 (2007) 10.1101/gad.1509607
Funding: Funding from NSF Career award 0644282, NSF Postdoctoral Fellowship to JE, NIH R01 1R01HG004037-01A1, ENCODE 1U54HG004570-01,
NHGRI 1 RC1 HG005334-01 gratefully acknowledged.