Transcript Slide 1
Integrative Analysis of Multiple Genome-wide Chromatin State Maps Jason Ernst, Pouya Kheradpour, Luke Ward, Nisha Rajagopal, Bing Ren, Brad Bernstein, Manolis Kellis Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA. Broad Institute of MIT and Harvard, Cambridge, MA. Summary A Multivariate Hidden Markov model for discovering and analyzing Chromatin States The multitude of histone modifications has led to the hypothesis of a histone code, capable of encoding a plethora of epigenetic information about cellular state. However, while a strictly combinatorial usage of these marks would imply over thousands of possible distinct codes, only a small number of distinct and biologically-meaningful sets of combinations of marks seem to occur in practice. To systematically discover these hidden ‘chromatin states’, we have developed ChromHMM, a Multivariate Hidden Markov Model that integrates ChIP-seq and ChIP-chip datasets in a rigorous graphical probabilistic framework, where each chromatin state is associated with a vector of emission probabilities indicating the precise combination of chromatin marks that define it, and a vector of transition probabilities encoding spatial relationships between neighboring chromatin states. We have used ChromHMM to learn chromatin states de novo across full epigenomes, revealing distinct classes of promoter, enhancer, transcribed and repressed regions, and other types of epigenomic elements independent of any previous annotations. We have also used ChromHMM to study dynamic changes in chromatin states across multiple cell types, to classify genes based on chromatin mark combinations across their entire length, and to relate enhancer regions to likely target genes based on their co-variation in multiple cell types. The overall framework enables us to annotate regions of the genome likely associated with functional genetic variants, study motifs and transcription factor binding in the context of epigenetic state, associate chromatin state with gene expression state, and predict the most informative combinations of marks to guide future experiments. State correlation with expression as a function of position 0kb -50kb Emission probabilities in a state are modeled with a product of independent bernoulli random variables 50kb Lower expressed Promoter 40-state model using 21 common marks in IMR90|ES Transition Matrix Repetitive Repressed Active Intergenic Transcribed One advantage of HMMs is that they naturally capture dependencies between neighboring states, such as those found between promoter regions and transcribed states, or in the spreading of chromatin as is found for large-scale repressed regions Higher expressed Greedy Mark Ordering fromIMR9 0 only model with chromatin Dynamics of Chromatin States across Nine Human Cell Types Joint work with Pouya Kheradpour, Tarjei S. Mikkelsen, Lucas D. Ward, Noam Shoresh, Charles B. Epstein, Bradley E. Bernstein Chromatin marks from Epigenome Roadmap Reference Epigenome Mapping Center (REMC) led by Bing Ren, UCSD. States in cell type II Comparing models with different numbers of states States in cell type Example of chromatin state annotation Pairwise fold enrichments Intergenic SNP identified associated with plasma eosinophil count levels in Gudbjartsson et al, 2009 found in candidate enhancer state. Assessing predictive value for a subset of marks in a new cell type Posterior Confusion Matrix Promoter State with Subset of 11 Marks GWAS SNP data from Hindroff et al Hemidesmosome assembly 34; 10-7 Anatomical structure morphogenesis 2.3; 10-13 Muscle structure development; 4.6; 10-6 A B C D E F G H I State with All Marks These states also showed distinct enrichments and depletions in Genome-wide Association Studies, suggesting distinct epigenomic roles in different diseases. State 1 Blood vessel development; 5.1; 10-6 Strong enhancer (state 4) Top motifs NF-Y; Rfx; Jundm2; Myc; CTCF ChIP-seq Enrichments State in HSMM Top Biological Process GO Category for TSS in intersection Corrected P-value Fold 1 1 Cellular macromolecule metabolic process <10-167 1.4 1 13 Immune Response <10-21 6.8 3 3 Nervous system development <10-12 2.8 13 1 Extracellular structure organization 0.008 8.0 13 13 Sensory perception of smell <10-203 4.5 Emission parameters from models learned independently Average Expression Level of Nearest Gene p53; Egr; TBX5; AREB6; AP-1 Enhancer activity patterns predict upstream regulators and downstream gene targets. p53; Bach2; Eomes; Nkx6-1; AP-2 Bach1; AP-1; NF-E2; Bach2; TEF-1 Bach2; TEF-1; Nrf-2; p53; Maf RP58; HEN1; Mef2; Gfi1b; Msx-1 Ascl2; AP-4; Myf; HEN1; HEB Bach1; Bach2; AP-1; NF-E2; Nrf-2 Bach2; Bach1; HEN1; Tal1beta; Arid3a Ets; FEV; Tel2; Nanog; Sox J Immune response 3.4; 10-19 Positive regulation of T-cell differentiation; 11; 0.01 H1 K562 GM12878 HepG2 Huvec HSMM NHLF NHEK HMEC Enrichment in GWAS studies Model inferered from concatenating data from nine cell types H1 K562 GM12878 HepG2 Huvec HSMM NHLF NHEK HMEC Top Is the most likely state assignments based on the raw input signal shown at bottom. The top table shows the best correlation of emission parameters for each state in a larger 79-state model for models with increasing numbers of states based on a nested parameter initialization. Shown below is that distinct biological states emerge only with a sufficient number of states. State in GM12878 K L Nrf-2; AP-1; Ets; NF-E2; Bach1 M Irf; NF-kappaB; PU.1; Pou2f1; Octamer Tel2; Ets; Elk1; NF-kappaB; Irf Oxygen transport; 28;10-5 TF binding and motif enrichment N Cellular macromolecule metabolic process; 1.4; 10-146 This matrix compares the state assignments based on all 41 marks and the subset of 11 Encode marks in the 51 state CD4T model. An entry in a cell indicates the proportion of the state of the row using all the marks that is assigned to the state of the column using a subset of the marks. Greedy Algorithm Ordering of Marks RNA Processing; O PU.1; STAT; NF-kappaB; Nrf-2; Elf1 GATA; Nrf-2; STAT; TCF11::MafG; Lmo2-complex 2.5; 10-87 Lipid homeostasis; 8;10-5 P Q R S Nrf-2; Maf; NF-E2; Hoxb7; TCF11::MafG HNF1; Bach1; Bach2; TEF-1; Fox1 HNF1; HNF4; Foxd1; RXR::LXR; PPARG T Rfx; Oct4; Pou2f1; Octamer; Sox HNF1; HNF4; PPARG; RXR::LXR; Tcf ChIP-seq data fromKasowski et al, Science 2010 and Kunarso et al, Nature Genetics 2010 Disease variants annotated by chromatin dynamics and predicted regulators Class I Class II Class III Class IV References 1. Barski A., Suresh C., Cui K., Tae-Young R., Schones D.E., Wang Z., Wei G., Chepelev I., and Zhao K. High-Resolution Profiling of Histone Methylations in the Human Genome. Cell 129: 823-837, 2007. 2. Boyle A.P., Davis S., Shulha H.P., Meltzer P., Margulies E.H., Weng Z., and Furey T.S., Crawford G.E. High-resolution mapping and characterization of open chromatin across the genome. Cell 132: 311-322, 2008. 3. Gudbjartsson, D.F. et al. Sequence variants affecting eosinophil numbers associate with asthma and myocardial infarction. Nat Genet. 41: 342-347 (2009). 4. Hindorff LA, Junkins HA, Mehta JP and Manolio TA. A Catalog of Published Genome-Wide Association Studies. Available at: www.genome.gov/26525384. Accessed April 15, 2009. 5. Li et al., PLoS Biol 6(2) (2007) 6. Sandmann et al., Genes & Dev. 21: 436-449 (2007) 10.1101/gad.1509007 7. Sandmann et al., Dev. Cell, 10(6):797-807 (2006) 10.1016/j.devcel.2006.04.009 8. Siepel A., Bejerano G., Pedersen J.S., Hinrichs A.S., Hou M., Rosenbloom K., Clawson H., Spieth J., Hillier L.W., Richards S., Weinstock G.M., Wilson R.K., Gibbs R.A., Kent W.J. , Miller W., and Haussler D. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res, 15:1034–1050, 2005. 9. Wang Z., Zang C., Rosenfeld J.A., Schones D.E., Barski A., Cuddapah S., Cui K., Roh T.Y., Peng W., Zhang M.Q., and Zhao K. Combinatorial patterns of histone acetylations and methylations in the human genome. Nature Genetics 40: 897-903, 2008. 10. Zeitlinger et al.,Genes & Dev. 21: 385-390 (2007) 10.1101/gad.1509607 Funding: Funding from NSF Career award 0644282, NSF Postdoctoral Fellowship to JE, NIH R01 1R01HG004037-01A1, ENCODE 1U54HG004570-01, NHGRI 1 RC1 HG005334-01 gratefully acknowledged.