Transcript Noonan

Regulomics I: Methods to read out regulatory functions

Identifying regulatory functions in genomes Chr5: 133,876,119 – 134,876,119 Genes Transcription • Regulatory elements are not easily detected by sequence analysis • Examine biochemical correlates of RE activity in cells/tissues: • Chromatin Immunoprecipitation (ChIP-seq) • DNase-seq and FAIRE • Methylated DNA immunoprecipitation (MeDIP)

Identifying regulatory functions in genomes Noonan and McCallion, Ann Rev Genomics Hum Genet 11:1 (2010)

1. TF binding Biochemical indicators of regulatory function 2. Histone modification • H3K27ac 3. Chromatin modifiers & coactivators 4. DNA looping factors p300 cohesin MLL • H3K4me3

Regulatory functions are tissue/cell type/time point-specific From Visel et al. (2009) Nature 461:199

Genes Transcription Histone mods TF binding Identifying regulatory functions in genomes Chr5: 133,876,119 – 134,876,119

TFs Methods ChIP-seq Histone mods Chromatin accessibility DNase FAIRE From Furey (2012) Nat Rev Genet 13:840

ChIP-seq PCR ChIP Peak call Signal Input Align reads to reference Use peaks of mapped reads to identify binding events

Calling peaks in ChIP-seq data ChIP Peak call Input Enrichment relative to control ChIP-seq is an enrichment method Requires a statistical framework for determining the significance of enrichment ChIP-seq ‘peaks’ are regions of enriched read density relative to an input control Input = sonicated chromatin collected prior to immunoprecipitation

There are many ChIP-seq peak callers available Wilbanks and Facciotti PLoS ONE 5:e11471 (2010)

Generating ChIP-seq peak profiles Artifacts: • • Repeats PCR duplicates From Park (2009) Nat Rev Genet 10:669

Assessing statistical significance Assume read distribution follows a Poisson distribution Many sites in input data will have some reads by chance Some sites will have many reads # of reads at a site (S) Empirical FDR: Call peaks in input (using ChIP as control) FDR = ratio of # of peaks of given enrichment value called in input vs ChIP From Pepke et al (2009) Nat Meth 6:S22

Assessing statistical significance Sequencing depth matters: # of reads at a site (S) From Park (2009) Nat Rev Genet 10:669

ChIP-seq signal profiles vary depending on factor

Transcription factors Pol II Histone mods

From Park (2009) Nat Rev Genet 10:669

ChIP-seq signal Quantitative analysis of ChIP-seq signal profiles

HeLa K562

HeLa Sites strongly marked in HeLa Sites strongly marked in both Clustering Sites strongly marked in K562

From Park (2009) Nat Rev Genet 10:669 ChIP-seq analysis workflow

Interpreting ChIP-seq datasets • • • Requires some prior knowledge TF function Histone modification Potential target genes • • • Exploit existing annotation Promoter locations Known binding sites Known histone modification maps

Example from PS1: CTCF and RAD21 (cohesin)

CTCF and cohesin co-occupy many sites

Promoters Insulators Enhancers

From Kagey et al (2010) Nature 467:430

Promoter Enhancers?

CTCF: marks insulators and promoters RAD21 (cohesin): marks insulators, promoters and enhancers

Discovering regulatory functions specific to a biological state

Limb Brain

Function?

Assign enhancers to genes based on proximity (not ideal) GREAT: bejerano.stanford.edu/great/ Gene ontology annotation assigned to regulatory sequences

CTCF TF motif elicitation from ChIP-seq data ~20,000 binding sites identified by ChIP: MEME suite: http://meme.nbcr.net/meme/ From Furey (2012) Nat Rev Genet 13:840

Single TF binding events may not indicate regulatory function Enhancer-associated histone modification • Many TFs are present at high concentrations in the nucleus • TF motifs are abundant in the genome • Single TF binding events may be incidental

Mapping chromatin accessibility DNase I FAIRE From Furey (2012) Nat Rev Genet 13:840

DNase I hypersensitivity identifies TF binding events From Furey (2012) Nat Rev Genet 13:840

DNase I hypersensitivity identifies regulatory elements DNase I hypersensitive sites Song et al., Genome Res 21:1757 (2011)

De novo TF motif discovery by DNase I hypersensitivity mapping In human ES cells: From Neph (2012) Nature 489:83

De novo TF motif discovery by DNase I hypersensitivity mapping Across tissue types: From Neph (2012) Nature 489:83

Capturing long-range regulatory interactions From Visel et al. (2009) Nature 461:199

Chromosome Conformation Capture Methods ChIP for specific factors: ChIA-PET Sequence: Hi-C Sequence Sequence

Long-range regulatory interactions mediated by specific factors: RNA PolII From Kieffer-Kwon et al. (2013) Cell 155:1507

Long-range regulatory interactions mediated by specific factors: Cohesin

Int – Pr – Ex –

Intergenic or intronic Promoter Exonic From DeMare et al. (2013) Genome Res. 23:1224

Summary • Relevant overview papers on ChIP-seq and DNase-seq posted on class wiki • Wednesday: Epigenetics and the histone code