1. Interpreting rich epigenomic datasets

Download Report

Transcript 1. Interpreting rich epigenomic datasets

Chromatin state and DNA sequence
in TF binding dynamics and disease
Manolis Kellis
Broad Institute of MIT and Harvard
MIT Computer Science & Artificial Intelligence Laboratory
DNA vs. epigenome in dynamics & disease
Sequence specificity
Motifs
TF binding
?
Interplay
Chromatin state
CATGACTG
CATGCCTG
ENCODE
Ernst, Bernstein
GWAS
Genotype
Disease
QTLs
QTLs
Epigenotype
Roadmap
Eaton, De Jager
States combine histone marks, FAIRE, Pol2, DNase
Transition matrix
• ENCODE datasets: Bernstein, Stam, Lieb, Crawford
• Several classes of Dnase hypersensitive regions
• Do they have different TF-binding properties?
TFs show characteristic chromatin state preferences
• Confirm TFDNAse
relationship
• However: Different
TFs bind different
chromatin states
• Dynamic binding
across cell types?
• Patterns hold
across 300+ TF
binding expts
• What about
dynamics?
Dynamic enhancers vs. constitutive CTCF/promoters
Dynamic TF binding  dynamic enhancer activity
Dynamic enh./static promoters
TF binding corr. w/ TF expression
TF co-occurrence patterns driven by chromatin state
Raw enrichments
TF co-occurrence patterns driven by chromatin state
Raw enrichments
Conditional enrichments
(if state preference is known)
Chromatin state preferences are motif encoded
• States bound by TFs enriched in corresponding motifs
• Enrichment also found in states of specific repression
Bound regions in preferred states depleted in motifs
• Permissive binding in promoters/enhancers/insulators
• DNase/FAIRE regions lacking marks: not permissive
Summary
• Chromatin states, TF dynamics, and motifs
–
–
–
–
TFs bind DNase; distinct chromatin state preferences
Chromatin state preferences are partly motif-encoded
States predict most previously-observed co-binding
Motifs guide states, states enable permissive binding
• Methylation vs. genotype in Alzheimer’s Disease
–
–
–
–
Variability between individuals mostly genotype-driven
Most variable: promoter-flanking, brain enhancers
Predictive for AD: Global inhibition of 7000 probes
Enhancers, not promoters. NRSF, ELK1, CTCF targets
• Conclusions:
– Power of regulatory annotation for interpreting disease
– Interplay of DNA sequence & epigenome in TFs/disease
DNA vs. epigenome in dynamics & disease
Sequence specificity
Motifs
TF binding
?
Interplay
Chromatin state
CATGACTG
CATGCCTG
ENCODE
Ernst, Bernstein
GWAS
Genotype
Disease
QTLs
QTLs
Epigenotype
Roadmap
Eaton, De Jager
Interpreting disease-association signals
(1) Interpret variants using ENCODE
- Chromatin states: Enhancers, promoters, motifs
- Enrichment in individual loci, across 1000s of SNPs in T1D
CATGACTG
CATGCCTG
GWAS
Genotype
mQTLs
Disease
MWAS
Epigenome
(2) Epigenome changes in disease
- Molecular phenotypic changes in patients vs. controls
- Small variation in brain methylomes, mostly genotype-driven
- 1000s of brain-specific enhancers increase methylation in Alzheimer’s
Methylation in 750 Alzheimer patients/controls
750 individuals (~50% w/AD)
Memory and Aging Project
Religious Order Study
486,000
methylation
probes
Philip deJager, Epigenomics Roadmap
Epigenome
Phenotype
Genome
meQTL
1
Brad Bernstein
REMC mapping
2
Epigenome
Classification
MWAS
• Patients followed for 10+ years with cognitive evaluations
• Brain samples donated post-mortem methylation/genotype
• Seek predictive features: SNPs, QTLs, mQTLs, regulation
Global variability in DLPFC and CD4+ methylation
Correlation matrix
Gender (M/F)
Batch
10551157.CD4
84653463.CD4
87038802.CD4
66406040.CD4
75990666.CD4
51815338.CD4
10473384.CD4
50400835.CD4
89614402.CD4
20594407.CD4
53772202.CD4
69982533.CD4
50104972.CD4
94430339.CD4
72188804.CD4
46291609.CD4
50107907.CD4
21362537.CD4
50406057.CD4
82317494.CD4
44019405.CD4
58501637.CD4
50402305.CD4
39989287.CD4
45115248.CD4
50106316.CD4
16513683.CD4
60871487.CD4
62985554.CD4
50301675.CD4
20331760.CD4
50108886.CD4
77330002.CD4
20271359.CD4
20674902.CD4
20399274.CD4
85171938.CD4
20898476.CD4
90267190.CD4
75990666.DLPFC
10473384.DLPFC
50406057.DLPFC
44019405.DLPFC
21362537.DLPFC
50400835.DLPFC
72188804.DLPFC
77330002.DLPFC
45115248.DLPFC
20271359.DLPFC
84653463.DLPFC
16513683.DLPFC
78072753.CD4
50108886.DLPFC
39989287.DLPFC
78072753.DLPFC
82317494.DLPFC
51815338.DLPFC
58501637.DLPFC
66406040.DLPFC
50402305.DLPFC
10551157.DLPFC
90267190.DLPFC
62985554.DLPFC
46291609.DLPFC
20898476.DLPFC
21001357.DLPFC
21001357.CD4
71648351.DLPFC
50106316.DLPFC
71648351.CD4
3430444.CD4
60871487.DLPFC
20399274.DLPFC
53772202.DLPFC
94430339.DLPFC
50107907.DLPFC
69982533.DLPFC
87038802.DLPFC
50104972.DLPFC
50301675.DLPFC
20594407.DLPFC
20674902.DLPFC
89614402.DLPFC
3430444.DLPFC
85171938.DLPFC
20331760.DLPFC
20331760.DLPFC
85171938.DLPFC
3430444.DLPFC
89614402.DLPFC
20674902.DLPFC
20594407.DLPFC
50301675.DLPFC
50104972.DLPFC
87038802.DLPFC
69982533.DLPFC
50107907.DLPFC
94430339.DLPFC
53772202.DLPFC
20399274.DLPFC
60871487.DLPFC
3430444.CD4
71648351.CD4
50106316.DLPFC
71648351.DLPFC
21001357.CD4
21001357.DLPFC
20898476.DLPFC
46291609.DLPFC
62985554.DLPFC
90267190.DLPFC
10551157.DLPFC
50402305.DLPFC
66406040.DLPFC
58501637.DLPFC
51815338.DLPFC
82317494.DLPFC
78072753.DLPFC
39989287.DLPFC
50108886.DLPFC
78072753.CD4
16513683.DLPFC
84653463.DLPFC
20271359.DLPFC
45115248.DLPFC
77330002.DLPFC
72188804.DLPFC
50400835.DLPFC
21362537.DLPFC
44019405.DLPFC
50406057.DLPFC
10473384.DLPFC
75990666.DLPFC
90267190.CD4
20898476.CD4
85171938.CD4
20399274.CD4
20674902.CD4
20271359.CD4
77330002.CD4
50108886.CD4
20331760.CD4
50301675.CD4
62985554.CD4
60871487.CD4
16513683.CD4
50106316.CD4
45115248.CD4
39989287.CD4
50402305.CD4
58501637.CD4
44019405.CD4
82317494.CD4
50406057.CD4
21362537.CD4
50107907.CD4
46291609.CD4
72188804.CD4
94430339.CD4
50104972.CD4
69982533.CD4
53772202.CD4
20594407.CD4
89614402.CD4
50400835.CD4
10473384.CD4
51815338.CD4
75990666.CD4
66406040.CD4
87038802.CD4
84653463.CD4
10551157.CD4
Most similar
Least similar
Dorso-Lateral
Pre-Frontal Cortex
T-cells
CD4+
Little variability, focused on regulatory regions
Probe intensity distribution
Methylation enrichment in states
1.0
0.6
0.4
0.2
Low signal
Heterochromatin
Rep
Active gene body
Gene body
Weak enhancer
Active enhancer
Weak promoter / enhancer
0.0
Active promoter
Methylation level
Inter-individual variability
0.8
• Hemi-methylated probes are
• Promoters: Stable low (active)
also the most variable
• Gene bodies: Stable high (active)
• Tiny fraction (0.6%) of all probes • Enhancers/poised: Most variable
P-value (-log10P)
Most epigenomic variability is genotype-driven
Chromosome and genomic position
-1
Distance from CpG (MB)
• Overlay Manhattan plots of 450,000 methylation probes
• Cutoff of 10-14 (10-2 after Benjamini-Hochberg correction)
• 150,000 mQTLs at P<0.01 after FDR correction
18
1
MultimodalSNP-associatedPromoter-depleted
All probes 1 Active promoter
SNP-associated
Multimodal
probes (~3Κ)
184
2,647
2 Promoter
flanking
SNP-associated
probes (29% of all)
3 Active enhancer
4 Weak enhancer
138,731
5 Gene bodies
6 Active gene
bodies
• 93.5% of multimodal probes
are SNP-associated
• Importance of distinguishing
contribution of genotype
to disease associations
7 Repetitive
8 Heterochromatin
9 Low signal
% of CpG probes
• SNP-associated probes depleted in promoters
(driven epigenetically>genetically, open chrom)
Significance q-value
>80% variance explained for 50,000+ probes
5
25
Distance to CpG (MB)
10
15
20
210 215 220
8k 32k 1M
log2(count)
Adjusted R2
Variance explained
0
0
Distance to CpG (MB)
5
25
10
15
10
2log2(count)
215
20
220
Phil de Jager: Methylation in 750 Alzheimer patients
750 individuals (~50% w/AD)
Memory and Aging Project
Religious Order Study
486,000
methylation
probes
Phil de Jager, Roadmap disease epigenomics
Epigenome
Phenotype
Genome
meQTL
1
Brad Bernstein
REMC mapping
2
Epigenome
Classification
MWAS
• Patients followed for 10+ years with cognitive evaluations
• Brain samples donated post-mortem methylation/genotype
• Seek predictive features: SNPs, QTLs, mQTLs, regulation
Global hyper-methylation in 1000s of AD-associated loci
P-value
QQ plot: Many loci with weak effects?
8
480,000 probes, ranked by Alzheimer’s association
6
4
Methylation
Observed (-logP)
10
Top 7000 probes
2
0
0
2
4
6
8
Expected (-logP)
10
Alzheimer’s-associated probes are hypermethylated
• Global effect across 1000s of probes
Alzheimer’s
Normal
– Rank all probes by Alzheimer’s association
– Observe functional changes down ranklist
– 7000 probes show shift in methylation
Complex disease: genome-wide effects
Hypermethylated probes
(repressed)
2 Promoter
flanking
Red: More methylated in Alhzeimer’s
Blue: Less methylated in Alzheimer’s
Significant probes are in enhancers
Not promoters
*
Low signal
9 Low signal
0.05
Heterochromatin
8 Heterochromatin
**
7 Repetitive Rep
*
gene
6 Active
Active gene body
bodies
*
Gene body
5 Gene bodies
0.10
enhancer
enhancer
4 Weak Weak
0.25
enhancer
enhancer
3 ActiveActive
0.30
Weak promoter / enhancer
0.20
promoter
promoter
1 Active Active
% probes
Chromatin state breakdown reveals  activity
*
*
0.15
*
*
0.00
*
* => fisher exact test, p-value <= 0.001
Estimating number of functionally-associated probes
Active TSS flanking
Active enhancer
Poised promoter
Polycomb repressed
Weak enhancer
Expected
Promoter
Strong transcription
Weak transcription
10,000
• Functional enrichments found for 10,000 probes
Predictive powerSignificant
of hyper-methylation
signal
Enhnacer Probes −− Risk
1.0
0.0
0.5
Relative risk
1.5
2.0
Sum of methylation signal
in 1,026 regulatory regions
APOE (e4/e3)
1st Meth
Quintile
2nd Meth
Quintile
3rd Meth
Quintile
4th Meth
Quintile
5th Meth
Quintile
Meth 5th Q
Relative
to 1st Q
• Sum total methylation levels across 1026 probes
• Individuals in top quintile show 2.5-fold higher risk
• By comparison, the APOE4 allele confers 1.5-fold
AD-associated probes enriched in ELK1/NRSF targets
ELK1
NRSF
CTCF
All probes, ranked by AD assoc. P-value
• Regulatory motifs enriched in top-scoring probes
• Genomic basis for association, potential cis or trans effect
• Reveals biological pathways involved and potential targets
DNA vs. epigenome in dynamics & disease
Sequence specificity
Motifs
TF binding
?
Interplay
Chromatin state
CATGACTG
CATGCCTG
ENCODE
Ernst, Bernstein
GWAS
Genotype
Disease
QTLs
QTLs
Epigenotype
Roadmap
Eaton, De Jager
Summary
• Chromatin states, TF dynamics, and motifs
–
–
–
–
TFs bind DNase; distinct chromatin state preferences
Chromatin state preferences are partly motif-encoded
States predict most previously-observed co-binding
Motifs guide states, states enable permissive binding
• Methylation vs. genotype in Alzheimer’s Disease
–
–
–
–
Variability between individuals mostly genotype-driven
Most variable: promoter-flanking, brain enhancers
Predictive for AD: Global inhibition of 7000 probes
Enhancers, not promoters. NRSF, ELK1, CTCF targets
• Conclusions:
– Power of regulatory annotation for interpreting disease
– Interplay of DNA sequence & epigenome in TFs/disease
Collaborators and Acknowledgements
• Chromatin state dynamics, ENCODE
– Brad Bernstein, John Stam, Jason Lieb, Crawford
• Methylation in Alzheimer’s disease
– Philip deJager & Gyan Srivastava, Brad Bernstein
– Religious Order Study, Memory and Aging Project
• Large-scale epigenomic datasets
– Epigenomics Roadmap, ENCODE project, NHGRI
• Funding
– NHGRI, NIH, NSF,
Sloan Foundation
MIT Computational Biology group
Compbio.mit.edu
Mike Lin
Ben
Holmes
Angela
Yen
Matt
Eaton
Soheil
Feizi
Luke
Bob
Ward Altshuler
Stefan
Washietl
Pouya
Kheradpour
Manolis
Kellis
Jason Jessica
Ernst
Wu
Irwin
Daniel
Jungreis Marbach
Louisa
DiStefano
Sushmita
Roy
Stata3
Stata4
Chris
Bristow
Mukul
Bansal
Rachel
Sealfon
Dave
Hendrix
Loyal
Goff