Yeast whole-genome analysis of conserved regulatory motifs

Download Report

Transcript Yeast whole-genome analysis of conserved regulatory motifs

Epigenomic views of human disease
reveal 1000s of regulatory variants
Manolis Kellis
Broad Institute of MIT and Harvard
MIT Computer Science & Artificial Intelligence Laboratory
Interpreting complex disease: from regions to models
Gene annotation
(Coding, 5’/3’UTR, RNAs)
 Evolutionary signatures
Roles in gene/chromatin regulation
 Activator/repressor signatures
CATGACTG
CATGCCTG
Non-coding annotation
 Chromatin signatures
Disease-associated
variant (SNP/CNV/…)
Other evidence of function
 Signatures of selection (sp/pop)
• Challenge: from loci to mechanism, pathways, drug targets
Need: A systems-level understanding of genomes and gene regulation
• The regulators: Transcription factors, microRNAs, sequence specificities
• The regions: enhancers, promoters, and their tissue-specificity
• The targets: TFstargets, regulatorsenhancers, enhancersgenes
• The grammars: Interplay of multiple TFs  prediction of gene expression
 The parts list = Building blocks of genome/disease regulatory networks
Systems-level views of disease epigenomics
• Chromatin states help interpret disease associations
– Annotate dynamic regulatory elements in multiple cell types
– Activity-based linking of regulators  enhancers  targets
– Mechanistic predictions, 2000+ T1D-associated enhancers
• Global methylation changes in Alzheimer’s Disease
–
–
–
–
Little variability between individuals, genotype-driven
Most variable regions: promoter-flanking, brain enhancers
Global inhibition of 7000+ probes. Predictive power for AD
Enhancers, not promoters. Targets of NRSF, ELK1, CTCF
• Conclusions:
– Power of regulatory annotation for interpreting disease
– 1000s of regions functionally associated with disease
– Weak associations, concentrated in regulatory pathways
Interpreting disease-association signals
(1) Interpret variants using ENCODE
- Chromatin states: Enhancers, promoters, motifs
- Enrichment in individual loci, across 1000s of SNPs in T1D
CATGACTG
CATGCCTG
GWAS
Genotype
mQTLs
Disease
MWAS
(2) Epigenome changes in disease
- Molecular phenotypic changes in patients vs. controls
- Small variation in brain methylomes, mostly genotype-driven
- 1000s of brain-specific enhancers increase methylation in Alzheimer’s
Chromatin states dynamics across nine cell types
Predicted
linking
Correlated
activity
• Single annotation track for each cell type
• Summarize cell-type activity at a glance
• Can study 9-cell activity pattern across
Ernst et al, Nature 2011
Enhancer-gene links supported by eQTL-gene links
eQTL study
15kb
Individuals
Indiv. 1
-0.5
Indiv. 2
-1.5
Indiv. 3
-1.8
Indiv. 4
3.1
Indiv. 5
1.1
Indiv. 6
-1.8
Indiv. 7
-1.4
Indiv. 8
3.2
Indiv. 9
4.4
…
…
Expression
level of gene
A
A
A
C
A
A
A
C
C
…
Validation rationale:
• Expression Quantitative Trait Loci (eQTLs)
provide independent SNP-to-gene links
• Do they agree with activity-based links?
Example: Lymphoblastoid (GM) cells study
• Expression/genotype across 60 individuals
(Montgomery et al, Nature 2010)
• 120 eQTLs are eligible for enhancer-gene
linking based on our datasets
• 51 actually linked (43%) using predictions
 4-fold enrichment (10% exp. by chance)
Sequence variant
at distal position
• Independent validation of links.
• Relevance to disease datasets. 6
Introducing multi-cell activity profiles
Gene
expression
Chromatin
States
Active TF motif
enrichment
TF regulator
expression
Dip-aligned
motif biases
HUVEC
NHEK
GM12878
K562
HepG2
NHLF
HMEC
HSMM
H1
Link enhancers to target genes
ON
OFF
Active enhancer
Repressed
Motif enrichment
Motif depletion
TF On
TF Off
Motif aligned
Flat profile
Introducing multi-cell activity profiles
Gene
expression
Chromatin
States
Active TF motif
enrichment
TF regulator
expression
Dip-aligned
motif biases
HUVEC
NHEK
GM12878
K562
HepG2
NHLF
HMEC
HSMM
H1
Link TFs to target enhancers
Predict activators vs. repressors
ON
OFF
Active enhancer
Repressed
Motif enrichment
Motif depletion
TF On
TF Off
Motif aligned
Flat profile
Coordinated activity reveals activators/repressors
Enhancer activity
Activity signatures for each TF
Ex1: Oct4 predicted activator
of embryonic stem (ES) cells
Ex2: Gfi1 repressor of
K562/GM cells
• Enhancer networks: Regulator  enhancer  target gene
Systems-level views of disease epigenomics
• Chromatin states help interpret disease associations
– Annotate dynamic regulatory elements in multiple cell types
– Activity-based linking of regulators  enhancers  targets
– Mechanistic predictions, 2000+ T1D-associated enhancers
• Global methylation changes in Alzheimer’s Disease
–
–
–
–
Little variability between individuals, genotype-driven
Most variable regions: promoter-flanking, brain enhancers
Global inhibition of 7000+ probes. Predictive power for AD
Enhancers, not promoters. Targets of NRSF, ELK1, CTCF
• Conclusions:
– Power of regulatory annotation for interpreting disease
– 1000s of regions functionally associated with disease
– Weak associations, concentrated in regulatory pathways
Revisiting diseaseassociated variants
xx
• Disease-associated SNPs enriched for enhancers in relevant cell types
• E.g. lupus SNP in GM enhancer disrupts Ets1 predicted activator
Mechanistic predictions for top disease-associated SNPs
Lupus erythromatosus in GM lymphoblastoid
Erythrocyte phenotypes in K562 leukemia cells
`
Disrupt activator Ets-1 motif
 Loss of GM-specific activation
 Loss of enhancer function
 Loss of HLA-DRB1 expression
Creation of repressor Gfi1 motif
 Gain K562-specific repression
 Loss of enhancer function
 Loss of CCDC162 expression
Detect SNPs that disrupt conserved regulatory motifs
• Functionally-associated SNPs enriched in states, constraint
• Prioritize candidates, increase resolution, disrupted motifs
Automating prediction of likely causal variants in LD
 HaploReg (compbio.mit.edu/HaploReg)
• Start with any list of SNPs or select a GWA study
– Mine publically available ENCODE data for significant hits
– Hundreds of assays, dozens of cells, conservation, motifs
– Report significant overlaps and link to info/browser
Ward and Kellis, NAR 2011
Functional enrichment for 1000s of SNPs
Beyond top few SNPs  entire rank list
Abhishek Sarkar, Luke Ward
Studying functional enrichments down the rank list
Increase
vs. expectation
Enriched in high ranks
Top ranks
Disease association
P-Value
(Rank all SNPs)
Bottom:
least significant
Expected at random
Depletion
vs. expectation
• Rank all SNPs by disease-association P-value
• Find annotations and cell types enriched in high ranks
• Estimate number of SNPs that show functional roles
1000s of GM/K562 enhancers contain Type1-Diabetes SNPs
Enhancers across cell types Chromatin states in GM12878
Lymphoblastoid
Leukemia
Promoters: 462 (excess 81)
Enhancers: 2049 (excess 392)
1940 distinct loci (R^2<.8)
Transcribed: 4740 (excess 522)
Insulator: 240 (excess 23)
Repressed: 1351
(excess 76)
Other: 21k (deplete 1093)
• Type 1 diabetes: Rank all SNPs by association P-value
• Specific states in specific cell types enrich in high rank
 Weak contributions from 1000s of regulatory regions
Systems-level views of disease epigenomics
• Chromatin states help interpret disease associations
– Annotate dynamic regulatory elements in multiple cell types
– Activity-based linking of regulators  enhancers  targets
– Mechanistic predictions, 2000+ T1D-associated enhancers
• Global methylation changes in Alzheimer’s Disease
–
–
–
–
Little variability between individuals, genotype-driven
Most variable regions: promoter-flanking, brain enhancers
Global inhibition of 7000+ probes. Predictive power for AD
Enhancers, not promoters. Targets of NRSF, ELK1, CTCF
• Conclusions:
– Power of regulatory annotation for interpreting disease
– 1000s of regions functionally associated with disease
– Weak associations, concentrated in regulatory pathways
Interpreting disease-association signals
(1) Interpret variants using ENCODE
- Chromatin states: Enhancers, promoters, motifs
- Enrichment in individual loci, across 1000s of SNPs in T1D
CATGACTG
CATGCCTG
GWAS
Genotype
mQTLs
Disease
MWAS
Epigenome
(2) Epigenome changes in disease
- Molecular phenotypic changes in patients vs. controls
- Small variation in brain methylomes, mostly genotype-driven
- 1000s of brain-specific enhancers increase methylation in Alzheimer’s
Methylation in 750 Alzheimer patients/controls
750 individuals (~50% w/AD)
Memory and Aging Project
Religious Order Study
486,000
methylation
probes
Philip deJager, Epigenomics Roadmap
Epigenome
Phenotype
Genome
meQTL
1
Brad Bernstein
REMC mapping
2
Epigenome
Classification
MWAS
• Patients followed for 10+ years with cognitive evaluations
• Brain samples donated post-mortem methylation/genotype
• Seek predictive features: SNPs, QTLs, mQTLs, regulation
Little variability, focused on regulatory regions
Probe intensity distribution
Methylation enrichment in states
1.0
0.6
0.4
0.2
Low signal
Heterochromatin
Rep
Active gene body
Gene body
Weak enhancer
Active enhancer
Weak promoter / enhancer
0.0
Active promoter
Methylation level
Inter-individual variability
0.8
• Hemi-methylated probes are
• Promoters: Stable low (active)
also the most variable
• Gene bodies: Stable high (active)
• Tiny fraction (0.6%) of all probes • Enhancers/poised: Most variable
P-value (-log10P)
Most epigenomic variability is genotype-driven
Chromosome and genomic position
-1
Distance from CpG (MB)
• Overlay Manhattan plots of 450,000 methylation probes
• Cutoff of 10-14 (10-2 after Benjamini-Hochberg correction)
• 150,000 mQTLs at P<0.01 after FDR correction
22
1
MultimodalSNP-associatedPromoter-depleted
All probes 1 Active promoter
SNP-associated
Multimodal
probes (~3Κ)
184
2,647
2 Promoter
flanking
SNP-associated
probes (29% of all)
3 Active enhancer
4 Weak enhancer
138,731
5 Gene bodies
6 Active gene
bodies
• 93.5% of multimodal probes
are SNP-associated
• Importance of distinguishing
contribution of genotype
to disease associations
7 Repetitive
8 Heterochromatin
9 Low signal
% of CpG probes
• SNP-associated probes depleted in promoters
(driven epigenetically>genetically, open chrom)
Significance q-value
>80% variance explained for 50,000+ probes
5
25
Distance to CpG (MB)
10
15
20
210 215 220
8k 32k 1M
log2(count)
Adjusted R2
Variance explained
0
0
Distance to CpG (MB)
5
25
10
15
10
2log2(count)
215
20
220
Systems-level views of disease epigenomics
• Chromatin states help interpret disease associations
– Annotate dynamic regulatory elements in multiple cell types
– Activity-based linking of regulators  enhancers  targets
– Mechanistic predictions, 2000+ T1D-associated enhancers
• Global methylation changes in Alzheimer’s Disease
–
–
–
–
Little variability between individuals, genotype-driven
Most variable regions: promoter-flanking, brain enhancers
Predictive power for AD: Global inhibition of 7000 probes
Enhancers, not promoters. Targets of NRSF, ELK1, CTCF
• Conclusions:
– Power of regulatory annotation for interpreting disease
– 1000s of regions functionally associated with disease
– Weak associations, concentrated in regulatory pathways
Interpreting disease-association signals
(1) Interpret variants
CATGACTG
CATGCCTG
GWAS
Genotype
mQTLs
Disease
Epigenome
MWAS
(2) Epigenome changes in disease
Phil de Jager: Methylation in 750 Alzheimer patients
750 individuals (~50% w/AD)
Memory and Aging Project
Religious Order Study
486,000
methylation
probes
Phil de Jager, Roadmap disease epigenomics
Epigenome
Phenotype
Genome
meQTL
1
Brad Bernstein
REMC mapping
2
Epigenome
Classification
MWAS
• Patients followed for 10+ years with cognitive evaluations
• Brain samples donated post-mortem methylation/genotype
• Seek predictive features: SNPs, QTLs, mQTLs, regulation
Global hyper-methylation in 1000s of AD-associated loci
P-value
QQ plot: Many loci with weak effects?
8
480,000 probes, ranked by Alzheimer’s association
6
4
Methylation
Observed (-logP)
10
Top 7000 probes
2
0
0
2
4
6
8
Expected (-logP)
10
Alzheimer’s-associated probes are hypermethylated
• Global effect across 1000s of probes
Alzheimer’s
Normal
– Rank all probes by Alzheimer’s association
– Observe functional changes down ranklist
– 7000 probes show shift in methylation
Complex disease: genome-wide effects
Hypermethylated probes
(repressed)
2 Promoter
flanking
Red: More methylated in Alhzeimer’s
Blue: Less methylated in Alzheimer’s
Significant probes are in enhancers
Not promoters
*
Low signal
9 Low signal
0.05
Heterochromatin
8 Heterochromatin
**
7 Repetitive Rep
*
gene
6 Active
Active gene body
bodies
*
Gene body
5 Gene bodies
0.10
enhancer
enhancer
4 Weak Weak
0.25
enhancer
enhancer
3 ActiveActive
0.30
Weak promoter / enhancer
0.20
promoter
promoter
1 Active Active
% probes
Chromatin state breakdown reveals  activity
*
*
0.15
*
*
0.00
*
* => fisher exact test, p-value <= 0.001
Estimating number of functionally-associated probes
Active TSS flanking
Active enhancer
Poised promoter
Polycomb repressed
Weak enhancer
Expected
Promoter
Strong transcription
Weak transcription
10,000
• Functional enrichments found for 10,000 probes
Predictive powerSignificant
of hyper-methylation
signal
Enhnacer Probes −− Risk
1.0
0.0
0.5
Relative risk
1.5
2.0
Sum of methylation signal
in 1,026 regulatory regions
APOE (e4/e3)
1st Meth
Quintile
2nd Meth
Quintile
3rd Meth
Quintile
4th Meth
Quintile
5th Meth
Quintile
Meth 5th Q
Relative
to 1st Q
• Sum total methylation levels across 1026 probes
• Individuals in top quintile show 2.5-fold higher risk
• By comparison, the APOE4 allele confers 1.5-fold
AD-associated probes enriched in ELK1/NRSF targets
ELK1
NRSF
CTCF
All probes, ranked by AD assoc. P-value
• Regulatory motifs enriched in top-scoring probes
• Genomic basis for association, potential cis or trans effect
• Reveals biological pathways involved and potential targets
Systems-level views of disease epigenomics
• Chromatin states help interpret disease associations
– Annotate dynamic regulatory elements in multiple cell types
– Activity-based linking of regulators  enhancers  targets
– Mechanistic predictions, 2000+ T1D-associated enhancers
• Global methylation changes in Alzheimer’s Disease
–
–
–
–
Little variability between individuals, genotype-driven
Most variable regions: promoter-flanking, brain enhancers
Predictive power for AD: Global inhibition of 7000 probes
Enhancers, not promoters. Targets of NRSF, ELK1, CTCF
• Conclusions:
– Power of regulatory annotation for interpreting disease
– 1000s of regions functionally associated with disease
– Weak associations, concentrated in regulatory pathways
Interpreting disease-association signals
Epigenomic changes
CATGACTG
CATGCCTG
GWAS
Genotype
Disease
Regulatory Annotation
Collaborators and Acknowledgements
• Chromatin state dynamics
– Brad Bernstein, ENCODE consortium
• Methylation in Alzheimer’s disease
– Philip deJager, Brad Bernstein, David Bennett
– Religious Order Study, Memory and Aging Project
• Large-scale epigenomic datasets
– Epigenomics Roadmap, ENCODE project, NHGRI
• Funding
– NHGRI, NIH, NSF,
Sloan Foundation
MIT Computational Biology group
Compbio.mit.edu
Mike Lin
Ben
Holmes
Angela
Yen
Matt
Eaton
Manolis
Kellis
Soheil
Feizi
#331: #19:Bob
Luke Altshuler
Ward
Jason Jessica
Ernst
Wu
Irwin
Daniel
Jungreis Marbach
Louisa
DiStefano
Sushmita
Roy
Stata3
Stata4
Chris
Bristow
Mukul
Bansal
Stefan
Washietl
Pouya
Kheradpour
Rachel(#187)
Sealfon
Dave
Hendrix
Loyal
Goff