Regulatory variation and eQTLs

Download Report

Transcript Regulatory variation and eQTLs

6.047/6.878/HST.507
Computational Biology: Genomes, Networks, Evolution
Lecture 16
Regulatory variation and eQTLs
Chris Cotsapas
[email protected]
Module 4: Population / Evolution / Phylogeny
• L15/16: Association mapping for disease and molecular traits
– Statistical genetics: disease mapping in populations (Mark Daly)
– Quantitative traits and molecular variation: eQTLs, cQTLs
• L17/18: Phylogenetics / Phylogenomics
– Phylogenetics: Evolutionary models, Tree building, Phylo inference
– Phylogenomics: gene/species trees, coalescent models, populations
• L19/20: Human history, Missing heritability
– Measuring natural selection in human populations
– The missing heritability in genome-wide associations
• And done! Last pset Nov 11 (no lab), In-class quiz on Nov 20
– No lab 4! Then entire focus shifts to projects, Thanksgiving, Frontiers
Today: Regulatory variation and eQTLs
1. Quantitative Trait Loci (QTLs), Regulatory Variation
– Molecular phenotypes as QTs: expression, chromatin…
– Discretization: a GWAS for each gene. Cis-/Trans-eQTLs
– Underlying regulatory variation: eQTLs, GWAS, cis-eQTL
2. Finding trans-eQTLs (distal from gene that varies)
– Challenges: Power, structure, sample size
– Cross-phenotype analysis: trans QTLs affect many genes
3. Identifying underlying regulatory mechanisms
– Cis-eQTLs: TSS-distance, cell type specificity
– eQTLs vs. GWAS: Expression as intermediate trait
4. Population differences, emerging efforts
– Shared associations, SNP-gene pairs, allelic direction
– Confound: environment, preparation, batch, ancestry
Quantitative traits
- weight, height
- anything measurable
- today: gene expression
QTLs (QT Loci)
- The loci that control
quantitative traits
Regulatory variation
• What do trait-associated variants do?
• Genetic changes to:
–
–
–
–
–
–
–
–
Coding sequence **
Gene expression levels
Splice isomer levels
Methylation patterns
Chromatin accessibility
Transcription factor binding kinetics
Cell signaling
Protein-protein interactions
Regulatory
History, eQTL, mQTL, others
BASIC CONCEPTS
Within a population
•
•
•
•
Damerval et al 1994
42/72 protein levels differ in maize
2D electrophoresis, eyeball spot quantitation
Problems:
– genome coverage
– quantitation
– post-translational modifications
• Solution: use expression levels instead!
Usual mapping tools available
• Discretization approach
Whole-genome eQTL analysis is an independent
GWAS for expression of each gene
gene 1
gene 2
gene 3
gene 4
gene 5
gene N
Genetics of gene expression (eQTL)
• cis-eQTL
– The position of the eQTL maps
near the physical position of the
gene.
– Promoter polymorphism?
– Insertion/Deletion?
– Methylation, chromatin
conformation?
• trans-eQTL
– The position of the eQTL does
not map near the physical
position of the gene.
– Regulator?
– Direct or indirect?
Modified from Cheung and Spielman 2009 Nat Gen
yeast, mouse, maize, human
eQTL – THE ARRAY ERA
Yeast
• Brem et al Science 2002
• Linkage in 40 offspring of lab x wild strain
cross
• 1528/6215 DE between parents
• 570 map in cross
– multiple QTLs
– 32% of 570 have cis linkage
• 262 not DE in parents also map
trans hotspots
Brem et al Science 2002
Yvert et al Nat Genet 2003
Mammals I
• F2 mice on atherogenic diet
• Expression arrays; WG linkage
Schadt et al
Nature 2003
Mammals II
10% !!
Chesler et al Nat Genet 2005
Mammals III
• No major trans loci in humans
– Cheung et al Nature 2003
– Monks et al AJHG 2004
– Stranger et al PLoS Genet 2005, Science 2007
Today: Regulatory variation and eQTLs
1. Quantitative Trait Loci (QTLs), Regulatory Variation
– Molecular phenotypes as QTs: expression, chromatin…
– Discretization: a GWAS for each gene. Cis-/Trans-eQTLs
– Underlying regulatory variation: eQTLs, GWAS, cis-eQTL
2. Finding trans-eQTLs (distal from gene that varies)
– Challenges: Power, structure, sample size
– Cross-phenotype analysis: trans QTLs affect many genes
3. Identifying underlying regulatory mechanisms
– Cis-eQTLs: TSS-distance, cell type specificity
– eQTLs vs. GWAS: Expression as intermediate trait
4. Population differences, emerging efforts
– Shared associations, SNP-gene pairs, allelic direction
– Confound: environment, preparation, batch, ancestry
Open question
WHERE ARE THE TRANS eQTLS?
Whole-genome eQTL analysis is an independent
GWAS for expression of each gene
gene 1
gene 2
gene 3
gene 4
gene 5
gene N
Issues with trans mapping
• Power
– Genome-wide significance is 5e-8
– Multiple testing on ~20K genes
– Sample sizes clearly inadequate
• Data structure
– Bias corrections deflate variance
– Non-normal distributions
• Sample sizes
– Far too small
But…
• Assume that trans eQTLs affect many genes…
• …and you can use cross-trait methods!
Association data
Z1,1
Z2,1
:
:
Zs,1
Z1,2
…
…
Z1,p
Zs,p
Cross-phenotype meta-analysis
l=1
l¹1
l¹1
−log(p)
−log(p)
−log(p)
SCPMA ~
L(data | λ≠1)
L(data | λ=1)
Cotsapas et al, PLoS Genetics
CPMA detects trans mixtures
20
10
0
CPMA (chisq df=1)
30
40
CPMA: ability to detect trans eQTLs
0
5
10
15
20
25
30
35
40
45
50
55
60
N/1000 traits affected by a trans eQTL at p ~ 0.001
65
70
75
80
85
90
95
100
Open research questions
• Do trans effects exist?
– Yes – heritability estimates suggest so.
– Can we detect them?
• Larger cohorts?
– Most eQTL studies ~50-500 individuals
– See later, GTEx Project
• Better methods?
– Collapsing data?
– PCA, summary statistics, modeling?
Today: Regulatory variation and eQTLs
1. Quantitative Trait Loci (QTLs), Regulatory Variation
– Molecular phenotypes as QTs: expression, chromatin…
– Discretization: a GWAS for each gene. Cis-/Trans-eQTLs
– Underlying regulatory variation: eQTLs, GWAS, cis-eQTL
2. Finding trans-eQTLs (distal from gene that varies)
– Challenges: Power, structure, sample size
– Cross-phenotype analysis: trans QTLs affect many genes
3. Identifying underlying regulatory mechanisms
– Cis-eQTLs: TSS-distance, cell type specificity
– eQTLs vs. GWAS: Expression as intermediate trait
4. Population differences, emerging efforts
– Shared associations, SNP-gene pairs, allelic direction
– Confound: environment, preparation, batch, ancestry
CAN WE LEARN REGULATORY
VARIATION FROM eQTL?
First, let’s define the question
• Can we use genetic perturbations as a way to
understand how genes are regulated?
• In what groups, in which tissues?
• To what stimuli/signaling events?
• Do cis eQTLs perturb promoter elements?
• Do trans perturb TFs? Signaling cascades?
Significant associations are symmetrically distributed around TSS
Most significant SNP per gene
0.001 permutation threshold
Stranger et al., PLoS Gen 2012
69-80% of cis associations are cell type-specific
No. of cell types with gene association
Cell type-specific and cell type-shared gene associations
(0.001 permutation threshold)
268
271
262
73
85
82
86
86
86
•
cell type
cis association sharing increases slightly
when significance thresholds are relaxed
•
Cell type specificity verified experimentally for subset of eQTLs
Dimas et al Science 2009
Slide courtesy Antigone Dimas
Dimas et al Science 2009
Open research questions
• Do cis eQTLs perturb functional elements?
– Given each is independent, how can we know?
• Do tissue-specific effects correlate with the
expression of a gene across tissues? Or a
regulator?
– Perhaps a gene is expressed, but in response to
different regulators across tissues?
• If we ever find trans eQTLs…
– Common regulators of coregulated genes?
– Tissue specificity?
– Mechanisms?
Candidate genes, perturbations underlying organismal phenotypes
APPLICATION TO GWAS
eQTLs as intermediate traits
Schadt et al Nat Genet 2005
Exploring eQTLs in the relevant cell type is
important for disease association studies
relevant cell type for disease
cell type not relevant for disease
Importance of cataloguing regulatory variation in multiple cell types
Slide courtesy Antigone Dimas
Modified from Nica and Dermitzakis Hum Mol Genet 2008
1
−1
8
2
4
MS
CD
0
−log10 (p)
6
Spearman's r (f = 10; d = 5)
199.0
199.1
199.2
199.3
199.4
199.5
Chr 1 (Mb)
Barrett et al 2008
de Jager et al 2007
Franke et al 2010
Anderson et al 2011
Today: Regulatory variation and eQTLs
1. Quantitative Trait Loci (QTLs), Regulatory Variation
– Molecular phenotypes as QTs: expression, chromatin…
– Discretization: a GWAS for each gene. Cis-/Trans-eQTLs
– Underlying regulatory variation: eQTLs, GWAS, cis-eQTL
2. Finding trans-eQTLs (distal from gene that varies)
– Challenges: Power, structure, sample size
– Cross-phenotype analysis: trans QTLs affect many genes
3. Identifying underlying regulatory mechanisms
– Cis-eQTLs: TSS-distance, cell type specificity
– eQTLs vs. GWAS: Expression as intermediate trait
4. Population differences, emerging efforts
– Shared associations, SNP-gene pairs, allelic direction
– Confound: environment, preparation, batch, ancestry
POPULATION DIFFERENCES
Shared association in 8 HapMap populations
APOH: apolipoprotein H
Stranger et al., PLoS Gen 2012
Number of genes with cis-eQTL associations
8 extended HapMap populations
Spearman Rank Correlation
SRC: permutation threshold
0.01
# genes
CEU
2869
CHB
2832
GIH
2959
JPT
2900
LWK
3818
MEX
2609
MKK
4222
YRI
3961
non-redundant 12494
>2 pops
6889 0.55
8 pops
151 0.01
FDR
0.06
0.06
0.06
0.06
0.05
0.07
0.04
0.05
0.001
# genes
657
774
698
795
773
472
947
799
3130
1074 0.34
63 0.02
FDR
0.03
0.02
0.03
0.02
0.02
0.04
0.02
0.02
# gene
313
378
300
386
311
165
411
328
1132
547
28
Stranger et al., PLoS Gen 2012
Direction of allelic effect
same SNP-gene combination across populations
Population 2
8.00
8.00
7.75
7.75
AGREEMENT
7.25
7.00
6.75
6.50
CC
CT
THAP5
7.50
log2 expression
THAP5
log2 expression
Population 1
7.50
7.25
7.00
6.75
6.50
TT
CC
rs40915
OPPOSITE
7.50
7.25
7.00
6.75
6.50
CC
CT
rs40915
TT
THAP5
7.75
log2 expression
THAP5
TT
8.00
8.00
log2 expression
CT
rs40915
7.75
7.50
7.25
7.00
6.75
6.50
CC
CT
TT
rs40915
Stranger et al., PLoS Gen 2012
Slide courtesy Alkes Price
Population differences could have non-genetic basis
• Differences due to environment?
2008)
(Idaghdour et al.
• Differences in cell line preparation? (Stranger et al. 2007)
• Differences due to batch effects?
(Akey et al. 2007)
(Reviewed in Gilad et al. 2008)
Slide courtesy Alkes Price
Gene expression experiment
Does gene expression in 60 CEU + 60 YRI vary with ancestry?
c
Does gene expression in 89 AA vary with % Eur ancestry?
60 CEU + 60 YRI from HapMap, 89 AA from Coriell HD100AA
Gene expression measurements at 4,197 genes obtained using
Affymetrix Focus array
Slide courtesy Alkes Price
Gene expression differences in African Americans
validate CEU-YRI differences
12% ±
3%
in cis
c = 0.43 (± 0.02)
(P-value < 10-25)
Slide courtesy Alkes Price
RNAseq, GTEx
EMERGING EFFORTS
RNAseq questions
• Standard eQTLs
– Montgomery et al, Pickrell et al Nature 2010
• Isoform eQTLs
– Depth of sequence!
•
•
•
•
Long genes are preferentially sequenced
Abundant genes/isoforms ditto
Power!?
Mapping biases due to SNPs
Strategies for transcript assembly
Garber et al. Nat Methods 8:469 (2011)
GTEx – Genotype-Tissue EXpression
An NIH common fund project
Current: 35 tissues from 50 donors
Scale up: 20K tissues from 900 donors.
Novel methods groups: 5 current + RFA
RNAseq combined with other techs
• Regulons: TF gene sets via CHiP/seq
– Look for trans effects
• Open chromatin states (Dnase I; methylation)
– Find active genes
– Changes in epigenetic marks correlated to RNA
– Genetic effects
• RNA/DNA comparisons
– Simultaneous SNP detection/genotyping
– RNA editing ???
Today: Regulatory variation and eQTLs
1. Quantitative Trait Loci (QTLs), Regulatory Variation
– Molecular phenotypes as QTs: expression, chromatin…
– Discretization: a GWAS for each gene. Cis-/Trans-eQTLs
– Underlying regulatory variation: eQTLs, GWAS, cis-eQTL
2. Finding trans-eQTLs (distal from gene that varies)
– Challenges: Power, structure, sample size
– Cross-phenotype analysis: trans QTLs affect many genes
3. Identifying underlying regulatory mechanisms
– Cis-eQTLs: TSS-distance, cell type specificity
– eQTLs vs. GWAS: Expression as intermediate trait
4. Population differences, emerging efforts
– Shared associations, SNP-gene pairs, allelic direction
– Confound: environment, preparation, batch, ancestry