Shaun Purcell Psychiatric & Neurodevelopmental Genetics

Download Report

Transcript Shaun Purcell Psychiatric & Neurodevelopmental Genetics

Gene-environment & gene-gene
interaction in association studies:
a methodologic introduction
Shaun Purcell
Psychiatric & Neurodevelopmental Genetics Unit
Center for Human Genetic Research
Massachusetts General Hospital
http://pngu.mgh.harvard.edu/~purcell
[email protected]
Finding disease-causing variation
…GGCGGTGTTCCGGGCCATCACCATTGCGGG
CCGGATCAACTGCCCTGTGTACATCACCAAG
GTCATGAGCAAGAGTGCAGCCGACATCATCG
CTCTGGCCAGGAAGAAAGGGCCCCTAGTTTT
TGGAGAGCCCATTGCCGCCAGCCTGGGGACC
GATGGCACCCATTACTGGAGCAAGAACTGGG
CCAAGGCTGCGGCGTTCGTGACTTCCCCTCC
CCTGAGCCCGGACCCTACCACGCCCGACTA…
chromosome 4 DNA sequence
SNP (single nucleotide polymorphism)
The Human Genome
Rare disease, major gene effect
Genotype
DD
Dd
dd
Risk of disease
0.001
0.001
0.95
Disease prevalence
~1 in 1000
Individuals with dd are ~1000 times more likely to get disease
Frequency of d in controls
Frequency of d in cases
~ 5%
~ 96%
Common polygenic disease
Common disease, polygenic effects
Genotype
DD
Dd
dd
Risk of disease
0.01
0.012
0.0144
Disease prevalence
~1 in 100
Each extra d allele increases risk by ~1.2 times
Frequency of d in controls
Frequency of d in cases
~ 5%
~ 6%
Genotype
?
Environment
Phenotype
Gene-environment correlation
Gene effect
Gene-environment interaction
?
Environmental effect
G x E interaction
The environment modifies the effect of a gene
A gene modifies the effect of an environment
Linkage disequilibrium (LD)
Gene effect
Epistasis
Gene effect
Gene × gene interaction
Epistasis: one gene modifies the effect of another
Classical definition of epistasis
BB
Bb
bb
AA
Aa
aa
The aa genotype masks the effect of the bb genotype
Separate analysis
• locus A shows an association with the trait
• locus B appears unrelated
AA
Aa
Marker A
aa
BB
Bb
Marker B
bb
Joint analysis
• locus B modifies the effects of locus A
AA
Aa
aa
BB
Bb
bb
Two locus genotypes
Locus B
BB
Bb
bb
AA
AABB
AABb
Aabb
Locus A
Aa
AaBB
AaBb
Aabb
aa
aaBB
aaBb
aabb
Epistasis & haplotypes
• Two-locus genotype
A/a B/b (AaBb)
A and B need not even be on same chromosome
• Haplotype
AB / ab
A and B on same chromosome; effect could appear as “interaction”
• cis versus trans effects
AB haplotype causes disease
B
A
a
b
A
b
a
B
disease
no disease
A and B interact to cause disease
A
B
a
b
A
b
a
B
disease
disease
Two locus genotypes
Locus B
BB
Bb
bb
AA
fAABB
fAABb
fAabb
fAA
Locus A
Aa
fAaBB
fAaBb
fAabb
fAa
aa
faaBB
faaBb
faabb
faa
fBB
fBb
fbb
f
“Penetrance” = probability of developing disease given genotype
Common disease, polygenic effects
Genotype
DD
Dd
dd
Risk of disease
0.01
0.012
0.0144
Disease prevalence
~1 in 100
Each extra d allele increases risk by ~1.2 times
Frequency of d in controls
Frequency of d in cases
~ 5%
~ 6%
Small single SNP effects
might represent larger epistatic effects
Risk of developing disease
BB
Bb
bb
AA
0.01
0.01
0.01
Aa
0.01
0.01
0.01
aa
0.01
0.01
0.20
0.01
0.01
0.012
Frequency a = b = 0.1
Interaction may be a common feature
of genetic variation
• Brem et al (2005) Nature
– gene expression phenotypes in yeast
– two-stage approach to find pairs of loci
• 65% of these pairs showed significant interaction
• many secondary loci would be missed by standard
approaches though
Examples of interactions?
Risk
Environment
Outcome
phenylalanine
hydroxylase
deficiency
debrisoquine
metabolism
fair skin
dietary
phenylalanine
mental retardation
smoking
lung cancer
sun exposure
skin cancer
Lewis blood group
alcohol intake
APOE genotype
head injury
coronary
atherosclerosis
Alzheimer's
disease
The rest of this talk…
• Statistical issues
• Study designs
• Examples
Population-based case/control
AA
AC
AA
AA
AA
CC
AA
AC
Family-based transmission disequilibrium test (TDT)
AA
AA
AA
AA
AC
AA

AA
AC

AC
CC
CC
CC
Odds ratio: measure of association
Case
Control
A
a
c
a
b
d
Odds of A in cases = a/b
Odds of A in controls = c/d
Odds ratio = (a/b)/(c/d) = ad / bc
Case
Control
EA
80
80
Odds ratio
1.00
0.375
(80*20)/(80*20)
(60*20)/(80*40)
a
20
20
Z = ( ln(ORE-) – ln(ORE+) ) / sqrt( VE- + VE+)
V( ln(OR) ) = 1/a + 1/b + 1/c + 1/d
E+
A
a
60 40
80 20
Regression modeling of interaction
Y = bXX + e
Y = bXX + bZZ + bIXZ + e
interaction component
Y = ( bX + bIZ )X + bZZ + e
effect of X on Y is modified by Z
Y = b0 + b1G + b2E +b3G×E
Y
E+
•
Linear for continuous outcomes
•
Logistic regression for yes/no
outcomes
G = 0, 1, 2 copies of allele “A”
EE = yes/no exposure (0/1)
0
continuous measure
1
Gene dosage
2
The “Interactome”
Definitions of epistasis
Biological
Statistical
AA
Aa
aa
BB
Individual-level
phenomenon
Bb
bb
Population-level
phenomenon
Requires:
1) Variation between individuals
2) Effect on disease
AA
Aa
aa
BB
AA
Aa
aa
BB
Bb
bb
Requires:
1) Correct statistical
definition of effect
Bb
bb
What do interactions mean?
• TEST MAIN EFFECT
– Null hypothesis straightforward
• TEST INTERACTION
– Null hypothesis is a mathematical model describing
joint effects
A- A+
BB+
1
b
a
?
Multiplicative risk ratios
BB+
A- A+
1 a
b ab
RR(A)
a/1
ab/b
=a
=a
Additive risk differences
BB+
A- A+
RD(A)
1 a
a-1
= a-1
b a+b-1 a+b-1-b = a-1
“…we defined interaction as departure
from a multiplicative model…”
• Multiplicative model
(a×b)
– common, easy to implement, logistic
regression
• additive on log-odds scale
• multiplicative on risk scale
• Other common models (on risk)
– additive
– heterogeneity model
(a + b)
(a + b – ab )
AB-
B+
A+
10
20
20
30
LENGTH = A + B
AB-
B+
A+
100
400
400
900
AREA = A + B + A×B
Log-transform
Cubic-transform
1.5
7-point scale
.3
1.5
Density
1
Density
.04
Density
Density
-5
0
p0
5
-2
-1
0
1
p1
2
-100
-50
0
50
p2
100
150
0
0
.5
0
0
0
.02
.1
.5
.1
Density
.2
1
.2
.06
2
.3
Censored
.08
Original
0
1
2
3
p3
4
5
0
2
4
p4
G1





G2





G1G2





6
8
OR(A) = 2
OR(B) = 2
Additive (3.00)
1/3
1/2
1
2
4
3
Multiplicative (4.00)
???
5
OR(A) = 1.2
OR(B) = 1.2
Additive (1.40)
1/3
1/2
1
2
Multiplicative (1.44)
?
3
4
5
No controls
(Case-only design)
Population-based
controls
Family-based controls
AA
AC
AA
AA

v.s.
More robust, fewer assumptions
More efficient, powerful
Case-only design
• Detect interaction only, no main effects
Risk factors
GEG+
EGE+
G+
E+
Prevalence
p0
pG
pE
pGE = p0 ∙ pG /p0 ∙ pE /p0
Case-only design
• Detect interaction only, no main effects
Risk factors
GEG+
EGE+
G+
E+
Prevalence
p0
pG
pE
pGE = p0 ∙ pG /p0 ∙ pE /p0
Leads to
ORINT = ORGE / (ORG ∙ ORE)
It turns out,
ORINT = ORCase / ORControl
where ORCase is the association of G and E in cases
and ORControl is the association of G and E in controls
100
Case-only designs offer efficient
detection of interaction
100
100
90
90
90
80
70
60
50
40
30
20
10
0
% replicates significant at p=0.05
80
80
70
70
60
100 cases, 100 controls
100 cases, 100 controls
200 cases, 200 controls
20050cases, 200 controls
200 cases only
200 cases only
200 controls only
20040controls only
60
50
40
30
30
20
20
10
10
0
No interaction
0
Interaction
100 cases, 100 controls
200 cases, 200 controls
200 cases only
200 controls only
Case-only design isn’t always valid
• Chromosomal proximity
Gene A
Gene B
• Multiple ethnicities in case sample
Gene A
Gene B
stratification
Epistasis:
LD in cases ≠ LD in controls
Genes in 5q GABA cluster
Controls
Cases
(Scz)
Pamela Sklar
Tracey Petryshen
C&M Pato
TDT requires independence
assumption
If variants A and B are in LD (common haplotypes AB / ab)
Stratify for bb probands
Stratify for BB probands
aa
AA
Aa
aa
→100%
aa
Aa
Aa
→0%
Aa
AA
Aa
→ 0%
→ false positive interactions
(due to linkage or population stratification)
Aa
AA
→100%
An “all pairs of SNPs” approach to
epistasis does not scale well
# SNPs
5
10
50
100
500
500000
# pairs
10
45
1,225
4,950
124,750
124,999,750,000
Multiple testing
increases false positives
1
P(at least 1 false positive)
0.9
per test false positive
rate 0.05
0.8
0.7
0.6
0.5
0.4
0.3
per test false
positive rate
0.001 = 0.05/50
0.2
0.1
0
0
5
10
15
20
25
30
35
40
Number of independent tests performed
45
50
Tests for interaction have
low power
1
Statistical power
0.9
Standard association test
0.8
0.7
0.6
0.5
Epistasis test
0.4
0.3
0.2
0.1
0
Increasing sample N
Dysbindin-1 (DTNBP1) &
schizophrenia
• DTNBP1 & 7 other genes encode proteins that
make up the BLOC1 protein complex
– biogenesis of lysosome-related organelles complex 1
• DTNBP1’s effect on Scz mediated via BLOC1?
– if so, an analysis including all 8 genes
might help to resolve inconsistent studies
Derek Morris
Aiden Corvin
Michael Gill
DTNBP1 association studies
3
4
5
6
7
89
T
C
G
C
A A T C C
A
C
C
Van den Oord et al. (2003)
Van den Bogaert et al. (2003)
T
T C
Tang et al. (2003)
A
C A
Kirov et al. (2004)
Williams et al. (2004)
C A
T Funke et al. (2004)
C
G
SNPs
Schwab et al. (2003)
C
G
Exons
Straub et al. (2002)
G G C
A T
rs2743852
P1583
P1795
P1792
P1578
P1763
P1320
P1757
P1765
rs2619550
P1325
rs2619542
P1635
P1655
rs3829893
P1287
rs734129
P1333
P1328
rs1047631
A A T
10
rs2619538
12
A
G
T C T
C
Numakawa et al. (2004)
Li et al. (2005)
G G
Types of interaction
G+
G+
G+
G-
G-
G-
Direction of effect
Presence of effect
Magnitude of effect
Duplicate gene action
Example: Kernel Color in Wheat
AA Aa
BB
Bb
bb
aa
  
  
  
Only 1 dominant allele required,
either A or B
A_B_
A_bb
aaB_
aabb
Normal
Normal
Normal
No product
Complementary gene action
Example: Flower color in sweet pea
AA Aa
BB
Bb
bb
aa
  
  
  
One recessive genotype at either
gene would increase disease risk
i.e. genes A and B required
A_B_
A_bb
aaB_
aabb
Normal
No product
No product
No product
AA Aa
BB
Bb
bb
aa
  
  
  
  
  
  
  
  
  
  
  
  
Complementary
gene action
Duplicate
gene action
Heterogeneity
model
“Checkerboard”
model
  
  
  
  
  
  
  
  
  
  
  
  
Negative feedback: a common biological mechanism
Negative feedback: simple model of dysregulation
-/-/+/+/+
+/-
+/+
Negative feedback: single marker analysis leads to
the “opposite allele” problem
2.6
2.4
Single marker
relative risk
2.2
2
1.8
1.6
1.4
1.2
-/-/+/-
+/-
+/+
1
0.8
0.6
0.4
+/+
0.2
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Frequency of one locus
(other locus fixed p=0.4)
0.8
0.9
Standard single SNP
analyses
373 Irish schizophrenics
-log10(p-value)
2.5
812 controls
2
1.5
p=0.05
1
0.5
0
DTNBP1
MUTED
PLDN
SNAPAP
CNO
BLOC1S1 BLOC1S2 BLOC1S3
Dysbindin-1 by itself shows no evidence of association with Scz
A
B
C
D
E
F
G
H
I
J
1
2
3
4
5
6
7
8
A single gene-based test
A
A
A
A
A
A
A
A
B
B
B
B
B
B
B
B
…….
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
6
J
7
J
8
J
80 allele-based tests
Gene-based p = 0.0009
Correcting for multiple tests, p = 0.025
2.5
PLDN
Odds ratio
2
1.5
SNAPAP
DTNBP1
1
BLOC1S1
CNO
0.5
BLOC1S3
MUTED
BLOC1S2
0
DTNBP1
DTNBP1
DTNBP1
MUTED genotype
An independent replication?
DTNBP1  MUTED epistasis
(Straub et al. WCPG meeting Oct 2005.)
Known protein
interactions
in BLOC-1 complex
Methylenetetrahydrofolate reductase (MTHFR) polymorphisms
and serum folate interact to influence negative symptoms
and cognitive impairment in schizophrenia
Joshua Roffman, Donald Goff, et al
• Folic acid deficiency may
contribute to negative
symptoms and cognitive
impairment in schizophrenia
– underlying mechanism remains
uncertain
• A cohort of 159 outpatients
with schizophrenia
measured:
– negative symptoms
– frontal lobe deficits
Low folate
30
High folate
25
20
15
10
5
C/C &
C/T
T/T
0
WCST % Perseverative Errors
PANSS Negative Symptoms
35
60
Low folate
50
High folate
40
30
20
10
C/C & C/T
T/T
0
0
Verbal Fluency
Low folate
10
•Interaction of low serum folic acid
and homozygosity for the MTHFR
677T allele confers risk.
High folate
20
•Patients homozygous for the
MTHFR 677T allele may therefore
benefit specifically from folic acid
supplementation.
30
40
50
60
C/C &
C/T
T/T
Further reading
• Cordell HJ (2002) Human Molecular Genetics 11: 2463-2468.
– a statistical review of epistasis, methods and definitions
• Clayton D & McKeigue P (2001) The Lancet, 358, 1357-60.
– a critical appraisal of GxE research
• Marchini J, Donnelly P & Cardon LR (2005) Nature Genetics,
37, 413-417
– epistasis in whole-genome association studies