Gene hunting from a Bayesian viewpoint Tuan V. Nguyen Bone and Mineral Research Program Garvan Institute of Medical Research Sydney, Australia.

Download Report

Transcript Gene hunting from a Bayesian viewpoint Tuan V. Nguyen Bone and Mineral Research Program Garvan Institute of Medical Research Sydney, Australia.

Gene hunting from a Bayesian
viewpoint
Tuan V. Nguyen
Bone and Mineral Research Program
Garvan Institute of Medical Research
Sydney, Australia
Gene search is justified?
• Exploration of disease pathway
• Public health implications
• Pharmacological applications
• Treatment?
10
0.5
8
0.4
6
0.3
4
0.2
2
0.1
0
0
Femoral neck BMD
10-year Risk of Fx
0.6
1.
05
-
12
0.
95
-
0.7
0.
85
-
14
0.
75
-
0.8
0.
65
-
16
0.
55
-
0.9
0.
45
-
18
<0
.4
0
Prevalence (%)
Complex traits
BMD: genetics and environments
Variation in complex trait = G + E + GxE
(G=genetics; E=environment, x=interaction)
MZ twins; r=0.73
1.4
1.3
1.3
1.2
1.2
1.1
1.1
1
Twin 2
Twin 2
1.4
0.9
DZ twins; r=0.47
1
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
Twin 1
Twin 1
Genetics, BMD and fracture
Fracture
BMD
Does familial risk of BMD affect fracture risk?
Intraclass correlation in BMD
RR of BMD/Fx
r = 0.8
r = 0.9
_________________________________________________
5
1.14
1.16
6
1.17
1.20
7
1.21
1.24
8
1.24
1.28
_________________________________________________
Genes that affect BMD explain small variation in fx risk
Fracture risk: genetics and environments
0.1
5
0.08
4
Relative risk
Pairwise concordance
Twin study
0.06
0.04
3
2
0.02
1
0
0
DZ
MZ
Zygosity
DZ
MZ
Zygosity
P Kannus et al, BMJ 1999; 319:1334-7
Current strategies
• Linkage analysis
• Genome-wide screen
• Association analysis
• “Candidate gene”
Linkage analysis – identical by descent (ibd)
AB
AC
AB
AC
IBD = 0
AB
CD
AC
AD
IBD = 1
AB
CD
BC
BC
IBD = 2
Linkage analysis: basic model
Squared
difference
in BMD
among
siblings
o
oo
oo
oo
o
o
o
oo
oo
oo
o
o
o
oo
oo
oo
o
o
0
1
2
Number of alleles shared IBD
Population-based association analysis
Fracture
AC AB AC BC AA AB BB AA AC
AB
AC BB BC BC CC AB BB CC
BB
Controls
BC
Family-based association analysis
AB
AA
AB
AB
AC
BC
BC
AA
AB
Genome-wide vs candidate gene approach
Genome-wide screen
Candidate gene analysis
Complex
Simple
No prior knowledge of
mechanism
Prior knowledge of
mechanism
Expensive
Inexpensive
No specific genes
Specific genes
Linkage vs association phenomena
Linkage
Association
Magnitude of “effect”
No
Yes
Transmission
Yes
No/Yes
Complex
Simple
Power
Low
High
False +ve
High
High
Study design complexity
Test statistic
Test statistic = signal / noise
= effect size / random error
Result: significant (+ve) or not significant (-ve)
Criteria: P-value
P<0.05  +ve
P>0.05  -ve
Consider an example
Genotype
Fracture
No fx
BB
300
300
Bb
600
650
Bb
100
50
OR = (0.1 / 0.9) / (0.05 / 0.95) = 2.11
LnOR = 0.74; SE(lnOR) = 0.18
P-value < 0.0001
Diagnostic analogy
Diagnosis
Genetic research
Has cancer
test +ve
OK
Has cancer
test –ve
! (false -ve)
No cancer
test +ve
! (false +ve)
No cancer
test –ve
OK
Association Significant
Association
Power
NS
No assoc.
Significant
No assoc.
NS
P-value
The meaning of P-value
• P-value: probability of getting a significant
statistical test given that there is no association (or
no linkage)
• P-value = P(significant stat | Ho is true)
The logic of P-value
• If Tuan has hypertension, he
is unlikely to have
pheochromocytoma
• If there was truly no
association, the the
observation is unlikely
• Tuan has
pheochromocytoma
• The observation occurred
• Tuan is unlikley to have
hypertension
• The no-association
hypothesis is unlikely
P-value
“Thinking about P values seems quite counterintuitive at first, as you must use backwards,
awkward logic. Unless you are a lawyer or a
Talmudic scholar … you will probably find this
sort of reasoning uncomfortable” (Intuitive
Statistics)
What do we want to know?
Clinical
P(+ve | cancer), or
P(cancer | +ve) ?
Research
P(Significant test | Association), or
P(Association | Significant test) ?
Breast Cancer Screening
Prevalence = 1%; Sensitivity = 90%; Specificity = 91%
Population
Cancer (n=10)
No Cancer (n=990)
+ve
-ve
+ve
-ve
N=9
N=1
N=90
N=900
P(Cancer| +ve result) = 9/(9+90) = 9%
Genetic association
Prior prob. association = 0.05; Power = 90%; Pvalue = 5%
1000 SNPs
True (n=50)
False (n=950)
+ve
-ve
+ve
-ve
N=45
N=5
N=48
N=902
P(True association| +ve result) = 45/(45+48) = 48%
Problem with p-value
No. of patients
get A and B
No. preferring
A:B
% Preferring A
Two-sided Pvalue
20
15:5
75.0
0.04
200
115:86
57.5
0.04
2,000
1046:954
52.3
0.04
2,000,000
1001445:
998555
50.07
0.04
P value is
• NOT the likelihood that findings are due to chance
• NOT the probability that the null hypothesis is
true given the data
• A p-value = 0.05 does not mean that there is a
95% chance that a real difference exists
• The lower p-value, the stronger the evidence for
an effect
Bayes factor
P(data | H0)
BF = _________________
P(data | H1)

2 / n
BF 
2    / n
0.5
2
2
BF 
0.5
2
1  n
2
/
2

exp  0.5x   0 2 /  /  2




exp  0.5x   0 2 /  2   2 / n

exp 0.5 z 2 1   / n 2


1 


Minimum Bayes factor
1 z2 

BF  z exp
 2 
Prob of null hypothesis

P H 0  

PH 0 | Data  1 
 PH 0 BF 
1
Where P(H0) is the prior probability of the null hypothesis
Re-evaluation of some “positive” studies
Study
OR
95% CI
P-value
Z
MinBF
Min
P(H0|data)
1
2.10
1.10, 3.90
0.021
2.30
0.27
0.21
2
1.50
1.10, 2.10
0.010
2.46
0.20
0.16
3
2.67
1.37, 6.02
0.009
2.60
0.15
0.13
4
2.59
1.23, 4.45
0.004
2.90
0.07
0.07
5
2.26
1.09, 4.69
0.026
2.19
0.33
0.25
6
2.60
1.40, 5.00
0.003
2.94
0.06
0.06
7
2.58
1.36, 4.91
0.004
2.89
0.07
0.07
8
2.79
1.02, 7.65
0.043
2.00
0.45
0.31
Bayes Factor for genetic association study
let p be the allelic frequency of genetic marker, then
BF can be shown to be:
1 / BF 
1
4n 2 p 3 1  p 3


2
1  n  3 F 


n/2
Bayes Factor and posterior probability of
an association
let 0 be the prior probability of a true association,
the posterior probability of the association is:
P0 
1
1  1   0  /  0 / BF
Bayes factor vs p-value – t/z test
LODA [Log10(Bayes factor)]
5
(From uppermost to lowermost lines:
n=10, 15, 50, 100, 500, 1000, 10000)
4
3
2
1
0
-6
-5
-4
-3
Log10(P-value)
-2
-1
P-value and Bayes factor – LD test
16
(From uppermost to lowermost lines:
n= 10, 50, 100, 300, 500, 1000, 10000)
14
Log10(Bayes factor)
12
10
8
6
4
2
0
-16
-14
-12
-10
-8
Log10(p-value)
-6
-4
-2
0
Bayes factor and p-value
Bayes factor
P-value
•
•
•
•
•
Non-comparative
Observed + hypothetical data
Evidence only negative
Sensitive
No formal justification or
interpretation
•
•
•
•
•
Comparative
Only observed data
Evidence +ve or –ve
Insensitive
Formal justification and
interpretation
Summary
• The criteria of p<0.05 is not an adequate measure
of a genetic association
• Bayes factor is potentially a relevant measure of
association
50
18
50
16
50
14
50
12
50
10
0
85
0
65
0
45
25
0
160
140
120
100
80
60
40
20
0
50
Number of studies
Distribution of sample sizes
Sample size
Ioannidis et al, Trends Mol Med 2003
Distribution of effect sizes
100
80
60
40
20
4
8
2.
2.
2
1.
6
8
0.
1.
2
5
0.
0
0
Number of studies
120
Effect size (OR)
Ioannidis et al, Trends Mol Med 2003
Correlation between
the odds ratio in the
first studies and in
subsequent studies
Ioannidis et al, Nat
Genet 2001
Evolution of
the strength of
an association
as more
information is
accumulated
Ioannidis et al,
Nat Genet
2001
Predictors of statistically significant discrepancies
between the first and subsequent studies of the same
genetic association
Odds ratio –
univariate analysis
Total no. of studies (per
association)
1.17 (1.03, 1.33)
Odds ratio –
multivariate
analysis
1.18 (1.02, 1.37)
Sample size of the first study
0.42 (0.17, 0.98)
0.44 (0.19, 0.99)
Single first study with clear
genetic effect
9.33 (1.01, 86.3)
NS
Predictor
Ioannidis et al, Nat Genet 2001
Diagnosis and statistical reasoning
Diagnosis
Research
Prior probability of disease
(prevalence)
Prior probability of research
hypothesis
Positive test result (+ve)
Statistical significance (S)
Sensitivity
P(+ve | diseased)
Power: (1-b)
P(S | association)
False positive rate
P(+ve | no diseased)
P-value
P(S | No association)
Positive predictive value
P(diseased | +ve)
Bayesian probability
P(Association | S)
Risk factors for fracture
•
•
•
•
Blonde hair
• Drinking coffee
Being tall
• Drinking tea
Wear trouser (women) • Coca cola
High heel (women)
• High protein intake
Cancer risk
•
•
•
•
Electric razors
Broken arms (women)
Fluorescent lights
Allergies
• Breeding reindeer
• Being a waiter
• Owning a pet bird
• Being short
• Being tall
• Hot dogs
• Have a refrigerator!
Altman and Simon, JNCI 1992
“Half of what doctors know is wrong.
Unfortunately we don’t know which
half.”
Quoted from the Dean of Yale Medical School,
in “Medicine and Its Myths”, New York Times Magazine,
16/3/2003