Bayesian inference in differential expression experiments Sylvia Richardson Natalia Bochkina Alex Lewin Centre for Biostatistics Imperial College, London Biological Atlas of Insulin Resistance BBSRC www.bgx.org.uk BGX.

Download Report

Transcript Bayesian inference in differential expression experiments Sylvia Richardson Natalia Bochkina Alex Lewin Centre for Biostatistics Imperial College, London Biological Atlas of Insulin Resistance BBSRC www.bgx.org.uk BGX.

Bayesian inference in differential
expression experiments
Sylvia Richardson
Natalia Bochkina
Alex Lewin
Centre for Biostatistics
Imperial College, London
Biological Atlas
of Insulin Resistance
BBSRC
www.bgx.org.uk
1
BGX
Background
•
Investigating changes of gene expression under
different conditions is one of the key questions in
many biological experiments
• Specificity of the context is
– High dimensional data (ten of thousands of
genes) and few samples
•
Need to borrow information
– Many sources of variability
•
Important to adopt a flexible modelling framework
Bayesian Hierarchical Modelling allows to capture important
features of the data while maintaining generalisibility
of the tools/ techniques developed
2
BGX
Modelling differential expression
Condition 1
Condition 2
Start with given point
estimates of expression
Hierarchical model of replicate
variability and array effect
Differential expression
parameter
Hierarchical model of replicate
variability and array effect
Posterior distribution
(flat prior)
Mixture modelling
for classification
3
BGX
Outline
• Background
• Bayesian hierarchical models for differential
expression experiments
• Decision rules based on tail posterior probabilities
– Comparison with existing approaches
– FDR estimation for tail posterior probabilities
• Extension of tail posterior probabilities to analysing
multiclass experiments
• Illustration
• Discussion and further work
4
BGX
I -- Bayesian hierarchical model for
differential expression (Lewin et al, Biometrics, 2006)
Data: ygcr = log gene expression gene g, replicate r, condition c
g = gene effect
δg = differential effect for gene g between 2 conditions
r(g)c = array effect – modelled as a smooth (spline) function of g
gc2 = gene specific variance
yg1r  N(g – ½ δg + r(g)1 , g12)
yg2r  N(g + ½ δg + r(g)2 , g22)
Σrr(g)c = 0, r(g)c = function of g , parameters {c,d}
• 1st level
• 2nd level
Exchangeable
variances
“Flat” priors for g , δg, {c,d}
gc2  g (ac, bc)
(lognormal or inverse-gamma)
5
BGX
Joint modelling of array effects and
differential expression
• Performs normalisation simultaneously with
estimation
– Gives fewer false positives than plug in
• BHM set up allows to check some of the
modelling assumptions using mixed posterior
predictive checks:
– the need for gene specific variances
– their 2nd level distribution
Found that lognormal or 2 parameter inverse
gamma distribution for the variances gave
similar model checks
6
BGX
Selecting genes that are differentially expressed
• Interested in testing the null hypothesis
H (g) :
0
±g = 0 versus H (g) :
1
±g 6= 0:
Two broad approaches have been used:
P value type
H0
U [0,1]
H1
close to 0
and Long
References Baldi
Smyth 2004, …
Moderated t stat
Mixture
P(H0 | ygcr)
close to 1
close to 0
Lonnstedt & Speed 02, Newton &
Kendziorski, 01, 03 Lonnstedt &
7
BGX
Britton 05, Gottardo 06, ….
Bayesian mixtures
• Relies on specification of prior model for
±g
:
±g » ¼0 ±0 + (1 ¡ ¼0 )h(±g j´)
• Choice of model for the alternative (see the poster by
Alex Lewin)
– Could influence the performance of the classification
– To check how the alternative fits the data is non standard
±g rules
Investigate properties of Bayesian selection
based on “non informative” prior for
8
BGX
II -- Bayesian selection rules for pairwise
comparisons
• 1st level (no array effect):
yg1r j®g ; ±g ; ¾ 2 » N (®g ¡ ±g =2; ¾ 2 );
g1
g1
y j® ; ± ; ¾ 2 » N (® + ± =2; ¾ 2 );
g2r
g
g
g
g2
g
g2
• Hierarchical model
®g ; ±g » 1; ¾2 » f (¾ 2 j µs )
gs
gs
r = 1; : : : ; m1 ;
r = 1; : : : ; m2 :
µs » f (µs )
Extend p value approach to consider the tail
probabilities of appropriate function of parameters
9
BGX
Posterior distributions
P
¡ y¹ , where m is the number of
ms y
Let y¹gs =
and
and
y
¹
=
y
¹
g
g1
s
r=1 gsr
Pg2
¡
m
replicates in condition s. Let s2 = 1¡
y¹gs )2 , s = 1; 2
s (y
gsr
gs
r=1
ms 1
Let w2 = ¾ 2 =m1 + ¾ 2 =m2 be the variance
of y¹g .
1
ms
g
g1
g2
tg = ±g =wg
• Define the “Bayesian T statistic”:
The following conditional distributions hold
±g jy¹g ; wg
tg jy¹g ; wg
» N (¹
yg ; w2 );
g
» N (¹
yg =wg ; 1):
Z 1
• Posterior distributions:
f (±g jy¹g ; s2 ) =
gs
f (tg jy¹g ; s2 ) =
gs
Z0 1
1
'((±g ¡ y¹g )=wg )f (w2 js2 )d(w2 );
g gs
g
wg
'(tg ¡ y¹g =wg )f (w2 js2 )d(w2 );
g
gs
g
0
where f (w2 js2 ) denotes the posterior density of the variances
g
gs
BGX
10
Tail posterior probabilities 1 (N. Bochkina and SR, 2006)
• Use selection rules of the form :
P fTg > Tg(®) jygsr g ¸ pcut ,
where Tg is a suitable function of parameters of interest,
and Tg(®) denotes an 1 ¡ ®=2 percentile of Tg obtained under the null hypothesis.
Summarise the distribution of the Tg by a tail area
Tg = ±g or tg ?
• What ‘statistic’ to choose:
• How to define its percentiles ?
– we y
suppose
¹g = 0 that we could have observed data
with
(its expected value of under the null)
y¹g = 0:using posterior distributions
– work out the percentiles
conditional on
BGX
11
Tail posterior probabilities 2
Recall j
f (±g y¹g ; s2 ) =
gs
Z
0
1
1
'((±g ¡ y¹g )=wg )f (w2 js2 )d(w2 )
g gs
g
wg
Corresponding distribution function involves numerical
integration  computationally
expensive
j
»
• But
f (tg y¹g = 0; wg )
N(0; 1)
f (tg jy¹g = 0; s2 )
gs
Distribution function of
does not involve gene specific parameters
(®) = ©¡1 (1 ¡ ®=2)
t(®)
=
t
g
 The percentile
is easy to calculate
 Consider the tailp(t
probability:
; t(®) ) = P fjt j > t(®) jy
g
g
gsr
g
BGX
12
Density of p(tg ; t(®) ) for ® = 0:05
for data generated under the null hypothesis
Can simulate or
numerically evaluate
F0 the
p(tgdistribution
; t(®) )
of
under H0
5%
0.5%
Key point:
F0 is gene
independent
(conjugate
¾g1 = ¾g2 case)
BGX
13
Another Bayesian rule
±g
• A natural idea isp(±
to compare
to 0,
f±the>parameter
jy g
;
0)
=
P
0
g
g
gsr
i.e. to consider :
or its complementary
or the
f
¡
g 2-sided alternative :
max p(±g ; 0); 1
p(±g ; 0)
• It turns out that this Bayesian selection rule
behaves like a “p-value”:
p(± ; 0)
g
– Distribution of
is uniform under H0
y¹g
– There is equivalence with frequentist
testing based on
the marginal distribution of
under the null, in the spirit
of the moderated t statistic introduced by Smyth 2004
BGX
14
Link between p(δg,0) and the moderated t statistic
Suppose ¾2 = ¾ 2 = ¾ 2 with conjugate prior InvGamma(a; b)
g1
g2
g
Z 1
1
j
2
f (±g y¹g ; s ) =
'((±g ¡ y¹g )=wg )f (w2 js2 )d(w2 )
gs
g gs
g
w
g
0
Distribution of wg conjugate
p
±g jygr ; a; b » t(2a + m1 + m2 ¡ 2; y¹g ; Sg = k)
where S 2 = E(¾2 jygr ; a; b) = 2b+s2g (m1 +m2¡¡2) is the Bayesian posterior estimate of
g
g
¡1 2a+m
¡11 +m2 2
the variance and k = (m + m )
1
2
Moderated
t statistic
p
p(±g ; 0) = P f±g > 0jygr ; a; bg = FT (2a+m +m ¡2) (¹
yg k=Sg )
1
2
©
ª
¡
j
= 1 P t > tmod H
g
0
Under H0 , the distribution of p(dg ; 0) is uniform
BGX
15
Histograms of measure of differential expression
Simulated data
Under H0
p(tg , tg (α))
p(δg , 0)
Under H1
BGX
16
Tail posterior probabilities 3
• Investigate the
selection
g >rules
(®) ) = P fjt jof
(®) jy
p(tperformance
;
t
>
t
pcut
g
g
gsr
based on
In particular:
p
– what is the FDR associated with each value of
– In the conjugate case:
Use Storey
cut
?
Use F0
¼0 P fp(tg ; t(®) ) > pcut jH0 g
F DR(pcut ) =
;
f
g
(®)
P p(tg ; t ) > pcut
Use observed proportion
– How
p(±g does
; 0) this rule compares to rules based on
BGX
17
Comparison of estimated (solid line) and “true”
FDR (dashed line) on simulated data
π0 = 0.95
π0 = 0.90
π0 = 0.70
BGX
18
III-- Data Sets and Biological questions
Biological Questions
• Understand the mechanisms of insulin resistance
• Cell line experiments where reaction of mouse muscle cell
line to treatment by insulin or metformin (an insulin
replacement drug) is observed after 2 and 12 hours
• Questions of interest related to simple and compound
comparisons
3 replicates for each condition, Affymetrix MOE430A chip,
22690 genes per chip
Data pre-processed by RMA and normalised using intensity
dependent LOESS normalisation
BGX
19
Volcano plots for muscle cell data:
Change between insulin and control at 2 hours
p(tg , t (α)), α = 0.05
2max{ p(δg ,0), 1- p(δg ,0)} - 1
Cut-off : 0. 925
Less peaked around zero
Allows better separation
Peaked around zero
Varies steeply
y¹g as a
20
function of
BGX
Insulin versus control
Tail posterior probabilities
2 hours
12 hours
π0 = 0.61
π0 = 0.98
Estimated FDR
1151 selected (FDR= 0.5%)
13 selected (FDR= 0.5%)
BGX
21
Metformin versus control
Tail posterior probabilities
2 hours
12 hours
π0 = 0.56
π0 = 0.79
Estimated FDR
1854 selected (FDR= 0.5%)
72 selected (FDR= 0.5%)
BGX
22
IV – Extension to the analysis of multi class data
• In our case study, 3 groups (control c=0, insulin c=1,
metformin c=2) and 3 times points: t=0, t=1 ( 2
hours), t=2 (12 hours) each replicated 3 times
• ANOVA like model formulation suited to the analysis
of such multifactorial experiments :
ygtcr
"gtcr
¾ ¡2
g
®g ; °gt ; ±gtc
=
»
»
»
®g + °gt + ±gtc + "gtcr ;
N (0; ¾ 2 )
g
¡(a; b)
1;
t; c
Global variance parametrisation
=(borrowing
1; 2 information)
BGX
23
Joint tail posterior probabilities
• Interest is in testing a compound null hypothesis, i.e.
involving several differential parameters
e.g. testing jointly for the effect of insulin and metformin at 2
hours
• In(g)this case, we are interested in a specific
alternative:
6
6
(g)
H
0
:
±g11 = 0 & ±g12 = 0 versus H
1
:
±g11 = 0 & ±g12 = 0:
Note: Rejecting the null hypothesis in an ANOVA setting
corresponds to a different alternative
• Define
probabilities:
fj posterior
j
j
j
J joint tail
(®)
p
where
g
= P tg11 > t
tgtc = ±gtc =wg
& tg12 > t(®) j data)
is the Bayesian T statistic for each treatment
BGX
24
Benefits of joint posterior probabilities
• Takes into account correlation of the differential
expression measures between the conditions
induced by sharing the same variance parameter
• Usual practice is to:
– Carry out pairwise comparisons
– Select genes for each comparison using same cut-off
on the pp
– Intersect lists and find genes common to both lists
• Joint pp shown to lead to fewer false positives in
this case of positive correlation (simulation study)
BGX
25
Correlation of DE parameters and Bayesian T
statistic for insulin and metformin (2 hours)
Correlation between ±g11 and ±g12
Correlation between tg11 and tg12
• With joint tail posterior probabilities, and a cut-off of
pcut =0.92, 280 selected as jointly perturbed at 2
hours
• Applying pairwise comparison and combining the lists
26
adds another 47 genes to the list
BGX
Discussion 1
• Tail posterior probabilities (Tpp) is a generic tool that can be used in
any situations where a large number of hypotheses related in a
hierarchical fashion are to be tested
• We have derived the distribution of the Tpp under the null and
proposed a corresponding estimate of FDR
• This distribution requires numerical integration but is gene
independent (conjugate case), so only needs to be evaluated once
• Tpp is a smooth function of the amount of DE with a gradient that
“spreads” the genes, thus allowing to choose genes with desired
level of uncertainty about their DE
• Interesting connection between Bayesian and frequentist inference
for the differential expression parameter
BGX
27
Discussion 2
• Interesting to compare performance of Tpp with that of mixture
models
• E.g Gamma mixtures (see poster by Alex Lewin)
δg ~ p0δ0 + p1G (-x|1.5, 1) + p2G (x|1.5, 2)
H0
H1
Dirichlet distribution for (p0, p1, p2)
Exp(1) hyper prior for 1 and 2
Also Normal and t mixtures have been considered:
δg ~ p0δ0 + (1-p0) T(ν,μ,τ) (μ ~ 1, τ, ν -1~ Exp(1) )
δg ~ p0δ0 + (1-p0) N(μ,τ) (μ ~ 1, τ ~ Exp(1) )
BGX
28
Simulated data
• 3000 variables, 6 replicates, 2 conditions
yg1r  N(g, g2)
yg2r  N(g + δg, g2)
• g2 ~ 0.03 + LogNorm(-3.85, 0.82),
• g ~ Norm(7, 25),
• δg : slightly asymmetric:
5%: δg | δg > 0 ~ h( δg),
10%: δg | δg < 0 ~ h(-δg),
85%: δg ~ N(0, 0.01),
where h(j±g jj±g 6= 0) = 0:2U [0; 2:5] + 0:4U [0:07; 0:7] + 0:4N (0:7; 0:7).
BGX
29
Comparison of mixture and tail pp
• Fit 3 mixture models (Gamma, Normal, t alternative) and flat model.
• Classification mixtures: P{ H1 |data}, flat: tail posterior probability.
Comparable performance,
with a little edge for the Gamma and Normal mixture
BGX
30
Thanks
BBSRC Exploiting Genomics grant
Wellcome Trust BAIR consortium
Colleagues in the Biostatistics group:
Marta Blangiardo, Anne Mette Hein, Maria de Iorio
Colleagues in the Biology group at Imperial
Tim Aitman, Ulrika Andersson, Dave Carling
Papers and technical reports: www.bgx.org.uk/
For the tail probability paper: www.bgx.org.uk/Natalia/Bochkina.ps
BGX
31