Counting chickens and other tales’ Using random effect

Download Report

Transcript Counting chickens and other tales’ Using random effect

‘Counting chickens and other tales’
Using random effect models and
MCMC estimation in applied
statistics research.
Dr William J. Browne
School of Mathematical Sciences
University of Nottingham
Outline
• Background to my research, random effect and
multilevel models and MCMC estimation.
• Random effect models for complex data
structures including artificial insemination and
Danish chicken examples.
• Multivariate random effect models and great tit
nesting behaviour.
• Sample size calculations for complex random
effect models.
• Multilevel modelling of mastitis incidence.
• Other research and future work.
Background
• 1995-1998 – PhD in Statistics, University of Bath.
“Applying MCMC methods to multilevel models.”
• 1998-2003 – Postdoctoral research positions at the
Centre for Multilevel Modelling at the Institute of
Education, London.
• 2003-2006 – Lecturer in Statistics at University of
Nottingham.
• 2006- Associate professor of Statistics at University of
Nottingham.
Research interests:
Multilevel modelling, complex random effect modelling,
applied statistics, Bayesian statistics and MCMC
estimation.
Random effect models
• Models that account for the underlying structure in the dataset.
• Originally developed for nested structures (multilevel models), for
example in education, pupils nested within schools.
• An extension of linear modelling with the inclusion of random effects.
A typical 2-level model is
yij   0  xij 1  u j  eij
u j ~ N (0,  u2 ), eij ~ N (0,  e2 )
Here i might index pupils and j index schools.
Alternatively in another example i might index cows and j index herds.
The important thing is that the model and statistical methods used are
the same!
Estimation Methods for Multilevel
Models
Due to additional random effects no simple matrix formulae
exist for finding estimates in multilevel models.
Two alternative approaches exist:
1. Iterative algorithms e.g. IGLS, RIGLS, that alternate
between estimating fixed and random effects until
convergence. Can produce ML and REML estimates.
2. Simulation-based Bayesian methods e.g. MCMC that
attempt to draw samples from the posterior distribution
of the model.
One possible computer program to use for multilevel
models which incorporates both approaches is MLwiN.
MLwiN
• Software package designed specifically for fitting
multilevel models.
• Developed by a team led by Harvey Goldstein and Jon
Rasbash at the Institute of Education in London over
past 15 years or so. Earlier incarnations ML2, ML3,
MLN.
• Originally contained ‘classical’ estimation methods
(IGLS) for fitting models.
• MLwiN launched in 1998 also included MCMC
estimation.
• My role in the team was as developer of the MCMC
functionality in MLwiN in my time at Bath and during 4.5
years at the IOE.
Note: MLwiN core team relocated to Bristol in 2005.
MCMC Algorithm
• Consider the 2-level normal response model
yij   0  xij 1  u j  eij
u j ~ N (0,  u2 ), eij ~ N (0,  e2 )
• MCMC algorithms usually work in a Bayesian
framework and so we need to add prior
distributions for the unknown parameters.
• Here there are 4 sets of unknown parameters:
 , u, u2 , e2
• We will add prior distributions
p( ), p( u2 ), p( e2 )
MCMC Algorithm (2)
One possible MCMC algorithm for this model then
involves simulating in turn from the 4 sets of
conditional distributions. Such an algorithm is known
as Gibbs Sampling. MLwiN uses Gibbs sampling for
all normal response models.
Firstly we set starting values for each group of unknown
2
2
parameters,  (0) , u(0) ,  u (0) ,  e (0)
Then sample from the following conditional distributions,
firstly
p( | y, u , 2 , 2 )
( 0)
To get  (1) .
u ( 0)
e ( 0)
MCMC Algorithm (3)
We next sample from
p(u | y, (1) , u2(0) , e2(0) )
to get u(1), then
p( u2 | y, (1) , u(1) , e2(0) )
to get  u (1) , then finally
p( e2 | y, (1) , u(1) , u2(1) )
2
To get  e2(1) . We have then updated all of the unknowns
in the model. The process is then simply repeated
many times, each time using the previously
generated parameter values to generate the next set
Burn-in and estimates
Burn-in: It is general practice to throw away the
first n values to allow the Markov chain to
approach its equilibrium distribution namely the
joint posterior distribution of interest. These
iterations are known as the burn-in.
Finding Estimates: We continue generating
values at the end of the burn-in for another m
iterations. These m values are then averaged to
give point estimates of the parameter of interest.
Posterior standard deviations and other
summary measures can also be obtained from
the chains.
So why use MCMC?
• Often gives better (in terms of bias) estimates for nonnormal responses (see Browne and Draper, 2006).
• Gives full posterior distribution so interval estimates for
derived quantities are easy to produce.
• Can easily be extended to more complex problems as
we will see next.
• Potential downside 1: Prior distributions required for all
unknown parameters.
• Potential downside 2: MCMC estimation is much slower
than the IGLS algorithm.
• For more information see my book: MCMC Estimation in
MLwiN – Browne (2003).
Extension 1: Cross-classified models
For example, schools by neighbourhoods. Schools will draw pupils from
many different neighbourhoods and the pupils of a neighbourhood will go
to several schools. No pure hierarchy can be found and pupils are said to
be contained within a cross-classification of schools by neighbourhoods:
nbhd 1
nbhd 2
School 1
xx
x
School 2
x
x
Nbhd 3
School 3
xx
x
School 4
x
xxx
School
Pupil
Nbhd
S1
S2
S3
S4
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
N1
N2
N3
Notation
With hierarchical models we use a subscript notation that has one
subscript per level and nesting is implied reading from the left. For
example, subscript pattern ijk denotes the i’th level 1 unit within the
j’th level 2 unit within the k’th level 3 unit.
If models become cross-classified we use the term classification
instead of level. With notation that has one subscript per
classification, that also captures the relationship between
classifications, notation can become very cumbersome. We propose
an alternative notation introduced in Browne et al. (2001) that only
has a single subscript no matter how many classifications are in the
model.
Single subscript notation
School
Pupil
S1
S3
S4
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
Nbhd
i
1
2
3
4
5
6
7
8
9
10
11
12
S2
N1
nbhd(i)
1
2
1
2
1
2
2
3
3
2
3
3
sch(i)
1
1
1
2
2
2
3
3
4
4
4
4
N2
N3
We write the model as
( 2)
(3)
yi  0  unbhd
(i )  usch (i )  ei
(1)
where classification 2 is neighbourhood and classification 3
is school. Classification 1 always corresponds to the
classification at which the response measurements are
made, in this case pupils. For pupils 1 and 11 equation (1)
becomes:
y1   0  u1( 2)  u1(3)  e1
y11   0  u3( 2)  u 4(3)  e1
Classification diagrams
In the single subscript notation we lose information about the relationship
(crossed or nested) between classifications. A useful way of conveying this
information is with the classification diagram. Which has one node per
classification and nodes linked by arrows have a nested relationship and
unlinked nodes have a crossed relationship.
Neighbourhood
School
Neighbourhood
School
Pupil
Nested structure where
schools are contained within
neighbourhoods
Pupil
Cross-classified structure where
pupils from a school come from
many neighbourhoods and pupils
from a neighbourhood attend several
schools.
Example : Artificial insemination by donor
1901 women
279 donors
1328 donations
12100 ovulatory cycles
response is whether conception occurs in a given cycle
In terms of a unit diagram:
Women
Cycles
w1
c1 c2 c3 c4…
Or a classification diagram:
w2
w3
c1 c2 c3 c4…
Donor
c1 c2 c3 c4…
Donation
Donations
Donors
d1
m1
d2
d1 d2 d3
d1 d2
m2
m3
Woman
Cycle
Model for artificial insemination data
We can write the model as
yi ~ Binomial (1, i )
Results:
Parameter
Description
Estimate(se)
( 2)
(3)
( 4)
logit(  i )  ( X ) i  u woman
(i )  u donation(i )  u donor(i )
0
intercept
( 2)
2
u woman
(i ) ~ N (0, u ( 2) )
1
azoospermia
0.22(0.11)
(3)
2
u donation
(i ) ~ N (0, u (3) )
2
semen quality
0.19(0.03)
( 4)
2
u donor
(i ) ~ N (0, u ( 4) )
3
womens age>35
4
sperm count
0.20(0.07)
5
sperm motility
0.02(0.06)
6
insemination to early
-0.72(0.19)
7
insemination to late
-0.27(0.10)
Note cross-classified models can
be fitted in IGLS but are far easier
to fit using MCMC estimation.
-4.04(2.30)
-0.30(0.14)
 u2( 2)
women variance
1.02(0.21)
 u2(3)
donation variance
0.644(0.21)
 u2( 4)
donor variance
0.338(0.07)
Extension 2: Multiple membership models
When level 1 units are members of more than one higher
level unit we describe a model for such data as a multiple
membership model.
For example,
• Pupils change schools/classes and each school/class has
an effect on pupil outcomes.
• Patients are seen by more than one nurse during the
course of their treatment.
Notation
yi  ( XB)i 
w
( 2)
i, j
jnurse ( i )
u
( 2)
j
 ei
u (j2) ~ N (0,  u2( 2) )
(2)
ei ~ N (0,  e2 )
n1(j=1)
n2(j=2)
n3(j=3)
p1(i=1)
0.5
0
0.5
p2(i=2)
1
0
0
p3(i=3)
0
0.5
0.5
p4(i=4)
0.5
0.5
0
y1  ( XB)1  0.5u1( 2)  0.5u3( 2)  e1
y2  ( XB) 2  1u1( 2)  e2
y3  ( XB)3  0.5u2( 2)  0.5u3( 2)  e3
y4  ( XB) 4  0.5u1( 2)  0.5u2( 2)  e4
Note that nurse(i) now indexes the set of
nurses that treat patient i and w(2)i,j is a
weighting factor relating patient i to nurse j.
For example, with four patients and three
nurses, we may have the following weights:
Here patient 1 was seen by nurse 1
and 3 but not nurse 2 and so on. If
we substitute the values of w(2)i,j , i
and j. from the table into (2) we get
the series of equations :
Classification diagrams for multiple
membership relationships
Double arrows indicate a multiple membership relationship between
classifications.
nurse
We can mix multiple membership, crossed and
hierarchical structures in a single model.
hospital
patient
nurse
GP practice
patient
Here patients are multiple members of nurses,
nurses are nested within hospitals and GP
practice is crossed with both nurse and hospital.
Example involving nesting, crossing and
multiple membership – Danish chickens
Production hierarchy
10,127 child flocks
725 houses
304 farms
Breeding hierarchy
10,127 child flocks
200 parent flocks
As a unit diagram:
farm
As a classification diagram:
f1
f2…
Farm
Houses
Child flocks
Parent flock
h1
c1 c2 c3…
p1
h2
h1
c1 c2 c3….
p2
h2
c1 c2 c3…. c1 c2 c3….
p3
p4
p5….
House
Parent flock
Child flock
Model and results
yi ~ Binomial (1,  i )
logit(  i )  ( XB)i 
( 3)
( 4)
 wi(,2j)u (j2)  uhouse
( i )  u farm( i )  ei
j p . flock( i )
( 3)
2
u (j2 ) ~ N (0, u2( 2 ) ) uhouse
( i ) ~ N (0,  u ( 3) )
4)
2
u (farm
( i ) ~ N (0,  u ( 4 ) )
Response is cases of
salmonella
Note multiple membership
models can be fitted in IGLS
and this model/dataset
represents roughly the most
complex model that the
method can handle.
Such models are far easier to
fit using MCMC estimation.
Results:
Parameter
Description
Estimate(se)
0
intercept
-2.322(0.213)
1
1996
-1.239(0.162)
2
1997
-1.165(0.187)
3
hatchery 2
-1.733(0.255)
4
hatchery 3
-0.211(0.252)
5
hatchery 4
-1.062(0.388)
 u2( 2 )
parent flock variance
0.895(0.179)
 u2( 3)
house variance
0.208(0.108)
 u2( 4)
farm variance
0.927(0.197)
Random effect modelling of
great tit nesting behaviour
• An extension of cross-classified models to
multivariate responses.
• Collaborative research with Richard Pettifor
(Institute of Zoology, London), and Robin
McCleery and Ben Sheldon (University of
Oxford).
Wytham woods great tit dataset
• A longitudinal study of great tits
nesting in Wytham Woods,
Oxfordshire.
• 6 responses : 3 continuous & 3 binary.
• Clutch size, lay date and mean
nestling mass.
• Nest success, male and female
survival.
• Data: 4165 nesting attempts over a
period of 34 years.
• There are 4 higher-level classifications
of the data: female parent, male
parent, nestbox and year.
Data background
The data structure can be summarised as follows:
Source
Number
of IDs
Median
#obs
Mean
#obs
Year
34
104
122.5
Nestbox
968
4
4.30
Male parent
2986
1
1.39
Female parent
2944
1
1.41
Note there is very little information on each
individual male and female bird but we can get
some estimates of variability via a random effects
model.
Diagrammatic representation of the dataset.
Year
Nest-box
1
3
♀16♂16
70
69
2
♀14♂14
5
6
♀17♂17
♀15♂18
♀18♂11
♀13♂15
♀11♂10
68
4
♀15♂11
♀13♂13
♀11♂10
67
66
♀8♂8
65
♀5♂1
64
♀1♂1
♀12♂11
♀12♂11
♀9♂6
♀6♂5
♀7♂6
♀2♂2
♀10♂12
♀10♂9
♀3♂7
♀3♂3
♀4♂4
Univariate cross-classified
random effect modelling
• For each of the 6 responses we will firstly fit a univariate
model, normal responses for the continuous variables
and probit regression for the binary variables. For
example using notation of Browne et al. (2001) and
letting response yi be clutch size:
(2)
(3)
(4)
(5)
y   u
u
u
u
e
i
m ale(i )
fem ale(i ) nestbox(i )
year(i ) i
(2)
(3)
u
~ N (0,  2 ), u
~ N (0,  2 ),
m ale(i )
u (2)
fem ale(i )
u (3)
(4)
(5)
u
~ N (0,  2 ), u
~ N (0,  2 ),
nestbox(i )
u (4)
year(i )
u (5)
e ~ N (0,  2 )
i
e
Estimation
• We use MCMC estimation in MLwiN and choose
‘diffuse’ priors for all parameters.
• We run 3 MCMC chains from different starting
points for 250k iterations each (500k for binary
responses) and use the Gelman-Rubin
diagnostic to decide burn-in length.
• We compared results with the equivalent
classical model using the Genstat software
package and got broadly similar results.
• We fit all four higher classifications and do not
consider model comparison.
Clutch Size
Parameter
Estimate (S.E.)
β
8.808 (0.109)
0.365 (0.101)
0.107 (0.026)
0.046 (0.043)
0.974 (0.062)
1.064 (0.055)
 u2(5)
(Year)
 u2( 4) (Nest box)
 u2(3) (Male)
 u2( 2) (Female)
 e2 (Observation)
Percentage
variance
14.3%
4.2%
1.8%
38.1%
41.6%
Here we see that the average clutch size is just
below 9 eggs with large variability between female
birds and some variability between years. Male birds
and nest boxes have less impact.
Lay Date (days after April 1st)
Parameter
β
 u2(5)
(Year)
 u2( 4) (Nest box)
 u2(3) (Male)
 u2( 2) (Female)
 e2 (Observation)
Estimate (S.E.)
29.38 (1.07)
37.74 (10.08)
3.38 (0.56)
0.22 (0.39)
8.55 (1.03)
25.10 (1.04)
Percentage
variance
50.3%
4.5%
0.3%
11.4%
33.5%
Here we see that the mean lay date is around the
end of April/beginning of May. The biggest driver of
lay date is the year which is probably indicating
weather differences. There is some variability due to
female birds but little impact of nest box and male
bird.
Nestling Mass
Parameter
Estimate (S.E.)
β
18.829 (0.060)
0.105 (0.032)
0.026 (0.013)
0.153 (0.030)
0.163 (0.031)
0.720 (0.035)
 u2(5)
(Year)
 u2( 4) (Nest box)
 u2(3) (Male)
 u2( 2) (Female)
 e2 (Observation)
Percentage
variance
9.0%
2.2%
13.1%
14.0%
61.7%
Here the response is the average mass of the chicks
in a brood at 10 days old. Note here lots of the
variability is unexplained and both parents are
equally important.
Human example
Sarah Victoria Browne
Helena Jayne Browne
Born 20th July 2004
Born 22nd May 2006
Birth Weight 6lb 6oz
Birth Weight 8lb 0oz
Father’s birth weight 9lb 13oz, Mother’s birth weight 6lb 8oz
Nest Success
Parameter
Estimate (S.E.)
β
0.010 (0.080)
0.191 (0.058)
0.025 (0.020)
0.065 (0.054)
0.060 (0.052)
 u2(5)
 u2( 4)
 u2(3)
 u2( 2)
(Year)
(Nest box)
(Male)
(Female)
Percentage
overdispersion
56.0%
7.3%
19.1%
17.6%
Here we define nest success as one of the ringed
nestlings captured in later years. The value 0.01 for β
corresponds to around a 50% success rate. Most of
the variability is explained by the Binomial
assumption with the bulk of the over-dispersion
mainly due to yearly differences.
Male Survival
Parameter
Estimate (S.E.)
β
-0.428 (0.041)
0.032 (0.013)
0.006 (0.006)
0.025 (0.023)
0.014 (0.017)
 u2(5)
 u2( 4)
 u2(3)
 u2( 2)
(Year)
(Nest box)
(Male)
(Female)
Percentage
overdispersion
41.6%
7.8%
32.5%
18.2%
Here male survival is defined as being observed
breeding in later years. The average probability is
0.334 and there is very little over-dispersion with
differences between years being the main factor.
Note the actual response is being observed breeding
in later years and so the real probability is higher!
Female survival
Parameter
Estimate (S.E.)
β
-0.302 (0.048)
0.053 (0.018)
0.065 (0.024)
0.014 (0.017)
0.013 (0.014)
 u2(5)
 u2( 4)
 u2(3)
 u2( 2)
(Year)
(Nest box)
(Male)
(Female)
Percentage
overdispersion
36.6%
44.8%
9.7%
9.0%
Here female survival is defined as being observed
breeding in later years. The average probability is
0.381 and again there isn’t much over-dispersion with
differences between nestboxes and years being the
main factors.
Multivariate modelling of the
great tit dataset
• We now wish to combine the six univariate models into
one big model that will also account for the correlations
between the responses.
• We choose a MV Normal model and use latent variables
(Chib and Greenburg, 1998) for the 3 binary responses
that take positive values if the response is 1 and
negative values if the response is 0.
• We are then left with a 6-vector for each observation
consisting of the 3 continuous responses and 3 latent
variables. The latent variables are estimated as an
additional step in the MCMC algorithm and for
identifiability the elements of the level 1 variance matrix
that correspond to their variances are constrained to
equal 1.
Multivariate Model
(2)
(3)
(4)
(5)
y   u
u
u
u
e
i
m ale(i )
fem ale(i ) nestbox(i )
year(i ) i
u
(2)
(3)
~ MVN (0, 
), u
~ MVN (0, 
),
m ale(i )
u (2)
fem ale(i )
u (3)
u
(4)
(5)
~ MVN (0, 
), u
~ MVN (0, 
),
nestbox(i )
u (4)
year(i )
u (5)
Here the vector valued
response is decomposed into
a mean vector plus random
effects for each classification.
e ~ MVN (0,  )
i
e
Inverse Wishart priors are used
for each of the classification
variance matrices. The values
are based on considering overall
variability in each response and
assuming an equal split for the 5
classifications.
p(u(i) ) ~ invWishart [6,6 * Su(i) ],
6
0
0
0
0.5 0



(
i
)
Su  




0
0
0
0
0
0 

15 0
0
0
0 

0 0.2
0
0
0 
0 0 0.05
0
0 
0 0
0
0.05
0 

0 0
0
0
0.05
Use of the multivariate model
• The multivariate model was fitted using an
MCMC algorithm programmed into the MLwiN
package which consists of Gibbs sampling steps
for all parameters apart from the level 1 variance
matrix which requires Metropolis sampling (see
Browne 2006).
• The multivariate model will give variance
estimates in line with the 6 univariate models.
• In addition the covariances/correlations at each
level can be assessed to look at how
correlations are partitioned.
e
z
t
c
h
s
i
Partitioning of covariances
C
Survival
l
Raw phenotypic c orrel ati on
L
a
y
i
n
(1) Caus al environmental
effect
g
Raw phenotypic c orrel ati on
b
u
a
d
a
t
e
(2) Female conditi on
effec t
F
Clutch Siz e
Survival
(i) Female
Layi ng date
Clutch Siz e
Clutch Siz e
Layi ng date
u
Layi ng date
Layi ng date
(iii) Observ ation
c
partitioni ng of
c ovari ance at
different level s
Clutch Siz e
(ii) Female
Clutch Siz e
Layi ng date
e
Fec undity
(ii) Observ ation
Layi ng date
Survival
(i) Terr itory
Clutch Siz e
[partiti oning of
c ovari ance
at different levels]
Fec undity
n
d
i
t
y
Correlations from a 1-level model
•
If we ignore the structure of the data and consider it as 4165
independent observations we get the following correlations:
CS
LD
NM
NS
MS
LD
-0.30
X
X
X
X
NM
-0.09
-0.06
X
X
X
NS
0.20
-0.22
0.16
X
X
MS
0.02
-0.02
0.04
0.07
X
FS
-0.02
-0.02
0.06
0.11
0.21
Note correlations in bold are statistically significant i.e. 95% credible
interval doesn’t contain 0.
Correlations in full model
CS
LD
NM
NS
MS
LD
N, F, O
-0.30
X
X
X
X
NM
F, O
-0.09
F, O
-0.06
X
X
X
NS
Y, F
0.20
N, F, O
-0.22
O
0.16
X
X
MS
0.02
-0.02
0.04
Y
0.07
X
FS
F, O
-0.02
F, O
-0.02
0.06
Y, F
0.11
Y, O
0.21
Key: Blue +ve, Red –ve: Y – year, N – nestbox, F – female, O - observation
Pairs of antagonistic covariances at
different classifications
There are 3 pairs of antagonistic correlations i.e.
correlations with different signs at different
classifications:
LD & NM : Female 0.20 Observation -0.19
Interpretation: Females who generally lay late, lay heavier
chicks but the later a particular bird lays the lighter its
chicks.
CS & FS : Female 0.48 Observation -0.20
Interpretation: Birds that lay larger clutches are more likely
to survive but a particular bird has less chance of
surviving if it lays more eggs.
LD & FS : Female -0.67 Observation 0.11
Interpretation: Birds that lay early are more likely to survive
but for a particular bird the later they lay the better!
Sample size calculations in random effect
models
• A current ESRC grant (2006-2009) that funds a
postdoc (Mousa Golalizadeh).
• The grant will consider the problem of deciding
on how much data to collect for a research
question taking account of the likely structure in
the collected data (See later slides).
• The grant will also investigate how various
MCMC algorithm developments perform in
practice when applied to real datasets.
• Finally we will investigate when complex models
are identifiable in the presence of ‘sparse’ data.
Background
• Many quantitative research questions are of the form of
a hypothesis – A has a significant effect on B.
• To answer such a question data is collected that allows
the researcher to (hopefully) test whether statistically A
has a significant effect on B. (In fact we aim to reject the
hypothesis that A doesn’t significantly affect B).
• A test is performed and either the researcher is happy
and A indeed has a significant effect on B or is left
wondering why the data collected do not back up their
hypothesis. Is the hypothesis false or was the data not
sufficient?
• The sufficiency of the data is the motivation for sample
size calculations.
Example
• Suppose I have the research question ‘Are Welshmen
on average taller than 175 cms?’
• I now need to get hold of a random sample of n
Welshmen and measure each of their heights.
• I make some statistical assumption about the distribution
of the heights of Welshmen e.g. that they come from a
Normal distribution.
• I might like to check this assumption by plotting a
histogram of the data.
• I can then form a statistical hypothesis test and test
whether indeed Welshmen are taller than 175cms.
• I need to decide how big to make n, my sample of
Welshmen.
Hypothesis Testing
• Let us assume our null hypothesis is that
the average height of Welshmen (μ) is
175cm.
• So we test H0:μ=175 vs HA:μ>175 (or
alternatively H0:θ=0 vs HA:θ>0 where θ=μ175)
• In practice we calculate from our sample its
mean (x ) and standard deviation (s2) and
use these along with n to form a test
statistic which we can compare with the
distribution assumed under H0
Type I and Type II errors
• No hypothesis test is perfect and there is always the possibility of
errors
Truth
Reject H0
H0 True
Type I error
Decision Accept H Correct
0
H0 False
Correct
Type II error
• P(Type I error) = α = significance level or size
• P(Type II error) = β, 1-β is the power of the test.
• In general we fix α to some value e.g. 0.05, 0.01 then 1-β depends
on our sample size.
Example hypothesis test
• Let us assume that in reality our sample mean is 180cms and the
population standard deviation (sd) is 5cms (known).
• We can then form a test statistic as follows:
Z
X  175

n

5
~ N (0,1)
5
n
• Note here that for small n and unknown sd we should use a
student-t distribution rather than Normal.
• For a 1-sided Z test we wish Z= n > 1.645 and so we need our
sample to be of size 3 to reject H0, using a student-t distribution
increases this to 5. (Here α=0.05)
• However if the sample mean had been only 176cms then we would
need n > (1.645*5)2 = 68 Welshmen to reject H0
Power calculations
• Our last slide in some sense is backwards as we cannot get
from a given sample mean to choosing a sample size!
• What we do instead is use different terminology and play
God!
• We will choose an ‘effect size’, γ which will represent a
guess at the increase in the sample mean for Welshmen.
• There then exists an (approximate) formula that links four
quantities, size (α), power (1-β), effect size (γ) and sample
size (n)

SE( )
 z1  z1 
Here RHS is sum of cases H0 true and
H0 false.
• Note that the standard error (SE) of γ is a function of n and σ
the population sd which is assumed known.
• We can now evaluate one of these quantities conditional on
the others e.g. what sample size is required given α,1-β and
γ?
Welsh height example
Here we have looked at two examples with effect sizes 5 and 1
respectively. Assume σ takes the value 5 and so let us suppose we
take a sample of size 25 Welshmen.

Then
 z1  z1 
SE( )
Case 1: 5/(5/√25)=1.645+z1-β,z1-β=3.355
β=0.9996
Case 2: 1/(5/ √25)=1.645+z1-β,z1-β=-0.645
β=0.25946
So here a sample of 25 Welshmen from a population with mean 180cms
would almost always result in rejecting H0,
but if the population mean is 176cms then only 26% of such samples
would be rejected.
We can plot curves of how power increases with sample size as shown
in the next slide.
Power curve for Welshmen
example
Here we see the two power curves for the
two scenarios:
Extending the idea

 z1  z1 
• The simple formula
can
SE( )
be used in many situations and hypothesis tests.
• To generalise the idea we assume that γ is an effect
size associated with a statistic that we wish to
compare with a (null) hypothesized value of 0.
• The complication occurs in finding a formula for the
standard error for the statistic and relating this
formula to the sample size, n.
• We will next consider an alternative approach before
returning to look at how the above approach extends
to multilevel models.
The use of simulation
• In reality our (hoped for) research path will be as follows:
Construct research question -> Form null hypothesis that
we believe false -> Collect appropriate data -> Reject
hypothesis therefore proving our research question.
• Assuming what we believe in our research question is
correct and hence null hypothesis is false we can still be
let down by not collecting enough data.
• The idea behind using simulation is to simulate the data
gathering process (assuming we know the right answer)
many times and see how often we can reject the null
hypothesis. The percentage of rejected null hypotheses
(via simulation) will then estimate power.
Simulation in our example
• Consider our Welsh height example case 2 where we
believe Welshmen have a mean height of 176cms (and
sd = 5cms) and we are testing the hypothesis
H0:μ=175cms, and we consider a sample size 25.
• Then we generate N samples (e.g. 5000) of size 25,
• xi ~ N (176,52 ),i  1,...,N and for each sample form a lower
bound for the confidence interval of the form
• xi 1.645 S.E.(xi ) . This we compare with the value 175
and the proportion greater than 175 is an estimate of the
power of the test.
• We can repeat this exercise for different sample sizes
and form a power curve.
Power curve comparison
Note simulation curve is a good approximation of the theoretical curve
although there are some minor (Monte Carlo) errors even with 5000
simulations per sample size.
Advantages/Disadvantages
• Theoretical approach is quick when the
formula can be derived.
• Approximations for more complex
situations exist which are equally quick.
• Simulation approach generalizes to more
situations but is much slower and we may
need large numbers of simulations per
scenario to get accurate power estimates.
What happens with multilevel data?
We will here mainly consider 2-level models and
take as our application area education, so we
have students nested within schools.
When deciding on a sampling scheme we have
many choices:
• How many schools, N ?
• How many pupils per school, nj ?
• Should we collect the same size sample from
each school ?
Our decision will depend on which parameter we
wish to estimate in the model.
Education Example
• For motivation we considered a two level dataset with exam marks
measured for each student in a collection of schools. In fact this
dataset exists and has 4915 students in 96 schools.
• Our hypothesis of interest is that the exam mark for an average
student is > 20 (null hypothesis = 20) which with such a large
sample results in the null hypothesis being rejected for our particular
data.
• If we fit the following multilevel model to the data we get the
estimates given:
yij   0  u j  eij
u j ~ N (0,  u2 ), eij ~ N (0,  e2 )
 0  21.685,  u2  16.205,  e2  139.367
• If we treat these estimates as population values, we are interested in
what power for testing our hypothesis results from various
combinations of N and nj
Design effect formula
•
•
•
•
•
•
•
If we assume balance then with n pupils in each of N schools for our
simple model (and only this simple model) the following formula holds:
Design effect = 1 + (n-1)ρ where ρ is the intra-class correlation.
So if we know the simple random sample size required for a given
power we need to multiply this by the design effect.
For example our data has ρ=16.205/(16.205+139.367)=0.104
So for schools of size 10 pupils we would need 1+9*0.104=1.94 times
as many students (in total) to get the same power.
For this model (and this model only) we could therefore perform our
power calculations assuming simple random sampling from a population
with variance 155.572 and scale up the sample required based on the
design.
So 1.685  1.645 0.84  n  338.4
155.572
•
n
And for schools of size 10 we require 1.94*338.4=657 pupils which we
can round up to 66 schools.
Simulating multilevel designs
• The process here is similar to the earlier
example except that we need to simulate from a
multilevel model and fit the models using MLwiN
(Rasbash, Browne et al. 2000).
• To this end we will write macro code in the
MLwiN macro language to perform the task.
• The MLwiN macro language allows datasets to
be simulated, models to be set up and run using
various algorithms and results collected.
• It has the advantage of performing all the
operations in one package but programming in
the macro language is not for the faint hearted!
Simulation continued
• We will perform simulations for schools of 10 pupils where number
of schools (N) ranges from 5 to 70. For each N, 5000 datasets are
generated.
• For each dataset we need to generate 10*N level 1 residuals with
variance 139.367, N level 2 residuals with variance 16.205 and add
these residuals up correctly with the fixed effect estimate 21.685.
• MLwiN has commands to generate random Normally distributed
observations but also has a SIMU command which given a model is
set up and estimates given will simulate from it directly making life
easier.
• For each simulated dataset we fit the variance components model
using the RIGLS algorithm. For small numbers of level 2 units we
may have estimation difficulties but MLwiN has an ERROR 0
command which simply ignores such problems.
• Note it is also important to ensure the command BATCH 1 is
included else MLwiN may only run RIGLS for 1 iteration for each
model!!
Comparison of formula/simulations
• The following graph compares the design effect formula to the
simulation approach:
Multilevel mastitis modelling!
• Martin Green has been successful in obtaining 4 years of
funding from Wellcome to come and work with me.
• The project is entitled ‘Use of Bayesian statistical
methods to investigate farm management strategies,
cow traits and decision-making in the prevention of
clinical and sub-clinical mastitis in dairy cows.’
• Martin is a specialist farm animal veterinary surgeon and
has recently also been appointed to a chair in
Nottingham’s new vet school.
• He completed a PhD in 2004 at the University of
Warwick in veterinary epidemiology and used MCMC to
fit binary response multilevel models in both MLwiN and
WinBUGS to look at various factors that affect clinical
mastitis in dairy cows.
Wellcome Fellowship
• In the 4 years of the grant we are fitting random effect
models to a large dataset that Martin has been collecting
in an earlier Milk Development Council funded grant.
• In particular we are looking at how farm management
practices, environmental conditions and cow
characteristics influence the risk of mastitis during the
dry period.
• We aim to get both interesting applied results and also
some novel statistical methodology from the grant and
MCMC will again play a big part.
• From a statistical point of view we are looking at model
fit diagnostics and model comparison in binary response
random effect models.
Other applications
• Current PhD student projects:
Kelly Handley : Statistical Analysis of Mass
Spectrometry data.
Chris Brignell : Statistical Shape Analysis in
Brain Imaging.
• Various MMath /BSc. projects looking at
applications of multilevel modelling of
share prices, educational data, house
prices and disease mapping.
Future Work
• Co-Investigator on BBSRC grant application (joint
between Nottingham and Bristol vet schools) entitled ‘An
investigation of Molecular Characteristics, Infection
Patterns and Prevention of E. Coli and S. uberis Mastitis
in UK Dairy Herds.’ (Main investigators Andrew Bradley
& Martin Green).
• Have been investigating MCMC algorithms for
‘structured multivariate Normal models’ which are what
in reality the IGLS algorithm fits. This model family also
includes multilevel time series models. I have been
investigating these models with an MMath. student and I
will have a visitor from Turkey who has government
funding to visit me and investigate the models.
• I am a named collaborator on the ESRC research node
held by Kelvyn Jones and the multilevel team in Bristol.
References
• Browne, W.J. (2003). MCMC Estimation in MLwiN. London: Institute
of Education, University of London.
• Browne, W.J. (2006). MCMC Estimation of ‘constrained’ variance
matrices with applications in multilevel modelling. Computational
Statistics and Data Analysis. 50: 1655-1677.
• Browne, W.J. and Draper D. (2006). A Comparison of Bayesian and
likelihood methods for fitting multilevel models (with discussion).
Bayesian Analysis. 1: 473-550.
• Browne, W.J., Goldstein, H. and Rasbash, J. (2001). Multiple
membership multiple classification (MMMC) models. Statistical
Modelling 1: 103-124.
• Chib, S. and Greenburg, E. (1998). Analysis of multivariate probit
models. Biometrika 85, 347-361.
• Rasbash, J., Browne, W.J., Goldstein, H., Yang, M., Plewis, I.,
Healy, M., Woodhouse, G.,Draper, D., Langford, I., Lewis, T. (2000).
A User’s Guide to MLwiN, Version 2.1, London: Institute of
Education, University of London.