Multilevel modelling Volume 5 Slide Set 2 Revised June 2011 11: THREE-LEVEL MODELS Two views • “The intractable statistical complexity that is occasioned by unduly ambitious.

Download Report

Transcript Multilevel modelling Volume 5 Slide Set 2 Revised June 2011 11: THREE-LEVEL MODELS Two views • “The intractable statistical complexity that is occasioned by unduly ambitious.

Multilevel modelling
Volume 5
Slide Set 2
Revised June 2011
1
11: THREE-LEVEL MODELS
Two views
• “The intractable statistical complexity that is occasioned
by unduly ambitious three-level models” (Bickel, 2007,
246) AND
• “higher levels may have substantial effects, but without
the guidance of well-developed theory or rich substantive
literature, unproductive guesswork, data dredging and
intractable statistical complications come to the fore”
(Bickel, 2007, 219)
•But technically, a three-level model is a straightforward
development of 2-level model; substantively research
problems are not confined to 2 levels!
THREE-LEVEL MODEL
•|Unit and classification diagrams, dataframes
• Some example of applied research
• Algebraic specification of 3 level random-intercepts model
• Various forms of the VPC
• Specifying models in MLwiN
• Residuals
• Applying the model
- the repeated cross-sectional model; changing school
performance
• Further levels:
- as structures etc
- in MLwin
Three-level models
Unit and classification diagrams
• Student achievement affected by student characteristics, class
characteristics and school characteristics
•Need more than 1 class per school; imbalance allowed
•Need lots of pupils in lots of classes in lots of schools!
School
Sc1
Sc2
Sc3
School
Class
Class
C1
C2
C1
C2
Student
Student St1 St2 St3 St1 St2 St1 St2 St3 St1 St2 St3 St4
4
Data Frame for 3 level model
Classifications or
levels
Response
Explanatory variables
NB categorical and continuous variables can
be included at any level
Student
Class School
j
k
Current Exam
scoreijk
Student previous
Examination
scoreijk
Student
Class
School
genderijk
teaching
stylejk
typek
1
1
1
75
56
M
Formal
State
2
1
1
71
45
M
Formal
State
3
1
1
91
72
F
Formal
State
1
2
1
68
49
F
Informal
State
2
2
1
37
36
M
Informal
State
1
1
2
67
56
M
Formal
Private
2
1
2
82
76
F
Formal
Private
3
1
2
85
50
F
Formal
Private
1
1
3
54
39
M
Informal
State
i
NB must be sorted correctly for MLwiN, recognises units by change in higher-level indices
5
Some examples (with references)
• West, B T et al (2007) Linear mixed models,
Chapman and Hall, Boca Raton
• Dependent variable: student’s gain in Maths score,
kindergarten to first grade
• Explanatory variables
- 1: Student (1190): Maths score in kindergarten, Sex,
Minority, SES
- 2: Classroom (312) Teacher’s years of teaching
experience, Teacher’s maths experience, teacher’s
maths knowledge
- 3: School (107) % households in n’hood of school in
poverty
NB lacks power to infer to specific classes/schools?
6
Some examples continued
• Bickel, R (2007) Multilevel analysis for
applied research, Guildford Press, New York
• Dependent variable: Maths score for 8th
graders in Kentucky
• Explanatory variables
- 1: Student (50,000): Gender, Ethnicity,
- 2: Schools (347) School size, % of school
students receiving free/reduced cost lunch
- 3: Districts (107) District school size
7
Some examples continued
• Ramano, E et al (2005) Multilevel correlates of
childhood physical aggression and prosocial
behaviour Journal of Abnormal Child Psychology, 33,
565-578
- individual, family and neighbourhood
• Wiggins, R et al (2002) Place and personal
circumstances in a multilevel account of women’s
long-term illness Social Science & Medicine, 54, 827838
- Large scale study, 75k+ women in 9539 wards in 401
districts; used PCA to construct level-2 variables from
census data
8
Algebraic specification of random intercepts
model
yijk   0ijk  1 xijk
 0 jk   0  v0 k  u0 jk  e0ijk
i level 1 (e.g. pupil), j level 2 (e.g. class) , k level 3 (e.g. school)
v0 k is the random effect at the school level, an allowed-to-vary departure from the
grand mean;
u0 jk is the random effect at the class level, a departure from the school effect;
eijk is the random effect at the pupil level, a departure from the class effect
within a school
Variance between schools = Var (v0k) =  v20
Variance between classes within schools = Var (u0jk) =  u20
Variance between pupils within classes within schools = Var (eijk) =  e2
2
Variance between classes =  v 0 +  u20
Random effects at different levels assumed to be uncorrelated
9
Various forms of the VPC for random
intercepts model
Proportion variance due to differences between schools
= intra-school correlation =
 v20
 v20   u20   e2
Proportion variance due to differences between classes
= intra-class correlation =
 v20   u20
 v20   u20   e2
Proportion between-classes variance due to differences between
 v20
classes within schools =  v20   u20
10
Correlation structure of
3 level model
S
Intra-class correlation (within same school & same class) r1
Intra-school correlation (within same school, different class) r2
1
1
1
2
2
2
2
3
3
3
1
1
2
1
1
2
2
1
1
2
P
1
2
3
1
2
3
4
1
2
3
C
1
1
1
1
r1
r2
0
0
0
0
0
0
0
1
1
2
r1
1
r2
0
0
0
0
0
0
0
1
2
3
r2
r2
1
0
0
0
0
0
0
0
2
1
1
0
0
0
1
r1
r2
r2
0
0
0
2
1
2
0
0
0
r1
1
r2
r2
0
0
0
2
2
3
0
0
0
r2
r2
1
r1
0
0
0
2
2
4
0
0
0
r2
r2
r1
1
0
0
0
3
1
1
0
0
0
0
0
0
0
1
r1
r2
3
1
2
0
0
0
0
0
0
0
r1
1
r2
3
2
3
0
0
0
0
0
0
0
r2
r2
1
11
Example: pupils within classes within schools
(Snijder & Bosker data)
Response is score on maths test at age 14
Estimate
S.E.
Fixed
β0
7.96
0.23
Random
 v20 (school)
2.124
0.546
1.746
0.226
7.816
0.186
 u20 (class)
 e2 (pupil)
Total variance is 2.124 + 1.746 + 7.816 = 11.686
12
Variance Partition Coefficients:
pupils within classes within schools (Snijder & Bosker data)
Intra-school correlation
= 2.124/11.686 = 0.18 (similarity of pupils in same school).
Intra-class correlation
= (2.124+1.746)/11.686 = 0.33 (similarity of pupils in same class,
in same school).
The similarity of classes within the same school (correlation
between mean maths score for 2 randomly selected classes in a
randomly selected school) is 2.124/(2.124+1.746) = 0.55.
13
Specifying models in MLwiN
• Three-level variance components for attainment
14
Specifying models in MLwiN
•
•
Are there classes and/or schools where the gender gap is large, small or
inverse?
Student gender in fixed part and Variance functions at each level
Level 3 variance
20 x02ijk  2 0 1x0ijk x1ijk  21x12ijk
Level 2 variance
 u20 x02ijk  2 u 0u1x0ijk x1ijk   u21x12ijk
Level 1 variance
 e20 x02ijk  2 e0e1x0ijk x1ijk
15
Specifying models in MLwiN
• Is the Gender gap differential according to teaching style?
• Cross-level interactions between Gender and Teaching style in the
fixed part of the model
• IE main effects for gender & style, and first order interaction between
Student Gender and Class Teaching Style
Fixed part
Cons: mean score for Male in
Formally-taught class
Female: differential for female in
formal class
Informal: differential for male in
informal class
Female*Informal: differential
for female in informal class
16
Residuals
• Key notion is that highest level residual is a
random, allowed-to-vary departure from
general relationship
• Each lower level residual is allowed-to-vary
random departure from the higher-level
departure
17
Level 3 residuals: school departures
from grand mean line

y ijk  0k  1 xijk

y ijk  0  1 xijk
v0 k

y ijk  0k  1 xijk
18
Level 2 residuals: class departures from
the associated
school
line

u0 jk
y ijk   0 jk  1 xijk
y ijk  0 jk  1 xijk

y ijk  0 jk  1 xijk

y ijk  0 jk  1 xijk
19
Level-1 residuals: student departures
from the associated class line
eijk

y ijk  0 jk  1 xijk
20
Applying the model: the repeated crosssectional model; changing school performance
School
Cohort
Student
Sc1
1985
1986
Sc3....
Sc2
1987
St1…St9 St1… St25 St1 …St32
1985
1987
St1… St22 St1… St12
1986
St1… St29
1987
St1… St13
• Modelling Exam scores for groups of students who entered
school in 1985 and a further group who entered in 1986.
• In a multilevel sense we do not have 2 cohort units but 2S
cohort units where S is the number of schools.
• The model can be extended to handle an arbitrary number of
cohorts with imbalance
21
Applying the model: the repeated crosssectional model; changing school performance
• Modelling Exam scores aged 16 for Level 3 139 state schools
from the Inner London Education Authority, Level 2 304 cohorts
with a maximum of 3 cohorts in any one school, and Level 1
115,347 pupils with a maximum of 135 pupils in any one school
cohort
•pupil level variables: Sex, Ethnicity, Verbal Reasoning aged 11
• cohort-level variables: % of pupils in each school who were
receiving Free-school meals in that year, % of pupils in the highest
VRband in that year, the year that the cohort graduated
• school level variables: the ‘sex’ of the school (Mixed Boys and
Girls); the schools’ religious denomination (Non-denominational,
CofE, Catholic)
22
Further levels - as structures, etc
Some examples of 4-level nested structures:
• student within class within school within LEA
• people within households within postcode sectors within
regions
•Finally, Repeated measures within students within cohorts
Sc2...
School
within schools Sc1
Cohort
student
Msmnt occ
1990
St1
O1
O2
1991
1990
St2...
St1
St2..
St1
O1 O2
O1 O2 O1 O2 O1 O2
1991
St2..
O1 O2
St1
St2..
O1 O2 O1 O2
Cohorts are now repeated measures on schools and tell us
about stability of school effects over time
Measurement occasions are repeated measures on students and
can tell us about students’ learning trajectories.
23
Further levels - in MLwiN
• Click on extra subscripts!
• Default is a maximum of 5 but can be
increased
13. Three-level Model Changing School
performance in London Schools, 1985-7: the
ethnic dimension
24
12 Logit models: an example of
a non-linear multilevel model
-
Generalised models
Multilevel logit
Extra-binomial variation
Estimation: quasi-likelihood and MCMC
VPC
Population average and subject specific
estimates
- Application: teenage employment in
Glasgow
25
Introduction to generalized multilevel
models
So far: Response is linearly related to predictors and all
random terms are assumed Normal
• Now a range of other data types for the response
All can be handled routinely by MLwiN
• Achieved by 2 aspects
a non-linear link between response and predictors
a non-Gaussian level 1 distribution
26
Response Example
Binary
Yes/No
Categorical
Proportion
Multiple
categories
Proportion
unemployed
Travel by
train, car,
foot
Count
No of
crimes in
area
Count
LOS
Model
Logit or probit or
log-log model
with binomial L1
random term
Logit etc. with
binomial L1
random term
Logit model with
multi-nomial
random term;
can handle
ordered and
unordered
categories
Log model with
L1 Poisson
random term;
can include an
Offset
Log model with
L1 NBD random
term; can include
an Offset
Focus on specifying
binomial multilevel models
with response that is either
binary or a proportion
Implementation in MLwiN
Reading:
Subramanian S V,
Duncan C, Jones K (2001)
Multilevel perspectives on
modeling census data
Environment and Planning
A 33(3) 399 – 417
27
Modelling proportions
• Proportions: employment rate; conceived as
the underlying propensity of being employed
•
Linear probability model: that is use standard
regression model with linear relationship and Gaussian
random term
•
But 3 problems
(1) Nonsensical predictions: predicted proportions
are unbounded, outside range of 0 and 1
(2) Anticipated non-linearity as approach bounds
(3) Heteogeneity: inherent unequal variance;
dependent on mean (ie fixed part) and on
denominator
•
Logit model with Binomial random term resolves
all three problems (could use probit, clog-clog)
28
The multilevel logistic RI model
• The underlying probability
or proportion is nonlinearly related to the
predictor(s)

e
 0  1 x1 ... u0 j
1 e
 0  1 x1 ... u0 j
where e is the base of the natural logarithm
• linearized by the logit transformation
 
log e
1 

   0  1 x1  ...  u0 j ;
 u ~ N (0, 2 )
0j
u0
• The logit transformation produces a linear function of the
parameters; bounded between 0 and 1; thereby solving
problems 1 and 2
• NB parameters on the logit scale
29
Solving problem 3:assume Binomial variation
for Level-1 random part
• Variance of the response is presumed Binomial:
Var ( y |  ) 
 (1   )
n
Ie Observed proportion depends on underlying proportion
and the denominator
• Achieved in practice by replacing the constant variable
at level 1 by a binomial weight, z, and constraining the
level-1 variance to 1 for exact binomial variation
• The random (level-1) component can be written as
yi   i  ei zi , zi 
ˆi (1  ˆi )
ni
,  ei2  1
30
Moving between Proportions, Odds and Logits
A
B
C
Proportion/Probability
5 out of 10
6 out of 10
8 out of 10
Proportion
(p)
A
0.5
B
0.6
C
0.8
A
B
C
Logit
e0
e0.41
e1.39
Odds
1.0
1.5
4
MLwiN: Expo calculation
Odds
(p/1-p)
1.0
1.5
4
Odds
5 to 5
6 to 4
8 to 2
Log of odds
Loge (p/1-p)
0
0.41
1.39
A
B
C
MLwiN: Logit
calculation
Logit
Proportion
e0/(1+ e0)
0.5
e0.41/(1+ e0.41)
0.6
e1.39/(1+ e1.39)
0.8
MLwiN: Alogit calculation
31
Binomial and Extra-binomial variation
• Binomial variation is all due to underlying probability & values of the
denominator, level 1 random variation is not freely estimated
• More general approach: allow the variance to be estimated from the
2
data; (un-constrain the parameter  ei  1 )
• Over-dispersion: (more than 1) unexplained variation in the
response that is not adequately modeled by the fixed means
• Under-dispersion: (Less than 1); suggesting greater dependency
that that expected from a binomial assumption; suggest missing an
important level in the model structure
• Can only be done with proportions NOT binary data
• Application: French KM and Jones K (2006) Impact of definition on
the study of avoidable mortality Social Science & Medicine, 62 (6),
1443-1456
32
Estimation
Quasi-likelihood
(Marginal Quasi-Likelihood versus Predictive
Quasi-Likelihood; 1st and 2nd order)
model linearised and Goldstein’s IGLS applied
1st and 2nd order Taylor series expansion (linearise the model)
MQL versus PQL are higher-level effects included in the linearisation
MQL1 crudest approximation. Estimates may be biased downwards
(esp. if within cluster sample size is small and between cluster variance
is large eg households); but stable.
PQL2 best approximation, but may not converge.
Advice Start with MQL1 to get starting values for 2nd PQL
MCMC methods
good quality estimates even where cluster size is small;
get deviance of model (DIC) for sequential model testing,
Advice: start with MQL1 and then switch to MCMC, then wait!
33
A simulation study
• To assess the bias of different estimators for binary
outcome
• Rodriguez, G, 'Multilevel Generalized Linear
Models', in de Leeuw, J and Meijer, E (Eds.) 2008,
Handbook of Multilevel Analysis, chapter 9, 335-376
• Generated 100 datasets with real structure (2449
births to 1558 mothers living in 161 communities in
Guatemala; ie pathological case) with known
coefficients, all fixed and random coefficients set
equal to 1
• Data and papers available from
http://data.princeton.edu/multilevel/
34
Results from simulation study
Estimation
Fixed Parameters
Method
Individual
Random Parameters
Family
Community
Family
Community
True Value
1.000
1.000
1.000
1.000
1.000
MQL-1
0.738
0.744
0.771
0.100
0.732
MQL-2
0.853
0.859
0.909
0.273
0.763
PQL-1
0.808
0.806
0.831
0.432
0.781
PQL-2
0.933
0.940
0.993
0.732
0.924
5quadrature
0.983
0.988
1.037
0.962
0.981
20quadrature
0.983
0.990
1.039
0.973
0.979
Gibbs
0.971
0.978
1.022
0.922
0.953
Quadrature (eg gllamm in Stata) good quality estimates, but computational burden
increases rapidly with the dimensionality of the problem eg with 12 quad points, 3 level
RI model requires evaluation of equivalent of 144 likelihoods; BUT 12 point and 3 level
And random intercept & slope, equivalent to almost 21,000 likelihoods
35
References on estimation
• Rodriguez, G., and Goldman, N. (1995). An assessment
of estimation procedures for multilevel models with binary
responses. J. Royal Statistical Society, A, 158: 73-90.
• Breslow, N. E. and Clayton, D. G. (1993). Approximate
inference in generalized linear mixed models. Journal
American Statistical Association, 88, 9-25.
• Goldstein, H., and Rasbash, J. (1996). Improved
approximations for multilevel models with binary
responses. Journal of the Royal Statistical Society, A,
159: 505-514.
• Rodríguez, G and Goldman, N (2001) Improved
Estimation Procedures for Multilevel Models with Binary
Response: A Case Study Journal of the Royal Statistical
Society. Series A (Statistics in Society) Vol. 164, No.
2.339-355.
36
Variance Partitioning Coefficient
For 2-level Normal response random intercept model:
Level 2 variance
VPC 
Level 1 variance  Level 2 variance
BUT with logit, level 2 variance is on the logit scale and the level 1
variance is on the probability scale so they can not be directly
compared. Also level 1 variance depends on underlying proportion
Possible solutions include
i) set the level 1 variance = variance of a standard logistic
distribution:
 u2
Then VPC  2
 u  3.29
Ignores that the level –1 variance is not constant
37
Variance Partitioning Coefficient (cont.)
Possible solutions include
ii) Simulation
Step 1: Simulate M (5000 say) values for random effect u’s with
same variance as estimated level 2 variance, N (0, ˆ 2 )
u
Step 2: using these 5000 u’s, combine with fixed part estimates
and particular values of the predictor variables to get predicted
logits; alogit to get probabilities;
 (*m)  [1  exp((ˆ T x * u( m) ))]1
the variance at level 2 on the probability scale is the variance
of these values.
Step 3: calculate a level 1 variance for the*5000 simulations
on
*
*
the probability scale:
v1(m)   (m) (1   (m) )
take the mean of these values to get overall level 1 variance
Step 4: use the usual VPC formula, now that level 1 and level 2
variances are on the same scale
Browne WJ, Subramanian S V, Jones K, Goldstein H. (2005)Variance
partitioning in multilevel logistic models that exhibit overdispersion.Journal
of Royal Statistical Society A,168(3) 599-613
38
Binomial models: The Median Odds Ratio
Variance
MOR
VPC
0.06
0.18
0.34
0.53
1.25
1.50
1.75
2.00
0.02
0.05
0.09
0.14
39
Population average and cluster specific estimates
MLwin gives directly
CLUSTER SPECIFIC Estimates (often called the unit or subject specific
because of use in repeated measures studies)
- the fixed effects conditional on higher level random effects
- effect of change for particular ‘individual’ of unit change in a predictor
NOT the POPULATION-AVERAGE estimates (Eg GEE)
- ie the marginal expectation of the dependent variables across the population;
"averaged " across the random effects
-effect of change in the whole population if everyone’s predictor variable
changed by one unit
In non-linear models these are different and the PA will generally be smaller than
CS, especially as size of random effects grows
Can derive PA from CS but not vice-versa, new version give both by simulation)
- Median of the predictive distribution is US
- Mean of the predictive distribution is PA
Ritz, J and Donna Spiegelman (2004) Equivalence of conditional and
marginal regression models for clustered and longitudinal data,
Statistical Methods in Medical Research, 13(4), 309-323
40
Data for the example: teenage employment
• “Ungrouped” data that is individual data
• Model binary outcome of employed or not and two individual
predictors
Name
Person
District
Employed
Qualif
Sex
Craig
1
1
Yes
Low
Male
Subra
2
1
Yes
High
Male
Nina
3
1
Yes
Low
Fem
Min
4
1
No
Low
Fem
Myles
5
1
No
High
Male
Sarah
12
50
Yes
High
Fem
Kat
13
50
No
Low
Fem
Colin
14
50
Yes
Low
Male
Andy
15
50
No
High
Male
41
Same data as a multilevel structure: a set of
tables for each district
QUALIF
LOW
HIGH
GENDER
MALE
FEMALE
5 out of 6
3 out of 12
2 out of 7
7 out of 9
LOW
HIGH
5 out of 9
8 out of 8
7 out of 11
7 out of 9
G1B
12%
LOW
3 out of 3
-
G99Z
3%
HIGH
2 out of 3
out of 5
•
•
•
•
Level 1
Level 2:
Margins:
Internal cells:
Postcode UnErate
G1A
15%
cell in table
Postcode sector
define the two categorical predictors
the response of 5 out of 6 are
employed
42
Turning a table into a model: 1
Qualif
GENDER
Male
Female
Unemp rate
Low p1j = e1j /n1j p2j = e2j /n2j wj
High p3j = e3j /n3j p4j = e4j /n4j
RESPONSE
Lij = predicted log-odds of employment for type of person i
in place j

x
+ x
+ x
Female
Contrast
male
Contrast
0 0
1 1
2 2
FIXED (contrast coding) unqualified
Unqualified qualified
male
Base
+  3x3
qualified
female
Contrast
RANDOM
Level 2 +  0j
Level 1 (separate coding)
+  1jz1j
unqualified
male
+  2jz2j
Unqualified
Female
+  3jz3j
qualified
male
+  4jz4j
qualified
female
43
Turning a table into a model: 3
So, for 1 = 1….4, j = 1 [cells within postcodes]
RESPONSE
L1j=
L2j =
L3j =
L4j =
FIXED PART
 1x1 +
 0x0 +  1x1
 0x0
+  2x2
 0x0
+  3x3
eij zij , zij 
(ˆ ij (1  ˆ ij )
nij
RANDOM PART
+  0j + 1jz1j
+  0j
+ 2jz2j
+  0j
+ 3jz3j
+  0j
+ 4jz4j
,  e2  1, if binom ial.
44
Population average and cluster specific estimates
Null random intercepts LOGIT Model
CLUSTER SPECIFIC = Medians
Probability of being employed for
median teenager in median district
POPULATION-AVERAGE= Means
Probability of being employed for
mean teenager in mean district
Little difference here as level 2
variance is relatively small
45
Multilevel modelling of binary data
• Exactly the same except
• The response is either 1 or 0
• The denominator is a set of 1’s
• So that a ‘Yes’ is 1 out of 1 , while a ‘No’ is 0 out of 1
• Insufficient information to estimate over-dispersion
46
Laboratory 12: Binary logistic modelling
In this exercise we are going to model Math5 as a
Pass/Fail; it makes sense to start from scratch!
1 Input the JSP data and name the variables as c1
‘School c2 ‘pupil’ c3 ‘math3’ c4 ‘math5’ c5 ‘male’
2 Sort the data; pupils at level 1 and schools at level 2
3 Centre the predictor Math3 around its mean of 25
4 Recode Math5 so that 0 to 34 is given 0 (ie a fail)
34 to 100 is give a 1 (pass)
Store in a new variable c6 called ‘Pass’’. It is a good idea
to overlap the values (34) so that no decimal values are
missed. Do not declare as a categorical variable but keep
numerical values. Use Tabulate to see what proportion of
pupils have passed the test……..
47
Binary logistic modelling (continued)
5 Create a constant and a denom; both of these are a set
of 1’s ; same length as pupils; name them ‘Cons’ ‘Denom’
6 Specify a model with ‘Pass’ as the response, pupils at
level 1; schools are level 2, and a Binomial distribution
7 Declare X0 to be the constant and include this in the fixed
part and the level 2 random part (NOT level 1)
8 Check that the hierarchy is correct
9 Check that fpath directories for discrete macros & for
temporary files to the current directory can be written to
10Set default options for estimation: exact binomial, mql,
1st order
11 Start model , allow to converge and interpret
12Change non-linear settings to pql, 2nd order…….
48
Binary logistic modelling (continued)
13
14
15
16
17
More to convergence and note any changes
Add extra term for Math3-25 to the fixed part only,
More to convergence and interpret
Make predictions for the fixed part and the level 2
random part and store logits in c50
Make varying relations plot for schools by plotting
the predicted logits against Math3
Anti- logit the values in c50 to get a plot of the
probabilities in the varying relations plot
and over to you……………..
49
13: Modelling Count Data: Outline
• Characteristics of count data and the Poisson distribution
• Applying the Poisson: Flying bomb strikes in South London
• Deaths by horse-kick: as a single-level model Poisson model fitted in
MLwiN
• Overdispersion: types and consequences, the unconstrained
Poisson, the Negative Binomial
•Taking stock: 4 distributions for modeling counts
•Number of extramarital affairs: the incidence rate ratio (IRR);
handling categorical & continuous predictors; comparing
model with DIC
•Titanic survivor data; taking account of exposure, the offset
•Multilevel Poisson and NBD models; estimation and VPC
•Applications: HIV in India Chapter 12
Some characteristics of count data
Very common in the social sciences
Number of children; Number of marriages
Number of arrests;
Number of traffic accidents
Number of flows;
Number of deaths
Counts have particular characteristics
•Integers & cannot be negative
•Often positively skewed; a ‘floor’ of zero
In practice often rare events which peak at 1,2 or 3 and rare at
higher values
Modelled by
•Logit regression models the log odds of an underlying
propensity of an outcome;
•Poisson regression models the log of the underlying rate of
occurrence of a count.
51
Theoretical Poisson distribution I
The Poisson distribution results if the underlying number of
random events per unit time or space have a constant mean () rate
of occurrence, and each event is independent Simeon-Denis Poisson (1838)
Research on the Probability of Judgments in Criminal and Civil Matters




Applying the Poisson: Flying bomb strikes in South London
Key research question: falling at random or under a guidance system
If random independent events should be distributed spatially as a Poisson
distribution
Divide south London into 576 equally sized small areas (0.24km2)
Count the number of bombs in each area and compare to a Poisson
Mean rate =  = [229(0) + 211(1) + 93(2) + 35(3) + 7(4) +1(5)]/576
= 0.929 hits per unit square
0
Observed 229
Poisson
1
2
3
4
5+
211
93
35
7
1
30.39
7.06
1.54
227.5 211.3 98.15
Very close fit; concluded random
52
Theoretical Poisson distribution II
Poisson PMF for different means
0.4
Mean: 10
Mean: 4
Mean: 1
Prob
0.3
0.2
0.1
0.0
0
10
20
Probability
mass
function(PMF)
for 3 different
mean
occurrences
30
When mean =1; very positively skewed
As mean occurrence increases (more common event), distribution
approaches Gaussian;
 So use Poisson for ‘rareish’ events; mean below 10
Fundamental property of the Poisson: mean = variance
Simulated 10,000 observations according to Poisson
Mean
Variance
Skewness
0.993
0.98
1.00
4.001
4.07
0.50
10.03
10.38
0.33
Variance is not a freely estimated parameter as in Gaussian
conts
53
Death by Horse-kick I: the data
Bortkewicz L von.(1989) The Law of Small Numbers, Leipzig
No of soldiers killed annually by horse-kicks in Prussian cavalry; 10
corps over 20 years (occurrences per unit time)
The full data 200 corps years of observations
1
0
0
2
0
2
0
1
1
2
0
0
0
2
0
0
2
0
1
1
1
1
……
As a frequency distribution (grouped data)
Deaths
0
1
2
Frequency
109
65
22
3
3
4
1
5+
0
Mean Variance Number of obs
0.61 0.611
200
100
Fre q u e n cy

1
50
0
0
1
2
3
Deaths
4
5+
 Interpretation: mean rate of 0.61
deaths per cohort year (ie rare)
Mean equals variance, therefore
a Poisson distribution
54
Death by Horse-kick II: as a Poisson
Again :The Poisson results if underlying number of random
events per unit time or space have a constant mean () rate of
occurrence, and each event is independent
With a mean (and therefore a variance) of 0.61
Deaths
0
1
2
3
4
Frequency
109
65
22
3
1
Theory
109
66
20
4
1
Formula for Poisson PMF
5+
0
1
e   x
PMF ( x) 
, x  0,1, 2,
x!
 e is base of the natural logarithm (2.7182)
  is the mean (shape parameter): the average number of events in a
given time interval
 x! is the factorial of x
EG: mean rate  of 0.6 accidents per corps
year; what is probability of getting 3 accidents
in a corps in a year?
e 0.6 0.63
PMF (3) 
 0.0226
3 * 2 *1
55
Horse-kick III: as a single level Poisson model
General form of the single-level model
yi ~ Poisson( i )
yi ~ ( i )  e0i z 0i
i  e
 0 x0 i  1 x1i ....
log( i )   0 x0i  1 x1i  ....
^ 0.5
Observed count is distributed as an underlying
Poisson with a mean rate of occurrence of ;
That is as an underlying mean and level-1
random term of z0 (the Poisson ‘weight’)
Mean rate is related to predictors non-linearly
as an exponential relationship
z 0i   i
Model loge to get a linear model (log link)
var(e0i )  1
The Poisson weight is the square root of
estimated underlying count, re-estimated at
each iteration
Variance of level-1 residuals constrained to 1,
Modeling on the loge scale, cannot make prediction of a negative count on
the raw scale
Level-1 variance is constrained to be an exact Poisson, (variance =mean)
56
Horse-kick IV: Null single-level Poisson model in MLwiN
 The raw ungrouped counts are modeled with a log link and
a variance constrained to be equal to the mean
 -0.494 is the mean rate of occurrence on the log scale
 Exponentiate -0.494 to get the mean rate of 0.61
 0.61 is interpreted as RATE; the number of events per unit
time (or space), ie 0.61 horse-kick deaths per corps-year
57
Overdispersion I: Types and consequences
So far; equi-dispersion, variances equal to the mean
 Overdispersion: variance > mean; long tail, eg LOS (common)
 Un-dispersion: variance < mean; data more alike than pure Poisson
process; in multilevel possibility of missing level
Consequences of overdispersion
 Fixed part SE’s "point estimates are accurate but they are not as precise as
you think they are"
 In multilevel, mis-estimate higher-level random part
Apparent and true overdispersion: thought experiment
•: number of extra-marital affairs: men women with different means
Mean
Var
Comment
Men&
Women
0.55
0.76
Overdisp
Men
1.00
1.00
Poisson
Women
0.10
0.10
Poisson
Who?
 apparent: mis-specified fixed part, not
separated out distributions with different
means
true: genuine stochastic property of
more inherent variability
in practice model fixed part as well as
possible, and allow for overdisperion
58
Overdispersion II: the unconstrained Poisson
•Deaths by horse-kick
•estimate an overdispersed Poisson
• allow the level-1
variance to be
estimated
 Not significantly different from 1;
 No evidence that this is not a
Poisson distribution
59
Overdispersion III: the Negative binomial
•Instead of fitting an overdispersed Poisson, could fit a NBD
model
Handles long-tailed distributions;
An explicit model in which variance is greater than the mean
Can even have an over-dispersed NBD
yi ~ NBD( i )
yi ~ ( i )  e0i z0i  e0i z1i
log( i )   0 x0i  1 x1i  ....
^ 0 .5
^
z0i   i ; z1i   i
2
var(yi |  i )   i   i / v
•Same log-link but NBD has 2 parameters for the
level-1 variance; that is quadratic level-1 variance,
v is the overdispersion parameter
60
Overdispersion IV: the Negative binomial
 Horse=kick analysis:
 Null single-level NBD
model; essentially no
change, v is estimated to
be 0.00 (see with Stored
model; Compared stored
model)
 Overdispersed negative
binomial;
No evidence of
overdispersion; deaths are
independent
61
Linking the Binomial and the Poisson: I
 First Bernoulli and Binomial
 Bernoulli is a distribution for binary
discrete events
 y is observed outcome; ie 1 or 0
 E(y) = ; underlying
propensity/probability for occurrence
 Var(y) = /1- 
Mean: 
Variance: /1- 
0.01
0.01/0.99 = 0.0099
0.5
0.5*0.5 = 0.2500
0.99
0.99/0.01 = 0.0099
 Binomial is a distribution for discrete
events out of a number of trials
 y is observed outcome; n is the
number of trials,
 E(y) = ; underlying propensity/ of
occurrence
 Var(y) = [/(1- )]/n
• Least variation when denominator is large (more reliable), and as
underlying probability approaches 0 or 1
62
Linking the Binomial and the Poisson: II
• Poisson is limit of a binomial process in which prob → 0, n→∞
• Poisson describes the probability that a random event will occur in
a time or space interval when the probability is very small, the
number of trials is very large, and each event is independent
• EG : The probability that any automobile is involved in an accident
is very small, but there are very many cars on a road in a day
so that the distribution (if each crash is independent) follows a
Poisson count
• If non-independence of crashes (a ‘pile-up’), then over-dispersed
Poisson/NBD, latter used for ‘contagious’ processes
• In practice, Poisson and NBD used for rare occurrences, less than
10 cases per interval,& hundreds or even thousands for
denominator/ trials [Clayton & Hills (1993)Statistical Models in
Epidemiology OUP]
63
Taking stock: 4 distributions for counts
 If common rate of occurrence (mean >10) then use raw counts and
Gaussian distribution (assess Normality assumption of the residuals)
 If rare rate of occurrence, then use over-dispersed Poisson or NBD; the
level-1 unconstrained variance estimate will allow assessment of departure
from equi-dispersion; improved SE’s, but biased estimates if apparent
overdispersion due to model mis-specification
 Use the Binomial distribution if count is out of some total and the event is
not rare; that is numerator and denominator of the same order
•Mean variance
relations for 4 different
distributions that could
be used for counts
64
Modeling number of extra-marital affairs Singlelevel Poisson with Single categorical Predictor
Extract of raw data (601 individuals from Fair 1978)
Affairs
Cons
Children
Religious
YearsMarried
Age
0
1
NoKids
Slightly
10-15 years
37.0
0
1
NoKids
Somewhat
< 4 years
27.0
3
1
NoKids
Slightly
< 4 years
27.0
0
1
WithKids
Anti
6-10 years
32.0
3
1
WithKids
Slightly
< 4 years
27.0
0
1
WithKids
Very
6-10 years
57.0
Fair, R C(1978) A theory
of extramarital affairs,
Journal of Political
Economy, 86(1), 45-61
Single categorical predictor Children with NoKids, as the base
Understanding customized predictions…………….
65
Single- level Poisson with Single categorical
Predictor
Understanding customized predictions
 Log scale:
NoKids: -0.092
WithKids-0.092+0.606= 0.514
First use equation to get underyling log-number of events then
exponeniate to get estimated count (since married)
 As mean/median counts
NoKids:
expo(-0.092)= 0.91211
Withkids:
expo(-0.092 + 0.606) = 1.6720
Those with children have a higher average rate of affairs (but have
they been married longer?)
66
Modeling number of extra marital affairs:
the incidence rate ratio (IRR)
So far: mean counts
NoKids:
expo(-0.092)=
0.91211
Withkids:
expo(-0.092 + 0.606) = 1.6720
But also as IRR; comparing the ratio of those with and without kids
IRR = 1.6720/0.91211 = 1.8331
That is Withkids have a 83% higher rate
BUT can get this directly from
the model by exponentiating the
estimate for the contrasted
category
expo(0.606) = 1.8331
Rules:
a) exponeniating the estimates for the (constant plus the contrasted
category) gives the mean rate for the contrasted category
b) ) exponeniating the contrasted category gives IRR in comparison to base
category
67
Why is the exponentiated coefficient a IRR?
•
•
As always, the estimated coefficient is the change in response
corresponding to a one unit change in the predictor
Response is underlying logged count
•
•
When Xi is 0 (Nokids);
log count| Xi is 0 = β0
But when Xi is 1(Withkids); log count| Xi is 1 = β0 + β1 X1i
•
Subtracting the first equation from the second gives
(log count| Xi is 1)-(log count| Xi is 0) = β1
•
Exponentiating both sides gives (note the division sign)
(count| Xi is 1)/(count| Xi is 0) = exp(β1)
•
Thus, exp(β1) is a rate ratio corresponding to the ratio of the mean
number of affairs for a with-child person to the mean number of
affairs without-child person
•
•
•
Incidence: number of new cases
Rate: because it number of events per time or space
Ratio: because its is ratio of two rates
68
Modeling number of extra-marital affairs:
changing the base category
 Previously contrasted category:
Withkids: +0.606
 Now contrasted category:
Nokids : -0.606
 Changing base simply produces a change of sign on the loge scale
 Exponentiating the contrasted category:
Before:
expo(+0.606) = 1.8331
Now:
expo(-0.606) = 0.5455
Doubling the rate on loge scale is 0.693; Halving the rate on loge scale
is -0.693
 IRR of 0.111 = IRR of 9-fold increase, difficult to appreciate
 Advice: choose base category to be have the lowest mean rate;
get positive contrasted estimates;
always then comparing a larger value to a base of 1
69
Affairs: modeling a set of categorical
predictors
A model with ‘years
married’ included
with < 4 years as
base
Customised predictions: mean rate, IRR, & graph with 95% CI’s
YrsMarried
A: Log
estimate
Mean rate
(expo A)
B: Differential
Log Estimate
IRR
(Expo B)
< 4 years
-0.297
0.743
0.000
1.000
4-5 years
-0.297 + 0.773
1.616
0.773
2.166
6-10 years
-0.297 + 0.932
1.888
0.932
2.540
10-15 years
-0.297 + 1.041
2.103
1.041
2.832
70
Affairs: modeling a continuous predictor Age
Age as a 2nd order
polynomial centred
around 17 years (the
youngest person in
survey; also lowest rate
To get mean rate as it changes
with age
Expo (-0.990 + 0.149*(Age-17) 0.003*(Age-17)2)
To get IRR in comparison for a
person aged 33 compared to 17
Expo( 0.149*16) –(0.003 *162)
(drop the constant!)
Easiest interpreted as graphs!
71
Affairs: a SET
of predictors &
models
Term
Poisson
SE
Extra
SE
NBD
SE
Fixed
Cons
Years
Married
4-5 years
6-10 years
-1.33
0.20
-1.33
0.53
-1.56
0.51
0.89
0.13
0.89
0.34
1.20
0.36
 Notice substantial
overdispersion
1.00
0.14
1.00
0.38
1.01
0.41
10-15 years
1.31
0.15
1.31
0.40
1.42
0.42
Religious
Somewhat
Slightly
-0.00
0.15
-0.00
0.39
0.06
0.37
0.95
0.14
0.95
0.37
1.28
0.39
 Poisson & extraPoisson no change
in estimates; some
change to NBD
Non
0.87
0.14
0.87
0.37
1.16
0.38
Anti
1.36
0.16
1.36
0.41
1.52
0.47
Children
WithKids
Age
(age-17)^1
(age-17)^2
-0.06
0.11
-0.06
0.28
0.25
0.29
0.05
0.02
0.05
0.05
0.01
0.05
-0.00
0.00
-0.00
0.00
0.00
0.00
Random
Part
Var
NBD var (v)
1.00
0.00
6.78
0.39
1.00
0.00
5.15
0.36
 Notice larger SE’s
when allow for
overdispersion;
NBD most
conservative
In full model,
WithKids not
significant
72
NBD model
for Marital
Affairs
 IRR of 1 for Under 4 years
married, Very religious, No
children, aged 17
 Previous Age effect is
really length of marriage
Used comparable vertical
axes, range of 4
73
NBD model for number of Extra-Marital Affairs
 With 95% confidence intervals
 NB that they are asymmetric on the unlogged scale
74
Affairs: Evaluating a sequence of models using DIC
• Likelihood and hence the Deviance are not available for Poisson and
NBD models fitted by quasi-likelihood
 DIC criterion available though MCMC; typically needs larger number of
iterations than Normal & Binomial (suggested default is 50k not 5k)
Terms
Null
+ Relig
+ YrsMar
+Kids
+Age2
Cons
0.375
-0.139
-1.131
-1.135
-1.352
Slightly
0.818
1.031
1.040
0.963
Somewhat
-0.017
0.050
0.056
0.005
Anti
1.086
1.406
1.412
1.367
Not
0.641
0.917
0.925
0.881
4-5 years
0.918
0.943
0.882
6-10 years
1.077
1.100
0.994
10-15 years
1.265
1.290
1.303
-0.029
-0.051
WithKids
(age-17)^1
0.048
(age-17)^2
-0.001
DIC:
3421.5
∆DIC
Pd
1
3296.8
3065.8
3068.0
3052.5
-124.653
-231.074
+2.176
-15.440
5
8
9
11
 Currently MCMC
not available in
MLwiN for overdispersed Poisson
nor NBD models;
so have to use
Wald tests in
Intervals and tests
window
75
Titanic survivor data: Taking account of exposure
 So far, response is observed count, now we want to model a count given
exposure: EG only 1 high-class female child survived but only 1 exposed!
Here 2 possible measures of exposure
a) the number of potential cases; could use a binomial
b) the expected number if everyone had the same exposure (i indexes cell)
Death rate = Total Deaths/ Total exposed =817/1316 = 0.379179
Expi = Casesi * Survival rate
Latter often used to treat the
Survive Case Age
Gender Class Exp
SR
exposure as a nuisance
parameter & allows
14
168
Adult
Men
Mid
63.7
22.0
calculation of Standardised
75
462
Adult
Men
Low
175.2
42.8
Rate’s
13
48
Child
Men
Low
18.2
71.4
SRi = Obsi/Expi * 100
57
175
Adult
Men
High
66.4
85.9
14
31
Child
Women
Low
11.8
119.1
76
165
Adult
Women
Low
62.6
121.5
80
93
Adult
Women
Mid
35.3
226.9
140
144
Adult
Women
High
54.6
256.4
13
13
Child
Women
Mid
4.93
263.7
1
1
Child
Women
High
0.38
263.7
11
11
Child
Men
Mid
4.17
263.7
5
5
Child
Men
High
1.89
263.7
Previous examples
Horsekick: Exposure
removed by design: 200
cohort years
Affairs: included length of
marriage; theoretically
interesting
76
Modeling SR’s: the use of the OFFSET
Model: SRi = (Obsi/Expi) = F(Agei, Genderi, Classi)
Where i is a cell, groups with same characteristics
Aim:
are observed survivors greater or less than expected, and how these
‘differences’ are related to a set of predictor variables?
As a non-linear model: E(SRi)= E(Obsi/Expi) =
 x  x ...
e
0
oi
1
1i
As a linear model (division of raw data is subtraction of a log)
Loge (Obsi) - Loge(Expi)) =
0 x0i  1x1i  ......
As a model with an offset, moving Loge(Expi) to the right-hand side, and
constraining coefficient to be 1; ie Exp becomes predictor variable
Loge (Obsi) = 1.0* Loge(Expi) +
0 x0i  1x1i  ......
NB MLwiN automatically loge transforms the observed
response; you have to create the loge of the expected and
declare it as an offset
Sir John Nelder
77
Surviving on the Titanic as a log-linear model
Ages
Gender
Class
Log
Estimate
Modeled
SR
SR=
Obs/Exp
Child
Women
Low
0.17
1.19
1.19
Child
Women
Mid
0.97
2.64
2.64
Child
Women
High
0.97
2.64
2.64
Child
Men
Low
-0.34
0.71
0.71
Child
Men
Mid
0.97
2.64
2.64
Child
Men
High
0.97
2.64
2.64
Adult
Women
Low
0.19
1.21
1.21
Adult
Women
Mid
0.82
2.27
2.27
Adult
Women
High
0.94
2.56
2.56
Adult
Men
Low
-0.85
0.43
0.43
Adult
Men
Mid
-1.52
0.22
0.22
Adult
Men
High
-0.15
0.86
0.86
Include the offset
As a saturated model; ie
Age *Gender*Class,
(2*2*3), 12 terms for 12
cells
 Make predictions on
the loge scale (must
include constant);
exponentiate all terms to
get departures from the
expected rate, that is
modeled SR’s
78
Titanic survival: parsimonious model
 Remove insignificant terms starting with 3-way interactions for
High*women*children
Customized
predictions: Very
low rates of survival
for Low and Middle
class adult men;
large gender gap
for adults, but not
for children
79
Titanic survival: parsimonious model
 Modeled SR’s and descriptive SR’s
 Ordered by worse survival
 Estimated SR’s only shown if 95% CI’s do not include 1.0
Age
Gender
Class
Est SR
SR
Adult
Men
Mid
0.22
0.22
Adult
Men
Low
0.42
0.43
Adults
Women
Low
1.21
1.21
Child
Men
High
1.71
2.64
Adults
Women
Mid
2.26
2.27
Child
Women
High
2.56
2.64
Adult
Women
High
2.57
2.56
Child
Women
Mid
2.62
2.64
Child
Men
Mid
2.68
2.64
Child
Women
Low
*
1.19
Child
Men
Low
*
0.71
Adults
Men
High
*
0.86
80
Two-level multilevel Poisson
yij ~ Poisson( ij )
yij ~ ( ij )  e0ij z 0ij
log( ij )   0 x0ij  1 x1ij  .... u 0 j
u 0 j ~ N (0,  u20 )
^ 0. 5
z 0i   ij
One new term, the
level 2 differential,
on the loge scale, is
assumed to come
from Normal
distribution with a
2
variance of  u 0
var(e0i )  1
 Can also fit Poisson multilevel with offset and
NBD multilevel in MLwiN
81
Estimation of multilevel Poisson and NBD in MLwiN I
• Same options as for binary and binomial
• Quasi-likelihood and therefore MQL or PQL fitted using
IGLS/RIGLS; fast, but no deviance (have to use Wald tests);
may be troubled by small number of higher-level units;
simulations have shown that MQL tends to overestimate the
higher-level variance parameters
• MCMC estimates; good quality and can use DIC to compare
Poisson models; but currently MCMC is not possible for extraPoisson nor for NBD
• MCMC in MLwiN often produces highly correlated chains (in
part due to the fact that the parameters of the model are highly
correlated; variance =mean) Therefore requires substantial
number of simulations; typically much larger than for Normal or
for Binomial
82
Estimation of multilevel Poisson and NBD in MLwiN II
• Possibility to output to WinBUGS and use the univariate AR sampler
and Gamerman (1997) method which tends to have less correlated
chains, but WinBUGS is considerably slower generally [Gamerman, D.
(1997) Sampling from the posterior distribution in generalized linear mixed models.
Statistics and Computing 7, 57-68]
• Advice: start with IGLS PQL; switch to MCMC, be prepared to make
500,000 simulations (suggest use 1 in 10 thinning to store the chains);
use Effective sample size to assess required length of change, eg need
ESS of at least 500 for key parameters of interest; compare results and
contemplate using PQL and over-dispersed Poisson
• Freely available software MIXPREG for multilevel Poisson counts
including offsets; uses full information maximum likelihood estimated
using quadrature http://tigger.uic.edu/~hedeker/mixpcm.PDF
83
VPC for Poisson models
Can either use
 Simulation method to derive VPC (modify the binomial procedure)
 Use exact method: http://people.upei.ca/hstryhn/iccpoisson.ppt (Henrik
Stryhn)
 VPC for two level random intercepts model (available for other models)
[(  0 x0 i  1 x1i ...) u20 / 2 ]
VPC  Upper/(Upper Lower)
[ 2(  0 x0 i  1 x1i ...) 2 u20 ]
Upper  e
Lower  e
[ 2(  0 x0 i  1 x1i ...) u20 ]
e
VPC versus mean rate for 3 different values
of Level-2 variance on Loge scale
Lev2var: 1.0
Lev2var: 1.5
Lev2var: 2.0
1.0
0.9
Clearly VPC
depends2 on 
and  u 0
0.8
VPC
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0
1
2
3
4
5
Mean Rate
84
Median Mean Ratio (MMR) for
Poisson models
Equivalent to MOR but for Poisson model
Directly comparable with fixed-effects relative risk ratios:
Gives the increased risk (on average; hence the median) that would
result from moving from a lower to a higher risk area if two areas
were chosen at random from the distribution with the estimated level
2 variance
85
Modeling Counts in MLwiN: HIV in India
Aim: investigate the State geography of HIV in terms of risk
Data: nationally representative sample of 100k individuals in 20052006
Response: HIV sero-status from blood samples
Structure: 1720 cells within 28 States; cells are a group of people
who share common characteristics [Age-Groups(4), Education(4),
Sex(2), Urbanity(2) and State (28)]
Rarity: only 467 sero-positives were found
Model: Log count of number of seropositives in a cell related to
an offset of Log expected count if national rates applied
Predictors of Age, Sex and Education and Urbanity
Two-level multilevel Poisson, extra-Poisson & NBD
HIV in India: Standardized Morbidity Rates
Higher educated females have the lowest risk, across the age-groups
87
HIV in India: some results
Risks for different States relative to living in urban and rural areas nationally.
88
General References on Modeling Counts
Agresti, A. (2001) Categorical Data Analysis (2nd ed). New York: Wiley.
Cameron, A.C. and P.K. Trivedi (1998). Regression analysis of count data,
Cambridge University Press
Hilbe, J.M. (2007). Negative Binomial Regression, Cambridge University
Press.
McCullagh, P and Nelder, J (1989). Generalized Linear Models, Second
Edition. Chapman & Hall/CRC.
)
89
14 & 15 Modelling Growth
Development and Change
with multilevel models
Outline
1 : Three types of change
- age, period and cohort
- designs to separate the effects
2: Madeira Growth Study
- design: accelerated longitudinal design
- analysis: random effects multilevel model
- results
3: Happiness in USA 1973-2004
- design: synthetic cohorts as cross-classifications
- analysis: additive & multiplicative cross-classifications
- results
Three types of ‘change’
1 : Age: how outcome develops as individuals develop;
maturation; internal to individuals; eg more conservative
as grow older
2: Period: specific time that outcome is measured; external to
individuals, represent variation over time periods regardless of
age eg 7/11
3: Cohort: group who share the same events with the same
time interval; the baby boomers
The centrality of cohorts for understanding social change was
made by Ryder (1965): successive replacement
The identification problem
Three type of change are conceptually clearly different but
difficulty in practice of distinguishing between them
Age = Period – Cohort
EG a child aged 12 will have been born in 1980 if they
have been measured in 1992.
The ability to disentangling the effects of these different
time depends on the research design that is used.
Separating types of change: alternative designs
1 : Cross-sectional design
Everybody measured at same time; period is held constant;
Age and birth cohort differences are confounded; not
empirically possible disentangle age & cohort; eg older age
more conservative; cannot separate becoming so as got older,
or were always so.
2: Longitudinal
Individuals measured on >1 occasion, allows study of
development rate. But cohort kept constant by sampling, change
could be either age or period. Age & cohort are confounded, so
observed change may not be the individual process of
maturation, but specific to this cohort as they collectively age.
3: Multiple cohorts followed longitudinally
can disentangle, but expensive and time consuming so……..
Accelerated Longitudinal Design
Multiple sequential age cohorts followed longitudinally,
usually with overlap
Eg follow 7 years until 11; 11 until 15, 15 until 19
Accelerated? 4 years of study get 12 year span of
development by splicing together get one overall
development trajectory
Advantages: very efficient in terms of time, less problem of
attrition; less cost keeping team in field
Age, period & cohort are potentially un-confounded in this
design so that we can assess how developed is a 11 year old
in different cohorts
Accelerated Longitudinal Design
Major aim: obtain the individual development trajectory in the
shortest possible time
Cohorts changes as threats: cannot splice together into
one overall development trajectory if there is not
‘convergence’; ie means for 11 years old are different;
dominant approach; tests for ‘unwelcome’ cohort effects
Cohort changes as an opportunity:
What happens to individuals as society is changing?
Very few studies of this
Madeira and Rapid Change
•
Ultra-peripheral region: island volcano 600 km from the Moroccan coast.;
inaccessible internally
•
Population quarter of a million
•
Since 1980’s the economy has changed from one based on subsistence
production (fishing and small farms) into an off-shore tourist/business centre;
with agriculture: export of bananas & wine.
•
EU funding 1994 $400M on infrastructural projects (bridges, tunnels and airport )
•
1988 the poorest region in EU, GDP per head being 40% of the European
average, 10 Years later had risen to 58%
•
“economic transformation and very much improved accessibility has had a
profound effect on the population particularly the young”.
Madeira : economy undergoing change
Madeira Growth Study
Cohort
Year
Born
1980
1982
1984
1986
1988
School Grade when measured
8
9
10
10
11
12
12
13
14
14
15
16
16
17
18
Stratified sampling: Stage 1, 29 schools were selected taking
into account area, the school grade & sport facilities. Stage 2,
507 students according to proportion of the population in
each of Madeira’s 11 districts. Aimed for 50 boys & 50 girls in
each age cohort. Complete records were obtained for 498
children so that the dropout is only 9 children or less than 2%.
Madeira Growth Study
3levels: Repeated measures within students within cohorts
Age Cohorts
Students
1 ………..
St1
…
St100.
5
St1 …
St100
Msmnt occ
O1 … O3
O1…O3
O1…O3 O1…O3
• Response : 12 minutes runs test; aerobic performance,
cardiovascular endurance
• Accelerated longitudinal design: five age cohorts 8,10 12,14 and
16 year olds studied for 3 consecutive years;
• Cohorts a tell us about stability of effects over time
• Measurement occasions are repeated measures on students and
can tell us about students’ development trajectories.
• Ignore period: with physical growth short-term time effect can
safely be assumed to be zero
Analysis of MGS: Modelling development OVER TIME
Multilevel model has become paradigmic method for repeated
measurements; deals with
Missingness: can handle non-informative drop out
Dependency: potentially complex dependencies in the
outcome over time; not 500* 3 independent info, df and
Type 1 errors
Heterogeneity eg individual growth varies from person to
person; variability between people can increase/
decrease over time
Plan
• Single level growth model
• Two level growth model with child effects
• Three level growth model with cohort effects
Single level growth model relating
Response to Age (or Time) as a quadratic
function; 1 is child
Graphs
Shape
Intercept
Linear
slope
Quadratic
Slope
a)
Decelerating
positive slope
2
2
-0.25
b)
Accelerating
positive slope
2
2
+0.25
c)
Positive to
negative slope
2
8
-1.2
d)
Inverted U shape
2
11
-2.2
Quite flexible, but could use cubic (1 more bend) or
(much) more flexible, splines
The two-level growth model
• Single response: Endurance; 12 minutes runs test
• Single predictor: Age where 0 is baseline
-
• Two level hierarchy: panel study
occasions at level 1 nested within
pupils at level 2
Set of characteristic plots………………
when the overall development is ‘Decelerating positive
slope’
Some scenarios for
development in the
population
Each line is a child
NB same general
development
Two level random-slopes growth model
y ij   0 x0ij  1 Ageij   2 Age   3 Boy j 
2
ij
 4 Ageij * Boy j  (u 0 j x0ij  u1 j Ageij  e0ij x0ij )
• Fixed part: general trend for boys
differentiated from girls ; Age is time
varying; Boy is time invariant
i is occasion
j is child
•
Random part (Level 2)
Between children variance;
Handles heterogeneity between
Children
•
Random part (Level 1)
Between occasion variance
u0 j
u0 j 


)
  ~ N (0, 
2 
u1 j 
 u 0u1  u1 
2
u0
[e0ij ] ~ N (0, e20 )
u1 j
Dependency in the 2 level random-intercepts model
yij   0 x0ij  1 Ageij   2 Age SQij   3 Boy j 
2
ij
 4 Ageij * Boy j  (u 0 j x0ij  e0ij x0ij )
Dependency has compound symmetry form:
the residual degree of correlation between occasions is given by
2
u0
2
2
u0
e0
r

 
Between child variance/ (Between child variance + Between occasion variance)
CS: same between any pair of occasions
Some Alternative dependency patterns (4 occasions)

Tim e_ 1  u20   e20


Tim e_ 2   u 0
 u20   e20

2
2

Tim e_ 3   u 0
 u0
 u 0   e0


Tim e_ 4   u 0
 u0
 u0
 u20   e20 

Tim e_ 1  12


2
Tim e_ 2  12  2


Tim e_ 3  13  23  32

2
Tim e_ 4  14  24  34  4 
Tim e_ 1 

Tim e_ 2  1
Tim e_ 3  2

Tim e_ 4  3
2
0
 02
 u1  02
 2 1




2
 0 
Compound symmetry
Same for all pairs
Un-structured
Freely estimated for each pair
Toeplitz
Depends on lag
Why bother?
Affects SE if get it wrong,
especially for time invariant
variables
Three-level growth model
• Single response: Endurance; 12 minutes runs test, 100
Metres
• Predictor: Age where 0 is baseline
Three level hierarchy: occasions at level 1 nested within
pupils at level 2 nested within cohorts at 3
2
yijk   0 x0  1 Ageijk   2 Ageijk
  3 Boy jk 
 4 Ageijk * Boy jk   4 Cohortk  (v0 k  u 0 jk  e0ijk )
yijk
Cohortk
4
v0 k
distance covered on occasion i by child j of cohort k.
the cohort number (0 to 4)which represents the 1980, 1982 to
1988 cohort.
the linear change in distance achieved for a cohort that is two
year laters
the unexplained differential achievement for a cohort around the
linear trend
Three-level model: Options & Estimation
Cohorts- fixed or random:
- Conventional wisdom when small number of higher-level units, they are more
appropriately specified as fixed-effects dummies
- But Yang & Land (2008) APC model found random effects give better results
even when the number of units was small
- If use fixed effects, all the degrees of freedom are used up at the cohort level, no
opportunity to include variables
- Shrinkage: random effects that are unreliable due to small numbers of observations
are shrunk towards zero
Full Bayesian analysis: better base for inference
- inference about every parameter fully takes into account the uncertainty
associated with all other parameters;
- Allows distribution of the variance of the random effects to be skewed
- Compare models Deviance Information Criterion
goodness-of-fit measure penalized for model complexity.
Results of 2 level model
Splicing together the
7-18 trajectory
No evidence of
random slope
heterogeneity
between children
ie Parallel lines
supported
Compound
symmetry only
Results of 3 level model
Allowing Age by
Sex by Cohort
interactions in
the fixed part
Convergence
for girls
Cohort change
for boys
Predicted performance for a
14 year old boy and girl
born in different cohorts
Yang (2008) Happiness in USA: APC
model
•
•
•
•
•
•
Instant Citation classic
Innovation: used Synthetic cohorts from the General Social Survey; grouped
people in 5 year bands 1895-1900, 1900-5, ….. 1975-1980
Period 1973 to 2004 in years
Age: continuous age as at survey
Multilevel cross classification of people in Period and in Cohorts
Overcomes identification problem
Yang (2008) Random-effects APC model
Cohorts
Periods
People
Multilevel proportional odds
model (Ordinal: cumulative
odds) with random effects
Developing APC model to take account
of Geography : motivation
Material
Cir cumstances
Income
inequality
Health
Inequality
Anxiety
Dominance
Hierarchy
Gaps in
Social Status
Violence
Unhappiness
Stress
Life
Social Cohesion
Social Capital
Societal
Structures
•
•
Expectancy
Psycho- social
Pathways
Outcomes
Relative income model; inequality as main driver
US States key testing ground for Wilkinson hypothesis
Developing APC model to take account
of Geography: Form of model
States
Periods
Cohorts
People
Individual effects Age, Gender,
Education, Income, etc
random effects: 18 Cohorts + 21 Periods + 49 States +
21*49 Periods*States
Results: Quadratic Age (across Period ,
Cohorts and States) and random effects
Logits
Same scale
Results: Random effects: continued
Logits
Same scale
Results: Random effects: continued
Logits: Same scale
Net of State and Period effects
Results: Individual fixed effects
Results: Individual fixed effects
Results: Individual fixed effects
Results: State
effects
Cross sectional
Gini NS
16: Spatial models
• Spatial models as multiple membership
models
• Lip cancer in Scotland: Poisson M model
and CAR
• Respiratory cancer in Ohio: repeated
measures and spatial effects
125
Spatial Models as a combination of strict hierarchy
and multiple membership (including GWR)
A
D
G
B
Person in A
C
Affected by A(SH) and
B,C,D (MM)
E F
Person in H
H
Affected by H(SH) and
E,I,K,G (MM)
I J
K L M
Multiple membership defined by common boundary; weights as function of
inverse distance between centroids of areas
126
Scottish Lip Cancer Spatial multiple-membership model
•
Response:
•
Predictor:
•
Structure
observed counts of male lip cancer for the 56 regions of
Scotland (1975-1980)
% of workforce working in outdoor occupations (Agric;For; Fish)
Expected count based on population size
areas and their neighbours defined as having a common
border (up to 11); equal weights for each neighbouring
region that sum to 1
Rate of lip cancer in each region is affected by both the region itself and
its nearest neighbours after taking account of outdoor activity
•
Model
Log of the response related to fixed predictor, with an
offset, Poisson distribution for counts;
NB
Two sets of random effects
1
area random effects; (ie unstructured; non-spatial variation);
2
multiple membership set of random effects for the neighbours of
each region
127
MCMC estimation: 50,000 draws
Poisson model
Fixed effects:
Offset and
Well-supported + relation
Well-supported
Residual
neighbourhood
effect
NB: Poisson highly
correlated chains
128
Scottish Lip Cancer: CAR model
CAR:
CAR one set of random effects, which have an expected
value of the average of the surrounding random effects;
weights divided by the number of neighbours
ui ~ N (ui ,  / ni )
2
u
where ui 
w
i, j
jneigh( i )
u j / ni
where ni is the number of neighbours for area i and the
weights are typically all 1
MLwiN:
limited capabilities for CAR model; ie at
one level only (unlike Bugs)
129
MCMC estimation: CAR model, 50,000 draws
Poisson model
Fixed effects:
Offset and
Well-supported + relation
Well-supported
Residual
neighbourhood
effect
130
NB Scales: shrinkage
131
Ohio cancer: repeated measures (space and time!)
•
Response:
counts of respiratory cancer deaths in Ohio counties
• Aim:
Are there hotspot counties with distinctive trends? (small
numbers so ‘borrow strength’ from neighbours)
• Structure:
annual repeated measures (1979-1988) for counties
Classification 3: n’hoods as MM (3-8 n’hoods)
Classification 2: counties (88)
Classification 1: occasion (88*10)
•
Predictor:
Expected deaths; Time
•
Model
1
2
Log of the response related to fixed predictor, with an
offset, Poisson distribution for counts (C1);
Two sets of random effects
area random effects allowed to vary over time; trend for each county
from the Ohio distribution (c2)
multiple membership set of random effects for the
neighbours of each region (C3)
132
MCMC estimation: repeated measures model,
50,000 draws
General
trend
Default
priors
N’hood
variance
Variance
function for
between
county time
trend
133
Respiratory cancer trends in Ohio: raw and modelled
Red: County 41 in 1988; SMR = 77/49 = 1.57
Blue: County 80 in 1988: SMR= 6/19 = 0.31
134
MCMC: Out to Bugs 2 Level model
#----MODEL Definition---------------model
{
# Level 1 definition
for(i in 1:N) {
Price[i] ~ dnorm(mu[i],tau[i])
mu[i]<- beta[1] * Cons[i]
+ beta[2] * Size_5[i]
+ beta[3] * Semi[i]
+ beta[4] * Det[i]
+ u2[District[i],1] * Cons[i]
+ u2[District[i],2] * Size_5[i]
}
# Higher level definitions
for (j in 1:n2) {
u2[j,1:2] ~
dmnorm(zero2[1:2],tau.u2[1:2,1:2])
}
# Priors for fixed effects
for (k in 1:4) { beta[k] ~ dflat() }
# Priors for random terms
for(i in 1:N) {
tau[i] <- 1/sigma2[i]
sigma2[i] <- va0[1] * Cons[i] * Cons[i]
+ 2 * va0[2] * Size_5[i] * Cons[i]
+ va0[3] * Size_5[i] * Size_5[i]
}
for(i in 1:3) {
va0[i] ~ dflat()
}
for (i in 1:2) {zero2[i] <- 0}
tau.u2[1:2,1:2] ~ dwish(R2[1:2, 1:2],2)
sigma2.u2[1:2,1:2] <- inverse(tau.u2[,])
}
#----Initial values file---------------------------list(beta=
c(199.944427,23.441458,40.083321,107.60284
4),
MCMC: Out to Bugs: lip cancer MM
#----MODEL Definition---------------model
{
# Level 1 definition
for(i in 1:N) {
obs[i] ~ dpois(mu[i])
log(mu[i]) <- offs[i] + beta[1] * cons[i]
+ beta[2] * perc_aff[i]
+ u2[area[i]] * cons[i]
}
+ weight1[i] * u3[neigh1[i]] * cons[i]
+ weight2[i] * u3[neigh2[i]] * cons[i]
+ weight3[i] * u3[neigh3[i]] * cons[i]
+ weight4[i] * u3[neigh4[i]] * cons[i]
+ weight5[i] * u3[neigh5[i]] * cons[i]
+ weight6[i] * u3[neigh6[i]] * cons[i]
+ weight7[i] * u3[neigh7[i]] * cons[i]
+ weight8[i] * u3[neigh8[i]] * cons[i]
+ weight9[i] * u3[neigh9[i]] * cons[i]
+ weight10[i] * u3[neigh10[i]] * cons[i]
+ weight11[i] * u3[neigh11[i]] * cons[i]
}
# Higher level definitions
for (j in 1:n2) {
u2[j] ~ dnorm(0,tau.u2)
}
for (j in 1:n3) {
u3[j] ~ dnorm(0,tau.u3)
}
# Priors for fixed effects
for (k in 1:2) { beta[k] ~ dflat() }
# Priors for random terms
tau.u2 ~ dgamma(0.001000,0.001000)
sigma2.u2 <- 1/tau.u2
tau.u3 ~ dgamma(0.001000,0.001000)
sigma2.u3 <- 1/tau.u3
}
#----Initial values file---------------------------list(beta= c(-0.295369,0.048434),
u2 = c( 0.089997,0.068526,0.068117,0.018452
MCMC: Out to Bugs Lip cancer CAR
#----MODEL Definition---------------model
{
# Level 1 definition
for(i in 1:N) {
obs[i] ~ dpois(mu[i])
}
log(mu[i])
<- offs[i] + beta[1] * perc_aff[i]
+ carmean + u3[area[i]] * cons[i]
}
}
# Higher level definitions
u3[1:n3] ~
car.normal(adj[],weights[],num[],tau.u3)
# Priors for fixed effects
beta[1] ~ dflat()
carmean ~ dflat()
# Priors for random terms
tau.u3 ~ dgamma(0.001000,0.001000)
sigma2.u3 <- 1/tau.u3
}
#----Initial values file--------------------------list(beta= c(0.035503),
carmean = 0,
Comparing WinBugs and MLwiN
• IGLS estimation is far quicker; for Normal response
models gives very good estimates
• Model comparison is also easier with formal test statistics
• Relatively easy to set up model and display results in
MLwiN
• MCMC in MLwiN is almost always faster than in
WinBUGS (examples: typically 15 fold faster)
• But MLwiN has restricted choice of MCMC algorithms,
and restricted range of models
• Having two independent MCMC algorithms for fitting
some models is useful as programming mistakes do occur
More information on spatial
models in MLwiN
William J. Browne (2003) MCMC Estimation
in MLwiN; Chapter 16 Spatial models
Lawson, A.B., Browne W.J., and Vidal Rodeiro,
C.L. (2003). Disease Mapping using
WinBUGS and MLwiN Wiley. London
(Chapter 8: GWR)
139
Lab 12: Binary logistic
Modelling: JSP Data with
response of pass/fail
JSP LOGIT Model1a: RI, Exact Binomial
JSP LOGIT Model1b: RI, Unconstrained
JSP LOGIT Model 2: maths3-25 fixed effect &
RI, Exact Binomial
JSP LOGIT Model 2: maths3-25 fixed effect &
RI, Exact Binomial