Transcript Document

AMMBR course design
CONTENT
METHOD
Y is
0/1
logistic
regression
conjoint
analysis
our own
survey
multi-level
methods
Advanced Methods and Models in Behavioral Research – 2009/2010
DATA COLLECTION
Advanced Methods and Models in Behavioral Research – 2009/2010
Data collection
• surveys not ready to roll – some technical
issues unsolved
• please wait until you get the green light to
send your invitations
• If you have not done so yet – send your list
of respondents (it’s mandatory)
Advanced Methods and Models in Behavioral Research – 2009/2010
MULTI – LEVEL ANALYSIS
Advanced Methods and Models in Behavioral Research – 2009/2010
Multi-level models or ...
•Bayesian hierarchical models
•mixed models (in SPSS)
•hierarchical linear models
•random effects models
•random coefficient models
•subject specific models
•variance component models
•variance heterogeneity models
dealing with clustered data.
One solution: the variance component model
Advanced Methods and Models in Behavioral Research – 2009/2010
Clustered data / multi-level models
• Pupils within schools
(within regions within countries)
• Firms within regions (or sectors)
• Vignettes within persons
[copy to blackboard]
Advanced Methods and Models in Behavioral Research – 2009/2010
Two issues with clustered data
• Your estimates will (in all likelihood) be too precise:
you find effects that do not exist in the population
[further explanation on blackboard]
• You will want to distinguish between effects within
clusters and effects between clusters
[see next two slides]
Advanced Methods and Models in Behavioral Research – 2009/2010
On individual vs aggregate data
For instance:
X = introvert
Y = school results
X = age of McDonald’s employee
Y = like the manager
Advanced Methods and Models in Behavioral Research – 2009/2010
Had we only known, that the data are clustered!
Using the
school example:
lines represent
schools. And
within schools
the effect of
being introvert
is positive!
So the effect of X within clusters can be different
from the effect between clusters!
Advanced Methods and Models in Behavioral Research – 2009/2010
MAIN MESSAGES
Be able to recognize clustered data and deal
with it appropriately (how you do that will
follow)
Distinguish two kinds of effects: those at the
"micro-level" vs those at the aggregate level
(and do not test a micro-hypothesis with
aggregate data)
Advanced Methods and Models in Behavioral Research – 2009/2010
A toy example – two schools, two pupils
Two schools each with two pupils. We first calculate the means.
(taken from Rasbash)
exam score
3
2
-1
Overall mean(0)
-4
School 1
School 2
Overall mean= (3+2+(-1)+(-4))/4=0
Advanced Methods and Models in Behavioral Research – 2009/2010
Now the variance
exam score
3
2
-1
Overall mean(0)
-4
School 1
School 2
The total variance is the sum of the squares of the departures of the
observations around mean divided by the sample size (4) =
(9+4+1+16)/4=7.5
Advanced Methods and Models in Behavioral Research – 2009/2010
The variance of the school means
around the overall mean
exam score
3
2
2.5
Overall mean(0)
-1
-2.5
-4
School 1
School 2
The variance of the school means around the overall mean=
(2.52+(-2.5)2)/2=6.25
(total variance was 7.5)
Advanced Methods and Models in Behavioral Research – 2009/2010
The variance of the pupils scores
around their school’s mean
exam score
3
2
2.5
-1
-2.5
-4
School 1
School 2
The variance of the pupils scores around their school’s mean=
((3-2.5)2 + (2-2.5)2 + (-1-(-2.5))2 + (-4-(-2.5))2 )/4 =1.25
Advanced Methods and Models in Behavioral Research – 2009/2010
-> So you can partition the variance
in individual level and school level
How much of the variability in pupil attainment is
attributable to factors at the school and how much to
factors at the pupil level?
In terms of our toy example we can now say
6.25/7.5= 82% of the total variation of
pupils attainment is attributable to school
level factors
1.25/7.5= 18% of the total variation of
pupils attainment is attributable to pupil
level factors
And this is important;
we want to know how
to explain
(in this example)
school attainment,
and appararently the
differences are at the
school level more than
the pupil level
Advanced Methods and Models in Behavioral Research – 2009/2010
Standard multiple regression won't do
Y
D1
D2
D3
D4
D5
id
+4
-1
-1
0
1
0
1
-3
1
1
1
0
-1
1
+2
0
0
1
0
-1
2
0
1
0
-1
1
0
2
+1
…
…
…
…
…
3
+2
…
…
…
…
…
3
-3
…
…
…
…
…
4
+4
…
…
…
…
…
4
…
…
…
…
…
…
…
…
So:
You can use all the data and
just run a multiple regression,
but then you disregard the
clustering effect, which gives
uncorrect confidence intervals
You can aggregate within
clusters, and then run a
multiple regression on the
aggregate data. Two problems:
no individual level testing
possible + you get less data
points.
So what can we do?
Advanced Methods and Models in Behavioral Research – 2009/2010
Multi-level models
The usual multiple regression model assumes
... with the subscript "i" defined at the case-level.
... and the epsilons independently distributed with
covariance matrix
I.
With clustered data, you know these
assumptions are not met.
Advanced Methods and Models in Behavioral Research – 2009/2010
Solution 1: add dummy-variables per cluster
• So just multiple regression, but with as many
dummy variables as you have clusters (minus 1)
... where, in this example, there are j+1 clusters.
IF the clustering is (largely) due to differences in the
intercept between persons, this might work.
BUT if there are only a handful of cases per person,
this necessitates a huge number of extra variables
Advanced Methods and Models in Behavioral Research – 2009/2010
Solution 2: split your micro-level X-vars
Say you have:
then create:
and add both as predictors (instead of x1)
Advanced Methods and Models in Behavioral Research – 2009/2010
Make sure that you
understand what
is happening here,
and why it is of use.
Solution 3: the variance component model
In the variance component model,
we split the randomness
in a "personal part" and a "rest part"
Advanced Methods and Models in Behavioral Research – 2009/2010
Now: how do you do this in Stata?
<See Stata demo>
[note to CS: use age and schooling as examples to split at restaurant level]
relevant commands
xtset and xtreg
bys <varA>: egen <meanvarB> = mean(<varB>)
gen dvarB = <varB> - <meanvarB>
convenience commands
tab <var>, gen()
order
edit
drop
des
sum
Advanced Methods and Models in Behavioral Research – 2009/2010
Up next
• How do we run the "Solution 1" and "Solution 2" analysis?
• Random intercept we now saw, but how about random slopes?
Advanced Methods and Models in Behavioral Research – 2009/2010
When you have multi-level data (2 levels)
1. If applicable: consider whether using separate
dummies per group might help (use only when this
does not create a lot of dummies)
2. Run an empty mixed model (i.e., just the constant
included) in Stata. Look at the level on which most
of the variance resides.
3. If applicable: divide micro-variables in "group
mean" variables and "difference from group mean"
variables.
4. Re-run your mixed model with these variables
included (as you would a multiple regression
analysis)
Advanced Methods and Models in Behavioral Research – 2009/2010
To Do
• Put your respondent list only if you have not done so
yet.
• Check the material online (as of tomorrow morning)
• Check in Stata: how do I create:
– dummies per cluster
– the mean of a variable within a cluster
– the deviation from the mean within a cluster
• Next time: bring your laptop. We’ll have a full
session of practicing only
Advanced Methods and Models in Behavioral Research – 2009/2010