Transcript Document

Instead of Friday, March 20:
This monday, 25 March, hours 7 and 8
In addition to the regular program:
Tuesday, 2 April, hours 7 and 8
Regular:
Friday, 5 April, hours 5 and 6
Advanced Methods and Models in Behavioral Research – 2013
Check out:
My logistic regression run on auto.dta
(Not easy / thinking out loud / there is more than one correct answer)
Revisit your own and others’ logit do files;
check if you are able to do this yourself
Advanced Methods and Models in Behavioral Research – 2013
The exam
Same kind of setup as MMBR:
On laptop
ExamMonitor installed
No books or notes allowed, only Stata’s help files
but:
No (or hardly any) multiple choice questions
Largest part is working on data
MMBR is considered working knowledge
You get the data before the exam (!)
Advanced Methods and Models in Behavioral Research – 2013
Our exam data
• Roger Fuchs & Freek Schoonbrood BEP project
• Go through the experiment at the link supplied in a
minute
• Make sure to:
– Answer seriously
– Understand that your are doing a conjoint analysis
– Realize that the data from this experiment are going to be
the ones that we will use during the exam
– Write down notes for improvement of the survey
Advanced Methods and Models in Behavioral Research – 2013
Logistics
• We now go through the experiment
• We try to come up with as many improvements as we can think
of
• Roger and Freek implement the ones they feel make sense
today
• Everyone arranges for at least 5 participants as of tomorrow:
ensure some variance! (and/or put an invite on your
Facebook/Twitter/... page)
• As soon as I have at least some data, I will put the data set
online (note that it might not be complete yet, as there might
follow some more participants later)
• (note that we are doing this sort of quick-and-dirty: we do not
check for all kinds of sampling biases, etc. Think about how we
could have done that!)
Advanced Methods and Models in Behavioral Research – 2013
http://bep.freek.ws
Advanced Methods and Models in Behavioral Research – 2013
In with the (multi-level) statistics...
Advanced Methods and Models in Behavioral Research – 2013
MULTI – LEVEL ANALYSIS
Advanced Methods and Models in Behavioral Research – 2013
Multi-level models or ...
•Bayesian hierarchical models
•mixed models (in SPSS)
•hierarchical linear models
•random effects models
•random coefficient models
•subject specific models
•variance component models
•variance heterogeneity models
dealing with clustered data.
One solution: the variance component model
Advanced Methods and Models in Behavioral Research – 2013
Clustered data / multi-level models
• Pupils within schools
(within regions within countries)
• Firms within regions (or sectors)
• Vignettes within persons
• Employees within stores (our fastfood.dta example)
Advanced Methods and Models in Behavioral Research – 2013
Two issues with clustered data
• Your estimates will (in all likelihood) be too precise:
you find effects that do not exist in the population
[do we get that?]
• You will want to distinguish between effects within
clusters and effects between clusters
[see next two slides]
Advanced Methods and Models in Behavioral Research – 2013
On individual vs aggregate data
For instance:
X = introvert
Y = school results
X = age of McDonald’s employee
Y = like the manager
Advanced Methods and Models in Behavioral Research – 2013
Had we only known, that the data are clustered!
Using the
school example:
lines represent
schools. And
within schools
the effect of
being introvert
is positive!
So the effect of an X within clusters can be different
from the effect between clusters!
Advanced Methods and Models in Behavioral Research – 2013
MAIN MESSAGES
Be able to recognize clustered data and deal
with it appropriately (how to do that will
follow)
Distinguish two kinds of effects: those at the
"micro-level" (within clusters) vs those at the
aggregate level (between clusters). They
need not be the same!
(and ... do not test a micro-hypothesis with
aggregate data)
Advanced Methods and Models in Behavioral Research – 2013
A toy example – two schools, two pupils
Two schools each with two pupils. We first calculate the means.
(taken from Rasbash)
exam score
3
2
-1
Overall mean(0)
-4
School 1
School 2
Overall mean= (3+2+(-1)+(-4))/4=0
Advanced Methods and Models in Behavioral Research – 2013
Now the variance
exam score
3
2
-1
Overall mean(0)
-4
School 1
School 2
The total variance is the sum of the squares of the departures of the
observations around the mean, divided by the sample size (4) =
(9+4+1+16)/4=7.5
Advanced Methods and Models in Behavioral Research – 2013
The variance of the school means
around the overall mean
exam score
3
2
2.5
Overall mean(0)
-1
-2.5
-4
School 1
School 2
The variance of the school means around the overall mean=
(2.52+(-2.5)2)/2=6.25
(total variance was 7.5)
Advanced Methods and Models in Behavioral Research – 2013
The variance of the pupils scores
around their school’s mean
exam score
3
2
2.5
-1
-2.5
-4
School 1
School 2
The variance of the pupils scores around their school’s mean=
((3-2.5)2 + (2-2.5)2 + (-1-(-2.5))2 + (-4-(-2.5))2 )/4 =1.25
Advanced Methods and Models in Behavioral Research – 2013
-> So you can partition the total variance
in individual level variance and school level variance
How much of the variability in pupil attainment is
attributable to factors at the school and how much to
factors at the pupil level?
In terms of our toy example we can now say
6.25/7.5= 82% of the total variation of
pupils attainment is attributable to school
level factors
1.25/7.5= 18% of the total variation of
pupils attainment is attributable to pupil
level factors
And this is important;
we want to know how
to explain
(in this example)
school attainment,
and appararently the
differences are at the
school level more than
the pupil level
Advanced Methods and Models in Behavioral Research – 2013
Standard multiple regression won't do
Y
D1
D2
D3
D4
D5
id
+4
-1
-1
0
1
0
1
-3
1
1
1
0
-1
1
+2
0
0
1
0
-1
2
0
1
0
-1
1
0
2
+1
…
…
…
…
…
3
+2
…
…
…
…
…
3
-3
…
…
…
…
…
4
+4
…
…
…
…
…
4
…
…
…
…
…
…
…
…
So you can use all the data and
just run a multiple regression, but
then you disregard the clustering
effect, which gives uncorrect
confidence intervals (and cannot
distinguish between effects at the
cluster vs at the school level)
Possible solution (but not so good)
You can aggregate within clusters,
and then run a multiple regression
on the aggregate data. Two
problems: no individual level
testing possible + you get much
less data points.
So what can we do?
Advanced Methods and Models in Behavioral Research – 2013
Multi-level models
The standard multiple regression model assumes
... with the subscript "i" defined at the case-level.
... and the epsilons independently distributed with
covariance matrix
I.
With clustered data, you know these
assumptions are not met.
Advanced Methods and Models in Behavioral Research – 2013
Solution 1: add dummy-variables per cluster
• Try multiple regression, but with as many dummy
variables as you have clusters (minus 1)
... where, in this example, there are j+1 clusters.
IF the clustering differences are (largely) due to differences in the
intercept between persons, this might work.
BUT if there are only a handful of cases per person, this
necessitates a huge number of extra variables
Advanced Methods and Models in Behavioral Research – 2013
Solution 2: split your micro-level X-vars
Say you have:
Make sure that you
understand what
is happening here,
and why it is of use.
then create:
and add both as predictors (instead of x1)
Advanced Methods and Models in Behavioral Research – 2013
Solution 3: the variance component model
In the variance component model,
we split the randomness
in a "personal part" and a "rest part"
Advanced Methods and Models in Behavioral Research – 2013
Now: how do you do this in Stata?
<See Stata demo>
[note to CS: use age and schooling as examples to split at restaurant level]
relevant commands
xtset and xtreg
bys <varA>: egen <meanvarB> = mean(<varB>)
gen dvarB = <varB> - <meanvarB>
convenience commands
tab <var>, gen()
order
edit
drop
des
sum
Advanced Methods and Models in Behavioral Research – 2013
Up next
• How do we run the "Solution 1”, "Solution 2”, and “Solution 3”
analysis and compare which works best? What about
assumption checking?
• Random intercept we now saw, but how about random slopes?
Advanced Methods and Models in Behavioral Research – 2013
When you have multi-level data (2 levels)
1. If applicable: consider whether using separate dummies per
group might help (use only when this does not create a lot of
dummies)
2. Run an empty mixed model (i.e., just the constant included) in
Stata. Look at the level on which most of the variance resides.
3. If applicable: divide micro-variables in "group mean" variables
and "difference from group mean" variables.
4. Re-run your mixed model with these variables included (as
you would a multiple regression analysis)
5. (and note: use regression diagnostics secretly, to find outliers
and such)
Advanced Methods and Models in Behavioral Research – 2013
On non-response
Advanced Methods and Models in Behavioral Research – 2013
Non-response analysis
• Not all of the ones invited are going to participate
• Think about selective non-response: some (kinds of)
individuals might be less likely to participate.
How might that influence the results?
sample
Data: TVSFP on influencing behavior
Advanced Methods and Models in Behavioral Research – 2013
Online as
motoroccasion8March2013.dta
Advanced Methods and Models in Behavioral Research – 2013