The Sources of Associational Life: A Cross

Download Report

Transcript The Sources of Associational Life: A Cross

Multilevel Models 1
Sociology 229: Advanced Regression
Copyright © 2010 by Evan Schofer
Do not copy or distribute without permission
Announcements
• Assignment 4 Due
• Assignments 2 & 3 handed back.
Multilevel Data
• Often we wish to examine data that is
“clustered” or “multilevel” in structure
– Classic example: Educational research
• Students are nested within classes
• Classes are nested within schools
• Schools are nested within districts or US states
• We often refer to these as “levels”
•
•
•
•
Ex: If the study is individual/class/school…
Level 1 = individual level
Level 2 = classroom
Level 3 = school
– Note: Some stats books/packages label differently!
Multilevel Data
• Students nested in class, school, and state
• Variables at each level may affect student outcomes
California
Oregon
School
Class
Class
School
Class
Class
School
Class
Class
Class
Class
School
Class
Class
Class
Class
Multilevel Data
• Simpler example: 2-level data
Class
Class
Class
Class
Class
Class
• Which can be shown as:
Level 2
Level 1
Class 1
S1
S2
Class 2
S3
S1
S2
Class 3
S3
S1
S2
S3
Multilevel Data
• We are often interested in effects of variables
at multiple levels
•
•
•
•
•
Ex: Predicting student test scores
Individual level: grades, SES, gender, race, etc.
Class level: Teacher qualifications, class size, track
School: Private vs. public, resources
State: Ed policies (funding, tests), budget
– And, it is useful to assess the relative importance
of each level in predicting outcomes
• Should educational reforms target classrooms?
Schools? Individual students?
• Which is most likely to have big consequences?
Multilevel Data
• Repeated measurement is also “multilevel” or
“clustered”
• Measurement at over time (T1, T2, T3…) is nested
within persons (or firms or countries)
• Level 1 is the measurement (at various points in time)
• Level 2 = the individual
Person 1
Person 2
Person 3
Person 4
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
Multilevel Data
• Examples of multilevel/clustered data:
• Individuals from same family
– Ex: Religiosity
• People in same country (in a cross-national survey)
– Ex: Civic participation
• Firms from within the same industry
– Ex: Firm performance
• Individuals measured repeatedly
– Ex: Depression
• Workers within departments, firms, & industries
– Ex: Worker efficiency
– Can you think of others?
Example: Pro-environmental values
• Source: World Values Survey (27 countries)
• Let’s simply try OLS regression
. reg supportenv age male dmar demp educ incomerel ses
Source |
SS
df
MS
-------------+-----------------------------Model | 2761.86228
7 394.551755
Residual | 105404.878 27799 3.79167876
-------------+-----------------------------Total |
108166.74 27806 3.89005036
Number of obs
F( 7, 27799)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
27807
104.06
0.0000
0.0255
0.0253
1.9472
-----------------------------------------------------------------------------supportenv |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0021927
.000803
-2.73
0.006
-.0037666
-.0006187
male |
.0960975
.0236758
4.06
0.000
.0496918
.1425032
dmar |
.0959759
.02527
3.80
0.000
.0464455
.1455063
demp | -.1226363
.0254293
-4.82
0.000
-.172479
-.0727937
educ |
.1117587
.0058261
19.18
0.000
.1003393
.1231781
incomerel |
.0131716
.0056011
2.35
0.019
.0021931
.0241501
ses |
.0922855
.0134349
6.87
0.000
.0659525
.1186186
_cons |
5.742023
.0518026
110.84
0.000
5.640487
5.843559
Aggregation
• If you want to focus on higher-level hypotheses
(e.g., schools, not children), you can aggregate
• Make “school” the unit of analysis
• OLS regression analysis of school-level variables
• Individual-level variables (e.g., student achievement) can
be included as school averages (aggregates)
– Ex: Model average school test score as a function
of school resources and average student SES
• Problem: Approach destroys individual-level data
• Also: Loss of statistical power (Tabachnick & Fidel 2007)
• Also: Can’t draw individual-level interpretations:
ecological fallacy.
Example: Pro-environmental values
• Aggregation: Analyze country means (N=27)
. reg supportenv age male dmar demp educ incomerel ses
Source |
SS
df
MS
-------------+-----------------------------Model | 2.58287267
7
.36898181
Residual | 7.72899325
19 .406789119
-------------+-----------------------------Total | 10.3118659
26 .396610228
Number of obs
F( 7,
19)
Prob > F
R-squared
Adj R-squared
Root MSE
=
27
=
0.91
= 0.5216
= 0.2505
= -0.0257
=
.6378
-----------------------------------------------------------------------------supportenv |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age |
.0211517
.0391649
0.54
0.595
-.0608215
.1031248
male |
3.966173
4.479358
0.89
0.387
-5.409232
13.34158
dmar |
.8001333
1.127099
0.71
0.486
-1.558913
3.15918
demp | -.0571511
1.165915
-0.05
0.961
-2.497439
2.383137
educ |
.3743473
.2098779
1.78
0.090
-.0649321
.8136268
incomerel |
.148134
.1687438
0.88
0.391
-.2050508
.5013188
ses | -.4126738
.4916416
-0.84
0.412
-1.441691
.6163439
_cons |
2.031181
3.370978
0.60
0.554
-5.024358
9.08672
Note loss of statistical power – few variables
are significant when N is only 27
Ecological Fallacy
• Issue: Data aggregation limits your ability to
draw conclusions about level-1 units
• The “ecological fallacy”
– Robinson, W.S. (1950). "Ecological Correlations
and the Behavior of Individuals". American
Sociological Review 15: 351–357
• Among US states, immigration rate correlates positively
with average literacy
• Does this mean that immigrants tend to be more literate
than US citizens?
• NO: You can’t assume an individual-level correlation!
– The correlation at individual level is actually negative
– But: immigrants settled in states with high levels of literacy –
yielding a correlation in aggregate statistics.
OLS Approaches
• Another option: Just use OLS regression
• Allows you to focus on lower-level units
– No need for aggregation
• Ex: Just analyze individuals as the unit of analysis,
ignoring clustering among schools
• Include independent variables measured at the
individual-level and other levels
• Problems:
• 1. Violates OLS assumptions (see below)
• 2. OLS can’t take full advantage of richness of multilevel
data
– Ex: Complex variation in intercepts, slopes across groups.
Multilevel Data: Problems
• Issue: Multilevel data often results in violation
of OLS regression assumption
• OLS requires an independent random sample…
• Students from the same class (or school) are not
independent… and may have correlated error
– If you don’t control for sources of correlated error,
models tend to underestimate standard errors
• This leads to false rejection of H0
– “Type I Error” -- Too many asterisks in table
• This is a serious issue, as we always want to err in the
direction of conservatism
Multilevel Data: Problems
• Why might nested data have correlated error?
– Example: Student performance on a test
• Students in a given classroom may share & experience
common (unobserved) characteristics
• Ex: Maybe the classroom is too dark, causing all
students to perform poorly on tests
– If all those students score poorly, they fall below the
regression line… and have negative error
– But OLS regression requires that error be “random”
– Within-class error should be random, not consistently negative
– Other sources of within-class (or school) error
• An especially good teacher; poor school funding
• Other ideas?
Multilevel Data: Problems
• Sources of correlated error within groups
– Ex: Cross-national study of homelessness
• People in welfare states have a common unobserved
characteristic: access to generous benefits
– Ex: Study of worker efficiency in workgroups
• Group members may influence each other (peer
pressure) leading to group commonalities.
Multilevel Data: Problems
• When is multilevel data NOT a problem?
– Answer: If you can successfully control for
potential sources of correlated error
• Add a control to OLS model for: classroom, school,
and state characteristics that would be sources of
correlated error in each group
• Ex: Teacher quality, class size, budget, etc…
• But: We often can’t identify or measure all
relevant sources of correlated error
• Thus, we need to abandon simple OLS regression and
try other approaches.
Example: Pro-environmental values
• Source: World Values Survey (~26 countries)
. reg supportenv age male dmar demp educ incomerel ses
Source |
SS
df
MS
-------------+-----------------------------Model | 2761.86228
7 394.551755
Residual | 105404.878 27799 3.79167876
-------------+-----------------------------Total |
108166.74 27806 3.89005036
Number of obs
F( 7, 27799)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
27807
104.06
0.0000
0.0255
0.0253
1.9472
-----------------------------------------------------------------------------supportenv |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0021927
.000803
-2.73
0.006
-.0037666
-.0006187
male |
.0960975
.0236758
4.06
0.000
.0496918
.1425032
dmar |
.0959759
.02527
3.80
0.000
.0464455
.1455063
demp | -.1226363
.0254293
-4.82
0.000
-.172479
-.0727937
educ |
.1117587
.0058261
19.18
0.000
.1003393
.1231781
incomerel |
.0131716
.0056011
2.35
0.019
.0021931
.0241501
ses |
.0922855
.0134349
6.87
0.000
.0659525
.1186186
_cons |
5.742023
.0518026
110.84
0.000
5.640487
5.843559
Robust Standard Errors
• Strategy #1: Improve our estimates of the
standard errors
– Option 1: Robust Standard Errors
• reg y x1 x2 x3, vce(robust)
• The Huber / White / “Sandwich” estimator
• An alternative method of computing standard errors
that is robust to a variety of assumption violations
– Provides accurate estimates in presence of heteroskedasticity
• Also, robust to model misspecification
– Note: Freedman’s criticism: What good are accurate SEs if
coefficients are biased due to poor specification?
• Doesn’t fix the clustered error problem…
Example: Pro-environmental values
• Robust Standard Errors
. reg supportenv age male dmar demp educ incomerel ses, robust
Linear regression
Number of obs
F( 7, 27799)
Prob > F
R-squared
Root MSE
=
=
=
=
=
27807
102.48
0.0000
0.0255
1.9472
-----------------------------------------------------------------------------|
Robust
supportenv |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0021927
.0008113
-2.70
0.007
-.0037829
-.0006024
male |
.0960975
.0237017
4.05
0.000
.049641
.142554
dmar |
.0959759
.025602
3.75
0.000
.0457948
.146157
demp | -.1226363
.0251027
-4.89
0.000
-.1718388
-.0734339
educ |
.1117587
.0057498
19.44
0.000
.1004888
.1230286
incomerel |
.0131716
.0056017
2.35
0.019
.002192
.0241513
ses |
.0922855
.0135905
6.79
0.000
.0656474
.1189237
_cons |
5.742023
.0527496
108.85
0.000
5.638631
5.845415
Standard errors shift a tiny bit… fairly similar to OLS
in this case
Robust Cluster Standard Errors
• Option 2: “Robust cluster” standard errors
– An extension of robust SEs to address clustering
• reg y x1 x2 x3, vce(cluster groupid)
– Note: Cluster implies robust (vs. regular SEs)
• It is easy to adapt robust standard errors to address
clustering in data; See:
– http://www.stata.com/support/faqs/stat/robust_ref.html
– http://www.stata.com/support/faqs/stat/cluster.html
• Result: SE estimates typically increase, which is
appropriate because non-independent cases aren’t
providing as much information compared to a sample of
independent cases.
Example: Pro-environmental values
• Robust Cluster Standard Errors
. reg supportenv age male dmar demp educ incomerel ses, cluster(country)
Linear regression
Number of clusters (country) = 26
Number of obs =
F( 7,
25) =
Prob > F
=
R-squared
=
Root MSE
=
27807
12.94
0.0000
0.0255
1.9472
-----------------------------------------------------------------------------|
Robust
supportenv |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0021927
.0017599
-1.25
0.224
-.0058172
.0014319
male |
.0960975
.0341053
2.82
0.009
.0258564
.1663386
dmar |
.0959759
.0722285
1.33
0.196
-.0527815
.2447333
demp | -.1226363
.0820805
-1.49
0.148
-.2916842
.0464115
educ |
.1117587
.0301004
3.71
0.001
.0497658
.1737515
incomerel |
.0131716
.0260334
0.51
0.617
-.0404452
.0667885
ses |
.0922855
.0405742
2.27
0.032
.0087214
.1758496
_cons |
5.742023
.2451109
23.43
0.000
5.237208
6.246838
Cluster standard errors really change the picture.
Several variables lose statistical significance.
Dummy Variables
• Another solution to correlated error within
groups/clusters: Add dummy variables
• Include a dummy variable for each Level-2 group, to
explicitly model variance in means
• A simple version of a “fixed effects” model (see below)
• Ex: Student achievement; data from 3 classes
• Level 1: students; Level 2: classroom
• Create dummy variables for each class
– Include all but one dummy variable in the model
– Or include all dummies and suppress the intercept
Yi    DClass2 X i  DClass3X i  X i   i
Dummy Variables
• What is the consequence of adding group
dummy variables?
• A separate intercept is estimated for each group
• Correlated error is absorbed into intercept
– Groups won’t systematically fall above or below the regression
line
• In fact, all “between group” variation (not just error) is
absorbed into the intercept
– Thus, other variables are really just looking at within group
effects
– This can be good or bad, depending on your goals.
Dummy Variables
• Note: You can create a set of dummy
variables in stata as follows:
• xi i.classid – creates dummy variables for each
unique value of the variable “classid”
– Creates variables named _Iclassid_1, _Iclassid2, etc
• These dummies can be added to the analysis by
specifying the variable: _Iclassid*
• Ex: reg y x1 x2 x3 _Iclassid*, nocons
– “nocons” removes the constant, allowing you to use a full set
of dummies. Alternately, you could drop one dummy.
Example: Pro-environmental values
• Dummy variable model
. reg supportenv age male dmar demp educ incomerel ses _Icountry*
Source |
SS
df
MS
-------------+-----------------------------Model | 11024.1401
32 344.504377
Residual | 97142.6001 27774 3.49760928
-------------+-----------------------------Total |
108166.74 27806 3.89005036
Number of obs
F( 32, 27774)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
27807
98.50
0.0000
0.1019
0.1009
1.8702
-----------------------------------------------------------------------------supportenv |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0038917
.0008158
-4.77
0.000
-.0054906
-.0022927
male |
.0979514
.0229672
4.26
0.000
.0529346
.1429683
dmar |
.0024493
.0252179
0.10
0.923
-.046979
.0518777
demp | -.0733992
.0252937
-2.90
0.004
-.1229761
-.0238223
educ |
.0856092
.0061574
13.90
0.000
.0735404
.097678
incomerel |
.0088841
.0059384
1.50
0.135
-.0027554
.0205237
ses |
.1318295
.0134313
9.82
0.000
.1055036
.1581554
_Icountry_32 | -.4775214
.085175
-5.61
0.000
-.6444687
-.3105742
_Icountry_50 |
.3943565
.0844248
4.67
0.000
.2288798
.5598332
_Icountry_70 |
.1696262
.0865254
1.96
0.050
.0000321
.3392203
… dummies omitted …
_Icountr~891 |
.243995
.0802556
3.04
0.002
.08669
.4012999
_cons |
5.848789
.082609
70.80
0.000
5.686872
6.010707
Dummy Variables
• Benefits of the dummy variable approach
• It is simple
– Just estimate a different intercept for each group
• sometimes the dummy interpretations can be of interest
• Weaknesses
• Cumbersome if you have many groups
• Uses up lots of degrees of freedom (not parsimonious)
• Makes it hard to look at other kinds of group dummies
– Non-varying group variables = collinear with dummies
• Can be problematic if your main interest is to study effects of
variables across groups
– Dummies purge that variation… focus on within-group variation
– If there isn’t much within group variation, there isn’t much to analyze
– Related point: fixed effects can amplify noise (e.g., in panel data).
Dummy Variables
• Note: Dummy variables are a simple example
of a “fixed effects” model (FEM)
• Effect of each group is modeled as a “fixed effect”
rather than a random variable
• Also can be thought of as the “within-group” estimator
– Looks purely at variation within groups
– Stata can do a Fixed Effects Model without the
effort of using all the dummy variables
• Simply request the “fixed effects” estimator in xtreg.
Fixed Effects Model (FEM)
• Fixed effects model:
Yij   j  X ij   ij
• For i cases within j groups
• Therefore j is a separate intercept for each group
• It is equivalent to solely at within-group variation:
Yij  Yj   ( X ij  X j )   ij   j
• X-bar-sub-j is mean of X for group j, etc
• Model is “within group” because all variables are
centered around mean of each group.
Fixed Effects Model (FEM)
. xtreg supportenv age male dmar demp educ incomerel ses, i(country) fe
Fixed-effects (within) regression
Group variable (i): country
Number of obs
Number of groups
=
=
27807
26
R-sq:
Obs per group: min =
avg =
max =
511
1069.5
2154
within = 0.0220
between = 0.0368
overall = 0.0239
F(7,27774)
=
89.23
corr(u_i, Xb) = 0.0213
Prob > F
=
0.0000
-----------------------------------------------------------------------------supportenv |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0038917
.0008158
-4.77
0.000
-.0054906
-.0022927
male |
.0979514
.0229672
4.26
0.000
.0529346
.1429683
dmar |
.0024493
.0252179
0.10
0.923
-.046979
.0518777
demp | -.0733992
.0252937
-2.90
0.004
-.1229761
-.0238223
educ |
.0856092
.0061574
13.90
0.000
.0735404
.097678
incomerel |
.0088841
.0059384
1.50
0.135
-.0027554
.0205237
ses |
.1318295
.0134313
9.82
0.000
.1055036
.1581554
_cons |
5.878524
.052746
111.45
0.000
5.775139
5.981908
-------------+---------------------------------------------------------------sigma_u | .55408807
Identical to dummy variable model!
sigma_e | 1.8701896
rho | .08069488
(fraction of variance due to u_i)
-----------------------------------------------------------------------------F test that all u_i=0:
F(25, 27774) =
94.49
Prob > F = 0.0000
ANOVA: A Digression
• Suppose you wish to model variable Y for j
groups (clusters)
• Ex: Wages for different racial groups
• Definitions:
• The grand mean is the mean of all groups
– Y-bar
• The group mean is the mean of a particular sub-group
of the population
– Y-bar-sub-j
ANOVA: Concepts & Definitions
• Y is the dependent variable
• We are looking to see if Y depends upon the particular
group a person is in
• The effect of a group is the difference
between a group’s mean & the grand mean
• Effect is denoted by alpha (a)
• If Y-bar = $8.75, YGroup 1 = $8.90, then Group 1= $0.15
• Effect of being in group j is:
α j  Yj  Y
• It is like a deviation, but for a group.
ANOVA: Concepts & Definitions
• ANOVA is based on partitioning deviation
• We initially calculated deviation as the
distance of a point from the grand mean:
di  Yi  Y
• But, you can also think of deviation from a
group mean (called “e”):
ei,Group 1  Yi,Group 1  YGroup 1
• Or, for any case i in group j:
eij  Yij  Yj
ANOVA: Concepts & Definitions
• The location of any case is determined by:
• The Grand Mean, m, common to all cases
• The group “effect” , common to members
• The distance between a group and the grand mean
• “Between group” variation
• The within-group deviation (e): called “error”
• The distance from group mean to an case’s value
The ANOVA Model
• This is the basis for a formal model:
• For any population with mean m
• Comprised of J subgroups, Nj in each group
• Each with a group effect 
• The location of any individual can be
expressed as follows: Y  μ  α
ij
j
 eij
• Yij refers to the value of case i in group j
• eij refers to the “error” (i.e., deviation from
group mean) for case i in group j
Sum of Squared Deviation
• We are most interested in two parts of model
• The group effects:
j
• Deviation of the group from the grand mean
• Individual case error:
eij
• Deviation of the individual from the group mean
• Each are deviations that can be summed up
• Remember, we square deviations when summing
• Otherwise, they add up to zero
• Remember variance is just squared deviation
Sum of Squared Deviation
• The total deviation can partitioned into j and
eij components:
• That is, j + eij = total deviation:
eij  Yij  Yj
α j  Yj  Y
eij  α j  (Yj  Y)  (Yij  Yj )  Yij  Y
Sum of Squared Deviation
• The total deviation can partitioned into j and
eij components:
• The total variance (SStotal) is made up of:
–
–
–
j : between group variance (SSbetween)
eij : within group variance (SSwithin)
SStotal = SSbetween + SSwithin
ANOVA & Fixed Effects
• Note that the ANOVA model is similar to the
fixed effects model
• But FEM also includes a X term to model linear trend
ANOVA
Yij  μ  α j  eij
Fixed Effects Model
Yij   j  X ij   ij
• In fact, if you don’t specify any X variables, they are
pretty much the same
Within Group & Between Group
Models
• Group-effect dummy variables in regression
model creates a specific estimate of group
effects for all cases
• Bs & error are based on remaining “within group”
variation
• We could do the opposite: ignore within-group
variation and just look at differences between
• Stata’s xtreg command can do this, too
• This is essentially just modeling group means!
Between Group Model
. xtreg supportenv age male dmar demp educ incomerel ses, i(country) be
Between regression (regression on group means)
Group variable (i): country
Number of obs
Number of groups
=
=
27
27
R-sq:
Obs per group: min =
avg =
max =
1
1.0
1
within =
.
between = 0.2505
overall = 0.2505
sd(u_i + avg(e_i.))=
.6378002
F(7,19)
Prob > F
=
=
0.91
0.5216
-----------------------------------------------------------------------------supportenv |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age |
.0211517
.0391649
0.54
0.595
-.0608215
.1031248
male |
3.966173
4.479358
0.89
0.387
-5.409232
13.34158
dmar |
.8001333
1.127099
0.71
0.486
-1.558913
3.15918
demp | -.0571511
1.165915
-0.05
0.961
-2.497439
2.383137
educ |
.3743473
.2098779
1.78
0.090
-.0649321
.8136268
incomerel |
.148134
.1687438
0.88
0.391
-.2050508
.5013188
ses | -.4126738
.4916416
-0.84
0.412
-1.441691
.6163439
_cons |
2.031181
3.370978
0.60
0.554
-5.024358
9.08672
Note: Results are identical to the aggregated
analysis… Note that N is reduced to 27
Fixed vs. Random Effects
• Dummy variables produce a “fixed” estimate
of the intercept for each group
• But, models don’t need to be based on fixed effects
• Example: The error term (ei)
• We could estimate a fixed value for all cases
– This would use up lots of degrees of freedom – even more
than using group dummies
• In fact, we would use up ALL degrees of freedom
– Stata output would simply report back the raw data (expressed
as deviations from the constant)
• Instead, we model e as a random variable
– We assume it is normal, with standard deviation sigma.
Random Effects
• A simple random intercept model
– Notation from Rabe-Hesketh & Skrondal 2005, p. 4-5
Random Intercept Model
Yij  0   j   ij
• Where  is the main intercept
• Zeta () is a random effect for each group
– Allowing each of j groups to have its own intercept
– Assumed to be independent & normally distributed
• Error (e) is the error term for each case
– Also assumed to be independent & normally distributed
• Note: Other texts refer to random intercepts as uj or nj.
Random Effects
• Issue: The dummy variable approach
(ANOVA, FEM) treats group differences as a
fixed effect
• Alternatively, we can treat it as a random effect
• Don’t estimate values for each case, but model it
• This requires making assumptions
– e.g., that group differences are normally distributed with a
standard deviation that can be estimated from data.
Linear Random Intercepts Model
• The random intercept idea can be applied to
linear regression
•
•
•
•
Often called a “random effects” model…
Result is similar to FEM, BUT:
FEM looks only at within group effects
Aggregate models (“between effects”) looks across
groups
– Random effects models is a hybrid: a weighted
average of between & within group effects
• It exploits between & within information, and thus can
be more efficient than FEM & aggregate models.
– IF distributional assumptions are correct.
Linear Random Intercepts Model
. xtreg supportenv age male dmar demp educ incomerel ses, i(country) re
Random-effects GLS regression
Group variable (i): country
R-sq:
within = 0.0220
between = 0.0371
overall = 0.0240
Random effects u_i ~ Gaussian
corr(u_i, X)
= 0 (assumed)
Assumes
normal uj,
uncorrelated
with X vars
Number of obs
Number of groups
=
=
27807
26
Obs per group: min =
avg =
max =
511
1069.5
2154
Wald chi2(7)
Prob > chi2
625.50
0.0000
=
=
-----------------------------------------------------------------------------supportenv |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0038709
.0008152
-4.75
0.000
-.0054688
-.0022731
male |
.0978732
.0229632
4.26
0.000
.0528661
.1428802
dmar |
.0030441
.0252075
0.12
0.904
-.0463618
.05245
demp | -.0737466
.0252831
-2.92
0.004
-.1233007
-.0241926
educ |
.0857407
.0061501
13.94
0.000
.0736867
.0977947
incomerel |
.0090308
.0059314
1.52
0.128
-.0025945
.0206561
ses |
.131528
.0134248
9.80
0.000
.1052158
.1578402
_cons |
5.924611
.1287468
46.02
0.000
5.672272
6.17695
-------------+---------------------------------------------------------------sigma_u | .59876138
SD of u (intercepts); SD of e; intra-class correlation
sigma_e | 1.8701896
rho | .09297293
(fraction of variance due to u_i)
Linear Random Intercepts Model
• Notes: Model can also be estimated with
maximum likelihood estimation (MLE)
• Stata:
xtreg y x1 x2 x3, i(groupid) mle
– Versus “re”, which specifies weighted least squares estimator
• Results tend to be similar
• But, MLE results include a formal test to see whether
intercepts really vary across groups
– Significant p-value indicates that intercepts vary
. xtreg supportenv age male dmar demp educ incomerel ses, i(country) mle
Random-effects ML regression
Number of obs
=
27807
Group variable (i): country
Number of groups
=
26
… MODEL RESULTS OMITTED …
/sigma_u |
.5397755
.0758087
.4098891
.7108206
/sigma_e |
1.869954
.0079331
1.85447
1.885568
rho |
.0769142
.019952
.0448349
.1240176
-----------------------------------------------------------------------------Likelihood-ratio test of sigma_u=0: chibar2(01)= 2128.07 Prob>=chibar2 = 0.000
Choosing Models
• Which model is best?
• There is much discussion (e.g, Halaby 2004)
• Fixed effects are most consistent under a
wide range of circumstances
• Consistent: Estimates approach true parameter values
as N grows very large
• But, they are less efficient than random effects
– In cases with low within-group variation (big between group
variation) and small sample size, results can be very poor
– Random Effects = more efficient
• But, runs into problems if specification is poor
– Esp. if X variables correlate with random group effects
– Usually due to omitted variables.
Hausman Specification Test
• Hausman Specification Test: A tool to help
evaluate fit of fixed vs. random effects
• Logic: Both fixed & random effects models are
consistent if models are properly specified
• However, some model violations cause random effects
models to be inconsistent
– Ex: if X variables are correlated to random error
• In short: Models should give the same results… If not,
random effects may be biased
– If results are similar, use the most efficient model: random
effects
– If results diverge, odds are that the random effects model is
biased. In that case use fixed effects…
Hausman Specification Test
• Strategy: Estimate both fixed & random
effects models
• Save the estimates each time
• Finally invoke Hausman test
– Ex:
•
•
•
•
•
xtreg var1 var2 var3, i(groupid) fe
estimates store fixed
xtreg var1 var2 var3, i(groupid) re
estimates store random
hausman fixed random
Hausman Specification Test
• Example: Environmental attitudes fe vs re
. hausman fixed random
Direct comparison of coefficients…
---- Coefficients ---|
(b)
(B)
(b-B)
sqrt(diag(V_b-V_B))
|
fixed
random
Difference
S.E.
-------------+---------------------------------------------------------------age |
-.0038917
-.0038709
-.0000207
.0000297
male |
.0979514
.0978732
.0000783
.0004277
dmar |
.0024493
.0030441
-.0005948
.0007222
demp |
-.0733992
-.0737466
.0003475
.0007303
educ |
.0856092
.0857407
-.0001314
.0002993
incomerel |
.0088841
.0090308
-.0001467
.0002885
ses |
.1318295
.131528
.0003015
.0004153
-----------------------------------------------------------------------------b = consistent under Ho and Ha; obtained from xtreg
B = inconsistent under Ha, efficient under Ho; obtained from xtreg
Test:
Ho:
difference in coefficients not systematic
chi2(7) = (b-B)'[(V_b-V_B)^(-1)](b-B)
=
2.70
Prob>chi2 =
0.9116
Non-significant pvalue indicates
that models yield
similar results…
Within & Between Effects
• Issue: What is the relationship between
within-group effects and between-group
effects?
• FEM models within-group variation
• BEM models between group variation (aggregate)
– Usually they are similar
• Ex: Student skills & test performance
• Within any classroom, skilled students do best on tests
• Between classrooms, classes with more skilled
students have higher mean test scores
– BUT…
Within & Between Effects
• But: Between and within effects can differ!
• Ex: Effects of wealth on attitudes toward welfare
• At the country level (between groups):
– Wealthier countries (high aggregate mean) tend to have prowelfare attitudes (ex: Scandinavia)
• At the individual level (within group)
– Wealthier people are conservative, don’t support welfare
• Result: Wealth has opposite between vs within effects!
– Watch out for ecological fallacy!!!
– Issue: Such dynamics often result from omitted
level-1 variables (omitted variable bias)
• Ex: If we control for individual “political conservatism”,
effects may be consistent at both levels…
Within & Between Effects / Centering
• Multilevel models & “centering” variables
• Grand mean centering: computing variables
as deviations from overall mean
• Often done to X variables
• Has effect that baseline constant in model reflects
mean of all cases
– Useful for interpretation
• Group mean centering: computing variables
as deviation from group mean
• Useful for decomposing within vs. between effects
• Often in conjunction with aggregate group mean vars.
Within & Between Effects
• You can estimate BOTH within- and betweengroup effects in a single model
• Strategy: Split a variable (e.g., SES) into two new
variables…
– 1. Group mean SES
– 2. Within-group deviation from mean SES
» Often called “group mean centering”
• Then, put both variables into a random effects model
• Model will estimate separate coefficients for between
vs. within effects
– Ex:
• egen meanvar1 = mean(var1), by(groupid)
• egen withinvar1 = var1 – meanvar1
• Include mean (aggregate) & within variable in model.
Within & Between Effects
• Example: Pro-environmental attitudes
. xtreg supportenv meanage withinage male dmar demp educ incomerel ses,
i(country) mle
Random-effects ML regression
Group variable (i): country
Random effects
~ Gaussian
Between
& withinu_i
effects
are opposite. Older
countries are MORE environmental, but older
people are LESS.
Omitted variables? Wealthy European countries
Log strong
likelihood
-56918.299
with
green =parties
have older populations!
Number of obs
Number of groups
=
=
27807
26
Obs per group: min =
avg =
max =
511
1069.5
2154
LR chi2(8)
Prob > chi2
620.41
0.0000
=
=
-----------------------------------------------------------------------------supportenv |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------meanage |
.0268506
.0239453
1.12
0.262
-.0200812
.0737825
withinage |
-.003903
.0008156
-4.79
0.000
-.0055016
-.0023044
male |
.0981351
.0229623
4.27
0.000
.0531299
.1431403
dmar |
.003459
.0252057
0.14
0.891
-.0459432
.0528612
demp | -.0740394
.02528
-2.93
0.003
-.1235873
-.0244914
educ |
.0856712
.0061483
13.93
0.000
.0736207
.0977216
incomerel |
.008957
.0059298
1.51
0.131
-.0026651
.0205792
ses |
.131454
.0134228
9.79
0.000
.1051458
.1577622
_cons |
4.687526
.9703564
4.83
0.000
2.785662
6.58939
Generalizing: Random Coefficients
• Linear random intercept model allows random
variation in intercept (mean) for groups
• But, the same idea can be applied to other coefficients
• That is, slope coefficients can ALSO be random!
Random Coefficient Model
Yij  1   1 j  2 X ij   2 j X ij   ij
Yij  1   1 j  2   2 j X ij   ij
Which can be written as:
• Where zeta-1 is a random intercept component
• Zeta-2 is a random slope component.
Linear Random Coefficient Model
Both
intercepts
and slopes
vary
randomly
across j
groups
Rabe-Hesketh & Skrondal 2004, p. 63
Random Coefficients Summary
• Some things to remember:
• Dummy variables allow fixed estimates of intercepts
across groups
• Interactions allow fixed estimates of slopes across
groups
– Random coefficients allow intercepts and/or
slopes to have random variability
• The model does not directly estimate those effects
– Just as we don’t estimate coefficients of “e” for each case…
• BUT, random components can be predicted after you
run a model
– Just as you can compute residuals – random error
– This allows you to examine some assumptions (normality).
STATA Notes: xtreg, xtmixed
• xtreg – allows estimation of between, within
(fixed), and random intercept models
•
•
•
•
xtreg y x1 x2 x3, i(groupid) fe - fixed (within) model
xtreg y x1 x2 x3, i(groupid) be - between model
xtreg y x1 x2 x3, i(groupid) re - random intercept (GLS)
xtreg y x1 x2 x3, i(groupid) mle - random intercept (MLE)
• xtmixed – allows random slopes & coefs
• “Mixed” models refer to models that have both fixed and
random components
• xtmixed [depvar] [fixed equation] || [random eq], options
• Ex: xtmixed y x1 x2 x3 || groupid: x2
– Random intercept is assumed. Random coef for X2 specified.
STATA Notes: xtreg, xtmixed
• Random intercepts
• xtreg y x1 x2 x3, i(groupid) mle
– Is equivalent to
• xtmixed y x1 x2 x3 || groupid: , mle
• xtmixed assumes random intercept – even if no other
random effects are specified after “groupid”
– But, we can add random coefficients for all Xs:
• xtmixed y x1 x2 x3 || groupid: x1 x2 x3 , mle cov(unstr)
– Useful to add: “cov(unstructured)”
• Stata default treats random terms (intercept, slope) as
totally uncorrelated… not always reasonable
• “cov(unstr) relaxes constraints regarding covariance
among random effects (See Rabe-Hesketh &
Skrondal).
STATA Notes: GLLAMM
• Note: xtmixed can do a lot… but GLLAMM
can do even more!
• “General linear & latent mixed models”
• Must be downloaded into stata. Type “search gllamm”
and follow instructions to install…
– GLLAMM can do a wide range of mixed & latentvariable models
• Multilevel models; Some kinds of latent class models;
Confirmatory factor analysis; Some kinds of Structural
Equation Models with latent variables… and others…
• Documentation available via Stata help
– And, in the Rabe-Hesketh & Skrondal text.
Random intercepts: xtmixed
• Example: Pro-environmental attitudes
. xtmixed supportenv age male dmar demp educ incomerel ses || country: , mle
Mixed-effects ML regression
Group variable: country
Wald chi2(7)
=
625.75
Log likelihood = -56919.098
Number of obs
Number of groups
=
=
27807
26
Obs per group: min =
avg =
max =
511
1069.5
2154
Prob > chi2
0.0000
=
-----------------------------------------------------------------------------supportenv |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0038662
.0008151
-4.74
0.000
-.0054638
-.0022687
male |
.0978558
.0229613
4.26
0.000
.0528524
.1428592
dmar |
.0031799
.0252041
0.13
0.900
-.0462193
.0525791
demp | -.0738261
.0252797
-2.92
0.003
-.1233734
-.0242788
educ |
.0857707
.0061482
13.95
0.000
.0737204
.097821
incomerel |
.0090639
.0059295
1.53
0.126
-.0025578
.0206856
ses |
.1314591
.0134228
9.79
0.000
.1051509
.1577674
_cons |
5.924237
.118294
50.08
0.000
5.692385
6.156089
-----------------------------------------------------------------------------[remainder of output cut off] Note: xtmixed yields identical results to xtreg , mle
Random intercepts: xtmixed
• Ex: Pro-environmental attitudes (cont’d)
supportenv |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0038662
.0008151
-4.74
0.000
-.0054638
-.0022687
male |
.0978558
.0229613
4.26
0.000
.0528524
.1428592
dmar |
.0031799
.0252041
0.13
0.900
-.0462193
.0525791
demp | -.0738261
.0252797
-2.92
0.003
-.1233734
-.0242788
educ |
.0857707
.0061482
13.95
0.000
.0737204
.097821
incomerel |
.0090639
.0059295
1.53
0.126
-.0025578
.0206856
ses |
.1314591
.0134228
9.79
0.000
.1051509
.1577674
_cons |
5.924237
.118294
50.08
0.000
5.692385
6.156089
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------country: Identity
|
sd(_cons) |
.5397758
.0758083
.4098899
.7108199
-----------------------------+-----------------------------------------------sd(Residual) |
1.869954
.0079331
1.85447
1.885568
-----------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 2128.07 Prob >= chibar2 = 0.0000
xtmixed output puts all random effects below main
coefficients. Here, they are “cons” (constant) for groups
defined by “country”, plus residual (e)
Non-zero SD
indicates that
intercepts vary
Random Coefficients: xtmixed
• Ex: Pro-environmental attitudes (cont’d)
. xtmixed supportenv age male dmar demp educ incomerel ses || country: educ, mle
[output omitted]
supportenv |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0035122
.0008185
-4.29
0.000
-.0051164
-.001908
male |
.1003692
.0229663
4.37
0.000
.0553561
.1453824
dmar |
.0001061
.0252275
0.00
0.997
-.0493388
.049551
demp | -.0722059
.0253888
-2.84
0.004
-.121967
-.0224447
educ |
.081586
.0115479
7.07
0.000
.0589526
.1042194
incomerel |
.008965
.0060119
1.49
0.136
-.0028181
.0207481
ses |
.1311944
.0134708
9.74
0.000
.1047922
.1575966
_cons |
5.931294
.132838
44.65
0.000
5.670936
6.191652
-----------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------country: Independent
|
sd(educ) |
.0484399
.0087254
.0340312
.0689492
sd(_cons) |
.6179026
.0898918
.4646097
.821773
-----------------------------+-----------------------------------------------sd(Residual) |
1.86651
.0079227
1.851046
1.882102
-----------------------------------------------------------------------------LR test vs. linear regression:
chi2(2) = 2187.33
Prob > chi2 = 0.0000
Here, we have allowed the slope of educ to vary
randomly across countries
Educ (slope) varies, too!
Random Coefficients: xtmixed
• What if the random intercept or slope
coefficients aren’t significantly different from
zero?
• Answer: that means there isn’t much random variability
in the slope/intercept
• Conclusion: You don’t need to specify that random
parameter
– Also: Models include a LRtest to compare with a
simple OLS model (no random effects)
• If models don’t differ (Chi-square is not significant) stick
with a simpler model.
Random Coefficients: xtmixed
• What are random coefficients doing?
• Let’s look at results from a simplified model
8
– Only random slope & intercept for education
3
4
5
6
7
Model fits a
different slope
& intercept for
each group!
0
2
4
6
highest educational level attained
8
Random Coefficients
• Why bother with random coefficients?
– 1. A solution for clustering (non-independence)
– Usually people just use random intercepts, but slopes may be
an issue also
– 2. You can create a better-fitting model
– If slopes & intercepts vary, a random coefficient model may fit
better
– Assuming distributional assumptions are met
– Model fit compared to OLS can be tested….
– 3. Better predictions
– Attention to group-specific random effects can yield better
predictions (e.g., slopes) for each group
» Rather than just looking at “average” slope for all groups.
Random Coefficients
• 4. Multilevel models explicitly put attention on
levels of causality
• Higher level / “contextual” effects versus individual /
unit-level effects
• A technology for separating out between/within
• NOTE: this can be done w/out random effects
– But it goes hand-in-hand with clustered data…
• Note: Be sure you have enough level-2 units!
– Ex: Models of individual environmental attitudes
• Adding level-2 effects: Democracy, GDP, etc.
– Ex: Classrooms
• Is it student SES, or “contextual” class/school SES?
Multilevel Model Notation
• So far, we have expressed random effects in
a single equation:
Random Coefficient Model
Yij  1   1 j  2 X ij   2 j X ij   ij
• However, it is common to separate levels:
Level 1 equation
Yij  1  2 X ij   ij
Intercept equation
1   1  u1 j
Slope Equation
2   2  u2 j
Gamma = constant
u = random effect
Here, we specify a random component for
level-1 constant & slope
Multilevel Model Notation
• The “separate equation” formulation is no
different from what we did before…
• But it is a vivid & clear way to present your models
• All random components are obvious because they are
stated in separate equations
• NOTE: Some software (e.g., HLM) requires this
– Rules:
• 1. Specify an OLS model, just like normal
• 2. Consider which OLS coefficients should have a
random component
– These could be the intercept or any X (slope) coefficient
• 3. Specify an additional formula for each random
coefficient… adding random components when desired
Cross-Level Interactions
• Does context (i.e., level-2) influence the effect
of level-1 variables?
– Example: Effect of poverty on homelessness
• Does it interact with welfare state variables?
– Ex: Effect of gender on math test scores
• Is it different in coed vs. single-sex schools?
– Can you think of others?
Cross-level interactions
• Idea: specify a level-2 variable that affects a
level-1 slope
Level 1 equation
Yij  1  2 X ij   ij
1   1  u1 j
Intercept equation
Slope equation with interaction
2   2   3Z j  u2 j
Cross-level interaction:
Level-2 variable Z affects slope (B2) of
a level-1 X variable

Coefficient 3 reflects size of
interaction (effect on B2 per unit
change in Z)
Cross-level Interactions
• Cross-level interaction in single-equation
form:
Random Coefficient Model with cross-level interaction
Yij  1  1 j  2 X ij   2 j X ij  3Xij  Z j  ij
– Stata strategy: manually compute cross-level
interaction variables
• Ex: Poverty*WelfareState, Gender*SingleSexSchool
• Then, put interaction variable in the “fixed” model
– Interpretation: B3 coefficient indicates the impact
of each unit change in Z on slope B2
• If B3 is positive, increase in Z results in larger B2 slope.
Cross-level Interactions
• Pro-environmental attitudes
. xtmixed supportenv age male dmar demp educ income_dev inc_meanXeduc ses ||
country: income_mean , mle cov(unstr)
Mixed-effects ML regression
Group variable: country
Interaction between country mean
Number of obs
=
27807
income and
individual-level
education
Number of groups
=
26
supportenv |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0038786
.0008148
-4.76
0.000
-.0054756
-.0022817
male |
.1006206
.0229617
4.38
0.000
.0556165
.1456246
dmar |
.0041417
.025195
0.16
0.869
-.0452395
.0535229
demp | -.0733013
.0252727
-2.90
0.004
-.1228348
-.0237678
educ |
-.035022
.0297683
-1.18
0.239
-.0933668
.0233227
income_dev |
.0081591
.005936
1.37
0.169
-.0034753
.0197934
inc_meanXeduc|
.0265714
.0064013
4.15
0.000
.0140251
.0391177
ses |
.1307931
.0134189
9.75
0.000
.1044926
.1570936
_cons |
5.892334
.107474
54.83
0.000
5.681689
6.102979
------------------------------------------------------------------------------
Interaction: inc_meanXeduc has a positive effect… The education slope is
bigger in wealthy countries
Note: main effects change. “educ” indicates slope when inc_mean = 0
Cross-level Interactions
• Random part of output (cont’d from last slide)
. xtmixed supportenv age male dmar demp educ income_dev inc_meanXeduc ses ||
country: income_mean , mle cov(unstr)
-----------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------country: Unstructured
|
sd(income~n) |
.5419256
.2095339
.253995
1.156256
sd(_cons) |
2.326379
.8679172
1.11974
4.8333
corr(income~n,_cons) | -.9915202
.0143006
-.999692
-.7893791
-----------------------------+-----------------------------------------------sd(Residual) |
1.869388
.0079307
1.853909
1.884997
-----------------------------------------------------------------------------LR test vs. linear regression:
chi2(3) = 2124.20
Prob > chi2 = 0.0000
Random components:
Income_mean slope allowed to have random variation
Interceps (“cons”) allowed to have random variation
“cov(unstr)” allows for the possibility of correlation between
random slopes & intercepts… generally a good idea.
Beyond 2-level models
• Sometimes data has 3 levels or more
•
•
•
•
Ex: School, classroom, individual
Ex: Family, individual, time (repeated measures)
Can be dealt with in xtmixed, GLLAMM, HLM
Note: stata manual doesn’t count lowest level
– What we call 3-level is described as “2-level” in stata manuals
– xtmixed syntax: specify “fixed” equation and then
random effects starting with “top” level
• xtmixed var1 var2 var3 || schoolid: var2 || classid:var3
– Again, specify unstructured covariance: cov(unstr)
Beyond Linear Models
• Stata can specify multilevel models for
dichotomous & count variables
– Random intercept models
•
•
•
•
xtlogit – logistic regression – dichotomous
xtpois – poisson regression – counts
xtnbreg – negative binomial – counts
xtgee – any family, link… w/random intercept
– Random intercept & coefficient models
– Plus, allows more than 2 levels…
• xtmelogit – mixed logit model
• xtmepoisson – mixed poisson model
Panel Data
• Panel data is a multilevel structure
• Cases measured repeatedly over time
• Measurements are ‘nested’ within cases
Person 1
Person 2
Person 3
Person 4
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
– Obviously, error is clustered within cases… but…
– Error may also be clustered by time
• Historical time events or life-course events may mean
that cases aren’t independent
– Ex: All T1s and all T5s
• Ex: Models of economic growth… certain periods
(e.g., Oil shocks of 1970s) affect all countries.
Panel Data
• Issue: panel data may involve clustering
across cases & time
• Good news: Stata’s “xt” commands were
made for this
• Allow specification of both ID and TIME clusters…
• Ex: xtreg var1 var2 var3, mle i(countryid) t(year)
– Note: You can also “mix and match” fixed and
random effects
• Ex: You can use dummies (manually) to deal with timeclustering with a random effect for case ids
Panel Data: serial correlation
• Panel data may have another problem:
• Sequential cases may have correlated error
– Ex: Adjacent years (1950 & 1951 or 2007 & 2008) may be
very similar. Correlation denoted by “rho” (r)
• Called “autocorrelation” or “serial correlation”
• “Time-series” models are needed
• xtregar – xtreg, for cases in which the error-term is
“first-order autoregressive”
• First order means the prior time influences the current
– Only adjacent time-points… assumes no effect of those prior
• Can be used to estimate FEM, BEM, or GLS model
• Use option “lbi” to test for autocorrelation (rho = 0?).
Panel Data: Choosing a Model
• If clustering is mainly a nuisance:
• Adjust SEs: vce(cluster caseid)
• Or simple fixed or random effects
– Choice between fixed & random
• Fixed is “safer” – reviewers are less likely to complain
– If hausman test works, random = OK, too
• But, if cross-sectional variation is of interest, fixed can
be a problem…
– In that case, use random effects… and hope the reviewers
don’t give you grief.
Panel Data: Choosing a Model
• If you have substantive interests in cross-level
dynamics, mixed models are probably the
way to go…
• Plus, you can create a better-fitting model
– Allows you to relax the assumption that slopes are the same
across groups.