Transcript Multilevel modelling in 15 minutes
Scottish Social Survey Network: Master Class 1
Data Analysis with Stata
Dr Vernon Gayle and Dr Paul Lambert
23 rd January 2008, University of Stirling The SSSN is funded under Phase II of the ESRC Research Development Initiative
1
Multilevel data and analysis with Stata (in 15 minutes)
2
Generalised linear model
• Y = BX + e • Y = outcome variable(s) • X = explanatory variables • e = error term for each individual response
Generalised linear mixed models
–
Adding complexity to the GLM, such as by disaggregating the error structures
3
The work of statistical modelling
• Y i = BX i + e i • Most of the time: – we have a single Y – we ignore e – we concentrate on what goes into B 4
Example
• Data: British Household Panel Survey 2005 adult interviews (7k adults in work) • Y = GHQ scale score for adults in employment (General Health Questionnaire, higher = worse subjective well-being) • X = various possible measures, including gender, age, marital status, occupational advantage, education, partner ’s GHQ • You can run this example, the files are at: 5
Cons Fem Age Age-squared Cohab Own CAMSIS Father’s CAMSIS Degree/Diploma Vocational qual No qual Works > 10hrs Partner’s GHQ R2 Results from four linear models
1
11.03** -0.33*
2
6.29** 1.25** 0.22** -0.0024** -0.77**
3
6.14** 1.28** 0.23** -0.0026** -0.76** -0.01* 0.01
-0.05
-0.13
-0.11
0.13
0.0009
0.0234
0.0244
4
6.56** 1.39** 0.22** -0.0024** -1.52** -0.01
0.08** 0.0293
Some regression assumptions
All variables are measured without errors All relevant predictors of the independent variable are included in the analysis Expected value of the error is zero Heteroscedasticity of the error No autocorrelation (no relation between error terms for different cases) – [above using: Menard, S. 1995.
Applied Logistic Regression Analysis
, London: Sage.] 7
Multilevel modelling
• What if there was some connection between some of the cases within the dataset? – This occurs by design in certain projects • e.g. educational research, sample includes multiple children from the same school – Some connections (‘hierarchical clusters’) are standard in most social surveys 8
Regions Wave 1 PSU 1 . . Individuals . . Person Groups PSU 2 PSU 3 Wave 2 Wave 3 . . . . Interviewers :
W 1, 3 :
Interviewer 1
W 2 only :
Interviewer 2
. .
Interviewer 2
. .
Interviewer 3
. .
.
.
Interviewer 3 Interviewer 1
9
How to account for hierarchy / clustering in individual data? 1. We could try a unique dummy var. for every cluster – – – – Country:
Y = BX + scot + wal + Nir + e ‘areg’ in Stata allows several hundred variables like this
often called a ‘hierarchical fixed effect’ but many hierarchies have too many clusters for this to be satisfactory 2. We could use higher level explanatory variables – – e.g. average unemployment rate in local authority district these are also ‘hierarchical fixed effects’ 3. We could try telling the model that we expect the error terms to be related – these are ‘hierarchical random effects’ = multilevel models 10
Creating a multilevel model
• Linear model: Y i = BX i + e i • Multilevel model (‘random intercepts’) Y ij = BX ij + u j + e ij • Multilevel model (‘random coefficients’) Y ij = BX ij + UB j + u j + e ij 11
How to implement multilevel models? • In SPSS and Stata, there are extension specifications which can be made in order to specify the simplest random intercepts model 12
Stata examples
• regress ghq fem age age2 cohab • regress ghq fem age age2 cohab, robust cluster(ohid) • xtmixed ghq fem age age2 cohab ||ohid: 13
Comments
• Models which ignore clustering should be unbiassed but inefficient • The simplest multilevel model: Shouldn’t change coefficent estimates (unbiased) Should change confidence intervals (inefficient) 14
15
16
3-level model in Stata (xtmixed) 17
The same model in MLwiN 18
A controversial claim about Stata • Stata is
the best
because: package to use for multilevel modelling, – It is integrated with data management capacity: easy to change variables; change cases; add higher level explanatory variables; etc – It has a wide range of hierarchical model estimators – It allows easy comparison between long-standing hierarchical estimators (from economics) and new random effects models • By constrast: – Other mainstream packages don’t have adequate range of model estimators – Specialist packages (e.g. MLwiN; HLM) do have more advanced modelling estimators, but they inhibit data manipulation / serious model building 19