Analyzing Health Equity Using Household Survey Data Lecture 10 Multivariate Analysis of Health Survey Data “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy.
Download ReportTranscript Analyzing Health Equity Using Household Survey Data Lecture 10 Multivariate Analysis of Health Survey Data “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy.
Analyzing Health Equity Using Household Survey Data Lecture 10 Multivariate Analysis of Health Survey Data
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Why multivariate analysis?
• Health sector inequalities
measured
through bivariate relationship b/w health vbl. and SES • To go beyond measurement of inequalities, need multivariate analysis, e.g.
–
Finer description
of inequality through standardisation for age, gender, etc.
–
Explanation
of inequality through decomposition of covariance – Identification of
causal
relationship b/w health vbl. and SES “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Descriptive analysis
• Aim is to
describe
SES related inequality in health • How does health vary with SES, conditional on other factors?
• OLS describes how mean of health varies with SES, conditional on controls • Modelling issues (OVB, endogeneity) are irrelevant • But, cannot place causal interpretation on estimates “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Causal analysis
• For causal inference need modelling approach • Appropriate model and estimator depends upon degree of detail required • To identify total causal effect and not its mechanisms, reduced form is adequate e.g. decomposition • To separately identify direct and indirect effects, need structural model “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Household production model
• Health “produced” from inputs • Inputs selected conditional on (unobservable) health endowments • So, inputs endogenous • RF demand relations combined technological impact and behavioural response • To isolate technological impact, must confront endogeneity of inputs: – Instrumental variables – Panel data “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Sample design and area effects
• Health data come from complex surveys • Stratified sampling – separate sampling from population sub-groups (
strata
) • Cluster sampling – clusters of observations not sampled independently • Over sampling – e.g. of poor, insured • Area effects – feature of population but importance depends on sample design “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Standard stratified sampling
• Population categorised by relatively few strata e.g. urban/rural, regions • Separate random sample of pre-defined size selected from each strata • Sample strata proportions need not correspond to population proportions sample weights (separate issue) • In pop. means differ by strata, standard errors of means and other descriptive statistics should be adjusted down “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Stratification and modelling
• Exogenous stratification – OLS is consistent, efficient and SEs valid • Endogenous stratification – adjust SEs • Relative to simple SEs, adjustment can be important • Relative to corrections for hetero. and clustering, adjustment is usually modest • May want intercept/slope differences by strata “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Example of adjustment to OLS standard errors
Table 1: OLS regression of height-for-age z-scores (*-100), Vietnam 1998 (children < 10 years)
Standard Errors Coefficient Unadjusted Stratification Hetero.
Cluster Child's age (months) 3.70*** 0.1986
adjusted 0.2466
Robust 0.2470
adjusted 0.2885
Strat. & cluster adj.
0.2872
Child's age squared (/100) Child is male -2.38*** 12.31*** 0.1554
3.2927
0.1755
3.2708
0.1758
3.2792
0.1966
3.3649
0.1957
3.2844
(log) Hhold. Consumption per capita Safe drinking water Satifactory sanitation Years of schooling of household head Mother has primary school diploma -37.85*** -7.43
-15.53*** -0.87* -2.33
3.9843
4.9533
5.1009
0.4804
4.0598
4.1046
4.8300
4.8199
0.4770
4.1309
4.1116
4.8441
4.8326
0.4786
4.1397
5.4035
9.1538
6.1202
0.7302
6.1913
5.4582
9.2098
6.0937
0.7188
6.2438
Sample size 5218
Notes
: Dependent variable is
negative
of z-score, multiplied by 100.
***, ** & * indicate 1%, 5% & 10% significance according to unadjusted standard errors.
Bold indicates a change in significance level relative to that using unadjusted standard errors.
Regression also contains region dummies at the level of stratification.
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Cluster sampling
• • • 1.
2.
2-stage (or more) sampling process Clusters sampled from pop./strata Households sampled from clusters Observations are not independent within clusters and likely correlated through unobservables Consequences and remedies depend on the nature of the within cluster correlation “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Exogenous cluster effects
y ic
X β
ic
c
ic
,
E
ic
|
X
ic
,
c
E
0, (1) If
E
c
|
X
ic
E
have random effects model . Conventional estimators e.g. OLS, probit, etc. are consistent but inefficient and SEs need adjustment.
Can accept inefficiency and adjust SEs. In Stata, use option cluster(varname) For efficiency, must estimate and take account of within cluster correlation, e.g. GLS, random effects probit. “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Endogenous cluster effects
(1) with
E
c
|
X
ic
E
c
is the fixed effects model Regressors correlated with composite error estimators are inconsistent.
conventional Need to purge cluster effects from composite error.
In linear model – cluster dummies, differences from cluster means or first differences.
Binary choice – fixed effects logit.
Having purged cluster effects, is no need to correct SEs “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Comparison of estimators for a cluster sample
Table 2: Regressions of height-for-age z-scores (*-100), Vietnam 1998 (children < 10 years)
OLS Random Effects Fixed Effects Coeff.
Cluster adjusted SE Coeff.
Robust SE Coeff.
Robust SE Child's age (months) Child's age squared (/100) Child is male (log) Hhold. Consumption p.c.
3.72*** -2.40*** 12.26*** -50.93*** 0.2917
0.1987
3.4527
5.1149
3.74*** - 2.40*** 12.19*** -43.17*** 0.2451
0.1742
3.2394
4.0778
3.78*** -2.44*** 12.97*** -30.37*** 0.2430
0.1732
3.2443
4.6090
Safe drinking water Satifactory sanitation Years of schooling of HoH Mum has primary school dip.
-12.55
-22.90*** - 0.39
2.67
8.6438
5.6974
0.6628
5.3187
44.5600
-7.93
-19.39*** -0.33
1.71
377.01*** 4.8984
4.8446
0.4828
4.1140
32.1941
-2.75
-9.77** -0.55
1.74
276.19*** Intercept Sample size 5218 445.00*** R 2 0.1527 B-P LM 485.84 (0.000) Hausman
Notes
: Dependent variable is
negative
of z-score, multiplied by 100.
***, ** & * indicate significance at 1%, 5% & 10% respectively.
50.54 (0.0000) SE - standard error, Robust SE - robust to general heteroskedasticity.
B-P LM - Breusch-Pagan Lagrange Multiplier test of significance of commune effects (p-value).
5.4247
4.9364
0.5081
4.3186
35.0991
Hausman - Hausman test of random versus fixed effects (p-value).
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Stata computation
OLS with cluster corrected SEs regr depvar
varlist,
cluster(commune) OLS with cluster and stratification corrected SEs svyset commune, strata(region) svy: reg depvar
varlist
Random effects (FGLS) xtreg depvar
varlist
, re i(commune) Fixed effects xtreg depvar
varlist
, fe i(commune) “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
But community effects can be interesting
Exogenous community effects
y ic
X
ic
Z
c
c
ic
,
E
ic
|
X
ic
,
Z
c
,
c
*
E
ic
0 (2) Condition for consistency:
E
c
* |
X ic
,
Z c
E
c
* SEs need to be adjusted for within-cluster correlation.
Efficiency loss from OLS may not be large.
This REM also known as the hierarchical model. “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Endogenous community effects • With a single cross-section, not possible to include community level regressors • With panel data, can do this • In cross-section:
– Run fixed effects and obtain estimates of the community level effects – Regress these effects on community level regressors “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Example explanation of community effects
Table 3: Analysis of commune level variation in height-for-age z-scores (*-100), Rural Vietnam 1998 (children<10 years)
OLS Random Effects 2nd-stage Fixed Effects Cluster Robust
Commune Health Centre Vbls.
Vitamin A available >= 1/2 time Has electricity Coeff.
-10.11
-38.79*** adj. SE 6.6530
11.4558
Coeff.
-6.86143
-50.56*** SE 6.5927
12.1861
Coeff.
-8.27114
-45.34*** SE 6.7506
10.7991
Has clean water source Has sanitory toilet Has child growth chart Number of inpatient beds 9.57
-27.53*** -13.85* 1.52* 7.6534
7.0928
7.2046
0.8298
7.2341
-24.50*** -10.2623
2.12** 8.4061
7.6694
7.5879
0.9242
7.0070
-24.30*** -11.732
2.09** 8.7610
7.8715
7.6292
0.9744
Has a doctor Intercept Sample size 4099 11.39
371.89*** R 2 6.9765
48.8784
0.1313
9.6255
344.71*** B-P LM 7.1834
41.5639
248.42
(0.0000) 10.1856
279.13*** 7.5207
41.6264
Notes
: Dependent variable is
negative
of z-score, multiplied by 100.
OLS & Random Effects - Coefficients on commune level regressors only are presented. 2nd stage Fixed Effects - Estimated commune effects from fixed effects regressed on commune vbls..
***, ** & * indicate significance at 1%, 5% & 10% respectively.
SE - standard error, Robust SE - robust to general heteroskedasticity.
B-P LM - Breusch-Pagan Lagrange Multiplier test of significance of community effects (p-value).
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Stata computation for 2-step procedure
Run fixed effects and save predictions of the fixed effects xtreg depvar
varlist
, fe i(commune) predict ce, u Use the between-groups panel estimator to regress these predicted effects on community level regressors xtreg ce
varlist2
, be i(commune) “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Sample weights
• Stratification, over-sampling and non-response can all lead to a sample that is not representative of the population • Sample weights are the inverse of the probability that an observation is a sample member • Sample weights must be applied to get unbiased estimates of population means, etc. and correct SEs • Should also be applied in “descriptive regressions” “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Should weights be applied to estimate a model?
• If selection is on exogenous factors, unweighted estimates are consistent and more efficient than weighted – Simple (robust) SEs are OK • Otherwise, weighting required for consistency – If stratification and weights, take account of both in computation of SEs – If no stratification, apply conventional SE formula to weighted data.
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
What if there is parameter heterogeneity in population?
y is
X β
is s
is
Say we are interested in an average, such as
β
1
N s S
1
N s
β
s
Consistent estimate is the population weighted average of the sector specific OLS estimates ˆ
s
Unweighted OLS on the whole sample is not consistent for the average parameter.
But neither is weighted OLS on the whole sample.
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity
Example application of sample weights
Table 4: Weighted regressions of height-for-age z-scores (*-100), Vietnam 1998 (children < 10 years)
Coeff.
Child's age (months) 3.90*** Child's age squared (/100) Child is male -2.51*** 14.86*** (log) Hhold. Consumption p.c.
Safe drinking water -50.14*** -12.16
Satifactory sanitation -22.01*** Years of schooling of HoH Mum has primary school dip.
-0.21
3.62
Intercept Sample size 5218 428.15*** R 2 OLS Adjusted SE 0.3218
0.2206
3.5718
5.5131
10.2770
5.9503
0.7355
5.6510
48.9827
0.1496
Random Effects Coeff.
3.90*** -2.50*** 14.56*** -40.67*** -6.92
-19.81*** -0.15
3.04
347.47*** R 2 Fixed Effects Robust SE 0.2652
0.1875
3.3595
4.3511
5.1624
5.3653
Coeff.
3.91*** -2.51*** 14.89*** -26.05*** -2.07
-10.48* 0.5122
4.2925
34.9686
0.4320
-0.42
2.19
236.12*** R 2 Robust SE 0.2642
0.1875
3.3731
5.0196
5.6079
5.4439
0.5363
4.4958
38.5646
0.2457
Notes
: Dependent variable is
negative
of z-score, multiplied by 100.
***, ** & * indicate significance at 1%, 5% & 10% respectively.
Adjusted SE - standard error adjusted for clustering and stratification and robust to hetero. Robust SE - standard error robust to general heteroskedasticity.
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity