Analyzing Health Equity Using Household Survey Data Lecture 10 Multivariate Analysis of Health Survey Data “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy.

Transcript Analyzing Health Equity Using Household Survey Data Lecture 10 Multivariate Analysis of Health Survey Data “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy.

Analyzing Health Equity Using Household Survey Data Lecture 10 Multivariate Analysis of Health Survey Data

“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity

Why multivariate analysis?

• Health sector inequalities

measured

through bivariate relationship b/w health vbl. and SES • To go beyond measurement of inequalities, need multivariate analysis, e.g.

–

Finer description

of inequality through standardisation for age, gender, etc.

–

Explanation

of inequality through decomposition of covariance – Identification of

causal

relationship b/w health vbl. and SES “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity

Descriptive analysis

• Aim is to

describe

SES related inequality in health • How does health vary with SES, conditional on other factors?

• OLS describes how mean of health varies with SES, conditional on controls • Modelling issues (OVB, endogeneity) are irrelevant • But, cannot place causal interpretation on estimates “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity

Causal analysis

• For causal inference need modelling approach • Appropriate model and estimator depends upon degree of detail required • To identify total causal effect and not its mechanisms, reduced form is adequate e.g. decomposition • To separately identify direct and indirect effects, need structural model “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity

Household production model

• Health “produced” from inputs • Inputs selected conditional on (unobservable) health endowments • So, inputs endogenous • RF demand relations  combined technological impact and behavioural response • To isolate technological impact, must confront endogeneity of inputs: – Instrumental variables – Panel data “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity

Sample design and area effects

• Health data come from complex surveys • Stratified sampling – separate sampling from population sub-groups (

strata

) • Cluster sampling – clusters of observations not sampled independently • Over sampling – e.g. of poor, insured • Area effects – feature of population but importance depends on sample design “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity

Standard stratified sampling

• Population categorised by relatively few strata e.g. urban/rural, regions • Separate random sample of pre-defined size selected from each strata • Sample strata proportions need not correspond to population proportions  sample weights (separate issue) • In pop. means differ by strata, standard errors of means and other descriptive statistics should be adjusted down “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity

Stratification and modelling

• Exogenous stratification – OLS is consistent, efficient and SEs valid • Endogenous stratification – adjust SEs • Relative to simple SEs, adjustment can be important • Relative to corrections for hetero. and clustering, adjustment is usually modest • May want intercept/slope differences by strata “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity

Example of adjustment to OLS standard errors

Table 1: OLS regression of height-for-age z-scores (*-100), Vietnam 1998 (children < 10 years)

Standard Errors Coefficient Unadjusted Stratification Hetero.

Cluster Child's age (months) 3.70*** 0.1986

adjusted 0.2466

Robust 0.2470

adjusted 0.2885

Strat. & cluster adj.

0.2872

Child's age squared (/100) Child is male -2.38*** 12.31*** 0.1554

3.2927

0.1755

3.2708

0.1758

3.2792

0.1966

3.3649

0.1957

3.2844

(log) Hhold. Consumption per capita Safe drinking water Satifactory sanitation Years of schooling of household head Mother has primary school diploma -37.85*** -7.43

-15.53*** -0.87* -2.33

3.9843

4.9533

5.1009

0.4804

4.0598

4.1046

4.8300

4.8199

0.4770

4.1309

4.1116

4.8441

4.8326

0.4786

4.1397

5.4035

9.1538

6.1202

0.7302

6.1913

5.4582

9.2098

6.0937

0.7188

6.2438

Sample size 5218

Notes

: Dependent variable is

negative

of z-score, multiplied by 100.

***, ** & * indicate 1%, 5% & 10% significance according to unadjusted standard errors.

Bold indicates a change in significance level relative to that using unadjusted standard errors.

Regression also contains region dummies at the level of stratification.

Cluster sampling

• • • 1.

2-stage (or more) sampling process Clusters sampled from pop./strata Households sampled from clusters Observations are not independent within clusters and likely correlated through unobservables Consequences and remedies depend on the nature of the within cluster correlation “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity

Exogenous cluster effects

y ic



X β

  



 

, 

 

  0, (1) If

 

 

have random effects model . Conventional estimators e.g. OLS, probit, etc. are consistent but inefficient and SEs need adjustment.

Can accept inefficiency and adjust SEs. In Stata, use option cluster(varname) For efficiency, must estimate and take account of within cluster correlation, e.g. GLS, random effects probit. “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity

Endogenous cluster effects

(1) with

 

 

 

is the fixed effects model Regressors correlated with composite error  estimators are inconsistent.

conventional Need to purge cluster effects from composite error.

In linear model – cluster dummies, differences from cluster means or first differences.

Binary choice – fixed effects logit.

Having purged cluster effects, is no need to correct SEs “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity

Comparison of estimators for a cluster sample

Table 2: Regressions of height-for-age z-scores (*-100), Vietnam 1998 (children < 10 years)

OLS Random Effects Fixed Effects Coeff.

Cluster adjusted SE Coeff.

Robust SE Coeff.

Robust SE Child's age (months) Child's age squared (/100) Child is male (log) Hhold. Consumption p.c.

3.72*** -2.40*** 12.26*** -50.93*** 0.2917

0.1987

3.4527

5.1149

3.74*** - 2.40*** 12.19*** -43.17*** 0.2451

0.1742

3.2394

4.0778

3.78*** -2.44*** 12.97*** -30.37*** 0.2430

0.1732

3.2443

4.6090

Safe drinking water Satifactory sanitation Years of schooling of HoH Mum has primary school dip.

-12.55

-22.90*** - 0.39

2.67

8.6438

5.6974

0.6628

5.3187

44.5600

-7.93

-19.39*** -0.33

1.71

377.01*** 4.8984

4.8446

0.4828

4.1140

32.1941

-2.75

-9.77** -0.55

1.74

276.19*** Intercept Sample size 5218 445.00*** R 2 0.1527 B-P LM 485.84 (0.000) Hausman

Notes

: Dependent variable is

negative

of z-score, multiplied by 100.

***, ** & * indicate significance at 1%, 5% & 10% respectively.

50.54 (0.0000) SE - standard error, Robust SE - robust to general heteroskedasticity.

B-P LM - Breusch-Pagan Lagrange Multiplier test of significance of commune effects (p-value).

5.4247

4.9364

0.5081

4.3186

35.0991

Hausman - Hausman test of random versus fixed effects (p-value).

Stata computation

OLS with cluster corrected SEs regr depvar

varlist,

cluster(commune) OLS with cluster and stratification corrected SEs svyset commune, strata(region) svy: reg depvar

varlist

Random effects (FGLS) xtreg depvar

varlist

, re i(commune) Fixed effects xtreg depvar

varlist

, fe i(commune) “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity

But community effects can be interesting

Exogenous community effects



y ic



 

  





, 

* 

 

 0 (2) Condition for consistency:



* |

X ic

Z c





* SEs need to be adjusted for within-cluster correlation.

Efficiency loss from OLS may not be large.

This REM also known as the hierarchical model. “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity

Endogenous community effects • With a single cross-section, not possible to include community level regressors • With panel data, can do this • In cross-section:

– Run fixed effects and obtain estimates of the community level effects – Regress these effects on community level regressors “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity

Example explanation of community effects

Table 3: Analysis of commune level variation in height-for-age z-scores (*-100), Rural Vietnam 1998 (children<10 years)

OLS Random Effects 2nd-stage Fixed Effects Cluster Robust

Commune Health Centre Vbls.

Vitamin A available >= 1/2 time Has electricity Coeff.

-10.11

-38.79*** adj. SE 6.6530

11.4558

Coeff.

-6.86143

-50.56*** SE 6.5927

12.1861

Coeff.

-8.27114

-45.34*** SE 6.7506

10.7991

Has clean water source Has sanitory toilet Has child growth chart Number of inpatient beds 9.57

-27.53*** -13.85* 1.52* 7.6534

7.0928

7.2046

0.8298

7.2341

-24.50*** -10.2623

2.12** 8.4061

7.6694

7.5879

0.9242

7.0070

-24.30*** -11.732

2.09** 8.7610

7.8715

7.6292

0.9744

Has a doctor Intercept Sample size 4099 11.39

371.89*** R 2 6.9765

48.8784

0.1313

9.6255

344.71*** B-P LM 7.1834

41.5639

248.42

(0.0000) 10.1856

279.13*** 7.5207

41.6264

Notes

: Dependent variable is

negative

of z-score, multiplied by 100.

OLS & Random Effects - Coefficients on commune level regressors only are presented. 2nd stage Fixed Effects - Estimated commune effects from fixed effects regressed on commune vbls..

***, ** & * indicate significance at 1%, 5% & 10% respectively.

SE - standard error, Robust SE - robust to general heteroskedasticity.

B-P LM - Breusch-Pagan Lagrange Multiplier test of significance of community effects (p-value).

Stata computation for 2-step procedure

Run fixed effects and save predictions of the fixed effects xtreg depvar

varlist

, fe i(commune) predict ce, u Use the between-groups panel estimator to regress these predicted effects on community level regressors xtreg ce

varlist2

, be i(commune) “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity

Sample weights

• Stratification, over-sampling and non-response can all lead to a sample that is not representative of the population • Sample weights are the inverse of the probability that an observation is a sample member • Sample weights must be applied to get unbiased estimates of population means, etc. and correct SEs • Should also be applied in “descriptive regressions” “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington DC, 2008, www.worldbank.org/analyzinghealthequity

Should weights be applied to estimate a model?

• If selection is on exogenous factors, unweighted estimates are consistent and more efficient than weighted – Simple (robust) SEs are OK • Otherwise, weighting required for consistency – If stratification and weights, take account of both in computation of SEs – If no stratification, apply conventional SE formula to weighted data.

What if there is parameter heterogeneity in population?

y is



X β

is s

 

Say we are interested in an average, such as

 1

N s S

  1

N s

Consistent estimate is the population weighted average of the sector specific OLS estimates ˆ

Unweighted OLS on the whole sample is not consistent for the average parameter.

But neither is weighted OLS on the whole sample.

Example application of sample weights

Table 4: Weighted regressions of height-for-age z-scores (*-100), Vietnam 1998 (children < 10 years)

Coeff.

Child's age (months) 3.90*** Child's age squared (/100) Child is male -2.51*** 14.86*** (log) Hhold. Consumption p.c.

Safe drinking water -50.14*** -12.16

Satifactory sanitation -22.01*** Years of schooling of HoH Mum has primary school dip.

-0.21

3.62

Intercept Sample size 5218 428.15*** R 2 OLS Adjusted SE 0.3218

0.2206

3.5718

5.5131

10.2770

5.9503

0.7355

5.6510

48.9827

0.1496

Random Effects Coeff.

3.90*** -2.50*** 14.56*** -40.67*** -6.92

-19.81*** -0.15

3.04

347.47*** R 2 Fixed Effects Robust SE 0.2652

0.1875

3.3595

4.3511

5.1624

5.3653

Coeff.

3.91*** -2.51*** 14.89*** -26.05*** -2.07

-10.48* 0.5122

4.2925

34.9686

0.4320

-0.42

2.19

236.12*** R 2 Robust SE 0.2642

0.1875

3.3731

5.0196

5.6079

5.4439

0.5363

4.4958

38.5646

0.2457

Notes

: Dependent variable is

negative

of z-score, multiplied by 100.

***, ** & * indicate significance at 1%, 5% & 10% respectively.

Adjusted SE - standard error adjusted for clustering and stratification and robust to hetero. Robust SE - standard error robust to general heteroskedasticity.

Analyzing Health Equity Using Household Survey Data Lecture 10 Multivariate Analysis of Health Survey Data “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy.

Transcript Analyzing Health Equity Using Household Survey Data Lecture 10 Multivariate Analysis of Health Survey Data “Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy.