Transcript uk14_tobias

Analysis of time-stratified case crossover studies in environmental epidemiology using Stata

Aurelio Tobías Spanish Council for Scientific Research (CSIC), Barcelona, Spain Ben Armstrong and Antonio Gasparrini London School of Hygiene and Tropical Medicine, UK 2014 UK Stata Users Group meeting London, 12th September 2014

Background

The time stratified case crossover design is a popular alternative to conventional time series regression for analysing associations between time series of environmental exposures (air pollution, weather) and counts of health outcomes

The case-crossover design

• Proposed by Maclure (1991) to study transient effects on the risk of acute health events … compared to your usual routine?

Have you made ​​any unusual activity … Health event

The case-crossover design

• Proposed by Maclure (1991) to study transient effects on the risk of acute health events Exposed?

Exposed?

Control period Risk period Health event

The case-crossover design

• Proposed by Maclure (1991) to study transient effects on the risk of acute health events Exposed?

Exposed?

Control period • Analysis likewise a matched case-control Risk period Risk period Exp.

No Control Exp. No a c b d Health event OR = b/c

Application to environmental studies

• • • Firstly adapted to air pollution studies – Philadelphia (Neas et al. 1999) and Barcelona (Sunyer et al. 2000) comparing exposure levels for a given day (t) when health event occurs vs. levels before (t-7) and after (t+7) the health event Allows to control for time-trend and seasonality by design, since compares exposure levels between same weekdays within each month of each year

Time-stratified case-crossover

(Figueiras et al. 2010)

Option 1: conditional logistic regression

• • • Dataset needs to be reshaped from time-series format to individual matched case-control by using the new command,

. mkcco vary, date(vardate)

– – – – This also creates three new variables,

case

case days (=1)

wn

weight observations by n events on case day

ccset

set num. to match case-control days Then, use Stata’s

clogit

command

. use madrid, clear . list date y pm10 temp in 1/31, noobs clean

date y pm10 temp 01jan2003 64 14 9.6 02jan2003 56 12 11 03jan2003 73 16 9.8 04jan2003 69 14 8.8 05jan2003 71 17 4.3 06jan2003 68 9 7 07jan2003 63 19 3 08jan2003 85 13 6.8 09jan2003 67 13 4.6 10jan2003 80 22 2.7 11jan2003 65 23 1 12jan2003 95 17 .85 13jan2003 60 45 .5 14jan2003 76 60 .45 15jan2003 77 72 1.3 16jan2003 75 54 2.6 17jan2003 76 65 2 18jan2003 75 43 .8 19jan2003 74 12 6.3 20jan2003 79 19 5.8 21jan2003 73 16 8 22jan2003 72 21 6.7 23jan2003 72 25 7 24jan2003 74 46 6.2 25jan2003 59 42 8.1 26jan2003 77 26 10 27jan2003 64 22 15 28jan2003 65 41 9.7 29jan2003 74 18 5.7 30jan2003 77 17 4.8 31jan2003 55 17 2.3

Sun Mon 5 12 19 26 6 13 20 27 Tue 7 14 21 28 Wed 1 8 15 22 29 Thu 2 9 16 23 30 Fri 3 10 17 24 31 Sat 4 11 18 25

. generate dow = dow(date) . list date y pm10 temp in 1/31 if dow==1, noobs clean

date y pm10 temp 06jan2003 68 9 7 13jan2003 60 45 .5 20jan2003 79 19 5.8 27jan2003 64 22 15

. mkcco y, date(date)

Dataset reshaped from 1096 to 5480 obs. (1096 case days and 4384 control days).

New variables

case

,

wn

and

ccset

have been added to the resaphed dataset.

Sun Mon 5 12 19 26 6 13 20 27 Tue 7 14 21 28 Wed 1 8 15 22 29 Thu 2 9 16 23 30 Fri 3 10 17 24 31 Sat 4 11 18 25

. list date y pm10 temp case ccset wn in 1/36 if dow==1, noobs clean

date y pm10 temp case ccset wn 06jan2003 68 9 7 1 2003021 68 13jan2003 60 45 .5 0 2003021 68 20jan2003 79 19 5.8 0 2003021 68 27jan2003 64 22 14.8 0 2003021 68

...

Sun Mon 5 12 19 26 6 13 20 27 Tue 7 14 21 28 Wed 1 8 15 22 29 Thu 2 9 16 23 30 Fri 3 10 17 24 31 Sat 4 11 18 25

...

date y pm10 temp case ccset wn 06jan2003 68 9 7 0 2003022 60 13jan2003 60 45 .5 1 2003022 60 20jan2003 79 19 5.8 0 2003022 60 27jan2003 64 22 14.8 0 2003022 60

Sun Mon 5 12 19 26 6 13 20 27 Tue 7 14 21 28 Wed 1 8 15 22 29 Thu 2 9 16 23 30 Fri 3 10 17 24 31 Sat 4 11 18 25

...

date y pm10 temp case ccset wn 06jan2003 68 9 7 0 2003023 79 13jan2003 60 45 .5 0 2003023 79 20jan2003 79 19 5.8 1 2003023 79 27jan2003 64 22 14.8 0 2003023 79

Sun Mon 5 12 19 26 6 13 20 27 Tue 7 14 21 28 Wed 1 8 15 22 29 Thu 2 9 16 23 30 Fri 3 10 17 24 31 Sat 4 11 18 25

...

date y pm10 temp case ccset wn 06jan2003 68 9 7 0 2003024 64 13jan2003 60 45 .5 0 2003024 64 20jan2003 79 19 5.8 0 2003024 64 27jan2003 64 22 14.8 1 2003024 64

. clogit case pm10 [w=wn], group(ccset)

(frequency weights assumed) ...

Conditional (fixed-effects) logistic regression Number of obs = 295025 LR chi2(1) = 8.39

Prob > chi2 = 0.0038

Log likelihood = -98913.41 Pseudo R2 = 0.0000

----------------------------------------------------------------------------- case | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+--------------------------------------------------------------- pm10 | .0007406 .0002552 2.90 0.004 .0002404 .0012408

------------------------------------------------------------------------------

Option 1: conditional logistic regression

• Advantage – Similar approach to a matched case-control study, being familiar to the epidemiologist • Limitations – – Unfriendly data management Large (huge) data-sets at individual level – Slow convergence when fitting interaction terms for individual factors (sex, age) – Cannot account for overdispersion and residual autocorrelation

Option 2: Poisson regression

• • • Lu and Zeger (2007) –

“ … the time-stratified case-crossover design leads to the same estimate as obtained from a Poisson regression with dummy variables indicating the strata”

Strata defined by the 3-way interaction between year, month and day of week Either

poisson

or

glm

(jointly with the

scale

option) Stata’s commands can be used

. use madrid, clear . generate yy = year(date) . generate mm = month(date) . generate dow = dow(date) . poisson y pm10 i.yy##i.mm#i.dow

...

Poisson regression Number of obs = 1096 LR chi2(252) = 1374.12

Prob > chi2 = 0.0000

Log likelihood = -3787.589 Pseudo R2 = 0.1535

----------------------------------------------------------------------------- y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+--------------------------------------------------------------- pm10 | .0007406 .0002552 2.90 0.004 .0002404 .0012408

| ... 252 parameters!

| _cons | 4.35927 .0563535 77.36 0.000 4.248819 4.469721

------------------------------------------------------------------------------

. glm y pm10 i.yy##i.mm#i.dow, family(poisson) link(log) scale(x2) ...

Generalized linear models No. of obs = 1096 Optimization : ML Residual df = 843 Scale parameter = 1 Deviance = 1069.73306 (1/df) Deviance = 1.26896

Pearson = 1065.616046 (1/df) Pearson = 1.264076

Variance function: V(u) = u [Poisson] Link function : g(u) = ln(u) [Log] AIC = 7.373338

Log likelihood = -3787.588997 BIC = -4830.78

----------------------------------------------------------------------------- | OIM y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+--------------------------------------------------------------- pm10 | .0007406 .0002869 2.58 0.010 .0001782 .001303

| ...

| _cons | 4.35927 .0633589 68.80 0.000 4.235088 4.483451

----------------------------------------------------------------------------- (Standard errors scaled using square root of Pearson X2-based dispersion.)

Option 2: Poisson regression

• Advantages – Avoids reshape of dataset keeping original time-series format – Can account for overdispersion and residual autocorrelation • Limitations – Large number of interaction parameters, slowing the convergence time (even making difficult it with missing data) – Even slower when fitting interaction terms for individual factors (sex, age)

Option 3: Conditional Poisson regression

• • • Armstrong and Gasparrini (ISEE 2011) –

“ … Poisson models with large numbers of indicator variables can alternatively be fit with conditional Poisson models, conditioning on numbers of events in the time stratum”

– Firstly proposed by Farrington (1995) for the self-controlled case-series design for vaccine safety studies Strata defined by grouping same weekday within each month of each year, and then use Stata’s

xtpoisson

command jointly with the

fe

option But overdispersion needs to accounted for with the new

xtpois_ovdp

command

. egen set = group(yy mm dow) . xtset set date

...

. xtpoisson y pm10, fe ...

Conditional fixed-effects Poisson regression Number of obs = 1096 Group variable: set Number of groups = 252 Obs per group: min = 4 avg = 4.3

max = 5 Wald chi2(1) = 8.42

Log likelihood = -2854.4979 Prob > chi2 = 0.0037

----------------------------------------------------------------------------- y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+--------------------------------------------------------------- pm10 | .0007406 .0002552 2.90 0.004 .0002404 .0012408

------------------------------------------------------------------------------

. xtpois_ovdp

Estimate and standard errors corrected for over-dipersion df: 843 ; pearson x2: 1065.6 ; dispersion: 1.26

----------------------------------------------------------------------------- y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+--------------------------------------------------------------- pm10 | .0007406 .0002869 2.58 0.010 .0001782 .001303

------------------------------------------------------------------------------

CPU time

• My example (3 years data, 60 events/day) –

clogit

0.181 sec.

poisson

2.111 sec.

xtpoisson

0.066 sec.

Option 3: Conditional Poisson regression

• Advantages – Avoids reshape of dataset keeping original time-series format – Can account for overdispersion and residual autocorrelation – Easy fit of interaction terms for individual factors (sex, age) with faster convergence • Limitations – ?

Summary

• Case-crossover studies can easily be analysed using Stata 1) Conditional logistic regression requires unfriendly data management, and large (huge) data-sets at individual level 2) Poisson, 3-way interaction, regression requires to fit large number of interaction parameters, making it difficult convergence 3) Conditional Poisson seems to solve both

Thanks for your attention!

Thanks for your attention!