Transcript uk13_fisher

Two-stage individual participant data meta-analysis and flexible forest plots

David Fisher

MRC Clinical Trials Unit Hub for Trials Methodology Research at UCL [email protected]

2013 UK Stata Users Group Meeting Cass Business School, London

Outline of presentation

• Introduction to individual patient data (IPD) meta-analysis (MA) • IPD vs aggregate-data (AD) MA • “One-stage” vs “two-stage” IPD MA • The ipdmetan • command Basic use; comparison with metan • • • Covariate interactions Combining AD with IPD Advanced syntax • The forestplot • command Interface with ipdmetan • Stand-alone use and “stacking” • Summary and Conclusion

Introduction to IPD meta-analysis

• Meta-analysis (MA): • Use statistical methods to combine results of “similar” trials to give a single estimate of effect • Increase power & precision • Assess whether treatment effects are similar in across trials (heterogeneity) • Aggregate data (AD) vs IPD: • “Traditional” MAs gather results from publications • Aggregated across all patients in the trial; nothing is known of individual patients • IPD MAs gather raw data from trial investigators • • Ensures all relevant patients are included Ensures similar analysis across all trials • Allows more complex analysis, e.g. patient-level interactions

“One-stage” IPD MA

• • Consider a linear regression (extension to GLMs or time to-event regressions is straightforward) For a one-stage IPD MA (

i

= trial,

j

= patient): •

𝑦

𝑖𝑗

= 𝛼

𝑖

+ 𝛽 + 𝑢

𝑖

𝑥

𝑖𝑗 where

α i β

= trial identifiers = overall treatment effect estimated across all trials

i

(with optional random effect

u i )

Examples in Stata: • Fixed effects: regress y x i.trial

• Random effects: xtmixed y x i.trial || trial: x, nocons

“Two-stage” IPD MA

• For a two-stage IPD MA: 𝑦 1 𝑗 = 𝛼 (1) + 𝛽 (1) 𝑥 1 𝑗 for trial 1 𝑦 𝑖 𝑗 = 𝛼 (𝑖) + 𝛽 (𝑖) 𝑥 𝑖 𝑗 for trial

i

• • • Then: 𝑖 𝑤 𝑖 𝛽 (𝑖) 𝑖 𝑤 𝑖 where and 𝑤 𝑖 1 = 𝑠𝑒 𝛽 (𝑖) 2 1 𝑖 𝑤 𝑖 Weights

w i

may be altered to give random effects • e.g. DerSimonian & Laird, 𝑤 𝑖 = (𝑖) 2 + 𝜏 2 Straightforward, but currently messy in Stata

Treatment-covariate interactions

• Assessment of patient-level covariate interactions is a great advantage of IPD • • •

𝑦

𝑖𝑗

= 𝛼

𝑖

+ 𝛽𝑥

𝑖𝑗

+ 𝛾𝑧

𝑖𝑗

+ 𝛿𝑥

𝑖𝑗

𝑧

𝑖𝑗 Arguably best done with “one-stage” • Main effects & interactions (& correlations) estimated simultaneously But basic analysis also possible with “two-stage” • Relative effect (interaction coefficient) only • Same approach (inverse-variance) as for main effects • • Ensures no estimation bias from between-trial effects Can be presented in a forest plot, with assessment of heterogeneity etc.

Discussed in a published paper (Fisher 2011)

“One-stage” vs “two-stage”

Pros One-stage

- All coeffs & correls estimated simultaneously - Flexible & extendable model structure

Cons

- Requires more statistical expertise - Challenging in certain situations, e.g. random-effects with time-to-event data - Not a natural fit with forest plots

Two-stage

- Natural extension of AD MA - Easily presentable in forest plots - Applicable to any set of effect estimates and SEs (incl. interactions) - Negligible difference to 1S in most common scenarios - Only a single estimate can be pooled, which limits complexity (e.g. interactions) - Theoretically inferior in (at least) some scenarios

Example data

• • • IPD MA of randomised trials of post-operative radiotherapy (PORT) in non-small cell lung cancer • Trial ID (k=11) • • Patient ID (n=2343) Treatment arm Outcome is censored time to overall survival (death from any cause) • Time to event (from randomisation) • Event type (death or censorship) Certain covariate measurements also available, not necessarily for all trials or patients • • Disease stage (factor, but treat as continuous) (+ others)

ipdmetan

syntax

Uses “prefix” command syntax: ipdmetan [

exp_list

], study(

study_ID

) [

ipd_options

ad(

aggregate_data_options

) forestplot(

forest_plot_options

) ] :

estimation_command

...

Example:

ipdmetan, study(trialid) eform

default is to pool coeffs from first dep. var. (excluding baseline factor levels)

: stcox arm, strata(sex)

ipdmetan options after comma, before colon

estimation_command

and options after colon

Trials included:

11

Patients included:

2342

Meta-analysis pooling of main (treatment) effect estimate

arm

using

Fixed-effects

------------------------------------------------------------------- trial reference | number belgium

...

Overall effect | Effect [95% Conf. Interval] % Weight ----------------------+-------------------------------------------- | 1.456 1.072 1.979 11.09

EORTC 08861 | 1.643 0.913 2.956 3.02

LILLE | 1.568 1.060 2.319 6.81

... ... ...

...

----------------------+-------------------------------------------- | 1.178 1.064 1.305 100.00

------------------------------------------------------------------- Test of overall effect = 1: z = 3.153 p = 0.002

Variable label Heterogeneity Measures -------------------------------------------------- | value df I² (%) | 37.0% Modified H² | 0.588

tau² | 0.0180

p-value ---------------+---------------------------------- Cochrane Q | 15.88 10 0.103

-------------------------------------------------- Output style similar to metan or metaan I² = between-study variance (tau²) as a percentage of total variance Modified H² = ratio of tau² to typical within-study variance

Basic forest plot

trial reference number belgium LCSG 773 CAMS MRC LU11 EORTC 08861 SLOVENIA LILLE GETCB 04CB86 GETCB 05CB86 ITALY KOREA Overall (I-squared = 37.0%, p = 0.103) .25

.5

1 2 4 Effect (95% CI) 1.46 (1.07, 1.98) 1.12 (0.83, 1.53) 1.03 (0.77, 1.38) 0.96 (0.74, 1.24) 1.64 (0.91, 2.96) 0.89 (0.54, 1.49) 1.57 (1.06, 2.32) 1.14 (0.80, 1.62) 1.44 (1.13, 1.83) 0.69 (0.40, 1.20) 1.16 (0.76, 1.76) 1.18 (1.06, 1.31) % Weight 11.09

11.13

12.20

16.00

3.02

3.97

6.81

8.48

17.84

3.49

5.98

100.00

Forest plot of covariate interactions

ipdmetan, study(trialid) eform : stcox arm##c.stage

interaction keepall Trials included:

8

Patients included:

1962

Meta-analysis pooling of interaction effect estimate

1.arm#c.stage2

using

Fixed-effects

default is to pool coeffs from first interaction term trial reference number belgium LCSG 773 CAMS MRC LU11 EORTC 08861 GETCB 04CB86 GETCB 05CB86 KOREA SLOVENIA LILLE ITALY Overall (I-squared = 2.7%, p = 0.409) .125

.25

.5

1 2 4 8 Effect (95% CI) 0.92 (0.61, 1.40) 0.76 (0.40, 1.45) 0.77 (0.43, 1.39) 0.62 (0.36, 1.07) 0.39 (0.14, 1.09) 0.94 (0.50, 1.77) 0.97 (0.72, 1.30) 2.09 (0.70, 6.27) (Insufficient data) (Insufficient data) (Insufficient data) 0.87 (0.72, 1.04) % Weight 18.70

8.11

9.49

11.26

3.16

8.22

38.35

2.73

100.00

Inclusion of aggregate data

• I don’t have a separate aggregate dataset, so I will create one artificially from my IPD dataset . ** Generate artificial trial subgrouping . gen subgroup = inlist(trialid, 1, 8, 12, 15) . label define subgroup_ 0 "Trial group 1" 1 "Trial group 2" . label values subgroup subgroup_ . ** Run ipdmetan within one of the subgroups; save the dataset . qui ipdmetan, study(trialid) by(subgroup) nooverall nograph saving(subgroup1.dta) : stcox arm if subgroup==1, strata(sex)

(Aside: Contents of subgroup1.dta

)

_use trialid _labels 1 1 belgium 1 8 EORTC 08861 1 1 12 LILLE 15 GETCB 05CB86 _ES 0.376

_seES 0.156

_lci 0.069

_uci 0.682

_wgt 0.286

_NN 202 0.496

0.300 -0.091

1.084

0.078

105 0.450

0.362

0.200

0.123

0.058

0.120

0.841

0.603

0.176

0.460

163 539

Inclusion of aggregate data: Syntax

. ipdmetan, study(trialid) eform nooverall Do not pool IPD and aggregate together ad(subgroup1.dta, byad) Aggregate data syntax “byad” = treat IPD & aggregate data as subgroups : stcox arm if subgroup==0, strata(sex)

estimation_command

Trials included from IPD:

7

Patients included:

1333

Trials included from aggregate data:

4

Patients included:

1009

Inclusion of aggregate data: Screen output

Pooling of main (treatment) effect estimate

arm

using Fixed-effects ------------------------------------------------------------------ trial reference | number | Effect [95% Conf. Interval] % Weight ---------------------+---------------------------------------------

IPD

| LCSG 773 | 1.123 0.827 1.526 11.13

CAMS | 1.029 0.768 1.378 12.20

...

| ...

Subgroup effect | 1.021 0.896 1.163 61.25

---------------------+---------------------------------------------

Aggregate

belgium | | 1.456 1.072 1.979 11.09

EORTC 08861 | 1.643 0.913 2.956 3.02

...

| ...

Subgroup effect | 1.479 1.256 1.743 38.75

------------------------------------------------------------------ Tests of effect size = 1: IPD z = 0.305 p = 0.760

Aggregate z = 4.682 p = 0.000

Inclusion of aggregate data: Forest plot

trial reference number IPD LCSG 773 CAMS MRC LU11 SLOVENIA GETCB 04CB86 ITALY KOREA Subtotal (I-squared = 0.0%, p = 0.740) Aggregate belgium EORTC 08861 LILLE GETCB 05CB86 Subtotal (I-squared = 0.0%, p = 0.964) .25

.5

1 2 4 Effect (95% CI) % Weight 1.12 (0.83, 1.53) 1.03 (0.77, 1.38) 0.96 (0.74, 1.24) 0.89 (0.54, 1.49) 1.14 (0.80, 1.62) 0.69 (0.40, 1.20) 1.16 (0.76, 1.76) 1.02 (0.90, 1.16) 18.18

19.92

26.12

6.48

13.85

5.69

9.76

100.00

1.46 (1.07, 1.98) 1.64 (0.91, 2.96) 1.57 (1.06, 2.32) 1.44 (1.13, 1.83) 1.48 (1.26, 1.74) 28.61

7.79

17.56

46.03

100.00

Advanced syntax example: non “e-class” estimation command

ipdmetan (u[1,1]/V[1,1]) (1/sqrt(V[1,1])) , study(trialid) eform ad(subgroup1.dta, byad) lcols(evrate=_d %3.2f "Event rate") Effect estimate & SE not from manually e(b) – must specify rcols(u[1,1] %5.2f "o-E(o)" V[1,1] %5.1f "V(o)") forest(nooverall nostats nowt) : sts test arm if subgroup==0, mat(u V)

Advanced syntax example: columns of data in forestplot

ipdmetan (u[1,1]/V[1,1]) (1/sqrt(V[1,1])) , study(trialid) eform ad(subgroup1.dta, byad) Mean of var currently in memory (note user assigned name, to match with varname in aggregate dataset) lcols(evrate=_d %3.2f "Event rate") rcols(u[1,1] %5.2f "o-E(o)" V[1,1] %5.1f "V(o)") forest(nooverall nostats nowt) Collect lists of returned stats : sts test arm if subgroup==0, mat(u V)

Advanced syntax example: Forest plot

trial reference number Event rate IPD LCSG 773 CAMS MRC LU11 SLOVENIA GETCB 04CB86 ITALY KOREA Subtotal 0.72

0.58

0.78

0.85

0.68

0.51

0.81

0.69

(I-squared = 0.0%, p = 0.710) Aggregate belgium EORTC 08861 0.83

0.43

LILLE GETCB 05CB86 0.64

0.50

Subtotal (I-squared = 0.0%, p = 0.964) .25

.5

1 2 4 o-E(o) V(o) 4.77

1.07

-2.48

-2.56

4.95

-4.50

3.06

3.24

41.0

44.9

59.4

15.6

31.6

13.2

22.4

229.6

Advanced syntax example: Forest plot

trial reference number Event rate IPD LCSG 773 CAMS MRC LU11 SLOVENIA GETCB 04CB86 ITALY KOREA Subtotal 0.72

0.58

0.78

0.85

0.68

0.51

0.81

0.69

(I-squared = 0.0%, p = 0.710) Aggregate belgium EORTC 08861 0.83

0.43

LILLE GETCB 05CB86 0.64

0.50

Subtotal (I-squared = 0.0%, p = 0.964) .25

.5

1 2 4 o-E(o) V(o) 4.77

1.07

-2.48

-2.56

4.95

-4.50

3.06

3.24

41.0

44.9

59.4

15.6

31.6

13.2

22.4

229.6

These vars do not appear in the aggregate dataset, so are not plotted Subtotal cannot be calculated for aggregate data

The

forestplot

command

• • • • Does not perform any calculations/estimations; simply plots existing data as a forest plot Overall/subgroup estimates, spacings, labels, text columns etc. need to be created/arranged in advance • Ordering & spacing; marking of subgroup/overall estimates for plotting “diamonds”: _use • Principal left-hand data column (study IDs, heterogeneity etc. – string fmt): _labels This setup is done automatically by passing to forestplot ipdmetan before • (but can also be done manually by user) Multiple datasets can be passed to forestplot at once to create a single large “stacked” plot on common

x

-axis

forestplot

syntax

forestplot [

varlist

] [if] [in] [,

plot_options graph_options using_option

] • • • •

varlist

= manually specify varnames to plot

plot_options

control the data plotting (within plot region)

graph_options

graph region) control the surroundings (outside plot region;

using_option

suitable datasets (or parts of datasets) to be fed to forestplot represents one or more options that allow , possibly with different

plot_options

, to form a single large forest plot on a single

x

-axis.

using_option

syntax

using(

filenamelist

[if] [in] [,

plot_options

]) [using(

filenamelist

[if] [in] [,

plot_options

)] ...] •

filenamelist

• is a list of one or more Stata-format datasets parts may be specified with [if] [in] • • same filename can appear more than once order of filenames determines placement in graph • • Different

plot_options

may be specified to each using option For same options applied to multiple files, place them in a

filenamelist

• For different options applied to each file, place each file in a different using option

plot_options

syntax

• • Based on metan of the forest plot syntax, options refer to different parts Most options appropriate to the underlying type are acceptable, with some exceptions twoway plot

Option

boxopt pointopt ciopt diamopt olineopt

Function

Weighted boxes for study point estimates Points for study point estimates Lines for confidence intervals Diamond for summary estimate Vertical line through summary estimate

twoway plot type

scatter [aweight] scatter rspike, hor pcarrow pcspike (x4) rspike

Example

forestplot

dataset (“resultsset” from last

ipdmetan

example)

Estimates; CIs; weights Extra data columns _use _by _study _labels 0 1 IPD 1 1 1 1 3 5 LCSG 773 CAMS 1 1 1 1 1 3 4 4 0 1 1 1 1 3 4 4 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 6 9 14 13 16 17 18 19 20 MRC LU11 SLOVENIA GETCB 04CB86 ITALY KOREA Subtotal (I-squared = 0.0%, p = 0.710) Aggregate belgium EORTC 08861 LILLE GETCB 05CB86 Subtotal (I-squared = 0.0%, p = 0.964) 4 5 4 Heterogeneity between groups: p = 0.000

Overall (I-squared = 38.4%, p = 0.093) _ES _lci _uci _wgt 0.116 -0.190 0.422 0.111

0.024 -0.269 0.316 0.121

-0.042 -0.296 0.213 0.160

-0.164 -0.660 0.332 0.042

0.157 -0.192 0.506 0.085

-0.341 -0.881 0.199 0.036

0.136 -0.278 0.550 0.061

0.019 -0.111 0.149 0.615

0.376

0.450

0.362

0.392

0.162

0.069 0.682 0.110

0.496 -0.091 1.084 0.030

0.058 0.841 0.068

0.120 0.603 0.177

0.228 0.556 0.385

0.061 0.264 1.000

evrate u_1_1_ V_1_1_ _NN 0.72

0.58

0.78

0.85

0.68

0.51

0.81

0.69

0.83

0.43

0.64

0.50

4.77

1.07

-2.48

-2.56

4.95

-4.50

3.06

3.24

41.0

44.9

59.4

15.6

31.6

13.2

22.4

229.6

202 105 163 539 1009 1009

“Stacking” of forest plots

• Imagine: • dataset on previous slide is saved as ipdtest.dta

• we want IPD boxes to be red, and AD boxes to be green • We proceed as follows: • Run forestplot with two using(...) options, one for each part of the plot, with the same filename • • (Alternatively: run ipdmetan different filenames) Specify our desired

plot_options

twice and save under as suboptions to using()

forestplot, using(ipdtest.dta if _by==1, boxopt(mcolor(red))) using(ipdtest.dta if _by==2, boxopt(mcolor(green))) lcols(evrate) rcols(u_1_1_ V_1_1_) nooverall nostats nowt trial Event reference number rate IPD LCSG 773 CAMS MRC LU11 SLOVENIA GETCB 04CB86 0.72

0.58

0.78

0.85

0.68

ITALY KOREA 0.51

0.81

Subtotal 0.69

(I-squared = 0.0%, p = 0.710) Aggregate belgium EORTC 08861 0.83

0.43

LILLE GETCB 05CB86 0.64

0.50

Subtotal (I-squared = 0.0%, p = 0.964) o-E(o) V(o) 4.77

1.07

-2.48

-2.56

4.95

-4.50

3.06

3.24

41.0

44.9

59.4

15.6

31.6

13.2

22.4

229.6

.25

.5

1 2 4

Summary and conclusion

• • • IPD is increasingly used, and its advantages widely accepted Large numbers of MA scientists use two-stage models for analysing IPD Currently only AD MA (e.g. metan ) and one-stage IPD (e.g. xtmixed ) commands exist in Stata • • ipdmetan is a universal command for two-stage IPD MA forestplot is a flexible forest plot command • • does not carry out analysis itself, thus not restricted by it may be useful outside the MA context (e.g. presenting trial subgroups)

Further information

• Other related programs (all call forestplot • admetan : calls ipdmetan to analyse AD (direct alternative to metan ) • • by default): ipdover : fit model within series of subgroups petometan : perform meta-analysis of time-to-event data using the Peto (log-rank) method • SSC and Stata Journal article in near future

Thankyou!

• Questions, requests, bug reports: [email protected]

• Thanks to: • • Jayne Tierney, Patrick Royston Ross Harris (author of metan ) for advice & support • Assorted colleagues for testing • Reference: • Fisher D. J. et al. 2011. Journal of Clinical Epidemiology 64: 949-67