C3 graphs - Medical Research Council

Transcript C3 graphs - Medical Research Council

Death and Missing Data in
Longitudinal Studies:
Quality of Life at the End of Life
Paula Diehr
Maximising return from cohort studies:
prevention of attrition and efficient analysis
London 6-25-2006
Charge


“The use of imputation to deal with attrition in
cohort studies”
I will concentrate primarily on what to do about
death in longitudinal studies
In my cohorts of older or sicker adults more than
half the missing values are missing due to death
 Taking care of the deaths first often helps deal with
the other missing data

2
My MO


First step: create a meaningful graph
Organize the data


Do something about the deaths




A place for every observation that could have been
made (if the person hadn’t died)
assign a valid value
Impute the (remaining) missing data
Graph
Analyze
3
Outline






ADHC example (very simple)
C3 example (more issues)
Death
Organization
Missing data
Analysis
4
Example 1: ADHC
Diehr and Johnson. Accounting for missing data in
end-of-life research.
Palliative Care 2005; 8:S50-S57.
Example: ADHC



Adult Day Health Care study
RCT (ADHC vs Usual Care)
939 Frail Veterans




At risk of nursing home placement
1 year study: data at 0, 6, 12 months
Findings: ADHC expensive, ineffective
Frail veterans didn’t fail

Why?
6
Health Variable




Utility (sort-of)
0 to 100
100 is perfect health
(0 is dead, but will let dead be missing at first)
7
Raw Data (phf)
60.0
50.0
40.0
30.0
baseline
95% CI
20.0
6 mos
10.0
0.0
12 mos
N=
626
626
Complete Case
626
279
134
24
Some Missing
Missing Pattern
adhc07.sps 11-17-2004
8
Accounting



939 persons
3*939=2817 observations if complete
502 observations were missing
302 missing because of death
 200 missing for other reasons


60% of missing were due to death
9
Deaths set to Zero (phf)
60.0
50.0
40.0
30.0
baseline
95% CI
20.0
6 mos
10.0
0.0
12 mos
N=
785
785
Complete Case
785
120
89
53
Some Missing
Missing Pattern
adhc07.sps 11-17-2004
10
Death=0 and Impute if 1 Known (phf)
60.0
50.0
40.0
30.0
baseline
95% CI
20.0
6 mos
10.0
0.0
12 mos
N=
939
939
939
Complete Case
Missing Pattern
adhc07.sps 11-17-2004
11
In ADHC Example:




Complete case data too optimistic – significant
improvement (65% complete)
Available data even more optimistic
Accounting for the deaths showed significant
decline (84% complete)
Imputing remaining missing values showed
significant decline (100% complete) (ITT)
12
Example 2: C3 Study
Complementary Comfort Care
Bill Lafferty, P.I.
NCI
Study Design




RCT
Effect of massage or meditation on QOL and
Sx in patients at the end of life
QOL and Sx assessed ~ every week until death
In progress
3 years of data collection
 First 100 cases (DSMB ok)

14
Outcome Variables



Quality of Life (QOL)
Symptoms (SX)
Health Rating (Hlthrat)
15
QOL (pqol)
How would you rate your overall quality of life
during the past 7 days?
0 is NO QUALITY OF LIFE
to
10 is PERFECT QUALITY OF LIFE

Note: if 0 had been “dead”, this would be a “preference-rated /
utility / rating scale” variable and dead would have the value
zero. Missed opportunity.
16
Health rating (Hlthrat)



0=worst possible health you can imagine and
still be alive
10 = as near perfect health as you can imagine
Baseline only
17
2-Death
Everyone is expected to die in C3.
Approaches to Handle Death






Ignore
Set death to a “low” value, perform sensitivity
analysis to see if final results change (arbitrary)
Impute the values after death as if person was
still alive (immortal cohort)
Joint modeling of survival and health
Health conditional on being alive
Transformation approach
19
Transformation Approach



Transform the outcome variable that has no
value for death to another variable that does
have a natural value for death.
Dichotomize, assign deaths to “low” category.
Transform to a probability
Probability of being healthy
 Dead have probability 0

20
Probability Transformations

Probability (QOL > 7 now | QOL now)

Dichotomize (good QOL > 7 or bad QOL <7 now)

Probability (QOL > 7 next week | QOL now)
Probability (Hlthrat > 7 now | QOL now)

Diehr et al, J Clin Epidemiology, 2005

21
QOL
QOL>7
now
P(QOL P(Hlthrat
>7) next >7) now
week
*


10
9
Ordinal
OK if dead is worst
QOL

8
7

6
5

OK if nonparametric
analysis (ordinal)
Mean is meaningless
4

3

2
1
0
dead

State worse than
death
Without deaths?
With deaths
Mean Difference or
change or AUC is
meaningless
22
QOL QOL>
7 now
P(QOL>7)
next week
P(Hlthrat
>7) now *


10
100
9
100
8
100
7
100
6
0
5
0
4
0
3
0

2
0

1
0

0
0


Dichotomize to Good
QOL yes/no
Dead = 0
OK if death is not good
QOL
Mean interpretable, any
analysis OK


AUC=weeks with good
QOL
Change meaningful
Loses information?
Bad cutpoint?
Assume death is bad
QOL
dead 0
23
QOL
QOL> P(QOL>7) P(Hlthrat
7 now next week >7) now *


10
100
94
9
100
88
8
100
76
7
100
59
6
0
39
5
0
22
4
0
11
3
0
5
2
0
2
1
0
1
0
0
.5
dead
0
0


Pr (Good QOL 1 week
later|QOL now)
Estimated from
transition pairs
Dead have 0 probability
of high QOL 1 week
later
Mean interpretable, any
analysis OK



AUC = # good QOL
weeks starting 1 week after
b/l
change, difference
Assume is death part of
the QOL construct (dead
people have bad QOL).
Probably ok.
24
QOL QOL> P(QOL>7) P(Hlthrat>7)
7 now next week now
* QOLt

QOLt = Pr (Good health
now |QOL now)
Dead have 0 probability
of being healthy now.
Mean interpretable, any
analysis OK
10
100
94
75
9
100
88
66
8
100
76
55
7
100
59
44
6
0
39
34

5
0
22
25

4
0
11
17
3
0
5
12
2
0
2
8
1
0
1
5
0
0
.5
3
dead 0
0
0




AUC = Healthy weeks
starting at B/L
change, difference OK
Assume death part of the
health construct. (Dead
people not healthy). This
seems obvious
Dead vs. 0
25
QOL
QOL>
7 now
Transformation modifies
relative spacing
P(QOL>7) P(Hlthrat
one week
>7) now
later
*QOLt

QOL, all distances are
the same
10
100
94
75
9
100
88
66

8
100
76
55

7
100
59
44
6
0
39
34

5
0
22
25

4
0
11
17
3
0
5
12
2
0
2
8
1
0
1
5
0
0
<1
<5
dead
0
0
0



10-9 = 1
2-1 = 1
QOLt different
75-66=9
8-5 = 3
Break between 6 and
7=1, 100, 20, 10
Use QOLt for this
analysis
26
Transform to prob(healthy)






“Healthy” = Hlthrat score of 7 or more
Logit(healthy0) = -3.323 + .442* QOL0
QOL
QOLt
QOLtd
QOLtdi
= original coding
= transformed to Prob(healthy)
= QOLt with deaths set to zero
= QOLtd with missing imputed
27
SX

Memorial Symptom Assessment Scale (MSAS)

In the past week did you have:
Difficulty concentrating, Pain, Lack of energy, Cough,
Changes in skin, Dry mouth, Nausea, Feeling drowsy,
Numbness/tingling in hands and feet, Difficulty
sleeping, Feeling bloated, Problems with urination,
Vomiting, Shortness of breath, Diarrhea, sweats, mouth
sores, problems with sexual interest, itching, lack of
appetite, dizziness, difficulty swallowing, change in the
way food tastes, weight loss, hair loss, constipation,
swelling of arms or legs, “I don’t look like myself ”,
other (!)
Feeling sad, worrying, feeling irritable, feeling nervous


28
Sx Scoring (MSAS)

First 22:







Last 4:







0 did not occur;
1.6 a little bit,
2.4 somewhat,
3.2 a lot,
3.8, occurred but did not bother me at all,
4.0 bothered me very much
0 did not occur,
1 occurred rarely,
2 occasionally,
3 frequently,
4 almost constantly
Total score is average value (high is bad, 4 is max)
“Continuous”, low value is good
29
SX
(selected values)
**SXt
P(Hlthrat>7)
given SX
.03
83
.25
75
.5
66
1
43
1.5
22
2
10
2.5
3
dead
0


Transform SX to SXt
Transformation can
be done for
continuous variables
30
3-organization
Longitudinal Data-- Ideal

Rectangular File



Spread sheet
A QOL value in every cell
ADHC
939 rows (1 row for each person)
 3 columns (0, 6, 12 months)


C3
300 rows (1 row for each person)
 3*52 = 156 columns, (1 column for each week)

32
ADHC was not ideal



We set dead to zero
We imputed the missing
Complete 3 x 937 array
33
C3 not ideal




Deaths
Missing data
Unscheduled weeks
Recruited over time



persons will have unequal number of weeks
Each person has a different schedule
When did the missing interviews “not happen”?
34
Tidy Dataset





Person’s potential f/u = weeks from enrollment
to end of data collection
Bin (cell, column) for each week of potential f/u
First enrollee will have 52*3 bins
Enrollee 2.5 years later will have 52/2=26 bins
Deaths: Set value in bins from death to the end
of this person’s potential follow-up to zero
35
Person 34







50-year old man
Referred from Hospice
Dying of cancer, frequent severe pain
QOLbase = 10
SXbase = .75
Lived 135 days (19 weeks)
Potential f/u 463 days (66 weeks)


(from his enrollment to end of data collection)
328 days dead (47 weeks)
36
Person 34 QOL (original coding)
pattern for person 34 (original coding)
QOLt, QOLtd, QOLtdi
10.0
9.0
8.0
7.0
6.0
5.0
4.0
3.0
QOL
2.0
1.0
0.0
0
100
200
300
400
500
after days after enroll
laff nice_graphs01.sps 2-20-2006 (new )
37
Person 34 QOLt (transformed)
pattern for person 34 QOLt
QOLt, QOLtd, QOLtdi
80.0
70.0
60.0
50.0
40.0
30.0
QOLT
20.0
10.0
0.0
0
100
200
300
400
500
after days after enroll
laff nice_graphs01.sps 2-20-2006 (new )
38
Person 34 QOLtd (set dead to zero)
pattern for person 34, QOLt, QOLtd
QOLt, QOLtd, QOLtdi
80.0
60.0
40.0
QOLTD
20.0
af ter days af ter enr
QOLT
0.0
af ter days af ter enr
0
100
200
300
400
500
laf f nice_graphs01.sps 2-20-2006 (new )
39
4- missing data and
imputation
Influence of the deaths



Complete case analysis gives no weight to deaths
Transforming and setting deaths to 0 may give
too much weight to deaths, because after death a
person has no missing data
May need to impute other missing data as well


Can remove later as sensitivity analysis
Only during potential follow-up
41
Missing


All methods are based on untestable
assumptions
Multiple imputation for cross-sectional missing


Longitudinal, jury’s still out


No software
C3 data surely not MAR


Software
(unless accounting for death makes them MAR?)
Gain some intuition
42
CHS Subjects who return from being missing
Y0






_
_
(Y4) _ Y6 Y7
Y4 is “like” a missing value


Y1
10 times as likely to be missing as Y1 or Y7
This person had other missing data
Like healthier subset of missing?
Impute Y4 in various simple ways
Compare observed to imputed value of Y4
Engels and Diehr. Journal of Clinical Epidemiology 2003;
56:968-976.
43
Findings

Most imputed values were biased too healthy


Most imputed values were under-dispersed


Best were: (before+after)/2, LOCF, NOCB,
regression on baseline data
Best were: NOCB, LOCF
Conclusion: use the person’s own longitudinal
data to impute missing data
44
Imputation of Missing
Everyone has a favorite method
 I prefer imputation by a simple method,
using the person’s own longitudinal data
 Knowing person died helps
 Scatterplot of QOLtd by several f(time)
for each person who died
 Log of “time until death” looked the best
for all subjects.

45
46
Person 34, QOLtd by log(days from death)
80.0
60.0
40.0
20.0
QOLTD
0.0
ln(400 - # of days u
QOLT
-20.0
ln(400 - # of days u
5.5
5.6
5.7
5.8
5.9
6.0
nicegraphs_02.sps, 6-15-2006
47
Imputation of Missing Data
(weeks with no entry)

Separate regression for each person.
Set QOLtdi = a + b* ln(days before death) if
QOLtd is missing

Other approaches

Modeling
 Multiple imputation

48
Person 34 QOLtdi (impute missing)
pattern for person 34
QOLt, QOLtd, QOLtdi
80.0
60.0
40.0
QOLTDI
af ter days af ter enr
QOLTD
20.0
af ter days af ter enr
QOLT
0.0
af ter days af ter enr
0
100
200
300
400
500
laf f nice_graphs01.sps 2-20-2006 (new )
49
Different N
Interpretation
50
51
Person 34, SXtd by log(days from death)
80.0
60.0
40.0
20.0
SXTD
0.0
ln(400 - # of days u
SXT
-20.0
ln(400 - # of days u
5.5
5.6
5.7
5.8
5.9
6.0
nicegraphs_02.sps, 6-15-2006
52
Person 34 SX, deaths and missing
pattern for person 34
SXt, SXtd, SXtdi
80.0
60.0
40.0
SXTDI
af ter days af ter enr
20.0
SXTD
af ter days af ter enr
0.0
SXT
-20.0
af ter days af ter enr
0
100
200
300
laf f nice_graphs01.sps 3-25-2006 (new )
400
500
pain
MI,
Locf,
Missing=“5”
53
Average QOLtdi and
SXtdi in the first 6
months
(estimated) % healthy conditional on
either QOL or SX
QOLtdi and SXtdi in first 6 months
50.00
40.00
30.00
20.00
Mean
10.00
QOLTDI
0.00
SXTDI
0
2
1
4
3
6
5
8
7
10
9
12
11
14
13
16
15
17
18
20
19
22
21
24
23
25
WEEK
nice_graphs04.sps 3-25-2006 (new )
Standardized at baseline
QOL < SX
AUC (to date)
7.8 wk, 9.9 wk, t=3.8
55
5-analysis
Possible Outcome Variables

QOL, QOLt


QOLtd


For graphs, population means
QOLtdi | alive


For analytic methods that (implicitly) impute missing (GEE,
AUC, growth curve, multi-level)
QOLtdi


If death, missing rates low (or MCAR)
f
Imputed values improve estimates
-1

(QOLtdi)
Original scale, death is its own category
57
Survival Function
1.0
.8
.6
.4
.2
Survival Function
0.0
Censored
0
100
200
300
400
500
600
Survival in Days (as of 2-15-2006)
Healthy volunteer effect
58
THE Graph
Inverse QOLtdi in First 6 Months
100
80
QOLtdi inv
60
Dead
Count
40
0-2
20
3-6
0
7-10
0
2
1
4
3
6
5
8
7
10
9
12
11
14
13
16
15
18
17
20
19
22
21
24
23
25
WEEK
N = 84, 6 mos pot f/u, nice_graphs27.sps 6-25-2006
At least 26 weeks potential f/u, Back-transform, original coding (QOL)
Accounts for death and imputed values, Hospice vs Other? - Ordinal analysis
59
Hospice effect on QOLtdi (n=84)
Average QOLtdi per week
50.00
40.00
30.00
20.00
Hospice Referral
10.00
Other
0.00
Hospice Ref erral
0
2
1
4
3
6
5
8
7
10
9
12 14
11 13
16
15
18
17
20
19
22
21
24
23
25
WEEK
laf f nice_graphs01.sps 2-24-2006 (new )
Similar baseline
AUC = weeks of healthy life60
QOL AUC = WHL|QOL
61
QOLTDI
Regression of QOLtdi on Time
Average QOLtdi in Hospice vs. Other
80
60
40
Hospice Referral
20
Hospice Referral
0
Other
0
10
20
30
WEEK
laff nice_graphs01.sps 6-20-2006 (new )
62
QOLtdi |Alive
Average QOLtdi per week (alive only)
50.00
40.00
30.00
20.00
Hospice Referral
10.00
Other
0.00
Hospice Ref erral
0
2
1
4
3
6
5
8
7
10
9
12
11
14
13
16
15
18
17
20
19
22
21
24
23
25
WEEK
laf f nice_graphs01.sps 3-25-2006 (new )
Different folks each time
Immortal cohort
63
6-Discussion
Transformations/Death
Imputation
Tidy dataset
Transformation:



Dichotomizing and QOLtd are the only measures that
combine death and QOL (utility, preferences)
Transformation is not appropriate for every variable.
Death should be part of the construct.
Dichotomizing, OK to put death in “low” category



Death is bad health (Hlthrat )
Death is probably bad QOL
May we think of death as bad SX?


Unclear. Maybe death cures SX. (itching)
Does using Pr( Hlthrat >7 | SX) get around this
problem? Only need to assume that dead not healthy.
65
Multiple Imputation


vs. sensitivity analysis
with AUC
66
Person 34 SX, multiple imputation?
pattern for person 34
SXtd, SXtdi
80.0
60.0
40.0
20.0
SXTDI
af ter days af ter enr
0.0
SXTD
-20.0
af ter days af ter enr
0
100
200
300
400
500
laf f nice_graphs01.sps 6-7-2006 (new )
67
Person 34 SX, deaths and missing
pattern for person 34
SX: AUC by trapezoidal rule
80.0
60.0
40.0
20.0
SXTDI
af ter days af ter enr
0.0
SXTD
-20.0
af ter days af ter enr
0
100
200
300
laf f nice_graphs01.sps 6-7-2006 (new )
400
500
Is trapezoidal
rule imputation?
68
To create a tidy dataset


Bin the data in equal-time bins (1 week), 1 bin for each
potential week of f/u
Transform QOL to new 0 to 100 scale where dead=0


Fill in zeroes for potential weeks when person was
Dead


QOLtd
Impute the missing data for potential weeks when
person was alive but data were missing.


QOLt
QOLtdi
BTDI --- Be Tidy!
69
Tidy Dataset




Necessary to place the imputed, dead interviews
Makes it clear what is known when, as everyone
has a value at each potential time
Specifically deals with death and missing data, so
assumptions are clear
“Virtual” tidy dataset may be enough in simpler
datasets
70
Death Matters
Be Tidy

C3 graphs - Medical Research Council

Transcript C3 graphs - Medical Research Council

Directory