C3 graphs - Medical Research Council
Download
Report
Transcript C3 graphs - Medical Research Council
Death and Missing Data in
Longitudinal Studies:
Quality of Life at the End of Life
Paula Diehr
Maximising return from cohort studies:
prevention of attrition and efficient analysis
London 6-25-2006
Charge
“The use of imputation to deal with attrition in
cohort studies”
I will concentrate primarily on what to do about
death in longitudinal studies
In my cohorts of older or sicker adults more than
half the missing values are missing due to death
Taking care of the deaths first often helps deal with
the other missing data
2
My MO
First step: create a meaningful graph
Organize the data
Do something about the deaths
A place for every observation that could have been
made (if the person hadn’t died)
assign a valid value
Impute the (remaining) missing data
Graph
Analyze
3
Outline
ADHC example (very simple)
C3 example (more issues)
Death
Organization
Missing data
Analysis
4
Example 1: ADHC
Diehr and Johnson. Accounting for missing data in
end-of-life research.
Palliative Care 2005; 8:S50-S57.
Example: ADHC
Adult Day Health Care study
RCT (ADHC vs Usual Care)
939 Frail Veterans
At risk of nursing home placement
1 year study: data at 0, 6, 12 months
Findings: ADHC expensive, ineffective
Frail veterans didn’t fail
Why?
6
Health Variable
Utility (sort-of)
0 to 100
100 is perfect health
(0 is dead, but will let dead be missing at first)
7
Raw Data (phf)
60.0
50.0
40.0
30.0
baseline
95% CI
20.0
6 mos
10.0
0.0
12 mos
N=
626
626
Complete Case
626
279
134
24
Some Missing
Missing Pattern
adhc07.sps 11-17-2004
8
Accounting
939 persons
3*939=2817 observations if complete
502 observations were missing
302 missing because of death
200 missing for other reasons
60% of missing were due to death
9
Deaths set to Zero (phf)
60.0
50.0
40.0
30.0
baseline
95% CI
20.0
6 mos
10.0
0.0
12 mos
N=
785
785
Complete Case
785
120
89
53
Some Missing
Missing Pattern
adhc07.sps 11-17-2004
10
Death=0 and Impute if 1 Known (phf)
60.0
50.0
40.0
30.0
baseline
95% CI
20.0
6 mos
10.0
0.0
12 mos
N=
939
939
939
Complete Case
Missing Pattern
adhc07.sps 11-17-2004
11
In ADHC Example:
Complete case data too optimistic – significant
improvement (65% complete)
Available data even more optimistic
Accounting for the deaths showed significant
decline (84% complete)
Imputing remaining missing values showed
significant decline (100% complete) (ITT)
12
Example 2: C3 Study
Complementary Comfort Care
Bill Lafferty, P.I.
NCI
Study Design
RCT
Effect of massage or meditation on QOL and
Sx in patients at the end of life
QOL and Sx assessed ~ every week until death
In progress
3 years of data collection
First 100 cases (DSMB ok)
14
Outcome Variables
Quality of Life (QOL)
Symptoms (SX)
Health Rating (Hlthrat)
15
QOL (pqol)
How would you rate your overall quality of life
during the past 7 days?
0 is NO QUALITY OF LIFE
to
10 is PERFECT QUALITY OF LIFE
Note: if 0 had been “dead”, this would be a “preference-rated /
utility / rating scale” variable and dead would have the value
zero. Missed opportunity.
16
Health rating (Hlthrat)
0=worst possible health you can imagine and
still be alive
10 = as near perfect health as you can imagine
Baseline only
17
2-Death
Everyone is expected to die in C3.
Approaches to Handle Death
Ignore
Set death to a “low” value, perform sensitivity
analysis to see if final results change (arbitrary)
Impute the values after death as if person was
still alive (immortal cohort)
Joint modeling of survival and health
Health conditional on being alive
Transformation approach
19
Transformation Approach
Transform the outcome variable that has no
value for death to another variable that does
have a natural value for death.
Dichotomize, assign deaths to “low” category.
Transform to a probability
Probability of being healthy
Dead have probability 0
20
Probability Transformations
Probability (QOL > 7 now | QOL now)
Dichotomize (good QOL > 7 or bad QOL <7 now)
Probability (QOL > 7 next week | QOL now)
Probability (Hlthrat > 7 now | QOL now)
Diehr et al, J Clin Epidemiology, 2005
21
QOL
QOL>7
now
P(QOL P(Hlthrat
>7) next >7) now
week
*
10
9
Ordinal
OK if dead is worst
QOL
8
7
6
5
OK if nonparametric
analysis (ordinal)
Mean is meaningless
4
3
2
1
0
dead
State worse than
death
Without deaths?
With deaths
Mean Difference or
change or AUC is
meaningless
22
QOL QOL>
7 now
P(QOL>7)
next week
P(Hlthrat
>7) now *
10
100
9
100
8
100
7
100
6
0
5
0
4
0
3
0
2
0
1
0
0
0
Dichotomize to Good
QOL yes/no
Dead = 0
OK if death is not good
QOL
Mean interpretable, any
analysis OK
AUC=weeks with good
QOL
Change meaningful
Loses information?
Bad cutpoint?
Assume death is bad
QOL
dead 0
23
QOL
QOL> P(QOL>7) P(Hlthrat
7 now next week >7) now *
10
100
94
9
100
88
8
100
76
7
100
59
6
0
39
5
0
22
4
0
11
3
0
5
2
0
2
1
0
1
0
0
.5
dead
0
0
Pr (Good QOL 1 week
later|QOL now)
Estimated from
transition pairs
Dead have 0 probability
of high QOL 1 week
later
Mean interpretable, any
analysis OK
AUC = # good QOL
weeks starting 1 week after
b/l
change, difference
Assume is death part of
the QOL construct (dead
people have bad QOL).
Probably ok.
24
QOL QOL> P(QOL>7) P(Hlthrat>7)
7 now next week now
* QOLt
QOLt = Pr (Good health
now |QOL now)
Dead have 0 probability
of being healthy now.
Mean interpretable, any
analysis OK
10
100
94
75
9
100
88
66
8
100
76
55
7
100
59
44
6
0
39
34
5
0
22
25
4
0
11
17
3
0
5
12
2
0
2
8
1
0
1
5
0
0
.5
3
dead 0
0
0
AUC = Healthy weeks
starting at B/L
change, difference OK
Assume death part of the
health construct. (Dead
people not healthy). This
seems obvious
Dead vs. 0
25
QOL
QOL>
7 now
Transformation modifies
relative spacing
P(QOL>7) P(Hlthrat
one week
>7) now
later
*QOLt
QOL, all distances are
the same
10
100
94
75
9
100
88
66
8
100
76
55
7
100
59
44
6
0
39
34
5
0
22
25
4
0
11
17
3
0
5
12
2
0
2
8
1
0
1
5
0
0
<1
<5
dead
0
0
0
10-9 = 1
2-1 = 1
QOLt different
75-66=9
8-5 = 3
Break between 6 and
7=1, 100, 20, 10
Use QOLt for this
analysis
26
Transform to prob(healthy)
“Healthy” = Hlthrat score of 7 or more
Logit(healthy0) = -3.323 + .442* QOL0
QOL
QOLt
QOLtd
QOLtdi
= original coding
= transformed to Prob(healthy)
= QOLt with deaths set to zero
= QOLtd with missing imputed
27
SX
Memorial Symptom Assessment Scale (MSAS)
In the past week did you have:
Difficulty concentrating, Pain, Lack of energy, Cough,
Changes in skin, Dry mouth, Nausea, Feeling drowsy,
Numbness/tingling in hands and feet, Difficulty
sleeping, Feeling bloated, Problems with urination,
Vomiting, Shortness of breath, Diarrhea, sweats, mouth
sores, problems with sexual interest, itching, lack of
appetite, dizziness, difficulty swallowing, change in the
way food tastes, weight loss, hair loss, constipation,
swelling of arms or legs, “I don’t look like myself ”,
other (!)
Feeling sad, worrying, feeling irritable, feeling nervous
28
Sx Scoring (MSAS)
First 22:
Last 4:
0 did not occur;
1.6 a little bit,
2.4 somewhat,
3.2 a lot,
3.8, occurred but did not bother me at all,
4.0 bothered me very much
0 did not occur,
1 occurred rarely,
2 occasionally,
3 frequently,
4 almost constantly
Total score is average value (high is bad, 4 is max)
“Continuous”, low value is good
29
SX
(selected values)
**SXt
P(Hlthrat>7)
given SX
.03
83
.25
75
.5
66
1
43
1.5
22
2
10
2.5
3
dead
0
Transform SX to SXt
Transformation can
be done for
continuous variables
30
3-organization
Longitudinal Data-- Ideal
Rectangular File
Spread sheet
A QOL value in every cell
ADHC
939 rows (1 row for each person)
3 columns (0, 6, 12 months)
C3
300 rows (1 row for each person)
3*52 = 156 columns, (1 column for each week)
32
ADHC was not ideal
We set dead to zero
We imputed the missing
Complete 3 x 937 array
33
C3 not ideal
Deaths
Missing data
Unscheduled weeks
Recruited over time
persons will have unequal number of weeks
Each person has a different schedule
When did the missing interviews “not happen”?
34
Tidy Dataset
Person’s potential f/u = weeks from enrollment
to end of data collection
Bin (cell, column) for each week of potential f/u
First enrollee will have 52*3 bins
Enrollee 2.5 years later will have 52/2=26 bins
Deaths: Set value in bins from death to the end
of this person’s potential follow-up to zero
35
Person 34
50-year old man
Referred from Hospice
Dying of cancer, frequent severe pain
QOLbase = 10
SXbase = .75
Lived 135 days (19 weeks)
Potential f/u 463 days (66 weeks)
(from his enrollment to end of data collection)
328 days dead (47 weeks)
36
Person 34 QOL (original coding)
pattern for person 34 (original coding)
QOLt, QOLtd, QOLtdi
10.0
9.0
8.0
7.0
6.0
5.0
4.0
3.0
QOL
2.0
1.0
0.0
0
100
200
300
400
500
after days after enroll
laff nice_graphs01.sps 2-20-2006 (new )
37
Person 34 QOLt (transformed)
pattern for person 34 QOLt
QOLt, QOLtd, QOLtdi
80.0
70.0
60.0
50.0
40.0
30.0
QOLT
20.0
10.0
0.0
0
100
200
300
400
500
after days after enroll
laff nice_graphs01.sps 2-20-2006 (new )
38
Person 34 QOLtd (set dead to zero)
pattern for person 34, QOLt, QOLtd
QOLt, QOLtd, QOLtdi
80.0
60.0
40.0
QOLTD
20.0
af ter days af ter enr
QOLT
0.0
af ter days af ter enr
0
100
200
300
400
500
laf f nice_graphs01.sps 2-20-2006 (new )
39
4- missing data and
imputation
Influence of the deaths
Complete case analysis gives no weight to deaths
Transforming and setting deaths to 0 may give
too much weight to deaths, because after death a
person has no missing data
May need to impute other missing data as well
Can remove later as sensitivity analysis
Only during potential follow-up
41
Missing
All methods are based on untestable
assumptions
Multiple imputation for cross-sectional missing
Longitudinal, jury’s still out
No software
C3 data surely not MAR
Software
(unless accounting for death makes them MAR?)
Gain some intuition
42
CHS Subjects who return from being missing
Y0
_
_
(Y4) _ Y6 Y7
Y4 is “like” a missing value
Y1
10 times as likely to be missing as Y1 or Y7
This person had other missing data
Like healthier subset of missing?
Impute Y4 in various simple ways
Compare observed to imputed value of Y4
Engels and Diehr. Journal of Clinical Epidemiology 2003;
56:968-976.
43
Findings
Most imputed values were biased too healthy
Most imputed values were under-dispersed
Best were: (before+after)/2, LOCF, NOCB,
regression on baseline data
Best were: NOCB, LOCF
Conclusion: use the person’s own longitudinal
data to impute missing data
44
Imputation of Missing
Everyone has a favorite method
I prefer imputation by a simple method,
using the person’s own longitudinal data
Knowing person died helps
Scatterplot of QOLtd by several f(time)
for each person who died
Log of “time until death” looked the best
for all subjects.
45
46
Person 34, QOLtd by log(days from death)
80.0
60.0
40.0
20.0
QOLTD
0.0
ln(400 - # of days u
QOLT
-20.0
ln(400 - # of days u
5.5
5.6
5.7
5.8
5.9
6.0
nicegraphs_02.sps, 6-15-2006
47
Imputation of Missing Data
(weeks with no entry)
Separate regression for each person.
Set QOLtdi = a + b* ln(days before death) if
QOLtd is missing
Other approaches
Modeling
Multiple imputation
48
Person 34 QOLtdi (impute missing)
pattern for person 34
QOLt, QOLtd, QOLtdi
80.0
60.0
40.0
QOLTDI
af ter days af ter enr
QOLTD
20.0
af ter days af ter enr
QOLT
0.0
af ter days af ter enr
0
100
200
300
400
500
laf f nice_graphs01.sps 2-20-2006 (new )
49
Different N
Interpretation
50
51
Person 34, SXtd by log(days from death)
80.0
60.0
40.0
20.0
SXTD
0.0
ln(400 - # of days u
SXT
-20.0
ln(400 - # of days u
5.5
5.6
5.7
5.8
5.9
6.0
nicegraphs_02.sps, 6-15-2006
52
Person 34 SX, deaths and missing
pattern for person 34
SXt, SXtd, SXtdi
80.0
60.0
40.0
SXTDI
af ter days af ter enr
20.0
SXTD
af ter days af ter enr
0.0
SXT
-20.0
af ter days af ter enr
0
100
200
300
laf f nice_graphs01.sps 3-25-2006 (new )
400
500
pain
MI,
Locf,
Missing=“5”
53
Average QOLtdi and
SXtdi in the first 6
months
(estimated) % healthy conditional on
either QOL or SX
QOLtdi and SXtdi in first 6 months
50.00
40.00
30.00
20.00
Mean
10.00
QOLTDI
0.00
SXTDI
0
2
1
4
3
6
5
8
7
10
9
12
11
14
13
16
15
17
18
20
19
22
21
24
23
25
WEEK
nice_graphs04.sps 3-25-2006 (new )
Standardized at baseline
QOL < SX
AUC (to date)
7.8 wk, 9.9 wk, t=3.8
55
5-analysis
Possible Outcome Variables
QOL, QOLt
QOLtd
For graphs, population means
QOLtdi | alive
For analytic methods that (implicitly) impute missing (GEE,
AUC, growth curve, multi-level)
QOLtdi
If death, missing rates low (or MCAR)
f
Imputed values improve estimates
-1
(QOLtdi)
Original scale, death is its own category
57
Survival Function
1.0
.8
.6
.4
.2
Survival Function
0.0
Censored
0
100
200
300
400
500
600
Survival in Days (as of 2-15-2006)
Healthy volunteer effect
58
THE Graph
Inverse QOLtdi in First 6 Months
100
80
QOLtdi inv
60
Dead
Count
40
0-2
20
3-6
0
7-10
0
2
1
4
3
6
5
8
7
10
9
12
11
14
13
16
15
18
17
20
19
22
21
24
23
25
WEEK
N = 84, 6 mos pot f/u, nice_graphs27.sps 6-25-2006
At least 26 weeks potential f/u, Back-transform, original coding (QOL)
Accounts for death and imputed values, Hospice vs Other? - Ordinal analysis
59
Hospice effect on QOLtdi (n=84)
Average QOLtdi per week
50.00
40.00
30.00
20.00
Hospice Referral
10.00
Other
0.00
Hospice Ref erral
0
2
1
4
3
6
5
8
7
10
9
12 14
11 13
16
15
18
17
20
19
22
21
24
23
25
WEEK
laf f nice_graphs01.sps 2-24-2006 (new )
Similar baseline
AUC = weeks of healthy life60
QOL AUC = WHL|QOL
61
QOLTDI
Regression of QOLtdi on Time
Average QOLtdi in Hospice vs. Other
80
60
40
Hospice Referral
20
Hospice Referral
0
Other
0
10
20
30
WEEK
laff nice_graphs01.sps 6-20-2006 (new )
62
QOLtdi |Alive
Average QOLtdi per week (alive only)
50.00
40.00
30.00
20.00
Hospice Referral
10.00
Other
0.00
Hospice Ref erral
0
2
1
4
3
6
5
8
7
10
9
12
11
14
13
16
15
18
17
20
19
22
21
24
23
25
WEEK
laf f nice_graphs01.sps 3-25-2006 (new )
Different folks each time
Immortal cohort
63
6-Discussion
Transformations/Death
Imputation
Tidy dataset
Transformation:
Dichotomizing and QOLtd are the only measures that
combine death and QOL (utility, preferences)
Transformation is not appropriate for every variable.
Death should be part of the construct.
Dichotomizing, OK to put death in “low” category
Death is bad health (Hlthrat )
Death is probably bad QOL
May we think of death as bad SX?
Unclear. Maybe death cures SX. (itching)
Does using Pr( Hlthrat >7 | SX) get around this
problem? Only need to assume that dead not healthy.
65
Multiple Imputation
vs. sensitivity analysis
with AUC
66
Person 34 SX, multiple imputation?
pattern for person 34
SXtd, SXtdi
80.0
60.0
40.0
20.0
SXTDI
af ter days af ter enr
0.0
SXTD
-20.0
af ter days af ter enr
0
100
200
300
400
500
laf f nice_graphs01.sps 6-7-2006 (new )
67
Person 34 SX, deaths and missing
pattern for person 34
SX: AUC by trapezoidal rule
80.0
60.0
40.0
20.0
SXTDI
af ter days af ter enr
0.0
SXTD
-20.0
af ter days af ter enr
0
100
200
300
laf f nice_graphs01.sps 6-7-2006 (new )
400
500
Is trapezoidal
rule imputation?
68
To create a tidy dataset
Bin the data in equal-time bins (1 week), 1 bin for each
potential week of f/u
Transform QOL to new 0 to 100 scale where dead=0
Fill in zeroes for potential weeks when person was
Dead
QOLtd
Impute the missing data for potential weeks when
person was alive but data were missing.
QOLt
QOLtdi
BTDI --- Be Tidy!
69
Tidy Dataset
Necessary to place the imputed, dead interviews
Makes it clear what is known when, as everyone
has a value at each potential time
Specifically deals with death and missing data, so
assumptions are clear
“Virtual” tidy dataset may be enough in simpler
datasets
70
Death Matters
Be Tidy