Clinical Trials for Quality of Life Endpoints in Oncology Jeff A. Sloan, Ph.D. Mayo Clinic, Rochester, MN, USA Oncology Education Session Rochester, November 1, 2005
Download ReportTranscript Clinical Trials for Quality of Life Endpoints in Oncology Jeff A. Sloan, Ph.D. Mayo Clinic, Rochester, MN, USA Oncology Education Session Rochester, November 1, 2005
Clinical Trials for Quality of Life Endpoints in Oncology
Jeff A. Sloan, Ph.D.
Mayo Clinic, Rochester, MN, USA
Oncology Education Session Rochester, November 1, 2005
QOL challenges
•
Reliability: if I were to use this tool under the same conditions would I get the same results?
•
Validity: am I measuring what I want to measure?
•
Missing data: imputation, design considerations
Take home message: there is good news
•
There are problems with using QOL assessments as indicators of efficacy in clinical trials.
•
There are scientifically sound solutions to these problems. The problems have been disseminated widely and consistently. The solutions have not.
Checklist for designing, conducting and reporting HRQL PRO in clinical trials Patient Reported Outcomes (PRO) and Regulatory Issues : A European Guidance Document for the improved integration
of health-related quality of life assessment in the drug regulatory process. Chassany O et ERIQA Working Group. Drug Information Journal 2002.
HRQL / PRO objectives
• Added value of HRQL / PRO • Choice of the questionnaires • Hypotheses of HRQL / PRO changes
Study design
• Basic principles of RCT fulfilled ?
• Timing and frequency of assessment • Mode and site of administration...
HRQL / PRO measure
• Description of the measure (items, domains…) • Evidence of validity • Evidence of cultural adaptation
Statistical analysis plan
• Primary or secondary endpoint • Superiority or equivalence trial • Sample size • ITT, type I error, missing data
Reporting of results
• Participation rate, data completeness • Distribution of HRQL / PRO scores
Interpreting the results
• Effect size, • Minimal Clinically Important Difference • Comparison with other criteria / scores • Number needed to treat…
EMEA RECOMMENDATIONS
Points to consider (CPMP/EWP/562/98) on clinical investigation of medicinal products in the chronic treatment of patients with COPD, 1999
•
In the major efficacy studies of symptomatic benefit the primary endpoint should reflect the clinical benefit the applicant wishes to claim in the future SPC (Summary of Product Characteristics)
•
It should include the
FEV1
as a measure of lung function and include a measure of
symptomatic benefit •
A significant benefit for
both endpoints
, should be demonstrated so that no multiplicity adjustment to significance levels is indicated
• The primary symptomatic benefit endpoint
referencing published data which supports its validity; one example is the should be justified by
St George’s Respiratory Questionnaire •
There are number of
secondary endpoints
which may provide useful information. … e.g. symptom scales, … and
quality of life assessment
QOL: The big picture
QOL
Social Spiritual Physical Emotional Intellectual Activities of Daily Living (ADL) Symptoms/ Toxicity Health Status (HS) HRQOL NHRQOL QOL is not Survival or Treatment Response
Symptoms and QOL: Is there a difference?
•
If you count the number of emetic episodes, you are assessing a symptom
•
If you ask the patient how bad their nausea is, you are assessing QOL
•
The measurement issues and analytical procedures are the same
•
Literature is converging to the term patient-reported outcomes (PRO)
Developmental Timeline of Commonly Used QOL Measurement Tools Uni scale SF-36 EORTC QLQ C30 BPI POMS COOP/ WONCA EQ5D BFI '67 '78 '85 '86 '87 '88 '89 '91 '92 '94 '95 '98 '99 '00 SDS FLIC FACT & CARES ESAS SF-12 MDASI SF-8
What is an Appropriate QOL Instrument?
• • • • •
Research objective (HYPOTHESIS DRIVEN) Specific rationale for the QOL part of the study Relevant domains of QOL (LIST & MATCH) Disease and patient population characteristics Psychometric characteristics (reliability & validity) of QOL instrument
•
Practical considerations (e.g. respondent burden, language translations)
Timing of QOL Assessment
•
Study objective
•
Characteristics and natural course of disease
•
Baseline and one follow-up QOL assessment are necessary
•
Treatment regimen
•
Similar timing of QOL assessment across treatment arms
•
Expected effects of the treatment
QOL Research Themes
1. Assess QOL within clinical trials with efficiency, consistency, specificity 2. Improve QOL methodology 3. Develop intervention studies targeted at QOL endpoints
QOL in NCCTG Clinical Trials
•
Since 1995, 84 trials with QOL component
•
>50 different QOL questionnaires used
•
>20 papers per year published with QOL
•
Average baseline compliance rate: 94%
What underlies these QOL metrics?
•
“NCCTG does not experience the problems that other groups report with respect to QOL”.
•
“Efforts to make the inclusion of QOL components in treatment trials easy and efficient have been well received by investigators”.
(Integrating cancer control research into the CCOP network: a case study of the NCCTG, NCI, 2004)
QOL Team Resources
•
MD tumor group liaisons
•
Operations manual
•
Forms bank
•
Literature bank
•
Background templates
•
Web-based utilities
Cancer Patient Assessment
•
Cancer patient assessment involves tumor growth and survival data.
•
We measure these scientifically and the effect of interventions on these endpoints.
•
Cancer also involves other things besides tumors and reduced lifespan that can be measured…..
… by answering scientific questions
•
What is the value added of loooooong QOL assessments to treatment trials?
•
What is the evidence for the use of single-item QOL assessments?
•
How do you deal with multiple endpoints?
•
How do you handle missing data?
•
What is the clinical significance of QOL assessments?
What is the value added of additional questions?
Less is More
•
Numerous studies indicate shorter assessments are “just as good” as longer assessments
•
Bernhard. single item quality of life indicators in cancer clinical trials. Brit J Cancer 84(9)1156-1165, 2002
• •
Vickers. Contolled Clinical Trials, 24: 731 – 735, 2003 Abdel-Khalek. Measuring anxiety. Death Studies 22(8):763-772, 1998
• • •
Gardner. Ed Psych Measurement 58(6):898-915, 1998 Sloan. Overall QOL. JCO 16:3662-3673, 1998 Sloan. Clinical significance of single items relative to summated scores. Mayo Clinic Proc 77: 479-487, 2002
Sloan et al, Biopharm Stat 14(1): 73-96, 2004.
Single-Item or Multiple-Item PRO?
Situations where a single item may suffice
Phase II study attempting to assess whether a treatment has any impact on QOL A stratification factor for the presence or absence of depressive issues Need to assess fatigue/pain as a correlate of toxicity (brief fatigue/pain inventory) Identifying patients who have need of further QOL assessment (e.g., score of 6 or less on a single item) A clinical setting wherein a basic idea of which domains of QOL (mental, physical, social) may be affected by a particular treatment or situation
Situations where a multi-item index may be needed
A Phase III study where it is known that QOL is impacted and more delineation of which QOL components are affected is needed A screen to identify the presence or absence of clinical depression Need to assess the impact of fatigue/pain on the activities of daily living (ADL items for pain/fatigue) Detailing the QOL-related issues once a cut off score on a single item has been obtained A clinical setting wherein precise indications of the way in which the different domains of QOL may be affected by a particular treatment or situation
Sloan et al, Mayo Clinic Proc 77: 479-487, 2002.
A Comparison of Simple Single-Item Measures and the Common Toxicity Criteria in Detecting the Onset of Oxaliplatin-Induced Peripheral Neuropathy in Patients with Colorectal Cancer
R. F. Morton, J. A. Sloan, A. Grothey, D. J. Sargent, H. McLeod, E. M. Green, C. Fuchs, R. K. Ramanathan, S. K. Williamson, R. M. Goldberg ASCO 2005
Background
•
Peripheral neuropathy (PN) is common during treatment with Oxaliplatin
•
Assessment of PN is historically done via the Common Toxicity Criteria (CTC)
•
We developed a single-item numerical analogue scale assessment to help measure PN
•
We compared the two measures to look at the sensitivity of the CTC in detecting the onset of PN
Methods
•
696 patients randomized to FOLFOX4
•
PN assessed bi-weekly during treatment
•
NAS filled out at baseline and every 12 weeks during treatment
A T I O N R A N D O M I Z
NCCTG/Intergroup Trial N9741
IFL: Irinotecan + 5-FU/LV FOLFOX4: Oxaliplatin + 5-FU/LV IROX: Irinotecan + Oxaliplatin
Goldberg et al, JCO 2004
NAS Tools
An Empirical Anomaly
• According to CTC only 20% of patients experienced serious PN • Clinical knowledge suggested the incidence rate should be much higher (about 80%)
Agreement
2 Point Change in QOL No (N=420) No (N=440) 308 Yes (N=276)
132
Grade 2+ PN Grade 3+ PN Yes (N=256) No (N=597) Yes (N=99)
112
380
40
144
217
59 % Agreement 65% 63% Kappa Statistic 0.25
0.13
The agreement of < 65% indicates CTC and NAS measure different aspects of PN.
Dose to PN: CTC versus NAS Which Comes First?
100 80 60 40 20 2 Point Change in QOL Grade 3+ PN 0 0 500 1000
Dose (mg/m 2 )
1500
Median dose to NAS CSD of 424 mg/m2 versus 765 ( 961) mg/m2 for CTC grade 2+ (3+) event
2000
Time to PN: CTC versus NAS Which Comes First?
100 80 60 40 20 2 Point Change in QOL Grade 3+ PN 0 0 0.5
1
Time
1.5
2
Patients notice an increase in PN two or three months earlier via the NAS
Conclusions
•
Grade 2+ PN is found to be a significant problem according to the NAS
•
Using CTC, PN is under-reported
• •
NAS may allow for earlier detection NAS should be used in conjunction with CTC
Are the occurrence of adverse events and clinically significant changes in symptom specific and global quality of life measures predictable?
Sumithra J. Mandrekar, Ph.D.
Mashele M. Huschka, B.S.
James R. Jett, M.D.
Jeff A. Sloan, Ph.D.
Mayo Clinic Rochester, MN
NCCTG Lung Cancer Trials
Study Number
95-20-53 95-24-52 97-24-51 98-24-52 N0021 N0022
Description
A Pilot Study of High-Dose Thoracic Radiation Therapy w/ Concomitant Cisplatin/Etoposide in Limited-Stage SCLC A Phase II Trial of Edatrexate in Combo w/ Vinblastine, Adriamycin, Cisplatin & Filgrastim in Pts w/ Advanced NSCLC Phase III Randomized, Double-Blind Study of CAI & Placebo w/ Advanced NSCLC Randomized Phase II Study of Docetaxel & Gemcitabine for Stage IIIB/IV NSCLC Phase II Study of Gemcitabine and Epirubicin for the Treatment of Mesothelioma Oral Vinorelbine For the Treatment of Metastatic Non-Small Cell Lung Cancer in Patients >= 65 Years of Age: A Phase II Trial of Efficacy, Toxicity, and Patients' Perceived Preference for Oral Therapy
Sample Size
76 34 177 99 68 58
Assessments Assessment Schedule
Uniscale LCSS Baseline, prior to irradiation, prior to last cycle and at 3 months, 1 year & 2 year follow-up visits Uniscale FACT-L v3 Baseline and prior to each treatment cycle Uniscale FACT-L v4 Baseline and monthly during course of treatment Uniscale LCCS Baseline and prior to each treatment cycle Uniscale SDS Baseline, at each evaluation and 3 months & 1 year follow-up visits Uniscale LCSS Baseline and immediately after completion of second cycle of chemotherapy
QOL Assessments
•
Spitzer’s Uniscale
•
1 question for the global assessment of QOL
•
Functional Assessment of Cancer Therapy Lung (FACT-L)
•
27 questions divided into 4 well-being constructs: physical, social/family, emotional, and functional
•
10 questions specific to lung cancer
•
Lung Cancer Symptom Scale (LCSS)
•
9 questions pertaining to lung cancer symptoms
•
Symptom Distress Scale (SDS)
•
12 questions related to symptoms commonly experienced by cancer patients
Determine the relationship of a single-item assessment with the multiple-item summated scales
Post-Baseline QOL
Uniscale
Mean (SD) Median Range
Multiple-items
Mean (SD) Median Range
FACT-L (N=148) LCSS & Uniscale (N=164) SDS (N=46) Total (N=358)
71.1 (19.13) 75.0
(0.0-97.0) 68.6 (25.44) 76.5
(1.0-100.0) 65.5 (23.05) 68.5
(4.0-97.0) 69.6 (22.31) 75.0
(0.0-100.0) 74.9 (12.26) 75.7
(30.7-99.3) 72.0 (16.26) 74.0
(0.0-99.3) 73.9 (14.53) 77.9
(38.5-96.2) 73.6 (14.22) 75.2
(0.0-99.3)
Spearman Rank Correlations between the Uniscale and the FACT-L, LCSS, and SDS were 0.66, 0.57, and 0.49 respectively
When QOL is high: Uniscale > LCSS When QOL is low: Uniscale < LCSS
Greater variability in Uniscale Scores Correlation=0.43
Determine if clinically significant declines are more readily detected by a single-item or multiple-item assessment
Individual Patient Data over time; Greater variability in Uniscale Scores
Uniscale
n (%)
Clinically Significant Decline (CSD) [10-point decline on a 0-100 scale] FACT-L (N=120)* LCSS & Uniscale (N=152)* SDS (N=45)* Total (N=317)*
73 (60.8%) 91 (59.9%) 20 (44.4%) 184 (58.0%)
Multiple-items
n (%) 46 (38.3%) 66 (43.4%) 13 (28.9%) 125 (39.4%)
Percent Agreement
56% 59% 71% 59% *Represents the number of patients that completed the Uniscale at baseline and at least once post-baseline
and
completed a multiple-item assessment at baseline and at least once post-baseline
Uniscale more likely to detect a CSD in QOL than the multiple-item assessments (58% vs. 39%)
The overall percent agreement in detecting a CSD in QOL between Uniscale and multiple-item assessments was 59%
Determine how single-item assessment and multiple-item summated scales relate to adverse events data
Adverse Events (AE)
•
Severe adverse event attribution
•
is defined as a grade 3, 4, or 5, regardless of 33% experienced a severe AE post baseline
•
Nine AEs experienced by at least 2% of the population that can also be collected via a QOL instrument
• • •
Alopecia, Anorexia, Constipation, Diarrhea, Dyspnea, Fatigue, Nausea, Neurosensory, Vomiting 95% experienced at least one of the nine AEs 20% had at least one of the nine graded as severe
•
CSD in AE is defined as a baseline AE of grade 0, 1, or 2 that changes to a grade 3, 4, or 5 post baseline
Severe AE and CSD in QOL
Uniscale
Number evaluable* Severe AE CSD in QOL Percent agreement
FACT-L & Uniscale
122 26 (21.3%) 74 (60.7%) 46%
LCSS & Uniscale
155 74 (47.7%) 92 (59.3%) 51%
SDS & Uniscale
46 17 (37.0%) 20 (43.5%) 46%
Total
323 117 (36.2%) 186 (57.6%) 48%
Multiple-items
Number evaluable* Severe AE CSD in QOL 140 30 (21.4%) 52 (37.1%) 156 76 (48.7%) 67 (43.0%) 45 17 (37.8%) 13 (28.9%) 341 123 (36.1%) 132 (38.7%) Percent agreement 64% 53% 60% 59% *Represents the number of patients that had an adverse event (any grade)
and
least once post-baseline completed a QOL assessment at baseline and at
CSD in AE and CSD in QOL
Alopecia Anorexia Constipation Diarrhea Uniscale
Number evaluable* CSD in AE CSD in QOL Percent agreement 139 2 (1.4%) 75 (54.0%) 46% 94 8 (8.5%) 61 (64.9%) 37% 69 6 (8.7%) 37 (53.6%) 52% 73 13 (17.8%) 44 (60.2%) 41%
Multiple-items
Number evaluable* CSD in AE CSD in QOL 145 2 (1.4%) 59 (40.7%) 99 9 (9.1%) 50 (50.5%) 72 6 (8.3%) 17 (23.6%) 77 15 (19.5%) 31 (40.3%) Percent agreement 59% 44% 74% 53% *Represents the number of patients that had a baseline and post-baseline adverse event (any grade)
and
completed a QOL assessment at baseline and at least once post-baseline
CSD in AE and CSD in QOL
Dyspnea Fatigue Nausea Neuro sensory Vomiting Uniscale
Number evaluable* CSD in AE CSD in QOL Percent agreement 155 43 (27.7%) 90 (58.1%) 46% 226 42 (18.6%) 139 (61.5%) 42% 208 34 (16.4%) 118 (56.7%) 42% 189 9 (4.8%) 116 (61.4%) 39% 142 23 (16.2%) 72 (50.7%) 50%
Multiple-items
Number evaluable* CSD in AE CSD in QOL 159 43 (27.0%) 67 (42.1%) 236 45 (19.1%) 96 (40.1%) 216 33 (15.3%) 84 (38.9%) 202 11 (5.5%) 73 (36.1%) 150 23 (15.3%) 60 (40.0%) Percent agreement 57% 59% 58% 64% 63% *Represents the number of patients that had a baseline and post-baseline adverse event (any grade)
and
completed a QOL assessment at baseline and at least once post-baseline
100 90 80 70 60 50 40 30 20 10 0 0
K-M Estimate of the Time to First Occurrence of Severe AE and CSD in QOL
1 2
Time (years)
3
Severe AE Median: 304 days Multiple-item Median: 142 days Uniscale Median: 67 days
4 5
100 90 80 70 60 50 40 30 20 10 0 0
K-M Estimate of Time to First Occurrence of Severe Fatigue and CSD in LCSS Fatigue AE LCSS Median: 81 days 70%
1
Time (years)
2
12%
3
100 90 80 70 60 50 40 30 20 10 0 0 1 2
K-M Estimate of Time to First Occurrence of Severe Fatigue and CSD in SDS Fatigue AE SDS Median: 52 days
•
6 events reported via CTC
•
25 CSD reported via SDS
3 4 5 6 7
Time (months)
8
83.7%
9 10 11 12
25.3%
Summary
•
Uniscale demonstrates greater variability than the multiple-item indices
•
The Uniscale is better able to detect a CSD in QOL than the multiple item assessments, and captures a CSD earlier than the multiple item assessments
•
Correlations and percent agreement between Uniscale and multiple-item assessments were modest
Summary
•
There is indication that a CSD in QOL occurs earlier than CTC AE reporting
•
Consistent with a recent finding that single-item QOL assessments detect a patient-perceived problem in peripheral neuropathy more than six weeks earlier than CTC (Morton et al, ASCO 2005)
•
The multiple-item assessments are in better agreement with occurrence or CSD in AE compared to the Uniscale
What is the evidence for the use of simple (single item) LASA’s?
The literature for simple assessments is considerable
• • • • • •
Grunberg S.M. (1996). Comparison of conditional quality of life terminology and visual analogue scale measurements. Quality of Life Research; 5: 65-72.
Gudex C. (1996). Health state valuations from the general public using the Visual Analogue Scale. Quality of Life Research, 5: 521-531.
Hyland ME. Development of a new type of global quality of life scale and comparison and preference for 12 global scales. Quality of Life Research. 5(5): 469-480. 1996.
Sriwatanakul, K. (1983). Studies with different types of visual analog scales for measurement of pain; Clinical Pharmacology and Therapeutics; 34(2): 234-239.
Wewers ME. (1990). A Critical Review of Visual Analogue Scales in the Measurement of Clinical Phenomena. Research in Nursing & Health, 13: 227 236.
Bretscher M. (1999). Quality of Life in Hospice Patients: A Pilot Study, Psychosomatics, 40, 309-313.
The Visual Analogue Uniscale Please mark with an ‘X’ the appropriate place within the bar to indicate your rating of this person’s quality of life during the past week.
Lowest quality applies to someone completely dependent physically on others, seriously impaired mentally, unaware of surroundings, and in a hopeless position.
Highest quality applies to someone physically and mentally independent, communicating well with others, able to do most of the things enjoyed, pulling own weight, with a hopeful yet realistic attitude. Lowest Quality (Please mark one ‘X’ within the bar) Highest Quality
Uniscale-NAS (Numeric Analog Scale)
Directions: Please circle the number (0-10) best reflecting your response to the following that describes your feelings
during the past week, including today
.
How would you describe:
1.
your overall Quality of Life?
0 As bad as it can be 1 2 3 4 5 6 7 8 9 10 As good as it can be
Linear Analogue Self Assessment (LASA)
•
General measure of global QOL dimensional constructs
•
Overall QOL Uniscale question plus domain specific questions
•
LASA 6 questions
•
covering domains: QOL, Mental, Social, Spiritual, Emotional, Physical e.g. How would you describe your overall physical well-being during the past week, including today? (0: as bad as it can be; 10: as good as it can be)
•
LASA additional items (any understandable construct) e.g. How would you describe your anxiety during the past week, including today? (0: anxiety as bad as it can be; 10: no anxiety)
LASA Validity Data
•
Median split adds 3 months to median survival in advanced cancer patients (Sloan, JCO, 1998)
•
Qualitative study: score of 5 or less indicates need for intervention (Frost, unpublished)
•
“Stable” populations average roughly 7, with SD roughly 2 on 10-point scale (20 on 100 pointt scale) (Locke, in preparation)
LASA Norms (Various)
•
Hospice patients 7.6
•
Advanced cancer patients 7.2
•
Recovering surgical patients 6.6
•
Healthy volunteers 8.2
•
Medical students 4.4
A Structured Multidisciplinary Psychosocial Intervention Improves the Quality of Life of Patients with Advanced Stage Cancer
T Rummans, M Clark, J Sloan, M Frost, P Atherton, M Bostwick, G Gamble, M Johnson, J Richardson Mayo Clinic, Rochester, MN
In press, JCO
Background
•
Some studies have suggested a psychosocial intervention has a positive effect on survival, while others have not demonstrated such an effect or suggested a negative effect on survival.
(Spiegel, 1990; Goodwin NEJM 2001; Spiegel, Cancer, 2002)
•
Most interventions are single - focus and have targeted mood (Fawzy, AGP,1993; Jacobsen JCO 2002; Kolden, Psycho-Onc. 2002)
Motivation for the present study
• A multidisciplinary intervention had not been tried nor tested for feasibility • Overall QOL is the composite, multidimensional psychosocial target
Study Schema Patients with Advanced Stage Disease scheduled to undergo radiation therapy R Arm A: Structured multi disciplinary psychosocial intervention.
8 - 90 minute sessions over 4 weeks Arm B: Standard Care QOL assessed at baseline and week 4 (EOT) Stratification: tumor type, ECOG PS, age
Secondary endpoint assessment tools
Linear Analogue Self Assessment (LASA) items Profile of Mood States – Short Form (POMS) Symptom Distress Scale (SDS) FACIT – Spiritual Well-Being
Which is the “real” symptom endpoint?
Treatment B QOL score 1 3 2 Treatment A Patient #1 Patient #2
Baseline t 1 t 2 t 3
…
t k = End of Treatment
1 = Change over time for an individual (from baseline) 2 = Difference between groups at a point (or area under the curve) 3 = Differences in changes from baseline
Primary Result: Overall QOL at 4 weeks At week 4, overall QOL was 10 points higher in the intervention arm than in the standard care arm (80 versus 70 on the 100-point scale respectively, p=0.047).
The treatment group improved 3.3 points from baseline, while the control group decreased 8.9 points on average, p=0.009.
More than three times as many patients in the treatment group reported a 10-point improvement in QOL from baseline compared to the control group (30% versus 9%, p=0.004).
Overall QOL Mental Well-being Physical Well-being MC997C: Mean LASA Scores at Week 4 Normal Range 0-10, Best=10 Transformed Range 0-100, Best=100 Emotional Well-being Social Activity 7.6 6.2 Spiritual Well-being 8.4 MC997C: Mean POMS Scores at Week 4 SUBSCALE SCORES Tension-Anxiety Subscale Depression-Dejection Subscale Anger-Hostility Subscale Vigor-Activity Subscale Fatigue-Inertia Subscale Confusion-Bewilderment Subscale 7.3 7.8 6.3 Normal Range 0-20, Best=20 16.3 17.7 17.8 7.2 11.8 16.5 72.8 77.8 63.3 75.9 62.4 83.9 Transformed Range 0-100, Best=100 81.7 88.5 88.9 36.0 59.2 82.5 TOTAL SCORES POMS-SF Total Score Range 0-120, Best=120 Range 0-100, Best=100 87.4 72.8
best worst