Clinical Trials for Quality of Life Endpoints in Oncology Jeff A. Sloan, Ph.D. Mayo Clinic, Rochester, MN, USA Oncology Education Session Rochester, November 1, 2005

Download Report

Transcript Clinical Trials for Quality of Life Endpoints in Oncology Jeff A. Sloan, Ph.D. Mayo Clinic, Rochester, MN, USA Oncology Education Session Rochester, November 1, 2005

Clinical Trials for Quality of Life Endpoints in Oncology

Jeff A. Sloan, Ph.D.

Mayo Clinic, Rochester, MN, USA

Oncology Education Session Rochester, November 1, 2005

QOL challenges

Reliability: if I were to use this tool under the same conditions would I get the same results?

Validity: am I measuring what I want to measure?

Missing data: imputation, design considerations

Take home message: there is good news

There are problems with using QOL assessments as indicators of efficacy in clinical trials.

There are scientifically sound solutions to these problems. The problems have been disseminated widely and consistently. The solutions have not.

Checklist for designing, conducting and reporting HRQL PRO in clinical trials Patient Reported Outcomes (PRO) and Regulatory Issues : A European Guidance Document for the improved integration

of health-related quality of life assessment in the drug regulatory process. Chassany O et ERIQA Working Group. Drug Information Journal 2002.

HRQL / PRO objectives

• Added value of HRQL / PRO • Choice of the questionnaires • Hypotheses of HRQL / PRO changes

Study design

• Basic principles of RCT fulfilled ?

• Timing and frequency of assessment • Mode and site of administration...

HRQL / PRO measure

• Description of the measure (items, domains…) • Evidence of validity • Evidence of cultural adaptation

Statistical analysis plan

• Primary or secondary endpoint • Superiority or equivalence trial • Sample size • ITT, type I error, missing data

Reporting of results

• Participation rate, data completeness • Distribution of HRQL / PRO scores

Interpreting the results

• Effect size, • Minimal Clinically Important Difference • Comparison with other criteria / scores • Number needed to treat…

EMEA RECOMMENDATIONS

Points to consider (CPMP/EWP/562/98) on clinical investigation of medicinal products in the chronic treatment of patients with COPD, 1999

In the major efficacy studies of symptomatic benefit the primary endpoint should reflect the clinical benefit the applicant wishes to claim in the future SPC (Summary of Product Characteristics)

It should include the

FEV1

as a measure of lung function and include a measure of

symptomatic benefit •

A significant benefit for

both endpoints

, should be demonstrated so that no multiplicity adjustment to significance levels is indicated

• The primary symptomatic benefit endpoint

referencing published data which supports its validity; one example is the should be justified by

St George’s Respiratory Questionnaire •

There are number of

secondary endpoints

which may provide useful information. … e.g. symptom scales, … and

quality of life assessment

QOL: The big picture

QOL

Social Spiritual Physical Emotional Intellectual Activities of Daily Living (ADL) Symptoms/ Toxicity Health Status (HS) HRQOL NHRQOL QOL is not Survival or Treatment Response

Symptoms and QOL: Is there a difference?

If you count the number of emetic episodes, you are assessing a symptom

If you ask the patient how bad their nausea is, you are assessing QOL

The measurement issues and analytical procedures are the same

Literature is converging to the term patient-reported outcomes (PRO)

Developmental Timeline of Commonly Used QOL Measurement Tools Uni scale SF-36 EORTC QLQ C30 BPI POMS COOP/ WONCA EQ5D BFI '67 '78 '85 '86 '87 '88 '89 '91 '92 '94 '95 '98 '99 '00 SDS FLIC FACT & CARES ESAS SF-12 MDASI SF-8

What is an Appropriate QOL Instrument?

• • • • •

Research objective (HYPOTHESIS DRIVEN) Specific rationale for the QOL part of the study Relevant domains of QOL (LIST & MATCH) Disease and patient population characteristics Psychometric characteristics (reliability & validity) of QOL instrument

Practical considerations (e.g. respondent burden, language translations)

Timing of QOL Assessment

Study objective

Characteristics and natural course of disease

Baseline and one follow-up QOL assessment are necessary

Treatment regimen

Similar timing of QOL assessment across treatment arms

Expected effects of the treatment

QOL Research Themes

1. Assess QOL within clinical trials with efficiency, consistency, specificity 2. Improve QOL methodology 3. Develop intervention studies targeted at QOL endpoints

QOL in NCCTG Clinical Trials

Since 1995, 84 trials with QOL component

>50 different QOL questionnaires used

>20 papers per year published with QOL

Average baseline compliance rate: 94%

What underlies these QOL metrics?

“NCCTG does not experience the problems that other groups report with respect to QOL”.

“Efforts to make the inclusion of QOL components in treatment trials easy and efficient have been well received by investigators”.

(Integrating cancer control research into the CCOP network: a case study of the NCCTG, NCI, 2004)

QOL Team Resources

MD tumor group liaisons

Operations manual

Forms bank

Literature bank

Background templates

Web-based utilities

Cancer Patient Assessment

Cancer patient assessment involves tumor growth and survival data.

We measure these scientifically and the effect of interventions on these endpoints.

Cancer also involves other things besides tumors and reduced lifespan that can be measured…..

… by answering scientific questions

What is the value added of loooooong QOL assessments to treatment trials?

What is the evidence for the use of single-item QOL assessments?

How do you deal with multiple endpoints?

How do you handle missing data?

What is the clinical significance of QOL assessments?

What is the value added of additional questions?

Less is More

Numerous studies indicate shorter assessments are “just as good” as longer assessments

Bernhard. single item quality of life indicators in cancer clinical trials. Brit J Cancer 84(9)1156-1165, 2002

• •

Vickers. Contolled Clinical Trials, 24: 731 – 735, 2003 Abdel-Khalek. Measuring anxiety. Death Studies 22(8):763-772, 1998

• • •

Gardner. Ed Psych Measurement 58(6):898-915, 1998 Sloan. Overall QOL. JCO 16:3662-3673, 1998 Sloan. Clinical significance of single items relative to summated scores. Mayo Clinic Proc 77: 479-487, 2002

Sloan et al, Biopharm Stat 14(1): 73-96, 2004.

Single-Item or Multiple-Item PRO?

Situations where a single item may suffice

Phase II study attempting to assess whether a treatment has any impact on QOL A stratification factor for the presence or absence of depressive issues Need to assess fatigue/pain as a correlate of toxicity (brief fatigue/pain inventory) Identifying patients who have need of further QOL assessment (e.g., score of 6 or less on a single item) A clinical setting wherein a basic idea of which domains of QOL (mental, physical, social) may be affected by a particular treatment or situation

Situations where a multi-item index may be needed

A Phase III study where it is known that QOL is impacted and more delineation of which QOL components are affected is needed A screen to identify the presence or absence of clinical depression Need to assess the impact of fatigue/pain on the activities of daily living (ADL items for pain/fatigue) Detailing the QOL-related issues once a cut off score on a single item has been obtained A clinical setting wherein precise indications of the way in which the different domains of QOL may be affected by a particular treatment or situation

Sloan et al, Mayo Clinic Proc 77: 479-487, 2002.

A Comparison of Simple Single-Item Measures and the Common Toxicity Criteria in Detecting the Onset of Oxaliplatin-Induced Peripheral Neuropathy in Patients with Colorectal Cancer

R. F. Morton, J. A. Sloan, A. Grothey, D. J. Sargent, H. McLeod, E. M. Green, C. Fuchs, R. K. Ramanathan, S. K. Williamson, R. M. Goldberg ASCO 2005

Background

Peripheral neuropathy (PN) is common during treatment with Oxaliplatin

Assessment of PN is historically done via the Common Toxicity Criteria (CTC)

We developed a single-item numerical analogue scale assessment to help measure PN

We compared the two measures to look at the sensitivity of the CTC in detecting the onset of PN

Methods

696 patients randomized to FOLFOX4

PN assessed bi-weekly during treatment

NAS filled out at baseline and every 12 weeks during treatment

A T I O N R A N D O M I Z

NCCTG/Intergroup Trial N9741

IFL: Irinotecan + 5-FU/LV FOLFOX4: Oxaliplatin + 5-FU/LV IROX: Irinotecan + Oxaliplatin

Goldberg et al, JCO 2004

NAS Tools

An Empirical Anomaly

• According to CTC only 20% of patients experienced serious PN • Clinical knowledge suggested the incidence rate should be much higher (about 80%)

Agreement

2 Point Change in QOL No (N=420) No (N=440) 308 Yes (N=276)

132

Grade 2+ PN Grade 3+ PN Yes (N=256) No (N=597) Yes (N=99)

112

380

40

144

217

59 % Agreement 65% 63% Kappa Statistic 0.25

0.13

The agreement of < 65% indicates CTC and NAS measure different aspects of PN.

Dose to PN: CTC versus NAS Which Comes First?

100 80 60 40 20 2 Point Change in QOL Grade 3+ PN 0 0 500 1000

Dose (mg/m 2 )

1500

Median dose to NAS CSD of 424 mg/m2 versus 765 ( 961) mg/m2 for CTC grade 2+ (3+) event

2000

Time to PN: CTC versus NAS Which Comes First?

100 80 60 40 20 2 Point Change in QOL Grade 3+ PN 0 0 0.5

1

Time

1.5

2

Patients notice an increase in PN two or three months earlier via the NAS

Conclusions

Grade 2+ PN is found to be a significant problem according to the NAS

Using CTC, PN is under-reported

• •

NAS may allow for earlier detection NAS should be used in conjunction with CTC

Are the occurrence of adverse events and clinically significant changes in symptom specific and global quality of life measures predictable?

Sumithra J. Mandrekar, Ph.D.

Mashele M. Huschka, B.S.

James R. Jett, M.D.

Jeff A. Sloan, Ph.D.

Mayo Clinic Rochester, MN

NCCTG Lung Cancer Trials

Study Number

95-20-53 95-24-52 97-24-51 98-24-52 N0021 N0022

Description

A Pilot Study of High-Dose Thoracic Radiation Therapy w/ Concomitant Cisplatin/Etoposide in Limited-Stage SCLC A Phase II Trial of Edatrexate in Combo w/ Vinblastine, Adriamycin, Cisplatin & Filgrastim in Pts w/ Advanced NSCLC Phase III Randomized, Double-Blind Study of CAI & Placebo w/ Advanced NSCLC Randomized Phase II Study of Docetaxel & Gemcitabine for Stage IIIB/IV NSCLC Phase II Study of Gemcitabine and Epirubicin for the Treatment of Mesothelioma Oral Vinorelbine For the Treatment of Metastatic Non-Small Cell Lung Cancer in Patients >= 65 Years of Age: A Phase II Trial of Efficacy, Toxicity, and Patients' Perceived Preference for Oral Therapy

Sample Size

76 34 177 99 68 58

Assessments Assessment Schedule

Uniscale LCSS Baseline, prior to irradiation, prior to last cycle and at 3 months, 1 year & 2 year follow-up visits Uniscale FACT-L v3 Baseline and prior to each treatment cycle Uniscale FACT-L v4 Baseline and monthly during course of treatment Uniscale LCCS Baseline and prior to each treatment cycle Uniscale SDS Baseline, at each evaluation and 3 months & 1 year follow-up visits Uniscale LCSS Baseline and immediately after completion of second cycle of chemotherapy

QOL Assessments

Spitzer’s Uniscale

1 question for the global assessment of QOL

Functional Assessment of Cancer Therapy Lung (FACT-L)

27 questions divided into 4 well-being constructs: physical, social/family, emotional, and functional

10 questions specific to lung cancer

Lung Cancer Symptom Scale (LCSS)

9 questions pertaining to lung cancer symptoms

Symptom Distress Scale (SDS)

12 questions related to symptoms commonly experienced by cancer patients

Determine the relationship of a single-item assessment with the multiple-item summated scales

Post-Baseline QOL

Uniscale

Mean (SD) Median Range

Multiple-items

Mean (SD) Median Range

FACT-L (N=148) LCSS & Uniscale (N=164) SDS (N=46) Total (N=358)

71.1 (19.13) 75.0

(0.0-97.0) 68.6 (25.44) 76.5

(1.0-100.0) 65.5 (23.05) 68.5

(4.0-97.0) 69.6 (22.31) 75.0

(0.0-100.0) 74.9 (12.26) 75.7

(30.7-99.3) 72.0 (16.26) 74.0

(0.0-99.3) 73.9 (14.53) 77.9

(38.5-96.2) 73.6 (14.22) 75.2

(0.0-99.3)

Spearman Rank Correlations between the Uniscale and the FACT-L, LCSS, and SDS were 0.66, 0.57, and 0.49 respectively

When QOL is high: Uniscale > LCSS When QOL is low: Uniscale < LCSS

Greater variability in Uniscale Scores Correlation=0.43

Determine if clinically significant declines are more readily detected by a single-item or multiple-item assessment

Individual Patient Data over time; Greater variability in Uniscale Scores

Uniscale

n (%)

Clinically Significant Decline (CSD) [10-point decline on a 0-100 scale] FACT-L (N=120)* LCSS & Uniscale (N=152)* SDS (N=45)* Total (N=317)*

73 (60.8%) 91 (59.9%) 20 (44.4%) 184 (58.0%)

Multiple-items

n (%) 46 (38.3%) 66 (43.4%) 13 (28.9%) 125 (39.4%)

Percent Agreement

56% 59% 71% 59% *Represents the number of patients that completed the Uniscale at baseline and at least once post-baseline

and

completed a multiple-item assessment at baseline and at least once post-baseline 

Uniscale more likely to detect a CSD in QOL than the multiple-item assessments (58% vs. 39%)

The overall percent agreement in detecting a CSD in QOL between Uniscale and multiple-item assessments was 59%

Determine how single-item assessment and multiple-item summated scales relate to adverse events data

Adverse Events (AE)

Severe adverse event attribution

is defined as a grade 3, 4, or 5, regardless of 33% experienced a severe AE post baseline

Nine AEs experienced by at least 2% of the population that can also be collected via a QOL instrument

• • •

Alopecia, Anorexia, Constipation, Diarrhea, Dyspnea, Fatigue, Nausea, Neurosensory, Vomiting 95% experienced at least one of the nine AEs 20% had at least one of the nine graded as severe

CSD in AE is defined as a baseline AE of grade 0, 1, or 2 that changes to a grade 3, 4, or 5 post baseline

Severe AE and CSD in QOL

Uniscale

Number evaluable* Severe AE CSD in QOL Percent agreement

FACT-L & Uniscale

122 26 (21.3%) 74 (60.7%) 46%

LCSS & Uniscale

155 74 (47.7%) 92 (59.3%) 51%

SDS & Uniscale

46 17 (37.0%) 20 (43.5%) 46%

Total

323 117 (36.2%) 186 (57.6%) 48%

Multiple-items

Number evaluable* Severe AE CSD in QOL 140 30 (21.4%) 52 (37.1%) 156 76 (48.7%) 67 (43.0%) 45 17 (37.8%) 13 (28.9%) 341 123 (36.1%) 132 (38.7%) Percent agreement 64% 53% 60% 59% *Represents the number of patients that had an adverse event (any grade)

and

least once post-baseline completed a QOL assessment at baseline and at

CSD in AE and CSD in QOL

Alopecia Anorexia Constipation Diarrhea Uniscale

Number evaluable* CSD in AE CSD in QOL Percent agreement 139 2 (1.4%) 75 (54.0%) 46% 94 8 (8.5%) 61 (64.9%) 37% 69 6 (8.7%) 37 (53.6%) 52% 73 13 (17.8%) 44 (60.2%) 41%

Multiple-items

Number evaluable* CSD in AE CSD in QOL 145 2 (1.4%) 59 (40.7%) 99 9 (9.1%) 50 (50.5%) 72 6 (8.3%) 17 (23.6%) 77 15 (19.5%) 31 (40.3%) Percent agreement 59% 44% 74% 53% *Represents the number of patients that had a baseline and post-baseline adverse event (any grade)

and

completed a QOL assessment at baseline and at least once post-baseline

CSD in AE and CSD in QOL

Dyspnea Fatigue Nausea Neuro sensory Vomiting Uniscale

Number evaluable* CSD in AE CSD in QOL Percent agreement 155 43 (27.7%) 90 (58.1%) 46% 226 42 (18.6%) 139 (61.5%) 42% 208 34 (16.4%) 118 (56.7%) 42% 189 9 (4.8%) 116 (61.4%) 39% 142 23 (16.2%) 72 (50.7%) 50%

Multiple-items

Number evaluable* CSD in AE CSD in QOL 159 43 (27.0%) 67 (42.1%) 236 45 (19.1%) 96 (40.1%) 216 33 (15.3%) 84 (38.9%) 202 11 (5.5%) 73 (36.1%) 150 23 (15.3%) 60 (40.0%) Percent agreement 57% 59% 58% 64% 63% *Represents the number of patients that had a baseline and post-baseline adverse event (any grade)

and

completed a QOL assessment at baseline and at least once post-baseline

100 90 80 70 60 50 40 30 20 10 0 0

K-M Estimate of the Time to First Occurrence of Severe AE and CSD in QOL

1 2

Time (years)

3

Severe AE Median: 304 days Multiple-item Median: 142 days Uniscale Median: 67 days

4 5

100 90 80 70 60 50 40 30 20 10 0 0

K-M Estimate of Time to First Occurrence of Severe Fatigue and CSD in LCSS Fatigue AE LCSS Median: 81 days 70%

1

Time (years)

2

12%

3

100 90 80 70 60 50 40 30 20 10 0 0 1 2

K-M Estimate of Time to First Occurrence of Severe Fatigue and CSD in SDS Fatigue AE SDS Median: 52 days

6 events reported via CTC

25 CSD reported via SDS

3 4 5 6 7

Time (months)

8

83.7%

9 10 11 12

25.3%

Summary

Uniscale demonstrates greater variability than the multiple-item indices

The Uniscale is better able to detect a CSD in QOL than the multiple item assessments, and captures a CSD earlier than the multiple item assessments

Correlations and percent agreement between Uniscale and multiple-item assessments were modest

Summary

There is indication that a CSD in QOL occurs earlier than CTC AE reporting

Consistent with a recent finding that single-item QOL assessments detect a patient-perceived problem in peripheral neuropathy more than six weeks earlier than CTC (Morton et al, ASCO 2005)

The multiple-item assessments are in better agreement with occurrence or CSD in AE compared to the Uniscale

What is the evidence for the use of simple (single item) LASA’s?

The literature for simple assessments is considerable

• • • • • •

Grunberg S.M. (1996). Comparison of conditional quality of life terminology and visual analogue scale measurements. Quality of Life Research; 5: 65-72.

Gudex C. (1996). Health state valuations from the general public using the Visual Analogue Scale. Quality of Life Research, 5: 521-531.

Hyland ME. Development of a new type of global quality of life scale and comparison and preference for 12 global scales. Quality of Life Research. 5(5): 469-480. 1996.

Sriwatanakul, K. (1983). Studies with different types of visual analog scales for measurement of pain; Clinical Pharmacology and Therapeutics; 34(2): 234-239.

Wewers ME. (1990). A Critical Review of Visual Analogue Scales in the Measurement of Clinical Phenomena. Research in Nursing & Health, 13: 227 236.

Bretscher M. (1999). Quality of Life in Hospice Patients: A Pilot Study, Psychosomatics, 40, 309-313.

The Visual Analogue Uniscale Please mark with an ‘X’ the appropriate place within the bar to indicate your rating of this person’s quality of life during the past week.

Lowest quality applies to someone completely dependent physically on others, seriously impaired mentally, unaware of surroundings, and in a hopeless position.

Highest quality applies to someone physically and mentally independent, communicating well with others, able to do most of the things enjoyed, pulling own weight, with a hopeful yet realistic attitude. Lowest Quality (Please mark one ‘X’ within the bar) Highest Quality

Uniscale-NAS (Numeric Analog Scale)

Directions: Please circle the number (0-10) best reflecting your response to the following that describes your feelings

during the past week, including today

.

How would you describe:

1.

your overall Quality of Life?

0 As bad as it can be 1 2 3 4 5 6 7 8 9 10 As good as it can be

Linear Analogue Self Assessment (LASA)

General measure of global QOL dimensional constructs

Overall QOL Uniscale question plus domain specific questions

LASA 6 questions

covering domains: QOL, Mental, Social, Spiritual, Emotional, Physical e.g. How would you describe your overall physical well-being during the past week, including today? (0: as bad as it can be; 10: as good as it can be)

LASA additional items (any understandable construct) e.g. How would you describe your anxiety during the past week, including today? (0: anxiety as bad as it can be; 10: no anxiety)

LASA Validity Data

Median split adds 3 months to median survival in advanced cancer patients (Sloan, JCO, 1998)

Qualitative study: score of 5 or less indicates need for intervention (Frost, unpublished)

“Stable” populations average roughly 7, with SD roughly 2 on 10-point scale (20 on 100 pointt scale) (Locke, in preparation)

LASA Norms (Various)

Hospice patients 7.6

Advanced cancer patients 7.2

Recovering surgical patients 6.6

Healthy volunteers 8.2

Medical students 4.4

A Structured Multidisciplinary Psychosocial Intervention Improves the Quality of Life of Patients with Advanced Stage Cancer

T Rummans, M Clark, J Sloan, M Frost, P Atherton, M Bostwick, G Gamble, M Johnson, J Richardson Mayo Clinic, Rochester, MN

In press, JCO

Background

Some studies have suggested a psychosocial intervention has a positive effect on survival, while others have not demonstrated such an effect or suggested a negative effect on survival.

(Spiegel, 1990; Goodwin NEJM 2001; Spiegel, Cancer, 2002)

Most interventions are single - focus and have targeted mood (Fawzy, AGP,1993; Jacobsen JCO 2002; Kolden, Psycho-Onc. 2002)

Motivation for the present study

• A multidisciplinary intervention had not been tried nor tested for feasibility • Overall QOL is the composite, multidimensional psychosocial target

Study Schema Patients with Advanced Stage Disease scheduled to undergo radiation therapy R Arm A: Structured multi disciplinary psychosocial intervention.

8 - 90 minute sessions over 4 weeks Arm B: Standard Care QOL assessed at baseline and week 4 (EOT) Stratification: tumor type, ECOG PS, age

Secondary endpoint assessment tools

Linear Analogue Self Assessment (LASA) items Profile of Mood States – Short Form (POMS) Symptom Distress Scale (SDS) FACIT – Spiritual Well-Being

Which is the “real” symptom endpoint?

Treatment B QOL score 1 3 2 Treatment A Patient #1 Patient #2

Baseline t 1 t 2 t 3

t k = End of Treatment

1 = Change over time for an individual (from baseline) 2 = Difference between groups at a point (or area under the curve) 3 = Differences in changes from baseline

Primary Result: Overall QOL at 4 weeks At week 4, overall QOL was 10 points higher in the intervention arm than in the standard care arm (80 versus 70 on the 100-point scale respectively, p=0.047).

The treatment group improved 3.3 points from baseline, while the control group decreased 8.9 points on average, p=0.009.

More than three times as many patients in the treatment group reported a 10-point improvement in QOL from baseline compared to the control group (30% versus 9%, p=0.004).

Overall QOL Mental Well-being Physical Well-being MC997C: Mean LASA Scores at Week 4 Normal Range 0-10, Best=10 Transformed Range 0-100, Best=100 Emotional Well-being Social Activity 7.6 6.2 Spiritual Well-being 8.4 MC997C: Mean POMS Scores at Week 4 SUBSCALE SCORES Tension-Anxiety Subscale Depression-Dejection Subscale Anger-Hostility Subscale Vigor-Activity Subscale Fatigue-Inertia Subscale Confusion-Bewilderment Subscale 7.3 7.8 6.3 Normal Range 0-20, Best=20 16.3 17.7 17.8 7.2 11.8 16.5 72.8 77.8 63.3 75.9 62.4 83.9 Transformed Range 0-100, Best=100 81.7 88.5 88.9 36.0 59.2 82.5 TOTAL SCORES POMS-SF Total Score Range 0-120, Best=120 Range 0-100, Best=100 87.4 72.8

best worst