Measurement of Fatigue in Cancer Clinical Trials

Download Report

Transcript Measurement of Fatigue in Cancer Clinical Trials

Measurement of Fatigue in
Cancer Clinical Trials
Session V
8:30am – 10:00am
Tuesday, Oct. 16, 2007
Key Questions
1. What are the key attributes of a fatigue
measure to be used in a clinical trial?
2. How important is measurement error if
fatigue is a primary or secondary
endpoint?
3. What is the appropriate reference
period?
4. What are the issues and challenges to
assessing fatigue longitudinally within
a trial (response shift, meaningful
change)?
Brief Presentations
• Desirable Attributes of a Fatigue
Measure
– Bryce B. Reeve, Ph.D.
• The Accuracy of Fatigue Recall Ratings
– Mark P. Jensen, Ph.D.
• Longitudinal Assessment of Fatigue in
Cancer Trials
– Charles S. Cleeland, Ph.D.
Discussants
•
•
•
•
William Frey, Ph.D.
Maria Sgambati, M.D.
Ralph Turner, Ph.D.
ASCPRO Members
Desirable Attributes of a PRO Measure
of Fatigue for Cancer Clinical Trials
Borrowing heavily from:
Before we begin…
1. This list is intended to be a resource for
determining some of the optimal qualities to
consider in selecting a PRO measure of
fatigue for a trial. PROs will vary in all
aspects of the properties listed below. No
one instrument need satisfy all criteria to be
considered appropriate for use in a trial.
2. The level of evidence to support use of a
PRO instrument should be judged by how
well the instrument has been evaluated for
the listed properties in a sample that
resembles the study target population.
Conceptual and Measurement Model
• How is fatigue defined by the developer?
• Does the definition capture the important
elements, attributes, or sub-domains of
importance for your study?
• Does the developer provide a conceptual
model?
• Do the scales reflect the concept definition?
• Do empirical analyses confirm the factor
structure described in the conceptual model?
Unidimensional Fatigue Scales
Scale Name
What is Assessed?
Brief Fatigue Inventory (BFI)
Severity
Cancer-Related Fatigue
Distress Scale (CRFDS)
Impact
Fatigue Severity Scale (FSS)
Impact & Functional
Outcomes
FACT-F
Severity & Impact
Global Vigor and Affect (GVA)
Severity
Pearson-Byars Fatigue Feeling
Checklist
Severity
Rhoten Fatigue Scale
Severity
Multidimensional Fatigue Scales
Scale Name
Cancer Fatigue Scale
What is Assessed?
Phenomenology & Severity
Fatigue Symptom Inventory
Severity, Impact, & duration
General, physical, mental,
reduced motivation, reduced
activity
Multidimensional Fatigue
Inventory (MFI-20)
Multidimensional Fatigue
Symptom Inventory (MFSI)
Piper Fatigue Scale
Schwartz Cancer Fatigue
Scale (SCFS)
Experience, somatic
symptoms, cognitive, affective
& behavioral sx.
Behavioral/severity, affective
meaning, sensory,
cognitive/mood
Physical, emotional, cognitive,
temporal
Severity 1
Severity 2
Severity 3
Severity 4
Severity 5
Fatigue
Impact 1
Impact 2
Impact 3
Impact 4
Impact 5
Severity 1
Fatigue
Severity
Severity 2
Severity 3
Severity 4
Severity 5
Impact 1
Fatigue
Impact
Impact 2
Impact 3
Impact 4
Impact 5
Severity 1
Severity 2
Fatigue
Severity
Severity 3
Severity 4
Severity 5
Fatigue
Impact 1
Fatigue
Impact
Impact 2
Impact 3
Impact 4
Impact 5
Reliability
• Does each scale (or sub-scale) meet
the minimum level of reliability for
group level measurement?
• Does psychometric evidence support
minimal floor and ceiling effects of the
measure for the study population?
Comparison of Measurement Precision
PROMIS CAT vs. Legacy Measures
0.6
4-item SF36/Vitality
4-item CAT
13-item FACIT-Fatigue
13-item CAT
98-item Bank
0.5
Standard Error
0.4
SE=0.32 (r=0.90)
0.3
SE=0.22 (r=0.95)
0.2
0.1
0
-2.5
No Fatigue
-1.5
-0.5
0.5
1.5
2.5
Severe Fatigue
standardCAT
error Outperforms Legacy Questionnaires
PROMIS
0.5
0.4
SF-36 10 items
standard error
P
r
e
c
i
s
i
o
n
↓
reliability = 0.90
0.3
10 item
CAT
0.2
HAQ 20 items
0.1
full bank
SF
0
-420
Disabled
-3 30
-2 40
-1
50
10 item PROMIS CAT
theta
0
60
1
US General Population mean
70
2 80
High physical
functioning
Why is reliability important in a clinical
trial?
• Ratings of subjective experiences involve
measurement error that must be compensated by
either using well-designed assessment tools or
increasing the number of subjects that are studied.
• Study found that improving reliability from .7 to .9
will decrease sample size requirements by 22%
– Perkins DO, Wyatt JR, Bartko JJ. Penny-wise and poundfoolish: The impact of measurement error on sample size
requirements in clinical trials. Biol Psychiatry. 2000;47:762766.
• If your primary endpoint is a PRO, then you want a
measure with the least error to detect group
differences.
Content Validity
• What evidence was used to determine if
the instrument measures the
appropriate content and represents the
variety of attributes that make up
fatigue?
• What methods were used to evaluate
readability and common/shared
interpretation of item wording?
Face Validity Example
Construct Validity
• To what extent does the fatigue scale/subscale correlate with other measures to which
it is similar (i.e., convergent validity) and
does not correlate with measures that are
dissimilar (i.e., divergent validity)?
• Is the scale able to differentiate among
groups known from past research to have
different levels of fatigue?
• What is the ability of the fatigue scale to
measure change in severity over a period in
which a treatment is known to have an effect
on fatigue (i.e., responsiveness, sensitivity to
change)?
Reference / Recall Period
• What reference period is used to
measure fatigue (i.e., present, past 24
hours, 2-3 days, 7 days, 4 weeks)?
Reference / Recall Period
General School of Thought
• Compared with average momentary
ratings, recall bias increases as the
length of recall extends from one day to
28 days.
• Recall ratings for fatigue are higher
than the average of the momentary
assessment.
Interpretability
• What efforts have been made to make
the scores on the fatigue measure
easily understood by decision-makers
(providers, patients, regulators, payers,
and researchers)?
• Has the measure estimated a minimally
important difference (MID) based on an
appropriate anchor?
Response Burden
• How many items are in the
questionnaire?
• What is the response scale (e.g., 5point response scale, visual analog
scale, checklist)?
• What education level is required to read
and comprehend the questionnaire?
Administrative Burden
• What is the average and range of time
needed to complete the questionnaire
via interviewer-administered
assessment (if necessary)?
• What resources (e.g., laptop, internet
connection) are required to administer
the CRF measure?
• How difficult is it to record responses
and to score the CRF measure?
Alternate Accessible Forms
• What modes are available for
administering the fatigue measure
(paper, computer, IVR)?
• What psychometric evidence is there
for the equivalence of results from
different modes?
Cultural / Language Adaptations
• What languages are available?
– What process was used to translate the
form in to different languages?
• What psychometric and qualitative
validation studies been performed to
assess conceptual equivalence?
Proprietary Rights
• Is there a cost associated with using
the tools or copyright issues?
Single vs Multiple Item Fatigue Measure
Single Item
• Short
• Capture in multiple
assessments
• Reliable?
• Content validity?
Multi-Item
• 3-7 items
• Reduced error
• Greater content
validity
• Redundant?
• More fatigue?
Assessment of Fatigue as an
Endpoint in a Clinical Trial.
Consistency in our Message