No Slide Title
Download
Report
Transcript No Slide Title
Generic measures:
limitations of use within
specific settings ?
J Freeman
Institute of Health Studies
Plymouth University
Properties
Clinical
feasibility
Psychometric
reliability
validity
appropriateness
responsiveness
Validity
Does it measure what it says
it measures?
Content validity
Criterion validity
Construct (convergent
and discriminant)
Bowling 1997
Construct validity
The extent to which
empirical data supports
hypotheses concerning
the attributes being
measured
detective work
jigsaw puzzle
Appropriateness
Is the range of the
construct measured within
the sample similar to the
range covered by the
instrument?
Van der putten et al 1999
The 36-item Short Form
Health Survey (SF-36):
Gold-standard generic
self-report measure of
health status
Adopted & disseminated
world-wide
Standardised UK and US
version
SF-36 dimensions
Dimensions
Physical function
Physical role limitations
Emotional role limitations
Emotional well-being
Bodily Pain
Energy / vitality
Social function
Health perceptions
No.
items
10
3
3
5
2
4
2
5
The SF-36
Relatively few studies
have evaluated its use as
an outcome measure for
clinical practice or clinical
trials in MS
Aim of study
To explore the reliability,
validity, clinical
appropriateness, and
responsiveness of the SF36 in MS patients within a
health care setting
Methods
150 patients with clinically
definite MS
Broad spectrum of disease
severity
Assessments completed in
106 patients once, twice
in 44 rehabilitation
inpatients
Assessments
Disease severity: EDSS
Health Status: SF-36
Disability: FIM
Handicap:LHS
Emotional well-being:GHQ
Assessment of
construct validity...
Convergent validity
Correlation's between SF-36
dimensions & instruments
measuring similar & different
constructs
Group differences validity
ANOVA to differentiate
between different groups
...Assessment of
construct validity
Hypothesis testing
T-tests to investigate
whether results in line with
theoretical expectation
Assessment of
appropriateness
Examination of the scale
score distributions of the 8
dimensions and the 2
summary components of
the SF-36 and all other
measures
range, mean, sd, floor, ceiling
Sample characteristics
Mean age
45 (24 - 78yrs)
Female
68%
Disease pattern
SP 50%
PP 11%
RR
33%
Benign
6%
Mean yr’s since diagnosis 11
(0.1 - 38)
Mean EDSS
5.7 (1 -9)
Results: convergent &
discriminat validity
Convergent & discriminant
validity supported
Substantial correlation’s with related
scales, e.g. FIM with SF-36 physical
function (r = 0.68), EDSS (r = 0.82)
Weak correlation's with unrelated scales
e.g. GHQ with SF-36 physical function
(r =0.26)
Results: group
differences validity
Group differences validity
supported
Significant differences
demonstrated in health status at
different level of disease severity
(p<0.05)
Results: hypothesis
testing
As hypothesised:
Patients requiring carer
assistance reported lower
physical scores (p<0.0001)
Patients scoring > 5 GHQ
points reported lower SF-36
emotional scores (p<0.0001)
Results:
appropriateness
Scores span the entire
spectrum of available
range
Significant floor and
ceiling effects (>20%) in
-
physical function
physical role limitations
emotional role limitations
bodily pain
Results:
appropriateness
Floor & ceiling effects
particularly marked when
patient selection restricted to
narrow range
- physical dimensions 52% floor in
severe group
- physical role limitations 84% floor in
severe group
- role limitations 45% ceiling in mild
group
Implications
S
c
o
r
e
r
a
n
g
e
floor
ceiling
Implications
Spectrum of SF-36 scale too
limited to detect changes which
may occur in pwMS
likely to limit its potential
responsiveness
limited usefulness within specific
MS populations /settings
Recommendations
Generic measures should be
tested for specific populations
and for specific purposes
When evaluating health status
in MS the SF-36 should be
supplemented with other
relevant & validated measures
to ensure comprehensive &
valid measurement
Recommendations
Clinicians & researchers should
understand the properties of an
outcome measure when
choosing an instrument and
interpreting the information it
generates
...the measure you choose is key
in determining effectiveness
Properties of Outcome
Measures
Clinical
feasibility
Psychometric
reliability
validity
appropriateness
responsiveness
Reliability of gait
measurements using
CODAmpx30 motion analysis
system
Veronica Maynard
Institute of Health Studies
University of Plymouth
Reliability
Reliability refers to the
consistency or repeatability of
a measurement taken under the
same conditions
Factors affecting
reliability
instrumental reliability reliability of measurement
device
rater reliability reliability of rater
administering
measurement device
response reliability reliability/stability of
variable being measured
Sources of error
Measurement error
difference between a
measurement & its true
value
Systematic error
bias resulting from one or
more processes
Random error
Reliability
3 broad categories
of reliability:
equivalence
(reproducibility)
stability (repeatability)
internal consistency
(homogeneity)
Types of reliability & how they
are determined
Reliability
Equivalence or
Reproducibility
Stability or
consistency
Internal
consistency
Inter-rater
reliability
Intra-rater or
test-retest
reliability
Split half
reliability & item
analysis
(Adapted from: Sim & Wright 2000, p.132)
Aim of study
To determine intra-rater
and inter-rater reliability
of gait measurements using
CODA mpx30 motion
analysis system
Reliability studies (I)
Intra-rater reliability
study:
10 healthy subjects
mean age 39.2 (29-52) yrs
3 recordings
single trained observer
Reliability studies (II)
Inter-rater reliability
study:
19 healthy subjects
mean age 34.4 (20-49) yrs
3 trained observers
Procedure
self-selected speed
Investigators blind
•
•
•
Points for analysis:
i) initial contact (IC)
ii) mid-stance and (MSt)
iii) mid swing (MSw)
1)
2)
3)
Stick figure illustrations of position of right leg
(red) at 1) IC 2) MSt and 3) MSw. Joint angles,
moments and powers were determined at these
points in the gait cycle.
Procedure (cont)
Spatiotemporal parameters:
walking velocity
duration of stance
duration of swing
Kinematic variables:
hip, knee & ankle angles at IC, MSt
MSw
Kinetic variables:
moments & power at hip, knee, ankl
at IC and MSt
Analysis
Sagittal plane data
Bland & Altman methods
Intraclass correlation coefficient (ICC)
to determine consistency and
agreement among ratings
Right Ankle Sagittal Rotation
20
10
5
97
100
94
91
88
85
82
79
76
73
70
67
64
61
58
55
52
49
46
43
40
37
34
31
28
25
22
19
16
13
7
10
4
0
1
Dorsiflexion (+ve) (degrees)
15
-5
-10
-15
Stance Phase
Swing Phase
-20
IC
MSt
Time (% Gait Cycle)
TO
MSw
Graphical illustration of sagittal plane joint movement
of the ankle during a single gait cycle (dorsiflexion
positive, plantarflexion negative). IC= Initial contact;
MSt = Mid stance; TO = Toe off; MSw = Mid swing
Results (I)
Intra-rater study:
Good agreement for spatiotemporal
Generally low ICC values (ICC
< 0.75) for all parameters
Bland & Altman plots
reasonable agreement for
kinematic data at ankle and
knee
Summary of key findings (II)
Inter-rater study:
Generally good agreement for
spatio-temporal parameters
(ICC > 0.70)
Lower ICC values & wide limits
of agreement for kinematic
data (especially hip)
6.0
ang le differenc e am-pm (degree s)
6.0
4.0
2.0
0.0
-2.0
-4.0
0.0
4.0
2.0
0.0
-2.0
-4.0
-6.0
2.0
4.0
6.0
angle mean am-pm (degrees)
1)
8.0
10.0
8.0
10.0
12.0
14.0
16.0
18.0
angle mea n am-pm (degree s)
2)
Examples of distribution plots from Bland &
Altman test for am-pm repeatability showing mean
measurements against differences between
measurements for ankle range of motion (ºs) at 1)
initial contact 2) mid stance.
Factors affecting
reliabilty
Errors associated with marker placeme
Soft tissue motion
Natural variation in individual
gait cycle
Sampling rate
Recommendations
Standard protocol for marker
placement
Training of observers
Averaging of min 3 gait cycles
(Winter 1984)
Interpret with caution data from
single cycle
General Recommendations
Standard protocol
Training
Averaging may be required
Determine level of error
Assess reliability before use in
research/clinically
Assess reliability in population under
study
Responsiveness
S.K. Spooner PhD BSc SRCh
Scheme Co-ordinator
Podiatry
Properties
Clinical
feasibility
Psychometric
reliability
validity
appropriateness
responsiveness
Responsiveness to
Change
HRQOL measures should
be responsive to
interventions that change
HRQOL
Evaluating responsiveness
requires assessing HRQOL
relative to an external
indicator of change
Testing for
Responsiveness
Measurement tools should
be tested on patients
receiving treatment of
known efficacy
Capable of detecting
treatment effects?
Responsiveness
Indices
Effect size (ES) = D/SD
Standardized Response Mean
(SRM) = D/SD+
Guyatt responsiveness
statistic (RS) = D/SD++
Where:
D = mean change
SD = baseline SD
SD+ = SD of D
SD++ = SD of D among “unchanged”
So How Big Are
Different Changes?
Effect size benchmarks
Small: 0.20 - 0.49
Moderate: 0.50 - 0.79
Large: 0.80 or above
Example 1
Freeman, J. et al.: Clinical
appropriateness: a key
factor in outcome
measurement selection:
the 36 item short health
survey in multiple
sclerosis. J Neurol
Neurosurg Psychiatry
2000; 68:150-156
Results
n=44
Effect sizes for SF-36
dimensions ranged from
negligible to small (0.010.30)
Pain & Physical Function
demonstrated statistically
significant change from
admission to discharge
Results
In contrast:
Functional independence
measure (ES = 0.56)
London Handicap Scale
(ES = 0.58)
28- item General Health
Questionnaire (ES = 0.51)
Example 2
Mens, J.M. et al:
Reliability & validity of hip
abduction strength to
measure disease severity
in posterior pelvic pain
since pregnancy. Spine
2002; 27(15): 1674-9
Example 2
Responsiveness of hip
abduction strength
expressed as standardized
response mean was
compared with
responsiveness of Quebec
Back Pain Disability Scale
in patients with PPPP
Results
Responsiveness of hip
abduction strength was
“large” (SRM =0.93)
In comparison, Quebec
Back Pain Disability Scale
(SRM = 1.20)
Change and
Responsiveness
Depends on Treatment
12
10
6
4
ip
R
t
en
m
ry
ge
ur
ge
ur
S
ce
la
ep
rS
ve
al
n
io
at
ic
ed
ry
Treatment Outcomes
H
V
de
ul
ho
rt
ea
M
er
lc
2
S
H
U
Impact on SF-36
8
Magnitude of Change
Should Parallel
Underlying Change
12
10
8
Change in HRQOL
6
4
2
0
n
ai
Tr
k
oc
ar
r
he
at
R
Fe
C
Siz e of Interv ention
Generic vs Condition
Specific Instruments
SF-36 is generic measure,
and may contain items
unrelated to disease being
studied.
Generic vs Condition
Specific Instruments
Generic instruments are
most useful in
discriminating and making
comparisons of different
disease states for
determining severity of
disease impact and crosscondition comparisons.
Generic vs Condition
Specific Instruments
Disease-specific
instruments can assess
limitations or restrictions
associated with particular
disease states.
May be more responsive
to minimally significant
changes.
Value Depends on
Cost
What ever instrument is
employed the importance
of HRQOL change
depends on what it costs
to produce it!