No Slide Title

Transcript No Slide Title

Generic measures:
limitations of use within
specific settings ?
J Freeman
Institute of Health Studies
Plymouth University
Properties
 Clinical
 feasibility
 Psychometric




reliability
validity
appropriateness
responsiveness
Validity
Does it measure what it says
it measures?
 Content validity
 Criterion validity
 Construct (convergent
and discriminant)
Bowling 1997
Construct validity
 The extent to which
empirical data supports
hypotheses concerning
the attributes being
measured
 detective work
 jigsaw puzzle
Appropriateness
 Is the range of the
construct measured within
the sample similar to the
range covered by the
instrument?
Van der putten et al 1999
The 36-item Short Form
Health Survey (SF-36):
 Gold-standard generic
self-report measure of
health status
 Adopted & disseminated
world-wide
 Standardised UK and US
version
SF-36 dimensions
Dimensions
Physical function
Physical role limitations
Emotional role limitations
Emotional well-being
Bodily Pain
Energy / vitality
Social function
Health perceptions
No.
items
10
3
3
5
2
4
2
5
The SF-36
 Relatively few studies
have evaluated its use as
an outcome measure for
clinical practice or clinical
trials in MS
Aim of study
 To explore the reliability,
validity, clinical
appropriateness, and
responsiveness of the SF36 in MS patients within a
health care setting
Methods
 150 patients with clinically
definite MS
 Broad spectrum of disease
severity
 Assessments completed in
106 patients once, twice
in 44 rehabilitation
inpatients
Assessments





Disease severity: EDSS
Health Status: SF-36
Disability: FIM
Handicap:LHS
Emotional well-being:GHQ
Assessment of
construct validity...
 Convergent validity
Correlation's between SF-36
dimensions & instruments
measuring similar & different
constructs
 Group differences validity
 ANOVA to differentiate
between different groups
...Assessment of
construct validity
 Hypothesis testing
T-tests to investigate
whether results in line with
theoretical expectation
Assessment of
appropriateness
 Examination of the scale
score distributions of the 8
dimensions and the 2
summary components of
the SF-36 and all other
measures
range, mean, sd, floor, ceiling
Sample characteristics
 Mean age
45 (24 - 78yrs)
 Female
68%
 Disease pattern


SP 50%
PP 11%
RR
33%
Benign
6%
 Mean yr’s since diagnosis 11
(0.1 - 38)
 Mean EDSS
5.7 (1 -9)
Results: convergent &
discriminat validity
 Convergent & discriminant
validity supported
Substantial correlation’s with related
scales, e.g. FIM with SF-36 physical
function (r = 0.68), EDSS (r = 0.82)
Weak correlation's with unrelated scales
e.g. GHQ with SF-36 physical function
(r =0.26)
Results: group
differences validity
 Group differences validity
supported
Significant differences
demonstrated in health status at
different level of disease severity
(p<0.05)
Results: hypothesis
testing
As hypothesised:
 Patients requiring carer
assistance reported lower
physical scores (p<0.0001)
 Patients scoring > 5 GHQ
points reported lower SF-36
emotional scores (p<0.0001)
Results:
appropriateness
 Scores span the entire
spectrum of available
range
 Significant floor and
ceiling effects (>20%) in
-
physical function
physical role limitations
emotional role limitations
bodily pain
Results:
appropriateness
 Floor & ceiling effects
particularly marked when
patient selection restricted to
narrow range
- physical dimensions 52% floor in
severe group
- physical role limitations 84% floor in
severe group
- role limitations 45% ceiling in mild
group
Implications
S
c
o
r
e
r
a
n
g
e
floor
ceiling
Implications
Spectrum of SF-36 scale too
limited to detect changes which
may occur in pwMS

likely to limit its potential
responsiveness

limited usefulness within specific
MS populations /settings
Recommendations
 Generic measures should be
tested for specific populations
and for specific purposes
 When evaluating health status
in MS the SF-36 should be
supplemented with other
relevant & validated measures
to ensure comprehensive &
valid measurement
Recommendations
Clinicians & researchers should
understand the properties of an
outcome measure when
choosing an instrument and
interpreting the information it
generates
...the measure you choose is key
in determining effectiveness
Properties of Outcome
Measures
 Clinical
 feasibility
 Psychometric




reliability
validity
appropriateness
responsiveness
Reliability of gait
measurements using
CODAmpx30 motion analysis
system
Veronica Maynard
Institute of Health Studies
University of Plymouth
Reliability
 Reliability refers to the
 consistency or repeatability of
 a measurement taken under the
 same conditions
Factors affecting
reliability



instrumental reliability reliability of measurement
device
rater reliability reliability of rater
administering
measurement device
response reliability reliability/stability of
variable being measured
Sources of error
 Measurement error
 difference between a
measurement & its true
value
 Systematic error
 bias resulting from one or
more processes
 Random error
Reliability

3 broad categories
of reliability:
 equivalence
(reproducibility)
 stability (repeatability)
 internal consistency
(homogeneity)
Types of reliability & how they
are determined
Reliability
Equivalence or
Reproducibility
Stability or
consistency
Internal
consistency
Inter-rater
reliability
Intra-rater or
test-retest
reliability
Split half
reliability & item
analysis
(Adapted from: Sim & Wright 2000, p.132)
Aim of study





To determine intra-rater
and inter-rater reliability
of gait measurements using
CODA mpx30 motion
analysis system
Reliability studies (I)

Intra-rater reliability
study:

10 healthy subjects

mean age 39.2 (29-52) yrs

3 recordings

single trained observer
Reliability studies (II)

Inter-rater reliability
study:

19 healthy subjects

mean age 34.4 (20-49) yrs

3 trained observers
Procedure

self-selected speed

Investigators blind

•
•
•
Points for analysis:
i) initial contact (IC)
ii) mid-stance and (MSt)
iii) mid swing (MSw)
1)
2)
3)
Stick figure illustrations of position of right leg
(red) at 1) IC 2) MSt and 3) MSw. Joint angles,
moments and powers were determined at these
points in the gait cycle.
Procedure (cont)
 Spatiotemporal parameters:



walking velocity
duration of stance
duration of swing
 Kinematic variables:

hip, knee & ankle angles at IC, MSt
MSw
 Kinetic variables:

moments & power at hip, knee, ankl
at IC and MSt
Analysis

Sagittal plane data

Bland & Altman methods

Intraclass correlation coefficient (ICC)
to determine consistency and
agreement among ratings

Right Ankle Sagittal Rotation
20
10
5
97
100
94
91
88
85
82
79
76
73
70
67
64
61
58
55
52
49
46
43
40
37
34
31
28
25
22
19
16
13
7
10
4
0
1
Dorsiflexion (+ve) (degrees)
15
-5
-10
-15
Stance Phase
Swing Phase
-20
IC
MSt
Time (% Gait Cycle)
TO
MSw
Graphical illustration of sagittal plane joint movement
of the ankle during a single gait cycle (dorsiflexion
positive, plantarflexion negative). IC= Initial contact;
MSt = Mid stance; TO = Toe off; MSw = Mid swing
Results (I)
 Intra-rater study:



Good agreement for spatiotemporal
Generally low ICC values (ICC
< 0.75) for all parameters
Bland & Altman plots
reasonable agreement for
kinematic data at ankle and
knee
Summary of key findings (II)
 Inter-rater study:


Generally good agreement for
spatio-temporal parameters
(ICC > 0.70)
Lower ICC values & wide limits
of agreement for kinematic
data (especially hip)
6.0
ang le differenc e am-pm (degree s)
6.0
4.0
2.0
0.0
-2.0
-4.0
0.0
4.0
2.0
0.0
-2.0
-4.0
-6.0
2.0
4.0
6.0
angle mean am-pm (degrees)
1)
8.0
10.0
8.0
10.0
12.0
14.0
16.0
18.0
angle mea n am-pm (degree s)
2)
Examples of distribution plots from Bland &
Altman test for am-pm repeatability showing mean
measurements against differences between
measurements for ankle range of motion (ºs) at 1)
initial contact 2) mid stance.
Factors affecting
reliabilty

Errors associated with marker placeme

Soft tissue motion

Natural variation in individual
gait cycle

Sampling rate
Recommendations




Standard protocol for marker
placement
Training of observers
Averaging of min 3 gait cycles
(Winter 1984)
Interpret with caution data from
single cycle
General Recommendations






Standard protocol
Training
Averaging may be required
Determine level of error
Assess reliability before use in
research/clinically
Assess reliability in population under
study
Responsiveness
S.K. Spooner PhD BSc SRCh
Scheme Co-ordinator
Podiatry
Properties
 Clinical
 feasibility
 Psychometric




reliability
validity
appropriateness
responsiveness
Responsiveness to
Change
 HRQOL measures should
be responsive to
interventions that change
HRQOL
 Evaluating responsiveness
requires assessing HRQOL
relative to an external
indicator of change
Testing for
Responsiveness
 Measurement tools should
be tested on patients
receiving treatment of
known efficacy
 Capable of detecting
treatment effects?
Responsiveness
Indices
 Effect size (ES) = D/SD
 Standardized Response Mean
(SRM) = D/SD+
 Guyatt responsiveness
statistic (RS) = D/SD++
Where:
D = mean change
SD = baseline SD
SD+ = SD of D
SD++ = SD of D among “unchanged”
So How Big Are
Different Changes?
 Effect size benchmarks
 Small: 0.20 - 0.49
 Moderate: 0.50 - 0.79
 Large: 0.80 or above
Example 1
 Freeman, J. et al.: Clinical
appropriateness: a key
factor in outcome
measurement selection:
the 36 item short health
survey in multiple
sclerosis. J Neurol
Neurosurg Psychiatry
2000; 68:150-156
Results
 n=44
 Effect sizes for SF-36
dimensions ranged from
negligible to small (0.010.30)
 Pain & Physical Function
demonstrated statistically
significant change from
admission to discharge
Results
 In contrast:
 Functional independence
measure (ES = 0.56)
 London Handicap Scale
(ES = 0.58)
 28- item General Health
Questionnaire (ES = 0.51)
Example 2
 Mens, J.M. et al:
Reliability & validity of hip
abduction strength to
measure disease severity
in posterior pelvic pain
since pregnancy. Spine
2002; 27(15): 1674-9
Example 2
 Responsiveness of hip
abduction strength
expressed as standardized
response mean was
compared with
responsiveness of Quebec
Back Pain Disability Scale
in patients with PPPP
Results
 Responsiveness of hip
abduction strength was
“large” (SRM =0.93)
 In comparison, Quebec
Back Pain Disability Scale
(SRM = 1.20)
Change and
Responsiveness
Depends on Treatment
12
10
6
4
ip
R
t
en
m
ry
ge
ur
ge
ur
S
ce
la
ep
rS
ve
al
n
io
at
ic
ed
ry
Treatment Outcomes
H
V
de
ul
ho
rt
ea
M
er
lc
2
S
H
U
Impact on SF-36
8
Magnitude of Change
Should Parallel
Underlying Change
12
10
8
Change in HRQOL
6
4
2
0
n
ai
Tr
k
oc
ar
r
he
at
R
Fe
C
Siz e of Interv ention
Generic vs Condition
Specific Instruments
 SF-36 is generic measure,
and may contain items
unrelated to disease being
studied.
Generic vs Condition
Specific Instruments
 Generic instruments are
most useful in
discriminating and making
comparisons of different
disease states for
determining severity of
disease impact and crosscondition comparisons.
Generic vs Condition
Specific Instruments
 Disease-specific
instruments can assess
limitations or restrictions
associated with particular
disease states.
 May be more responsive
to minimally significant
changes.
Value Depends on
Cost
 What ever instrument is
employed the importance
of HRQOL change
depends on what it costs
to produce it!

No Slide Title

Transcript No Slide Title

Directory