Dept of Physiotherapy
Download
Report
Transcript Dept of Physiotherapy
Reliability and Validity Testing
Definitions
Validity - the extent to which a test
measures what it is designed to measure
Reliability - the extent to which a test or
measure is reproducible
Validity
Logical (face) - how much the measure
obviously involves the performance.
Construct - how well the measure relates to
the theory
Content - how well the outcome evaluates the
intervention
Criterion - how well the test measures against
a set standard
Assessment of Validity
Criterion validity
Concurrent
Predictive
Prescriptive
Bland and Altman
Bias
Dispersion of the Bias
Relationship of Bias to value
M = Experimental measured value
GS = Gold Standard measured value
M
GS
102
96
98
105
Mean Diff
SD
1.96*SD
Bias + 1.96 SD,s
Bias - 1.96 SD's
Diff
103
98
93
101
-1
-2
5
4
Bias
SD Bias
ULA
LLA
Mean (M,GS)
102.5
97
95.5
103
1.5
3.5
6.9
8.4
-5.4
Prediction versus True VO2max; Difference
against mean (mls/min/kg)
20.00
15.00
10.00
Diff
5.00
0.00
35.00
40.00
45.00
50.00
55.00
60.00
65.00
70.00
75.00
80.00
85.00
-5.00
-10.00
-15.00
Mean
diff
bias
mean+1.96stdev
mean-1.96stdev
Bland and Altman Limits of
Agreement
Advantages
Easy to interpret
visually
Can indicate bias in
measurements
Can be clinically
useful
Useful for validity
Disadvantages
Difficult for more than
two raters or datasets
More complex to
interpret
Needs high numbers
Should also report
raw data to interpret
variation
Reliability
A measure CANNOT be valid but NOT
reliable
However a measure CAN BE reliable but
NOT valid
Reliability
Observed score =
True score + Error score
True score hard to evaluate but we can
estimate the error score
Sources of Error
The Participants
Sources of Error
The Testing
Poor directions
Additional motivation
Inconsistent protocol
Sources of Error
The Scoring
The scorers
Type of scoring system
Sources of Error
The Instrumentation
Calibration
Inaccuracies
Sensitivity
Statistical techniques
Pearsons r
ICC
Limits of agreement
Cronbachs alpha
Kappa statistic
Weighted kappa statistic
Pearsons r
Weaknesses
Bi-variate
Limited to two variables
Does not consider differences in variance
Only measures association not agreement
Not really appropriate for reliability
Intra-class correlation (ICC)
Strengths
Weaknesses
Univariate
Allows for unequal
cell numbers
Value from -1 to +1
Allows any number of
raters or subjects
Has several
formulae
Does not imply
usefulness
Ratios can be
difficult to compare
Between subject
variation should
reflect population
Calculation
Variance between (due to) repeated trials
Variance between (due to) repeated
observers/observations
Variance from ANOVA model = Mean
Squares
Shrout and Fleiss formulae
Case 1: Each subject rated by a different set of
k raters randomly selected from a larger
population of raters
Case 2: A random sample of k raters, selected
from a larger population of raters, rates each
subject
Case 3: Each subject is rated by k raters who
are the only raters of interest
Cases (1,1), (2,1) & (3,1) are used when
the unit of measurement is obtained from
only one measurement
Cases (1,k), (2,k) & (3,k) are used when
the unit of measurement is obtained from
more than one measurement (i.e. a mean
measurement)
How to calculate
Use equations and values obtained from
ANOVA’s (Rankin and Stokes, 1998)
Use macros downloaded from SPSS.com
(may not work with all versions of SPSS)
Cronbachs Alpha
Generalised measure of reliability
Easy to interpret
Similar to intraclass correlation
Kappa statistics
Kappa statistic
Nominal data
Weighted Kappa statistic
Ordinal data
Generating ICC’s
Need
Correct macro
Data laid out appropriately
Two lines of syntax to run macros
All files resident in the same directory
References
Sim J (1993) Measurement validity in Physical
Therapy research. Physical Therapy, 73 (2); 48-55
Rankin G, Stokes M (1998) Reliability of assessment
tools in rehabilitation: an illustration of appropriate
statistical analyses. Clinical Rehabilitation, 12; 187
Bland JM, Altman DG (1986) Statistical methods for
assessing agreement between two methods of clinical
measurement. Lancet, Feb 8; 307-310.
Kreb DE (1984) Intraclass correlation coefficients:
Use and calculation. Physical Therapy, 64 (10); 15811582.
Thomas JR, Nelson JK (2001) Research Methods in
Physical Activity 4th Ed. Human Kinetics, Leeds.
George,K, Batterham,A & Sulliavan,I (2000) Validity
in clinical research: a review of basic concepts and
definitions. Physical Therapy in Sport, 1; 19-27
more references
Eliasziw M, Young SL, Woodbury MG, Fryday-Field K
(1994) Statistical methodology for the concurrent
assessment of interrater and intrarater reliability:
Using gonimetric measurements as an example.
Physical Therapy, 74 (8); 777-788.
Keating J, Maryas T (1998) Unreliable inferences
from reliable measurements. Australian Journal of
Physiotherapy, 44 (1); 5-10.
Greenfield MLH, Kuhn JE, Wotjys EM (1998) Validity
and Reliability. American Journal of Sports Medicine,
26 (3); 483-485.
Batterham,A.M. & George,K.P. (2000) Reliability in
evidence-based clinical practice: a primer for allied
health professionals. Physical Therapy in Sport, 1;
54-61