Application Techniques

Download Report

Transcript Application Techniques

PTP 560
• Research Methods
• Week 3
Thomas Ruediger, PT
Reliability
• Observed score = True Score ± Error, (X) = (T) ± (E)
• Consistent
– Score
– Performance
• The true score is free from Error (X)
• Measurement Error
– Hypothetically it could be zero=
– Practically, it is…
• Systematic=uses scales, stateometer,
• Random=done differently for no reason.
• Or both
Types of Measurement Error
• Systematic
– Biased= always there
– Consistent= use same instrument
– Often more of a validity concern, but affects
reliability
– Examples?
• Random
– Unpredictable factors
– As likely to be high as low
– Examples?
Sources of Measurement Error
• Individual
– Skill of the person taking the measure
– Also called rater or tester error
• The instrument: can be limited by using the same.
• Lability of the phenomenon (when not from
instrument or tester)
– An actual change from measurement to measurement,
then a real difference is obsereved.
Regression towards the mean
• Initial extreme high scores
– Subsequent scores will tend toward the mean
– Proportional to the amount of error
• Extreme low scores
– Will also tend toward the mean subsequently
– Proportional to the amount of error
• “Bell Shaped”
• Research repercussion
– Group assignments based on scores
– Intervention effect may be masked
Reliability Coefficients
• True Score Variance/Total Variance
• Can range from 0 to 1
– By convention 0.00 to 1.00
– 0.00 = no reliability
– 1.00 = perfect reliability
• Portney and Watkins Guidelines *TESTABLE
–
–
–
–
–
Less than 0.50 = poor reliability
0.50 to 0.75 = moderate reliability
0.75 to 1.00 = good reliability
These are NOT standards
Acceptable level should be based on application
Correlation v Agreement
• Correlation – degree of association
– Is X correlated/associated with Y
• Usually not as clinically important for PT
– We want to know whether they agree, not just
correlated. We want accuracy to be consistent.
• We generally want to know agreement
– Between tests
– Between raters
Correlation v Agreement
Correlation v Agreement
Reliability
• Required to have validity
– Validity needs to be reliable
– But does not have to be valid to be reliable.
• Four general approaches
– Test-Retest
• (Nominal data) Kappa statistic for percent agreement
– Good vs. No Good
• (Ordinal Data) Spearman rho
• (Interval or Ratio Data) Pearson Product-moment
• ICC (For Ordinal, Interval, and Ratio Data)
– Association and agreement reflected
– The current preferred index
Reliability
– Rater reliability
• ICC should be used
– Alternate forms
• Limits of Agreement
– Internal Consistency (Homogeneity)
• Usually Cronbach’s alpha
Reliability
• Generalizability
– Reliability is not “owned “ by the instrument
– May not apply to:
• Another population
• Another rater (or group of raters)
• Different time interval
• Minimum Detectable Difference
– Or minimum detectable change
– How much change is needed to say it’s not chance
– Not the same as MCID
Minimum detectable difference
(MDD)?
•
•
•
•
Smallest difference that reflects true difference
Better the reliability, smaller the MDD
Different than statistical difference
(1.96*SEM*√2) 1.96 = 95% CI
• Ask yourself: Difference b/w 1 and 2?
– Is it statistically different?
– Is it clinically different? (Next slide)
Eliasziw M, Young SL, Woodbury MG, Fryday-Field K. Statistical methodology for the concurrent assessment of interrater and intrarater reliability: using
goniometric measurements as an example. Phys Ther. Aug 1994;74(8):777-788.
Weir JP. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res. Feb 2005;19(1):231-240.
Minimum Clinically Important
Difference (MCID)?
• Smallest difference considered clinically non-trivial
• Smallest that patient perceives as beneficial
• Usually associated with either:
– Expert judgment of clinician
– External Health Status Measure
Validity
• Measurement measures what is intended
• We use them to draw inferences in clinical use
– Due to indirect nature of measuring
– To apply our result to a diagnostic challenge
– Ex: Why do we do a manual muscle test?
• Validity
– Is not something an instrument has
• Is specific to the intended use
• Not required for Reliability
– (i.e. Just because it is reliable does not mean it is valid)
• Multiple types
Validity
– Face Validity (LEAST rigorous, looks like it should
make sense)
– Content (tests content, GRE content is a good
predictor of passing leisure exam)
– Criterion-referenced (To a GOLD or a Reference
standard)
• Concurrent validity
• Predictive validity
– Construct (Figure 6.2 in P &W helpful here)
• Part content
• Part theoretical
• Multiple ways to assess (I won’t test these!)
Validity of Change
• Change is often how we make clinical decisions
– Evaluate treatment effect
– Consider different options
• Validity affected by four issues
– Level of measurement (Ordinal has highest risk)
– Reliability
• There will likely be a change due to chance
• There may be a true change
(One suggestion (reliability > 0.50 to use change scores))
– Stability of variable
– Baseline scores
• Floor effect
• Ceiling effect
Truth
+
+
Test
NPV = d/c+d
-
Sn = a/a+c
1-Sn = - LR
Sp
a
b
c
d
Sn
+ LR = 1-Sp
Sp = d/b+d
PPV = a/a+b
Truth
+
Sp
+
99
Sp = d/b+d
b
Test
-
Sn = a/a+c
1
d
Sn = ?
In this example we picked 100 people with a
known disorder, applied our clinical test and
got these results.
Truth
+
Sp= ?
+
a
-
c
Sp = d/b+d
20
Test
Sn = a/a+c
80
Sn
In this example we picked 100 people known
to not have the disorder, applied our clinical
test and got these results.
Now a patient comes in
• The history suggests to you that she has the
disorder
• You do the clinical test
• The result of the test is negative
• Which is more useful?
– SpPin?
– SnNout?
or
Another patient comes in
• The history suggests to you that she does not have
the disorder
• She is very concerned that she has it
• You do the clinical test
• The result of the test is positive
• Which is more useful:
– SpPin
– SnNout
or
Truth
= - LR
= - LR
+
+
99
20
-
1
80
Test
+ LR =
+ LR =
Likelihood Ratios
• Allows us to quantify the likelihood of a condition
(present or absent)
• Importance ↑ as they move away from 1
• 1 does not change our confidence
• Which number is further away from 1?
– (look at the nomogram)
• - LR is further away from 1
(this is a logarithmic scale)