Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.

Download Report

Transcript Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.

Reliability
Definition: The stability or consistency of a test.
Assumption: True score = obtained score +/- error
Domain Sampling Model
Item Domain
Test
Overview of Reliability Techniques
4. Internal Consistency
1. Test-retest
T1
A
T2
K-R-20
A
1
2
2. Parallel/Alternate Forms
A
T
Score
B
Score
3. Split-half
1
50 pairs
100
Coefficient
Alpha
Test-retest method
[error is due to changes occurring due to the passage of time]
Some Issues:
• Length of time between test administrations if crucial
(generally, the longer the interval, the lower the reliability)
• Memory
• Stability of the construct being assessed
• Speed tests, sensory discrimination, psychomotor tests
(possible fatigue factor)
Parallel/Alternate Forms
[error due to test content and perhaps passage of time]
Two types:
1) Immediate (back-to-back administrations)
2) Delayed (a time interval between administrations)
Some Issues:
• Need same number & type of items on each test
• Item difficulty must be the same on each test
• Variability of scores must be the same on each test
Split-half reliability
[error due to differences in item content between the halves of the test]
• Typically, responses on odd versus even items are employed
• Correlate total scores on odd items with the scores obtained
on even items
• Need to use the Spearman-Brown correction formula
# of times the test is
lengthened
rttc =
corrected r for
the total test
correlation between both
parts of the test
nr12
1 + (n – 1) r12
Person
Odd
Even
1
36
43
2
44
40
3
42
37
4
33
40
KR-20 and Coefficient Alpha [error due to item similarity]
•
KR-20 is used with scales that have right & wrong responses (e.g.,
achievement tests)
• Alpha is used for scales that have a range of response options where
there are no right or wrong responses (e.g., 7-point Likert-type scales)
KR-20
Rtt = k
k–1
% of those getting
the item correct
 pi (1 – pi)
y
2
# of items variance of test scores
variance of scores on
each item
Alpha
 = k
1 – i2
k–1
# of items
 y2
variance of
test scores
Possible problem with choosing test items based on
their correlations with a criterion
*
* *
*
* *
.20
.15
.10
* *
*
.05
*
* ** *
* *
*
* *
.25
*
*
.30
* *
*
* *
*
.35
* *
* *
.40
*
*
Selection
zone
*
Corr. .50
with .45
criteria
* *
*
* *
*
* *
.00
.00
.05
.10
.15
.20
.25
.30
.35
.40
.45
Correlation of items with total test scores
.50
Factors Affecting Reliability
1) Variability of scores (generally, the more
variability, the higher the reliability)
2) Number of items (the more questions, the higher
the reliability)
3) Item difficulty (moderately difficult items lead to
higher reliability, e.g., p-value of .40 to .60)
4) Homogeneity/similarity of item content (e.g., item
x total score correlation; the more homogeneity,
the higher the reliability)
5) Scale format/number of response options (the
more options, the higher the reliability)
Standard Error of Measurement
[Error that exists in an individual’s test score]
SEM = 
1 - r
Reliability
Standard Deviation
Examples:  = 10; r = .90
SEM = 3.16
 = 10; r = .60
SEM = 6.32
Normal Curve
68
%
95
%
-4
-3
-2
-1
Actual
z-score = 2.58
Actual
z-score = 1.96
• 3.16 x 1.96 = 6.19 (95%
confidence)
• 3.16 x 2.58 = 8.15 (99%
confidence)
Mean
+1
+2
99
%
+3
+4
Other Standard Errors
Standard error of the mean: SX =
s = standard deviation
N = # observations or
sample size
S
√N
p = proportion
Standard error of proportion: SEP =
p (1 - p)/N
N = sample size
Standard error of
difference in proportions:
Standard error of estimate
(validity coefficient):
y’ = y
y = standard deviation of y
2
1 - r xy
(criterion)
2
r xy = correlation between x
and y squared