Transcript RELIABILITY
LECTURE 6
RELIABILITY
RELIABILITY
• Reliability is a proportion of variance measure (squared variable) • • Defined as the proportion of observed score (x) variance due to true score ( ) variance: 2 x = xx’ = 2 / 2 x
VENN DIAGRAM REPRESENTATION Var( ) Var(e) reliability Var(x)
PARALLEL FORMS OF TESTS
• If two items x 1 they have and x 2 are parallel, • equal true score variance: – Var( 1 ) = Var( 2 ) • equal error variance: – Var(e 1 ) = Var(e 2 ) • • Errors e 1 (e 1 and e , e 2 2 ) = 0 are uncorrelated: 1 = 2
Reliability: 2 parallel forms
• • x 1 = (x 1 + e ,x 2 1 , x 2 = + e ) = reliability 2 = xx’ = correlation between parallel forms
Reliability: parallel forms
x 1 x x x 2 e e xx’ = x * x
Reliability: 3 or more parallel forms
• For 3 or more items x i , same general form holds • reliability of any pair is the correlation between them • Reliability of the
composite
(sum of items) is based on the
average
inter-item correlation: stepped-up reliability,
Spearman-Brown formula
Reliability: 3 or more parallel forms
Spearman-Brown formula for reliability
r xx = k r(i,j) / [ 1+ (k-1) r(i,j) ] Example: 3 items, 1 correlates .5 with 2, 1 correlates .6 with 3, and 2 correlates .7 with 3; average is .6
r xx = 3(.6) / [1 + 2(.6) ] = 1.8/2.2 = .87
Reliability: tau equivalent scores
• If two items x 1 and x 2 equivalent, they have are tau • 1 = 2 • equal true score variance: – Var( 1 ) = Var( 2 ) • unequal error variance: – Var(e 1 ) Var(e 2 ) • Errors e 1 (e 1 and e , e 2 2 ) = 0 are uncorrelated:
Reliability: tau equivalent scores
• • x 1 = (x 1 + e ,x 2 1 , x 2 = + e ) = reliability 2 = xx’ = correlation between tau eqivalent forms (same computation as for parallel, observed score variances are different)
Reliability: Spearman-Brown
Can show the reliability of the parallel forms or tau equivalent composite is
kk’ = [k
xx’ ]/[1 + (k-1)
xx’ ] k = # times test is lengthened example: test score has rel=.7
doubling length produces 2(.7)/[1+.7] = .824
rel =
Reliability: Spearman-Brown
example: test score has rel=.95
Halving (half length) produces
xx
= .5(.95)/[1+(.5-1)(.95)] = .905
Thus, a short form with a random sample of half the items will produce a test with adequate score reliability
Reliability: KR-20 for parallel or tau equivalent items/scores
Items are scored as 0 or 1, dichotomous scoring
Kuder and Richardson (1937): special cases of Cronbach’s more general equation for parallel tests. KR-20 = [k/(k-1)] [ 1 p i q i / 2 y ] , where p i = proportion of respondents obtaining a score of 1 and q i = 1 – p i .
p i is the
item difficulty
Reliability: KR-21 for parallel forms assumption Items are scored as 0 or 1, dichotomous scoring Kuder and Richardson (1937) KR-21 = [k/(k-1)] [ 1 - k p. q. / 2 c ] p. is the mean item difficulty and q. = 1 – p. KR-21 assumes that all items have the same difficulty (parallel forms) item mean gives the best estimate of the population values. KR-21 KR-20.
Reliability: congeneric scores
• If two items x 1 1.
1 2 and x 2 are congeneric, 2. unequal true score variance: Var( 1 ) Var( 2 ) 3. unequal error variance: Var(e 1 ) Var(e 2 ) 4. Errors e 1 (e 1 , e 2 and e ) = 0 2 are uncorrelated:
Reliability: congeneric scores
x 1 = 1 + e 1 , x 2 = 2 + e 2 jj = Cov(t 1 , t 2 )/ x1 x2 This is the correlation between two separate measures that have a common latent variable
e 1 x 1 Congeneric measurement structure x1 1 12 x2 2 x 2 1 2 e 2 xx’ = x1 1 12 x2 2
Reliability: Coefficient alpha
Composite=sum of k parts, each with its own true score and variance C = x 1 + x 2 + …x k
≤
est
1 2 k / 2 c = k/(k-1)[1 s 2 k / s 2 c ]
Reliability: Coefficient alpha
Alpha = 1. Spearman-Brown for parallel or tau equivalent tests 2. = KR20 for dichotomous items (tau equiv.) = Hoyt, even for 2 x item 0 (congeneric)
Hoyt reliability
• Based on ANOVA concepts extended during the 1930s by Cyrus Hoyt at U. Minnesota • Considers items and subjects as factors that are either random or fixed (different models with respect to expected mean squares) • Presaged more general Coefficient alpha derivation
Source
Reliability: Hoyt ANOVA
df Expected Mean Square
Person (random) I-1 2 + 2 x items + K 2 Items (random) K-1 error (I-1)(K-1) 2 + k 2 x item + I 2 items 2 + 2 x item parallel forms => 2 x item = 0 Hoyt = { ℇ(MS persons ) - ℇ(MS error ) } / ℇ(MS persons ) est Hoyt = [ (MS persons ) - (MS error ) ] / (MS persons )
Reliability: Coefficient alpha
Composite=sum of k parts, each with its own true score and variance C = x 1 + x 2 + …x k Example: sx1 = 1, sx2=2, sx3=3
sc = 5
est
= 3/(3-1)[1 (1+4+9)/25 ] = 1.5[1 – 14/25] = 16.5/25 = .66
RELIABILITY Generalizability d-coefficients g-coefficients ANOVA test-retest inter-rater parallel form Cronbach’s alpha internal consistency dichotomous scoring: KR-20 KR-21 average inter-item Hoyt split half Spearman Brown correction
SPSS DATA FILE JOE SUZY FRANK JUAN SHAMIKA ERIN MICHAEL BRANDY WALID KURT ERIC MAY 0 1 1 1 1 0 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 1 0 1 1 0 1 1 1 1 1 0 1 0 1 0 0 0 0 1 0 1 1 1 1 0 1
SPSS RELIABILITY OUTPUT R E L I A B I L I T Y A N A L Y S I S S C A L E (A L P H A) Reliability Coefficients N of Cases = 12.0 N of Items = 4 Alpha = .1579
SPSS RELIABILITY OUTPUT R E L I A B I L I T Y A N A L Y S I S S C A L E (A L P H A) Reliability Coefficients N of Cases = 12.0 N of Items = 8 Alpha = .6391
Note: same items duplicated
TRUE SCORE THEORY AND STRUCTURAL EQUATION MODELING True score theory is consistent with the concepts of SEM - latent score (true score) called a factor in SEM - error of measurement - path coefficient between observed score x and latent score is same as index of reliability
COMPOSITES AND FACTOR STRUCTURE
• 3 Manifest (Observed) Variables required for a unique identification of a single factor • Parallel forms implies – Equal path coefficients (termed factor loadings) for the manifest variables – Equal error variances – Independence of errors
e e x 1 Parallel forms factor diagram x x x 2 x 3 x e x i x j = x i * x j = reliability between variables i and j
RELIABILITY FROM SEM
• TRUE SCORE VARIANCE OF THE COMPOSITE IS OBTAINABLE FROM THE LOADINGS: k = 2 i = Variance of factor i=1 k = # items or subtests = k 2 x = k times pairwise average reliability of items
RELIABILITY FROM SEM
• RELIABILITY OF THE COMPOSITE IS OBTAINABLE FROM THE LOADINGS: = k/(k-1)[1 - 1/ ] • example 2 x = .8
, K=11 = 11/(10)[1 - 1/8.8 ] = .975
TAU EQUIVALENCE
• ITEM TRUE SCORES DIFFER BY A CONSTANT: i = j + k • ERROR STRUCTURE UNCHANGED AS TO EQUAL VARIANCES, INDEPENDENCE
CONGENERIC MODEL
• LESS RESTRICTIVE THAN PARALLEL FORMS OR TAU EQUIVALENCE: – LOADINGS MAY DIFFER – ERROR VARIANCES MAY DIFFER • MOST COMPLEX COMPOSITES ARE CONGENERIC: – WAIS, WISC-III, K-ABC, MMPI, etc.
e 1 e 3 x 1 x 1 x 3 x 3 x 2 x 2 (x 1 , x 2 )= x 1 * x 2 e 2
COEFFICIENT ALPHA
• • xx’ = 1 2 E / 2 X = 1 - [ 2 i (1 ii )]/ 2 X , • since errors are uncorrelated • = k/(k-1)[1 s 2 i / s 2 C ] • where C = x i (composite score) • s 2 i = variance of subtest x i • s C = variance of composite • Does not assume knowledge of subtest ii
COEFFICIENT ALPHA NUNNALLY’S COEFFICIENT
• • IF WE KNOW RELIABILITIES OF EACH SUBTEST, i N = K/(K-1)[1 s 2 i (1- r ii )/ s 2 X ] • where r ii = coefficient alpha of each subtest • Willson (1996) showed N xx’
s 1 e 3 e 1 NUNNALLY’S RELIABILITY CASE e 2 x 1 x 1 x 2 x 2 s 2 x 3 x 3 s 3 X i X i = 2 x i + s 2 i
Reliability Formula for SEM with Multiple factors (congeneric with subtests) Single factor model: = i 2 / [ i 2 > + ii + ij ] If eij = 0, reduces to = i 2 / [ i 2 variances + ii ] = Sum(factor loadings on 1 st factor)/ Sum of observed This generalizes (Bentler, 2004) to the sum of factor loadings on the 1 st factor divided by the sum of variances and covariances of the factors for multifactor congeneric tests Maximal Reliability for Unit-weighted Composites Peter M. Bentler University of California, Los Angeles UCLA Statistics Preprint No. 405 October 7, 2004 http://preprints.stat.ucla.edu/405/MaximalReliabilityforUnit-weightedcomposites.pdf
Multifactor models and specificity • Specificity is the correlation between two observed items independent of the true score • Can be considered another factor • Cronbach’s alpha can
overestimate
reliability if such factors are present • Correlated errors can also result in alpha overestimating reliability
e 3 e 1 CORRELATED ERROR PROBLEMS s x 1 x 1 x 2 x 2 e 2 x 3 x 3 Specificities can be misinterpreted as a correlated error model if they are correlated or a second factor s 3
e 3 s 3 CORRELATED ERROR PROBLEMS x 1 e 1 x 1 e 2 x 2 x 2 x 3 x 3 Specificieties can be misinterpreted as a correlated error model if specificities are correlated or are a second factor
• • • • • • • •
SPSS SCALE ANALYSIS
• ITEM DATA • EXAMPLE: (Likert items, 0-4 scale) • Mean Std Dev Cases 1. CHLDIDEAL (0-8) 2.7029 1.4969 882.0
2. BIRTH CONTROL PILL OK 2.2959 1.0695 882.0
3. SEXED IN SCHOOL 1.1451 .3524 882.0
4. POL. VIEWS (CONS-LIB) 4.1349 1.3379 882.0
5. SPANKING OK IN SCHOOL 2.1111 .8301 882
CORRELATIONS
• Correlation Matrix • • • • • • CHLDIDEL PILLOK SEXEDUC POLVIEWS CHLDIDEL 1.0000
PILLOK .1074 1.0000
SEXEDUC .1614 .2985 1.0000
POLVIEWS .1016 .2449 .1630 1.0000
SPANKING -.0154 -.0307 -.0901 -.1188
SCALE CHARACTERISTICS
• • Statistics for Mean Variance Std Dev Variables Scale 12.3900 7.5798 2.7531 5 • • Items Mean Minimum Maximum Range Max/Min Variance 2.4780 1.1451 4.1349 2.9898 3.6109 1.1851
• • • • • • Item Variances Mean Minimum Maximum Range Max/Min Variance 1.1976 .1242 2.2408 2.1166 18.0415 .7132
Inter-itemCorrelations Mean Minimum Maximum Range Max/Min Variance .0822 -.1188 .2985 .4173 -2.5130 .0189
ITEM-TOTAL STATS
• • • • Item-total Statistics Scale Scale Corrected Mean Variance Item Squared Alpha Total Multiple if item Correlation R deleted • • • • • CHLDIDEAL 9.6871 4.4559 .1397 .0342 .2121
PILLOK 10.0941 5.2204 .2487 .1310 .0961
SEXEDUC 11.2449 6.9593 .2669 .1178 .2099
POLVIEWS 8.2551 4.7918 .1704 .0837 .1652
SPANKING 10.2789 7.3001 -.0913 .0196 .3655
ANOVA RESULTS
• Analysis of Variance • • Source of Variation Sum of Sq.
DF Mean Square F Prob.
• • • • • Between People 1335.5664 881 1.5160
Within People 8120.8000 3528 2.3018
Measures 4180.9492 4 1045.2373 934.9 .0000
Residual 3939.8508 3524 1.1180
Total 9456.3664 4409 2.1448
RELIABILITY ESTIMATE
• Reliability Coefficients 5 items • Alpha = .2625 Standardized item alpha = .3093
• Standardized means all items parallel
RELIABILITY: APPLICATIONS
STANDARD ERRORS
• s e = standard error of measurement • = s x [1 xx ] 1/2 • can be computed if xx is estimable • provides error band around an observed score: [ -1.96s
e + x, 1.96s
e + x ]
-1.96s
e x +1.96s
e ASSUMES ERRORS ARE NORMALLY DISTRIBUTED
TRUE SCORE ESTIMATE
• est = xx x + [1 xx ] x mean • example: x= 90, mean=100, rel.=.9
• est = .9 (90) + [1 - .9 ] 100 = 81 + 10 = 91
STANDARD ERROR OF TRUE SCORE ESTIMATE
• S = = s x [ xx ] 1/2 [1 xx ] 1/2 • Provides estimate of range of likely true scores for an estimated true score
DIFFERENCE SCORES
• Difference scores are widely used in education and psychology: Learning disability = Achievement - Predicted Achievement • Gain score from beginning to end of school year • Brain injury is detected by a large discrepancy in certain IQ scale scores
RELIABILITY OF D SCORES
• D = x - y • s 2 D = s 2 x + s 2 y - 2r xy s x s y • r DD = [r xx s 2 x + r yy s 2 y -2 r xy s x s y ]/ [s 2 x + s 2 y - 2r xy s x s y ]
REGRESSION DISCREPANCY
•
D
= y - y pred • where y pred = bx + b 0 • s
DD
= [(1 - r 2 xy • where )(1- r
DD
)] 1/2 • r
DD =
[r yy + r xx r xy -2r 2 xy ]/ [1- r 2 xy ]
TRUE DISCREPANCY
• D = b D y.x
(y - y mn ) + b D x.y
(x - x mn ) • s D = [b 2 D y.x
+ b 2 D x.yn
+2(b Dy.x
b Dx.y r xy ] • and r DD = {[2-(r xx -r yy ) 2 + (r yy -r xy ) 2 2(r yy -r xy )(r xx -r xy )r 2 xy ] / [(1-r xy )(r yy +r xx -2r xy )]} -1 -