RELIABILITY

Transcript RELIABILITY

LECTURE 6

RELIABILITY

• Reliability is a proportion of variance measure (squared variable) • • Defined as the proportion of observed score (x) variance due to true score (  ) variance:  2 x  =  xx’ =  2  /  2 x

VENN DIAGRAM REPRESENTATION Var(  ) Var(e) reliability Var(x)

PARALLEL FORMS OF TESTS

• If two items x 1 they have and x 2 are parallel, • equal true score variance: – Var(  1 ) = Var(  2 ) • equal error variance: – Var(e 1 ) = Var(e 2 ) • • Errors e 1  (e 1 and e , e 2 2 ) = 0 are uncorrelated:  1 =  2

Reliability: 2 parallel forms

• • x 1 =  (x 1  + e ,x 2 1 , x 2 =  + e ) = reliability 2 =  xx’ = correlation between parallel forms

Reliability: parallel forms

x 1  x   x  x 2 e  e  xx’ =  x  *  x 

Reliability: 3 or more parallel forms

• For 3 or more items x i , same general form holds • reliability of any pair is the correlation between them • Reliability of the

composite

(sum of items) is based on the

average

inter-item correlation: stepped-up reliability,

Spearman-Brown formula

Reliability: 3 or more parallel forms

Spearman-Brown formula for reliability

r xx = k r(i,j) / [ 1+ (k-1) r(i,j) ] Example: 3 items, 1 correlates .5 with 2, 1 correlates .6 with 3, and 2 correlates .7 with 3; average is .6

r xx = 3(.6) / [1 + 2(.6) ] = 1.8/2.2 = .87

Reliability: tau equivalent scores

• If two items x 1 and x 2 equivalent, they have are tau •  1 =  2 • equal true score variance: – Var(  1 ) = Var(  2 ) • unequal error variance: – Var(e 1 )  Var(e 2 ) • Errors e 1  (e 1 and e , e 2 2 ) = 0 are uncorrelated:

Reliability: tau equivalent scores

• • x 1 =  (x 1  + e ,x 2 1 , x 2 =  + e ) = reliability 2 =  xx’ = correlation between tau eqivalent forms (same computation as for parallel, observed score variances are different)

Reliability: Spearman-Brown

Can show the reliability of the parallel forms or tau equivalent composite is



kk’ = [k



xx’ ]/[1 + (k-1)



xx’ ] k = # times test is lengthened example: test score has rel=.7

doubling length produces 2(.7)/[1+.7] = .824

rel =

Reliability: Spearman-Brown

example: test score has rel=.95

Halving (half length) produces

 



= .5(.95)/[1+(.5-1)(.95)] = .905

Thus, a short form with a random sample of half the items will produce a test with adequate score reliability

Reliability: KR-20 for parallel or tau equivalent items/scores

Items are scored as 0 or 1, dichotomous scoring

Kuder and Richardson (1937): special cases of Cronbach’s more general equation for parallel tests. KR-20 = [k/(k-1)] [ 1  p i q i /  2 y ] , where p i = proportion of respondents obtaining a score of 1 and q i = 1 – p i .

p i is the

item difficulty

Reliability: KR-21 for parallel forms assumption Items are scored as 0 or 1, dichotomous scoring Kuder and Richardson (1937) KR-21 = [k/(k-1)] [ 1 - k  p. q. /  2 c ] p. is the mean item difficulty and q. = 1 – p. KR-21 assumes that all items have the same difficulty (parallel forms) item mean gives the best estimate of the population values. KR-21  KR-20.

Reliability: congeneric scores

• If two items x 1 1.

 1   2 and x 2 are congeneric, 2. unequal true score variance: Var(  1 )  Var(  2 ) 3. unequal error variance: Var(e 1 )  Var(e 2 ) 4. Errors e 1  (e 1 , e 2 and e ) = 0 2 are uncorrelated:

Reliability: congeneric scores

x 1 =  1 + e 1 , x 2 =  2 + e 2  jj = Cov(t 1 , t 2 )/  x1  x2 This is the correlation between two separate measures that have a common latent variable

e 1 x 1 Congeneric measurement structure  x1  1  12  x2  2 x 2  1  2 e 2  xx’ =  x1  1  12  x2  2

Reliability: Coefficient alpha

Composite=sum of k parts, each with its own true score and variance C = x 1 + x 2 + …x k



≤



est

1  2 k /  2 c = k/(k-1)[1  s 2 k / s 2 c ]

Reliability: Coefficient alpha

Alpha = 1. Spearman-Brown for parallel or tau equivalent tests 2. = KR20 for dichotomous items (tau equiv.) = Hoyt, even for  2  x item  0 (congeneric)

Hoyt reliability

• Based on ANOVA concepts extended during the 1930s by Cyrus Hoyt at U. Minnesota • Considers items and subjects as factors that are either random or fixed (different models with respect to expected mean squares) • Presaged more general Coefficient alpha derivation

Source

Reliability: Hoyt ANOVA

df Expected Mean Square

Person (random) I-1  2  +  2  x items + K  2  Items (random) K-1 error (I-1)(K-1)  2  + k  2  x item + I  2 items  2  +  2  x item parallel forms =>  2  x item = 0  Hoyt = { ℇ(MS persons ) - ℇ(MS error ) } / ℇ(MS persons ) est  Hoyt = [ (MS persons ) - (MS error ) ] / (MS persons )

Reliability: Coefficient alpha

Composite=sum of k parts, each with its own true score and variance C = x 1 + x 2 + …x k Example: sx1 = 1, sx2=2, sx3=3

sc = 5 

est

= 3/(3-1)[1  (1+4+9)/25 ] = 1.5[1 – 14/25] = 16.5/25 = .66

RELIABILITY Generalizability d-coefficients g-coefficients ANOVA test-retest inter-rater parallel form Cronbach’s alpha internal consistency dichotomous scoring: KR-20 KR-21 average inter-item Hoyt split half Spearman Brown correction

SPSS DATA FILE JOE SUZY FRANK JUAN SHAMIKA ERIN MICHAEL BRANDY WALID KURT ERIC MAY 0 1 1 1 1 0 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 1 0 1 1 0 1 1 1 1 1 0 1 0 1 0 0 0 0 1 0 1 1 1 1 0 1

SPSS RELIABILITY OUTPUT R E L I A B I L I T Y A N A L Y S I S S C A L E (A L P H A) Reliability Coefficients N of Cases = 12.0 N of Items = 4 Alpha = .1579

SPSS RELIABILITY OUTPUT R E L I A B I L I T Y A N A L Y S I S S C A L E (A L P H A) Reliability Coefficients N of Cases = 12.0 N of Items = 8 Alpha = .6391

Note: same items duplicated

TRUE SCORE THEORY AND STRUCTURAL EQUATION MODELING True score theory is consistent with the concepts of SEM - latent score (true score) called a factor in SEM - error of measurement - path coefficient between observed score x and latent score  is same as index of reliability

COMPOSITES AND FACTOR STRUCTURE

• 3 Manifest (Observed) Variables required for a unique identification of a single factor • Parallel forms implies – Equal path coefficients (termed factor loadings) for the manifest variables – Equal error variances – Independence of errors

e e x 1 Parallel forms factor diagram  x   x  x 2 x 3  x   e  x i x j =  x i  *  x j  = reliability between variables i and j

RELIABILITY FROM SEM

• TRUE SCORE VARIANCE OF THE COMPOSITE IS OBTAINABLE FROM THE LOADINGS: k =   2 i = Variance of factor i=1 k = # items or subtests = k  2 x  = k times pairwise average reliability of items 

RELIABILITY FROM SEM

• RELIABILITY OF THE COMPOSITE IS OBTAINABLE FROM THE LOADINGS:  = k/(k-1)[1 - 1/  ] • example  2 x  = .8

 , K=11 = 11/(10)[1 - 1/8.8 ] = .975

TAU EQUIVALENCE

• ITEM TRUE SCORES DIFFER BY A CONSTANT:  i =  j +  k • ERROR STRUCTURE UNCHANGED AS TO EQUAL VARIANCES, INDEPENDENCE

CONGENERIC MODEL

• LESS RESTRICTIVE THAN PARALLEL FORMS OR TAU EQUIVALENCE: – LOADINGS MAY DIFFER – ERROR VARIANCES MAY DIFFER • MOST COMPLEX COMPOSITES ARE CONGENERIC: – WAIS, WISC-III, K-ABC, MMPI, etc.

e 1 e 3 x 1  x 1  x 3  x 3    x 2  x 2  (x 1 , x 2 )=  x 1  *  x 2  e 2

COEFFICIENT ALPHA

• •  xx’ = 1  2 E /  2 X = 1 - [  2 i (1  ii )]/  2 X , • since errors are uncorrelated •  = k/(k-1)[1  s 2 i / s 2 C ] • where C =  x i (composite score) • s 2 i = variance of subtest x i • s C = variance of composite • Does not assume knowledge of subtest  ii

COEFFICIENT ALPHA NUNNALLY’S COEFFICIENT

• • IF WE KNOW RELIABILITIES OF EACH SUBTEST,  i  N = K/(K-1)[1  s 2 i (1- r ii )/ s 2 X ] • where r ii = coefficient alpha of each subtest • Willson (1996) showed    N   xx’

s 1 e 3 e 1 NUNNALLY’S RELIABILITY CASE e 2 x 1  x 1   x 2  x 2 s 2 x 3  x 3   s 3  X i X i =  2 x i  + s 2 i

Reliability Formula for SEM with Multiple factors (congeneric with subtests) Single factor model:  =   i 2 / [   i 2  >  +  ii +   ij ] If eij = 0, reduces to  =   i 2 / [   i 2 variances +  ii ] = Sum(factor loadings on 1 st factor)/ Sum of observed This generalizes (Bentler, 2004) to the sum of factor loadings on the 1 st factor divided by the sum of variances and covariances of the factors for multifactor congeneric tests Maximal Reliability for Unit-weighted Composites Peter M. Bentler University of California, Los Angeles UCLA Statistics Preprint No. 405 October 7, 2004 http://preprints.stat.ucla.edu/405/MaximalReliabilityforUnit-weightedcomposites.pdf

Multifactor models and specificity • Specificity is the correlation between two observed items independent of the true score • Can be considered another factor • Cronbach’s alpha can

overestimate

reliability if such factors are present • Correlated errors can also result in alpha overestimating reliability

e 3 e 1 CORRELATED ERROR PROBLEMS s x 1  x 1   x 2  x 2 e 2 x 3  x 3   Specificities can be misinterpreted as a correlated error model if they are correlated or a second factor s 3

e 3 s 3 CORRELATED ERROR PROBLEMS x 1 e 1  x 1  e 2  x 2  x 2 x 3  x 3   Specificieties can be misinterpreted as a correlated error model if specificities are correlated or are a second factor

• • • • • • • •

SPSS SCALE ANALYSIS

• ITEM DATA • EXAMPLE: (Likert items, 0-4 scale) • Mean Std Dev Cases 1. CHLDIDEAL (0-8) 2.7029 1.4969 882.0

2. BIRTH CONTROL PILL OK 2.2959 1.0695 882.0

3. SEXED IN SCHOOL 1.1451 .3524 882.0

4. POL. VIEWS (CONS-LIB) 4.1349 1.3379 882.0

5. SPANKING OK IN SCHOOL 2.1111 .8301 882

CORRELATIONS

• Correlation Matrix • • • • • • CHLDIDEL PILLOK SEXEDUC POLVIEWS CHLDIDEL 1.0000

PILLOK .1074 1.0000

SEXEDUC .1614 .2985 1.0000

POLVIEWS .1016 .2449 .1630 1.0000

SPANKING -.0154 -.0307 -.0901 -.1188

SCALE CHARACTERISTICS

• • Statistics for Mean Variance Std Dev Variables Scale 12.3900 7.5798 2.7531 5 • • Items Mean Minimum Maximum Range Max/Min Variance 2.4780 1.1451 4.1349 2.9898 3.6109 1.1851

• • • • • • Item Variances Mean Minimum Maximum Range Max/Min Variance 1.1976 .1242 2.2408 2.1166 18.0415 .7132

Inter-itemCorrelations Mean Minimum Maximum Range Max/Min Variance .0822 -.1188 .2985 .4173 -2.5130 .0189

ITEM-TOTAL STATS

• • • • Item-total Statistics Scale Scale Corrected Mean Variance Item Squared Alpha Total Multiple if item Correlation R deleted • • • • • CHLDIDEAL 9.6871 4.4559 .1397 .0342 .2121

PILLOK 10.0941 5.2204 .2487 .1310 .0961

SEXEDUC 11.2449 6.9593 .2669 .1178 .2099

POLVIEWS 8.2551 4.7918 .1704 .0837 .1652

SPANKING 10.2789 7.3001 -.0913 .0196 .3655

ANOVA RESULTS

• Analysis of Variance • • Source of Variation Sum of Sq.

DF Mean Square F Prob.

• • • • • Between People 1335.5664 881 1.5160

Within People 8120.8000 3528 2.3018

Measures 4180.9492 4 1045.2373 934.9 .0000

Residual 3939.8508 3524 1.1180

Total 9456.3664 4409 2.1448

RELIABILITY ESTIMATE

• Reliability Coefficients 5 items • Alpha = .2625 Standardized item alpha = .3093

• Standardized means all items parallel

RELIABILITY: APPLICATIONS

STANDARD ERRORS

• s e = standard error of measurement • = s x [1  xx  ] 1/2 • can be computed if  xx  is estimable • provides error band around an observed score: [ -1.96s

e + x, 1.96s

e + x ]

-1.96s

e x +1.96s

e ASSUMES ERRORS ARE NORMALLY DISTRIBUTED

TRUE SCORE ESTIMATE

•  est =  xx  x + [1  xx  ] x mean • example: x= 90, mean=100, rel.=.9

•  est = .9 (90) + [1 - .9 ] 100 = 81 + 10 = 91

STANDARD ERROR OF TRUE SCORE ESTIMATE

• S  = = s x [  xx  ] 1/2 [1  xx  ] 1/2 • Provides estimate of range of likely true scores for an estimated true score

DIFFERENCE SCORES

• Difference scores are widely used in education and psychology: Learning disability = Achievement - Predicted Achievement • Gain score from beginning to end of school year • Brain injury is detected by a large discrepancy in certain IQ scale scores

RELIABILITY OF D SCORES

• D = x - y • s 2 D = s 2 x + s 2 y - 2r xy s x s y • r DD = [r xx s 2 x + r yy s 2 y -2 r xy s x s y ]/ [s 2 x + s 2 y - 2r xy s x s y ]

REGRESSION DISCREPANCY

•

= y - y pred • where y pred = bx + b 0 • s

= [(1 - r 2 xy • where )(1- r

)] 1/2 • r

DD =

[r yy + r xx r xy -2r 2 xy ]/ [1- r 2 xy ]

TRUE DISCREPANCY

• D = b D y.x

(y - y mn ) + b D x.y

(x - x mn ) • s D = [b 2 D y.x

+ b 2 D x.yn

+2(b Dy.x

b Dx.y r xy ] • and r DD = {[2-(r xx -r yy ) 2 + (r yy -r xy ) 2 2(r yy -r xy )(r xx -r xy )r 2 xy ] / [(1-r xy )(r yy +r xx -2r xy )]} -1 -

RELIABILITY

Transcript RELIABILITY

LECTURE 6

RELIABILITY

RELIABILITY

PARALLEL FORMS OF TESTS

Reliability: 2 parallel forms

Reliability: parallel forms

Reliability: 3 or more parallel forms

Reliability: 3 or more parallel forms

Reliability: tau equivalent scores

Reliability: tau equivalent scores

Reliability: Spearman-Brown

Reliability: Spearman-Brown

Reliability: congeneric scores

Reliability: congeneric scores

Reliability: Coefficient alpha

Reliability: Coefficient alpha

Hoyt reliability

Reliability: Hoyt ANOVA

Reliability: Coefficient alpha

COMPOSITES AND FACTOR STRUCTURE

RELIABILITY FROM SEM

RELIABILITY FROM SEM

TAU EQUIVALENCE

CONGENERIC MODEL

COEFFICIENT ALPHA

COEFFICIENT ALPHA NUNNALLY’S COEFFICIENT

SPSS SCALE ANALYSIS

CORRELATIONS

SCALE CHARACTERISTICS

ITEM-TOTAL STATS

ANOVA RESULTS

RELIABILITY ESTIMATE

RELIABILITY: APPLICATIONS

STANDARD ERRORS

TRUE SCORE ESTIMATE

STANDARD ERROR OF TRUE SCORE ESTIMATE

DIFFERENCE SCORES

RELIABILITY OF D SCORES

REGRESSION DISCREPANCY

TRUE DISCREPANCY

Directory