Transcript CHAPTER 5
CHAPTER 5 Test Scores as Composites This Chapter is about the Quality of Items in a Test. 1 Test Scores as Composites What is the Composite Test Score? A composite test score is a total test score created by summing two or more subtest scores i.e., WAIS IV Full Scale IQ consisted of 1-Verbal Comprehension Index, 2-Perceptual Reasoning Index, 3-Working Memory Index, and 4-Processing Speed Index. Qualifying Examinations and EPPP Exams are also composite test scores. 2 Item Scoring Schemes [skeems]Systems We have 2 different scoring system 1. Dichotomous Scores Dichotomous Scores are restricted to 0 and 1 such as scores on True and False, and multiple-choice question 2. Non-dichotomous Scores Non dichotomous Scores are not restricted to 0 and 1 Can have range of possible points such as in essays. 1,2, 3, 4, 5…….. 3 Dichotomous Scheme Examples 1. The space between nerve cell endings is called the a. Dendrite b. Axon ; c. Synapse d. Neutron (In this item, responses a, b, and d are scored 0; response c is scored 1.) 2. Teachers in public school systems should have the right to strike. a. Agree b. Disagree (In this item, a response of Agree is scored 1; Disagree is scored 0) . 4 Or, you can use True or False. Practical Implication for Test Construction Variance and Covariance measure the quality of items in a test. Reliability and validity measure the quality of the entire test. σ²=SS/N used by one set of data Variance is the degree of variability of scores from mean. 5 Practical Implication for Test Construction Correlation is based on a statistic called Covariance (Cov xy or S xy) COVxy=SP/N-1 used for 2 sets of data Covariance is a number that reflects the degree to which 2 variables vary together. r=sp/√ssx.ssy 6 Variance X 1 2 4 5 σ² = ss/N s² = ss/n-1 or ss/df Pop Sample SS=Σx²-(Σx)²/N SS=Σ( x-μ)² Sum of Squared Deviation from Mean 7 Covariance Covariance is a number that reflects the degree to which 2 variables vary together. Original Data X Y 1 3 2 6 4 4 5 7 8 Covariance COVxy=SP/N-1 2 ways to calculate the SP SP= Σxy-(Σx.Σy/N) SP= Σ(x-μx)(y-μy) SP requires 2 sets of data SS requires only one set of data 9 Descriptive Statistics for Dichotomous Data 10 Descriptive Statistics for Dichotomous Data Item Variance & Covariance 11 Descriptive Statistics for Dichotomous Data P=Item Difficulties: P= (#of examinees who answered an item correctly / total # of examinees or P=f/N See handout The higher the P value The easier the item 12 Relationship between Item Difficulty P and σ² Variance σ² (quality) 0 difficult 0.5 1 easy P= Item Difficulty 13 Non-dichotomous Scores Examples 1. Write a grammatically correct German sentence using the first person singular form of the verb verstehen. (A maximum of 3 points may be awarded and partial credit may be given.) 2. An intellectually disabled person is a nonproductive member of society. 5. Strongly agree 4. Agree, 3. No opinion 2. Disagree 1. Strongly disagree (Scores can range from 1 to 5 points. with high scores indicating a positive attitude toward intellectually disabled citizens.) 14 Descriptive Statistics for Non-dichotomous Variables 15 Descriptive Statistics for Non-dichotomous Variables 16 Variance of a Composite “σ²C” σ²=SS/N σ²a=SSa/Na σ²b=SSb/Nb σ²C= σ²a+σ²b Ex. From WAIS III-- FSIQ=VIQ+PIQ If More than 2 subtests, σ²C=σ²a+σ²b+σ²c… Calculate the variance for each subtest and add them up. 17 Variance of a Composite “σ²C” What is the Composite Test Score? Ex. WAIS IV Full Scale IQ which consist of a-Verbal Comprehension Index, b-Perceptual Reasoning Index, c-Working Memory Index, and d-Processing Speed Index. More than 2 subtests σ²C=σ²a+σ²b+σ²c+σ²d 18 *Suggestions to Increase the Total Score Variance of a Test 1-Increase the number of items in a test 2-Item difficulties p (medium range) 3-Items with similar content have higher correlations & higher covariance 4-Item scores & total scores variances alone are not indices (in-də-ˌcēz) of test quality (reliability and validity). 19 *1-Increase the Number of Items in a Test (how to calculate the test variance) Variance for a test of 25 items is higher than a variance for a test of 20 items. σ²=N(σ²x)+N(N-1)(COVx)= Ex. If the COVx=items covariance = (0.10) σ²x=items variance (0.20) N= #of items in a test -- first try N=20 σ²=test variance For 20 items 42 , then try N=25 and σ²=test variance for 25 items 65 20 2-Item Difficulties Item difficulties should be almost equal for all of the items and difficulty levels should be in the medium range. 21 3-Items with Similar Content have Higher Correlations & Higher Covariance 22 4- Item Scores & Total Scores Variances Alone are not Indices (in-də-ˌcēz) of Test Quality Variance and Covariance are important and necessary however, they are not sufficient to determine the test quality. To determine a higher level of test quality we use Reliability and Validity. 23 UNIT II RELIABILITY CHAP 6: RELIABILITY AND THE CLASSICAL TRUE SCORE MODEL CHAP 7: PROCEDURES FOR ESTIMATING RELIABILITY CHAP 8: INTRODUCTION TO GENERALIZABILITY THEORY CHAP 9: RELIABILITY COEFFICIENTS FOR CRITERION-REFERENCED TESTS 24 CHAPTER 6 Reliability and the Classical True Score Model Reliability (p)=Reliability is a measure of consistency/dependability, or when a test measures same thing more than once and results in same outcome. Reliability refers to the consistency of examinees performance over repeated administrations of the same test or parallel forms of the test (Linda Crocker Text). 25 THE MODERN MODELS 26 *TYPES OF RELIABILITY TYPE OF RELIABILITY WHT IT IS HOW DO YOU DO IT WHAT THE RELIABILITY COEFFICIENT LOOKS LIKE Test-Retest A measure of stability Administer the same test/measure at two different times to the same group of participants r test1.test2 Ex. IQ test A measure of equivalence Administer two different forms of the same test to the same group of participants r testA.testB Ex. Stats Test 2 Admin Parallel/alternate Interitem/Equivalent Forms 2 Admin Test-Retest with Alternate Forms A measure of stability and equivalence 2 Admin Inter-Rater 27 1 Admin r testA.testB A measure of agreement Have two raters rate behaviors and then determine the amount of agreement between them Percentage of agreement A measure of how consistently each item measures the same underlying Correlate performance on each item with overall performance across participants Cronbach’s Alpha Method Kuder-Richardson Method Split Half Method Hoyts Method 1 Admin Internal Consistency On Monday, you administer form A to 1st half of the group and form B to the second half. On Friday, you administer form B to 1st half of the group and form A to the 2nd half Test-Retest Class IQ Scores Students X John 125 Jo 110 Mary 130 Kathy 122 David 115 Y 120 112 128 120 120 1st time on Mon 2nd time on Fri 28 Parallel/alternate Forms Scores on 2 forms of stats tests Students Form A Form B John 95 92 Jo 84 82 Mary 90 88 Kathy 76 80 David 81 78 29 Test-Retest with Alternate Forms On Monday, you administer form A to 1st half of the group and form B to the second half. On Friday, you administer form B to 1st half of the group and form A to the 2nd half Students Form A to 1st group (Mon) Students Form B to 2nd group (Mon) David Mary Jo John Kathy 85 94 78 81 67 Mark Jane George Mona Maria 82 95 80 80 70 Next slide 30 Test-Retest with Alternate Forms On Friday, you administer form B to 1st half of the group and form A to the second Students Form B to 1st group (Fri) Students Form A to 2nd group (FRi) David Mary Jo John Kathy 85 94 78 81 67 Mark Jane George Mona Maria 82 95 80 80 70 31 HOW RELIABILITY IS MEASURED Reliability is Measured by Using a Correlation Coefficient r test1•test2 Reliability or r x.y Coefficients: Indicates how scores on one test change, relative to scores on a second test Can range from 0.0 to ±1 • ±1.00 = perfect reliability • 0.00 = no reliability 32 THE CLASSICAL MODEL 33 A CONCEPTUAL DEFINITION OF RELIABILITY CLASSICAL MODEL Observed Score = True Score ± Error Score X=T±E Method Error Trait Error 34 Classical Test Theory The Observed Score, X=T+E X is the score you actually record or observe on a test. The True Score, T=X-E or, the difference between the Observed score and Error score is the True score T score is the reflection of the examinee true knowledge The Error Score, E =X-T or, the difference between the Observed score and True score is the Error score. E are factors that cause the True Score and observed score to differ. 35 A CONCEPTUAL DEFINITION OF RELIABILITY Observed Score = True Score ± Error Score Method Error Trait Error Observed Score X=T±E Score that actually observed Consists of two components • True Score • Error Score (X) 36 A CONCEPTUAL DEFINITION OF RELIABILITY Observed Score = True Score ± Error Score Method Error Trait Error True Score T=X-E Perfect reflection of true value for individual Theoretical score 37 A CONCEPTUAL DEFINITION OF RELIABILITY Observed Score = True Score ± Error Score Method Error Trait Error Method error is due to characteristics of the test or testing situation Trait error is due to individual characteristics True Score Conceptually, Reliability = True Score + Error Score True Score Observed Score Reliability of the observed score becomes higher if error is reduced!! 38 A CONCEPTUAL DEFINITION OF RELIABILITY OR Observed Score = True Score ± Error Score Error Score Method Error Trait Error E=X-T Is the Difference between Observed and True score ± X=T±E 95=90+5 or 85=90-5 The difference between T and X is 5 points or E=±5 39 The Classical True Score Model X=T±E X= Represents the observed test score T= Represents the individual's True knowledge of score E= Represents the random error component 40 Classical Test Theory What Makes up the Error Score? E=X-T Error Score consist of; 1-Method Error and 2-Trait Error 1-Method Error Method Error is the difference between True & Observed Scores resulting from the test or testing situation. 2-Trait Error Trait Error is the difference between True & Observed Scores resulting from the characteristics of examinees. See next slide 41 What Makes up the Error Score? 42 Expected Value of True Score Definition of the True Score The True score is defined as the expected value of the examinees’ test scores (mean of observed scores) over many repeated testing with the same test. 43 Error Score Definition of the Error Score Error scores for an examinee over many repeated testing should be Zero. eEj=Tj-Tj=0 eEj=Expected value of Error Tj=Examinee’ True Score Ex. next 44 Error Score X-E=T or, the difference between the Observed score and Error score is the True score (scores are from the same examinee) 98-8= 90 88+2=90 80+10=90 X±E=T 100-10=90 95-5=90 81+9=90 88+2=90 90-0=90 -8+2+10-10-5+9+2-0=0 45 *INCREASING THE RELIABILITY OF A TEST Meaning Decreasing Error 7 Steps 1. Increase Sample Size (n) 2. Eliminate Unclear Questions 3. Standardize Testing Conditions 4. Moderate the Degree of Difficulty of the tests (P) 5. Minimize the Effects of External Events 6. Standardize Instructions (Directions) 7. Maintain Consistent Scoring Procedures (use rubric) 46 *Increasing Reliability of your Items in a Test 47 *Increasing Reliability Cont.. 48 How Reliability (p) is Measured for an Item/score P=True Score/True Score + Error Score or p=T/T+E 0=== p === ±1 Note: In this formula you always add your Error(the difference between T and X) to the True Score in the denominator (±) , Whether is positive or negative. p=T/T + (the difference between T and X which is E) p=T/T+E 49 Which Item has the Highest Reliability? Maximum points for this question is 10 +2= 8……….. 8/10=0.80 -3=6…………. 6/9=0.666 +7=1……….…1/8=0.125 -1=9…………..9/10=0.90 +4=6………....6/10=0.60 -4=6……….....6/10=0.60 +1=7………....7/8=0.875 0=10…………10/10=1.0 -5=4…………..4/9=0.444 +6=3…………..3/9=0.333 >MORE ERROR <LESS RELIABLE p=T/T+E 50 How Classical Reliability (p) is Measured for a Test X=T+E p=T/X…for an essay item/score Examinees 1. X1=t1+e1 Ex. 10 = 7+3 2. X2=t2+e2 Ex. 8 = 5 + 3 3. X3=t3+e3 Ex. 6 = 4 + 2 Then calculate theσ²X=4 & σ²T=2.33 51 How Classical Reliability (p) is Measured for a Test Reliability Coefficient for All Items px1x2=σ²T/σ²X Px1x2 for previous ex=2.33/4.00= 0.58 Pk=σ²T/σ²X 52 How Reliability Coefficient (p) is Measured for a Test T X T±E=X 3+2= 5 4+3=7 8+6=13 9+5=14 2+1=3 1+1=2 8+1=9 7+3=10 P= σ²T/ σ²x 9.643/19.554= 0.493 053 Reliability Coefficient (p) for parallel test forms Reliability Coefficient (p) =The correlation between scores on parallel test forms. Next slide 54 X±E=T Scores on Parallel Test Forms X Test A 98-2= 96 88+2=90 80+11=91 100-8=92 95-3=92 81+12=93 88+1=89 90-3=87 r=sp/√ssx.ssy Y Test B 95-6=89 80+6=86 87-4=83 75+12=87 90-5=85 82-2=80 86-3=83 85+6=91 r=0.882 55 *Reliability Coefficient and Reliability Index Reliability Coefficient- px1x2=σ²T/σ²X Reliability Index pxt=σT/σX Therefore-p =(pxt)² Or pxt = Just like the relationship between σ² and σ x1x2 The higher the item-reliability index, The higher the internal consistency of the test. 56 *Reliability Coefficient and Reliability Index Reliability Coefficient PX1X2= σ²T/σ²X Reliability Coefficient is the correlation coefficient that expresses the degree of reliability of a test. Reliability PXT= Index σT/σX Reliability index is the correlation coefficient that expresses the degree of relationship between True (T) and Observed (X) scores of a test. It is the √ of Reliability Coefficient. 57 Reliability of a Composite C=a+b…..+k Two Ways to Determine/predict the Reliability of the Composite Test Scores *1-Spearman Brown Prophecy Formula Allows us to estimate the reliability of a composite of parallel tests when the reliability of one of these tests is known. Ex. Next *2 -CRONBACH’S Alpha (α) or Coefficient (α) 58 *Next week Split Half Reliability Method which is the same as Spearman Brown Prophecy Formula when K=2 59 *1. Spearman Brown Prophecy Formula 60 *1. Spearman Brown Prophecy Formula 61 If N or K=2 then, we can call it Split half Reliability Method which is used for Measuring the Internal Consistency Reliability (see next chapter) The effect of changing test length can also be estimated by using Spearman Brown Prophecy Formula. Just like increasing the variance of a test by increasing the # of items in a test (Chapter 5) 62 *The Spearman-Brown Prophcy Formula is used for: a,b,c a. Correcting for one half of the test by estimating the reliability of the whole test. b. Determining how many additional items are needed to increase reliability up to a certain level. c. Determining how many items can be eliminated without reducing reliability below a predetermined level 63 Reliability of a Composite C=a+b…..+k *2-CRONBACH’S Alpha (α) or Coefficient (α) is a preferred statistic Allows us to estimate the reliability of a composite when we know the composite score variance and/or the covariance among all its components. Next slide 64 Reliability of a Composite C=a+b…..+k *2-CRONBACH’S Alpha (α) or Coefficient (α) K α=Pccʹ= (1K 1 ²i ) ²C K= # of tests=3 σ²i= Variance of each test σ²ta, σ²tb, σ²tc σ²ta =2, σ²tb =3, σ²tc=4 σ²C= Composite score variance=12 65 The Standard Error of Measurement σ E or σ M Standard Error of Measurement is the Mean of the Standard Deviations (σ) of all errors (E) made by several examinee. E=T-X Examinees Test 1 Test 2 Test 3 Test 4 1. E=95-90=5 -----4---- -----3------ ----4-2. E=85-86=1 -----1----- ----3------2-3. E=90-95=5 -----3----- ----1------ ----3--4. E=95-93=2 -----2---- -----4---- ------1-σ1 σ2 σ3 σ4 66 *The Standard Error of Measurement σ 1. Find the σs of these errors (E) for all of E the examinees tests. The mean/average for these σs is called the Standard Error of Measurement 2. σE = σx 1 pxx' Pxxʹ= r =reliability coefficient or use Px1x2 for parallel tests. σx=Standard Deviation for a Set of Observed Scores(X). 67 *The Standard Error of Measurement σ E is a tool used to estimate or infer how far an observed score (X) deviates from a true score (T). σE = σx 1 pxx' Pxxʹ=r=reliability coefficient=use Px1x2 for parallel tests=.91 σx=Standard Deviation for a Set of Observed Scores=10 ----- σ =3 E next slide 68 The Standard Error of Measurement σ E This means the average difference between the True scores (T) and Observed scores (X) is 3 points for all examinees which is called the Standard Error of Measurement. 3 69 70 71