Transcript CHAPTER 5
CHAPTER 5
Test Scores as
Composites
This Chapter is about the Quality of
Items in a Test.
1
Test Scores as Composites
What
is the Composite Test Score?
A composite test score is a total test
score created by summing two or
more subtest scores i.e., WAIS IV Full
Scale IQ consisted of 1-Verbal
Comprehension Index, 2-Perceptual
Reasoning Index, 3-Working Memory
Index, and 4-Processing Speed Index.
Qualifying Examinations and EPPP Exams
are also composite test scores.
2
Item Scoring Schemes
[skeems]Systems
We have 2 different scoring system
1.
Dichotomous Scores
Dichotomous Scores are restricted to 0 and 1
such as scores on True and False, and
multiple-choice question
2. Non-dichotomous Scores
Non dichotomous Scores are not restricted to
0 and 1
Can have range of possible points such as in
essays. 1,2, 3, 4, 5……..
3
Dichotomous Scheme Examples
1. The space between nerve cell endings is called
the
a. Dendrite
b. Axon ;
c. Synapse
d. Neutron
(In this item, responses a, b, and d are scored 0;
response c is scored 1.)
2. Teachers in public school systems should
have the right to strike.
a. Agree
b. Disagree
(In this item, a response of Agree is scored 1;
Disagree is scored 0) .
4
Or, you can use True or False.
Practical Implication for Test
Construction
Variance and Covariance measure
the quality of items in a test.
Reliability and validity measure the
quality of the entire test.
σ²=SS/N used by one set of data
Variance is the degree of variability
of scores from mean.
5
Practical Implication for Test
Construction
Correlation is based on a statistic
called Covariance (Cov xy or S xy)
COVxy=SP/N-1 used for 2 sets
of data
Covariance is a number that reflects
the degree to which 2 variables
vary together.
r=sp/√ssx.ssy
6
Variance
X
1
2
4
5
σ² = ss/N
s² = ss/n-1 or ss/df
Pop
Sample
SS=Σx²-(Σx)²/N
SS=Σ( x-μ)²
Sum of Squared Deviation from Mean
7
Covariance
Covariance
is a number that
reflects the degree to which 2
variables vary together.
Original Data
X Y
1 3
2 6
4 4
5 7
8
Covariance
COVxy=SP/N-1
2 ways to calculate the SP
SP= Σxy-(Σx.Σy/N)
SP= Σ(x-μx)(y-μy)
SP requires 2 sets of data
SS requires only one set of
data
9
Descriptive Statistics for
Dichotomous Data
10
Descriptive Statistics for
Dichotomous Data
Item Variance & Covariance
11
Descriptive Statistics for
Dichotomous Data
P=Item
Difficulties:
P= (#of examinees who answered an item correctly / total # of examinees
or
P=f/N
See handout
The higher the P value The easier
the item
12
Relationship between Item Difficulty
P and σ²
Variance
σ² (quality)
0 difficult
0.5
1
easy
P= Item Difficulty
13
Non-dichotomous Scores Examples
1.
Write a grammatically correct German
sentence using the first person singular form
of the verb verstehen. (A maximum of 3
points may be awarded and partial credit
may be given.)
2.
An intellectually disabled person is a
nonproductive member of society.
5. Strongly agree 4. Agree, 3. No opinion
2. Disagree 1. Strongly disagree
(Scores can range from 1 to 5 points. with
high scores indicating a positive attitude
toward intellectually disabled citizens.)
14
Descriptive Statistics for
Non-dichotomous Variables
15
Descriptive Statistics for
Non-dichotomous Variables
16
Variance of a Composite “σ²C”
σ²=SS/N
σ²a=SSa/Na
σ²b=SSb/Nb
σ²C= σ²a+σ²b
Ex.
From WAIS III-- FSIQ=VIQ+PIQ
If More than 2 subtests, σ²C=σ²a+σ²b+σ²c…
Calculate the variance for each
subtest and add them up.
17
Variance of a Composite “σ²C”
What
is the Composite Test Score?
Ex.
WAIS IV Full Scale IQ which
consist of a-Verbal Comprehension
Index, b-Perceptual Reasoning Index,
c-Working Memory Index, and
d-Processing Speed Index.
More than 2 subtests
σ²C=σ²a+σ²b+σ²c+σ²d
18
*Suggestions to Increase the
Total Score Variance of a Test
1-Increase
the number of items in a test
2-Item difficulties p (medium range)
3-Items with similar content have higher
correlations & higher covariance
4-Item scores & total scores variances
alone are not indices (in-də-ˌcēz) of test
quality (reliability and validity).
19
*1-Increase the Number of Items
in a Test (how to calculate the test variance)
Variance
for a test of 25 items is higher
than a variance for a test of 20 items.
σ²=N(σ²x)+N(N-1)(COVx)=
Ex. If the COVx=items covariance = (0.10)
σ²x=items variance (0.20)
N= #of items in a test -- first try N=20
σ²=test variance For 20 items 42 ,
then try N=25
and σ²=test variance for 25 items 65
20
2-Item Difficulties
Item
difficulties
should be almost
equal for all of the
items and difficulty
levels should be in
the medium range.
21
3-Items with Similar Content have Higher
Correlations & Higher Covariance
22
4- Item Scores & Total Scores Variances Alone
are not Indices (in-də-ˌcēz) of Test Quality
Variance and Covariance are
important and necessary however,
they are not sufficient to
determine the test quality.
To determine a higher level of test
quality we use Reliability and
Validity.
23
UNIT II
RELIABILITY
CHAP 6: RELIABILITY AND THE CLASSICAL
TRUE SCORE MODEL
CHAP 7: PROCEDURES FOR ESTIMATING
RELIABILITY
CHAP 8: INTRODUCTION TO
GENERALIZABILITY THEORY
CHAP 9: RELIABILITY COEFFICIENTS FOR
CRITERION-REFERENCED TESTS
24
CHAPTER 6
Reliability and the Classical True Score Model
Reliability
(p)=Reliability is a measure of
consistency/dependability, or when a test
measures same thing more than once and
results in same outcome.
Reliability refers to the consistency of
examinees performance over repeated
administrations of the same test or parallel
forms of the test (Linda Crocker Text).
25
THE
MODERN
MODELS
26
*TYPES OF RELIABILITY
TYPE OF
RELIABILITY
WHT IT IS
HOW DO YOU DO IT
WHAT THE RELIABILITY
COEFFICIENT LOOKS
LIKE
Test-Retest
A measure of stability
Administer the same
test/measure at two different
times to the same group of
participants
r test1.test2
Ex. IQ test
A measure of
equivalence
Administer two different forms of
the same test to the same group
of participants
r testA.testB
Ex. Stats Test
2 Admin
Parallel/alternate
Interitem/Equivalent
Forms
2 Admin
Test-Retest with
Alternate Forms
A measure of stability
and equivalence
2 Admin
Inter-Rater
27
1 Admin
r testA.testB
A measure of
agreement
Have two raters rate behaviors
and then determine the amount
of agreement between them
Percentage of agreement
A measure of how
consistently each
item measures the
same underlying
Correlate performance on each
item with overall performance
across participants
Cronbach’s Alpha Method
Kuder-Richardson Method
Split Half Method
Hoyts Method
1 Admin
Internal
Consistency
On Monday, you administer form A to 1st
half of the group and form B to the second
half.
On Friday, you administer form B to 1st half
of the group and form A to the 2nd half
Test-Retest
Class
IQ Scores
Students
X
John
125
Jo
110
Mary
130
Kathy
122
David
115
Y
120
112
128
120
120
1st time on Mon
2nd time on Fri
28
Parallel/alternate Forms
Scores
on 2 forms of stats tests
Students
Form A
Form B
John
95
92
Jo
84
82
Mary
90
88
Kathy
76
80
David
81
78
29
Test-Retest with Alternate Forms
On Monday, you administer form A to 1st
half of the group and form B to the second
half. On Friday, you administer form B to 1st half of the group and form A to the 2nd half
Students Form A to 1st group (Mon) Students Form B to 2nd group (Mon)
David
Mary
Jo
John
Kathy
85
94
78
81
67
Mark
Jane
George
Mona
Maria
82
95
80
80
70
Next slide
30
Test-Retest with Alternate Forms
On
Friday, you administer form B to 1st
half of the group and form A to the second
Students Form B to 1st group (Fri) Students Form A to 2nd group (FRi)
David
Mary
Jo
John
Kathy
85
94
78
81
67
Mark
Jane
George
Mona
Maria
82
95
80
80
70
31
HOW RELIABILITY IS
MEASURED
Reliability
is Measured by Using a
Correlation Coefficient
r test1•test2
Reliability
or r x.y
Coefficients:
Indicates how scores on one test change,
relative to scores on a second test
Can range from 0.0 to ±1
• ±1.00 = perfect reliability
• 0.00 = no reliability
32
THE
CLASSICAL
MODEL
33
A CONCEPTUAL DEFINITION OF
RELIABILITY
CLASSICAL MODEL
Observed Score = True Score ± Error Score
X=T±E
Method Error
Trait Error
34
Classical Test Theory
The Observed Score, X=T+E
X is the score you actually record or observe on
a test.
The True Score, T=X-E or, the difference between
the Observed score and Error score is the True
score
T score is the reflection of the examinee true
knowledge
The Error Score, E =X-T or, the difference
between the Observed score and True score is
the Error score.
E are factors that cause the True Score and
observed score to differ.
35
A CONCEPTUAL DEFINITION OF
RELIABILITY
Observed Score = True Score ± Error Score
Method Error
Trait Error
Observed Score
X=T±E
Score that actually observed
Consists of two components
• True Score
• Error Score
(X)
36
A CONCEPTUAL DEFINITION OF
RELIABILITY
Observed Score = True Score ± Error Score
Method Error
Trait Error
True
Score T=X-E
Perfect reflection of true value
for individual
Theoretical score
37
A CONCEPTUAL DEFINITION
OF RELIABILITY
Observed Score = True Score ± Error Score
Method Error
Trait Error
Method error is due to characteristics of the test or
testing situation
Trait error is due to individual characteristics
True Score
Conceptually, Reliability = True Score
+ Error Score
True Score
Observed Score
Reliability of the observed score becomes higher
if error is reduced!!
38
A CONCEPTUAL DEFINITION
OF RELIABILITY OR
Observed Score = True Score ± Error Score
Error
Score
Method Error
Trait Error
E=X-T
Is the Difference between
Observed and True score ±
X=T±E
95=90+5 or 85=90-5 The difference
between T and X is 5 points or E=±5
39
The Classical True Score Model
X=T±E
X=
Represents the observed test
score
T= Represents the individual's
True knowledge of score
E= Represents the random error
component
40
Classical Test Theory
What Makes up the Error Score?
E=X-T
Error Score consist of;
1-Method Error and 2-Trait Error
1-Method Error
Method Error is the difference between True
& Observed Scores resulting from the test
or testing situation.
2-Trait Error
Trait Error is the difference between True &
Observed Scores resulting from the
characteristics of examinees.
See next slide
41
What Makes up the Error Score?
42
Expected Value of True Score
Definition
of the True Score
The True score is defined as
the expected value of the
examinees’ test scores (mean
of observed scores) over many
repeated testing with the same
test.
43
Error Score
Definition of the Error Score
Error scores for an
examinee over many
repeated testing should be
Zero.
eEj=Tj-Tj=0
eEj=Expected value of
Error
Tj=Examinee’ True Score
Ex. next
44
Error Score
X-E=T or, the difference between the
Observed score and Error score is the
True score (scores are from the same
examinee)
98-8= 90
88+2=90
80+10=90
X±E=T
100-10=90
95-5=90
81+9=90
88+2=90
90-0=90
-8+2+10-10-5+9+2-0=0
45
*INCREASING THE RELIABILITY OF
A TEST Meaning Decreasing Error
7 Steps
1. Increase Sample Size (n)
2. Eliminate Unclear Questions
3. Standardize Testing Conditions
4. Moderate the Degree of Difficulty of
the tests (P)
5. Minimize the Effects of External Events
6. Standardize Instructions (Directions)
7. Maintain Consistent Scoring Procedures
(use rubric)
46
*Increasing Reliability of your
Items in a Test
47
*Increasing Reliability Cont..
48
How Reliability (p) is
Measured for an Item/score
P=True
Score/True Score + Error
Score or p=T/T+E
0=== p === ±1
Note: In this formula you always add your Error(the
difference between T and X) to the True Score in
the denominator (±) , Whether is positive or
negative.
p=T/T + (the difference between T and X
which is E)
p=T/T+E
49
Which Item has the Highest Reliability?
Maximum points for this question is 10
+2= 8……….. 8/10=0.80
-3=6…………. 6/9=0.666
+7=1……….…1/8=0.125
-1=9…………..9/10=0.90
+4=6………....6/10=0.60
-4=6……….....6/10=0.60
+1=7………....7/8=0.875
0=10…………10/10=1.0
-5=4…………..4/9=0.444
+6=3…………..3/9=0.333
>MORE ERROR <LESS RELIABLE
p=T/T+E
50
How Classical Reliability (p) is
Measured for a Test
X=T+E
p=T/X…for an essay item/score
Examinees
1.
X1=t1+e1
Ex. 10 = 7+3
2. X2=t2+e2
Ex. 8 = 5 + 3
3. X3=t3+e3
Ex. 6 = 4 + 2
Then calculate theσ²X=4 & σ²T=2.33
51
How Classical Reliability (p) is
Measured for a Test
Reliability
Coefficient
for All Items
px1x2=σ²T/σ²X
Px1x2 for previous ex=2.33/4.00= 0.58
Pk=σ²T/σ²X
52
How Reliability Coefficient (p) is
Measured
for a Test
T
X
T±E=X
3+2= 5
4+3=7
8+6=13
9+5=14
2+1=3
1+1=2
8+1=9
7+3=10
P=
σ²T/ σ²x
9.643/19.554= 0.493
053
Reliability Coefficient (p)
for parallel test forms
Reliability
Coefficient (p)
=The correlation between
scores on parallel test
forms.
Next slide
54
X±E=T
Scores on Parallel Test Forms
X Test A
98-2= 96
88+2=90
80+11=91
100-8=92
95-3=92
81+12=93
88+1=89
90-3=87
r=sp/√ssx.ssy
Y Test B
95-6=89
80+6=86
87-4=83
75+12=87
90-5=85
82-2=80
86-3=83
85+6=91
r=0.882
55
*Reliability Coefficient and
Reliability Index
Reliability Coefficient- px1x2=σ²T/σ²X
Reliability Index pxt=σT/σX
Therefore-p =(pxt)²
Or pxt =
Just like the relationship between σ² and σ
x1x2
The higher the item-reliability index,
The higher the internal consistency of the
test.
56
*Reliability Coefficient and
Reliability Index
Reliability
Coefficient
PX1X2= σ²T/σ²X
Reliability Coefficient is the correlation coefficient that
expresses the degree of reliability of a test.
Reliability
PXT=
Index
σT/σX
Reliability index is the correlation coefficient
that expresses the degree of relationship
between True (T) and Observed (X) scores of a
test. It is the √ of Reliability Coefficient.
57
Reliability of a Composite
C=a+b…..+k
Two Ways to Determine/predict the
Reliability of the Composite Test
Scores
*1-Spearman
Brown Prophecy Formula
Allows us to estimate the reliability of a
composite of parallel tests when the
reliability of one of these tests is
known.
Ex. Next
*2 -CRONBACH’S Alpha (α)
or Coefficient (α)
58
*Next week Split
Half Reliability Method which is the same as
Spearman Brown Prophecy Formula when K=2
59
*1. Spearman Brown Prophecy Formula
60
*1. Spearman Brown Prophecy Formula
61
If N or K=2 then, we can call it Split half
Reliability Method which is used for
Measuring the Internal Consistency
Reliability (see next chapter)
The effect of changing test
length can also be estimated by
using Spearman Brown
Prophecy Formula. Just like increasing the
variance of a test by increasing the # of items in a test
(Chapter 5)
62
*The Spearman-Brown Prophcy Formula is
used for: a,b,c
a.
Correcting for one half of the test by
estimating the reliability of the whole
test.
b. Determining how many additional
items are needed to increase reliability
up to a certain level.
c. Determining how many items can be
eliminated without reducing reliability
below a predetermined level
63
Reliability of a Composite
C=a+b…..+k
*2-CRONBACH’S Alpha (α) or
Coefficient (α) is a preferred statistic
Allows us to estimate the
reliability of a composite
when we know the
composite score variance
and/or the covariance
among all its components.
Next slide
64
Reliability of a Composite
C=a+b…..+k
*2-CRONBACH’S Alpha (α) or Coefficient (α)
K
α=Pccʹ=
(1K 1
²i
)
²C
K= # of tests=3
σ²i= Variance of each test σ²ta, σ²tb, σ²tc
σ²ta =2, σ²tb =3, σ²tc=4
σ²C= Composite score variance=12
65
The Standard Error of Measurement
σ
E
or
σ
M
Standard Error of Measurement is the Mean of
the Standard Deviations (σ) of all errors (E)
made by several examinee.
E=T-X
Examinees Test 1
Test 2 Test 3 Test 4
1.
E=95-90=5 -----4---- -----3------ ----4-2.
E=85-86=1 -----1----- ----3------2-3.
E=90-95=5 -----3----- ----1------ ----3--4.
E=95-93=2 -----2---- -----4---- ------1-σ1
σ2
σ3
σ4
66
*The Standard Error of Measurement
σ
1. Find the σs of these errors (E) for all of
E
the examinees tests.
The mean/average for these σs is called
the Standard Error of Measurement
2.
σE = σx 1 pxx'
Pxxʹ= r =reliability coefficient or use Px1x2 for
parallel tests.
σx=Standard Deviation for a Set of Observed
Scores(X).
67
*The Standard Error of Measurement
σ
E
is a tool used to estimate or infer how far an
observed score (X) deviates from a true score (T).
σE = σx 1 pxx'
Pxxʹ=r=reliability coefficient=use Px1x2 for parallel tests=.91
σx=Standard Deviation for a Set of Observed Scores=10 -----
σ =3
E
next slide
68
The Standard Error of
Measurement σ
E
This
means the average difference
between the True scores (T) and
Observed scores (X) is 3 points for all
examinees which is called the Standard
Error of Measurement.
3
69
70
71