Chapter 6 Item Analysis

Download Report

Transcript Chapter 6 Item Analysis

Chapter 7 Item Analysis
In constructing a new test (or shortening or
lengthening an existing one), the final set of
items is usually identified through a process
known as item analysis.
—Linda Croker
Both the validity and the reliability of any
test depend ultimately on the characteristics of
its items.
Two Approaches of Item Analysis
Qualitative Analysis
Quantitative Analysis
Qualitative Analysis
includes the consideration of content
validity (content and form of items), as well as
the evaluation of items in terms of effective
item-writing procedures.
• Quantitative Analysis
includes principally the measurement of
item difficulty and item discrimination.
§1 Item Difficulty
1. Definition
The item difficulty for item i, pi , is defined
as the proportion of examinees who get that
item correct.
Though the proportion of examinees passing
an item traditionally has been called the item
difficulty, this proportion logically should be
called item easiness, because the proportion
increase as the item becomes easier.
2. Estimation Methods
• Method for Dichotomously Scored Item
• Method for Polytomously Scored Item
• Grouping Method
• Method for Dichotomously Scored
Items
R
P
N
(7.1)
p is the difficulty of a certain item.
R is the number of examinees who get that item
correct.
N is the total number of examinees.
Example 1
There are 80 high school students attending a
science achievement test, and 61 students pass
item 1, 32 students pass item 10. Please calculate
the difficulty for item 1 and 10 separately.
• Method for Polytomously Scored
Items
X
P
X max
(7.2)
X , the mean of total examinees’ scores on one item
X max , the perfect scores of that item
Example 2
The perfect scores of one open- ended item is
20 points, the average score of total examinees
on this item is 11 points. What is the item
difficulty?
Key: .55
• Grouping Method (Use of Extreme Groups)
Upper (U) and Lower (L) Criterion groups are selected
from the extremes of distribution of test scores or job ratings.
T. L. Kelley (1939) proposed that upper and lower
27% could lead to the optimal point when the total test
scores are normally distributed.
PU  PL
P
2
(7.3)
PU
is th proportion for examinees of upper group
who get the item correct.
PL
is the proportion for examinees of lower group
who get the item correct.
Example 3
There are 370 examinees attending a language
test. Known that 64 examinees of 27% upper
extreme group pass item 5, and 33 examinees of 27%
lower extreme group pass the same item. Please
compute the difficulty of item 5.
Key : .49
3. Correct Chance Effects on Item
Difficulty for Multiple-Choice Item
KP  1
CP 
K 1
(7.4)
CP ,corrected item difficulty
P ,uncorrected item difficulty
K , the number of choices for that item
Example 4
The diffuculty of one five-choice item is .50,
the difficulty of another four-choice item is .53.
Which item is more difficulty?
ANSWER
KP  1 5  0.5  1
CP1 

 0.38
K 1
5 1
KP  1 4  0.53  1
CP2 

 0.37
K 1
4 1
So, the four-choice item is more difficulty.
4. Item Difficulty and Discrimination
Discrimination
Difficulty
If there are 100 persons in one population ,
then ,we can calculate the discriminations as
following:
P=.01,
1 × 99 = 99
P=.02,
2 × 98 = 196
P=.3,
30× 70 = 2100
P=.5,
50 × 50 = 2500
5. Test difficulty and the
Distribution of Test Scores
• How to Calculate the Test Difficulty ?
Two Methods
A
calculate the mean of all item
difficulties of the test
B
compute the ratio of mean of test scores
to perfect test scores
• Test difficulty and the Distribution of Test Scores
(a)
Positive Skewed Distribution
(b)
Negtive Skewed Distribution
§2 Item Discrimination
When the test as a whole is to be evaluated by means of
criterion-related validation, the items may themselves be
evaluated and selected on the basis of their relationships to
the external criterion.
When we identify an item for which high scoring
examinees have a high probability of answering correctly
and low-scoring examinees have a low probability of
answer correctly, we would say such an item can
discriminates or differentiates the examinees.
1. Interpretation
Item discrimination refers to the degree to
which an item differentiates correctly among
test takers in the behavior that the test is
designed to measure.
2. Estimation Methods
• Index of Discrimination
(used for dichotomously scored items)
D = PH - PL
(7.5)
We need to set one or two cutting scores to divide the examinees into
upper scoring group and lower scoring group.
PH is the proportion in the upper group who answer the item
correctly and PL is the proportion in the lower group who answer the
item correctly.
Values of D may range from -1.00 to 1.00.
Example 1
There are 140 students attending a world history test.
(1) If we use the ratio 27% to determine the upper and
lower group, then how many examinees are there in the
upper and lower group separately? (2)If 18 examinees
in upper group answer item 5 correctly, and 6 examinees
in lower group answer it correctly, then calculate the
discrimination index for item 5.
Example 2
50 Examinees’ Test Data on 8-Item Scale About Job
Stress.
Item
1
2
3
4
5
6
PH
PL
.54
.32
D
.18 .25
.81 .47
.56 .11
.32
.05
.51
.10
.36
.27
7
8
.18
. 23
.63
.25
.56
.19
.41 -.05
.38
.37
Guidelines for Interpretation of D Value
D≥.40, the item is functioning quite satisfactorily
.30≤ D≤.39, little or no revision is required
.20 ≤ D≤.29, the item is marginal and needs
revision
D≤.19, the item should be eliminated or
completely revised
•
Correlation Indices of Item
Discrimination
(1) Pearson Product Moment Correlation
Coefficient
rXY 
 xy
Ns X sY
This formula is commonly used to estimate the degree of the
relationship between item and criterion scores
(2) Point Biserial Correlation
If we use the total test score as the criterion, and test item
is scored 0 to 1, then we can use the following formula:
rpbi 
X p  Xt
st
p/q
(7.6)
X p is the mean test scores for those who answer the item correctly
X t is the mean scores for the entire group
st
is the standard deviation of test scores for entire group
p is the pass ratio of that item (difficulty)
q is fail ratio of that item
Example 3
the Test Data of 15 Examinees
Examinees
1
2
Test score
90 81 80 78 77 70 69 65 55 50 49 42 35 31 10
Item score
1 0
note:
3
1
4
1
5
1
st 
6
1
7
1
8
0
9
0
2
(
X

X
)

n
10
0
11
1
12
0
13 14
1
0
15
0
Xt 
90  81  80  ...  10
 58 .80
15
58936 8822 / 15
st 
 21.72
15
rpbi
68.5  58.80

.5333 / .4667  .48
21.72
Transformation of Formula 7.6
rpbi 
Xq
X p  Xq
st
pq
(7.7)
is the mean test scores for those
who answer that item incorrectly
(3) Biserial Correlation Coefficient
X p  Xt p
rb 

st
Y
or
X p  X q pq
rb 

st
Y
(4) Correlation Between Items
a) Tetrachoric Correlation Coefficient
Each variable is created through dichotomizing an
underlying normal distribution
rt  cos(
AD
180 )
AD  BC
(7.8)
Item i
Item
j
1
0
0
1
A
C
B
D
A+C
B+D
A+B
C+D
b) PHI Coefficient
BC  AD
r 
( A  B)(C  D)( A  D)(B  D)
(7.9)
• Variance for item
n
si 
2
(X
j 1
ij
 Xi )
2
(7.10)
n
Difficulty and Discrimination
P
1.00
0.90
0.70
0.60
0.50
0.40
0.30
0.10
0.00
D
0.00
0.20
0.60
0.80
1.00
0.80
0.60
0.20
0.00
§3 Application Case of Item
Analysis
1. Procedures
•
•
•
•
Select a representative sample of examinees and
administer the test;
Differentiate the examinees into upper 27% (or 30%
etc.) group and lower 27% group according to their
test scores;
Calculate PU and PL, then estimate P and D for each
item;
Compare the responses on different choices for each
item between the upper group and lower group;
Revise items.
●
2. Analysis Case
Item
1
2
3
4
Group
Number of Examinees on Each
Choice
A
B
C
D
Omit
Upper
5
92
1
2
0
Lower
22
50
12
16
0
Upper
58
10
15
16
1
Lower
26
21
15
36
2
Upper
17
15
28
28
12
Lower
25
11
19
34
11
Upper
1
44
14
36
5
Lower
1
56
10
28
5
Key
P
D
rb
B
0.71
0.42
0.52
A
0.42
0.32
0.33
D
0.31
-0.06
-0.04
C
0.12
0.04
0.08
Choice Analysis
• Whether the examinees who choose the correct choice
is more than those who choose the wrong choices
• Whether a lot of examinees choose the wrong choices
•
Whether the examinees of upper group who choose
the correct choice is more than the examinees of lower
group
• Whether the examinees of upper group who choose the
wrong choice is more than those of lower group
• Whether there is any choice that few examinees choose
• Whether there is any item that quite a number of
examinees make no choices