Transcript DifferentialItemFunctioning
Differential Item Functioning
Anatomy of the name
• DIFFERENTIAL – Differential Calculus?
– Comparing two groups • ITEM – Focus on ONE item at a time – Not the whole test • FUNCTIONING – All we have is the item performance (1 or 0). – Not about the content, format of item •
Is there any Differential Item Functioning between groups?
ITEM01 ITEM02 ITEM03 ITEM04 ITEM05 ITEM06 0 0.2
Proportion correct
0.4
0.6
0.8
Female Male 1
ITEM01 ITEM02 ITEM03 ITEM04 ITEM05 ITEM06 0
Proportion correct: matched students
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Female Male 0.8
Why do we care about DIF?
• Validation process of test – Bias-Free against minorities • Necessary but not sufficient – Inference or interpretation beyond statistics data must be involved • Bias? DIF? Impact?
– DIF: Conditional on ability – Bias: Pejorative in nature – Impact: Not conditional on ability
Definition of DIF
• An item has no DIF if the probability of getting the item right is dependent only on ability, not on group membership.
• An item has DIF if the probability of getting the item right is dependent on group membership (and possibly on ability).
Causes & Types of DIF
• Causes – Construct irrelevant variance – Opportunity to learn • Types – Adverse – Benign
Construct Irrelevant Variance Adverse Causes (k-12) Opportunity to Learn Benign
Responsibility
MP Field Client
Some DIF Examples
• Meaning of “ascend” in MCAS vocabulary test • Potato Salad example in NAEP Biology test • Train schedule in urban area in LSAT logical reasoning problem • Color of lemon from ETS
Empirical Evidence
• It is a kind of Function.
y
f
(
x
) • Inputs: – Item response vector – Total score – Group indicator • Output: – A number called DIF index
Feverish World of DIF
• Every categorical data analysis method can be used, since the DIF index is just simply a mathematical function with an item response vector as the main input. – Mantel-Haenszel method – Standardization method – Logistic regression method – Dimensionality analysis – IRT based methods
One question, many answers
• Mantel-Haenszel method – Differences in constant odds ratio • Standardization method – Differences in proportion of correct • Logistic regression method – Group variable coefficient estimates • Dimensionality analysis – Second dimension of data • IRT based methods – Area between two ICCs
Area between two ICCs
1 0.8
0.6
0.4
0.2
0 -3 Male -2 -1 0 q 1 Female 2 3
DIF in MP
• Standardization method • Index describing the degree of DIF – Standardized P-Difference • Comparing groups – Male-Female – White-Black – White-Hispanic • Minimum 200 examinees in one group
Classification of DIF
C | -0.15
| B | -0.1
| | -0.05
| A |
0
| | 0.05
| B | 0.1
| C | 0.15
|
Grade Subject Form Position Item Number Type M/F Stat M/F Cat 03 03 03 03 03 03 03 mat mat mat mat mat mat mat 00 00 00 00 00 00 00 01 02 03 04 08 09 10 201401 MC 201417 MC 226696 MC 201459 MC 201408 MC 201286 MC 201604 MC -0.02
-0 0.03
0.01
0.01
-0.06
-0.05
A B A A A A A
Some more Jargon
• Matching variable – Conditional variable – Total score, theta score, external measure • Focal group – Study group • Base group – Reference group
Base group White Group
Item of Interest
Focal group Black Group 7 5 10 5 7 10 15 20 15 20
White Group Black Group 5 10 15 20 5 10 15 20 White Group Black Group We can now study this item of interest for both the White group and the Black group
Impact vs. DIF
• Impact – Difference between two groups in performance on item level (and total score level) • DIF – Difference between two groups in performance on item level AFTER groups matched with respect to the ability
Standardized P-Difference
1) Match the different groups by score level 2) At every score level get the proportion correct for each group 3) Apply weighting to the difference of proportion correct 4) Accumulate these weighted differences across all score levels 5) Divide the sum of the weighted difference by the sum of the weights
Formal Definition of Standardized P-Difference
STD P DIF
m fm
P bm
)
m
m w m
• • •
w m
: Weighting factor at score level
m P fm
: Proportion correct of the focal group
P bm
: Proportion correct of the base group
Summation (Σ)
w m
(
P fm
P bm
) (
f
1
P b
1 ) (
f
0
P b
0 )
w
14 (
P f
14
P b
14 )
w P
3 (
f
3
P b
3 )
w P
2 (
f
2
P b
2 )
w
40 (
P f
40
P b
40 )
m m fm
P bm
)
w P
0 (
f
0
P b
0 )
w P
1 (
f
1
P b
1 )
w P
14 (
f
14
P b
14 )
w P
40 (
f
40
P b
40 )
Does it work?
• If we know which items have DIF in advance, we can test the method to see whether it catches the DIF properly or not. • We simulated data from a 40 item test. One item had DIF: we made it more difficult for one group than another.
• We ran the Standardized P-Difference procedure to evaluate the DIF for each item.
• Ideally, the method would make the right decision on each item.
Data Simulation plan
• Examinees – 2000 examinees in focal group and 8000 in base group – Focal group ability: ~N (0,1) – Base group ability: ~N (1,1) • Items – 40 MC items only – 41 score levels (from 0 to 40) • DIF setting – Only 1 item having DIF – The focal group difficulty parameter is 1.0 higher than the base group one.
– The others have the same item parameters for both groups.
Raw Score Distribution
6% 5% 4% 3% 2% 1% 0% 0 base focal 3 6 9 12 15 18 21 24
Raw Score
27 30 33 36 39
ITEM01 ITEM02 ITEM03 ITEM04 ITEM05 ITEM06 ITEM07 0 0.2
0.4
P value
0.6
0.8
1 base focal
RS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Weight 0 0 0 3 2 5 17 28 44 48 69 76 96 93 104 101 94 106 86 79 82 Pf 0.00
0.00
0.00
0.00
0.00
0.00
0.24
0.21
0.25
0.27
0.32
0.28
0.41
0.45
0.47
0.52
0.59
0.56
0.58
0.68
0.68
Non-DIF item Pvalues
Pb 0.00
0.00
0.00
0.00
0.00
0.22
0.12
0.16
0.33
0.32
0.35
0.32
0.40
0.42
0.46
0.49
0.54
0.58
0.58
0.62
0.61
RS 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Weight 77 55 68 66 58 63 58 52 40 47 43 41 51 35 26 23 27 18 11 8 2000
ITEM 26
Pf 0.71
0.65
0.62
0.70
0.74
0.75
0.74
0.77
0.85
0.77
0.91
0.83
0.80
0.97
0.96
0.91
0.93
1.00
1.00
1.00
Pb 0.64
0.69
0.72
0.66
0.76
0.79
0.78
0.80
0.83
0.82
0.85
0.88
0.88
0.86
0.90
0.91
0.94
0.92
0.98
1.00
RS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Weight 0 0 0 3 2 5 17 28 44 48 69 76 96 93 104 101 94 106 86 79 82 Pf 0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.04
0.02
0.06
0.06
0.04
0.04
0.09
0.05
0.09
0.07
0.12
0.13
0.14
0.20
DIF item Pvalues
Pb 0.00
0.00
0.00
0.00
0.00
0.06
0.06
0.07
0.07
0.07
0.12
0.12
0.12
0.23
0.22
0.29
0.34
0.32
0.38
0.52
0.48
RS 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Weight 77 55 68 66 58 63 58 52 40 47 43 41 51 35 26 23 27 18 11 8 2000
ITEM 27
Pf 0.18
0.24
0.18
0.30
0.28
0.35
0.36
0.35
0.50
0.30
0.47
0.61
0.71
0.74
0.73
0.91
0.81
0.78
1.00
1.00
Pb 0.54
0.60
0.61
0.69
0.73
0.76
0.79
0.75
0.81
0.83
0.87
0.88
0.89
0.94
0.95
0.97
0.98
0.99
0.98
1.00
ITEM 26
Non-DIF item P values
1 0.8
0.6
0.4
0.2
0 0 0.2
0.4
Base
0.6
0.8
1 ITEM 27
DIF item P values
1 0.8
0.6
0.4
0.2
0 0 0.2
0.4
Base
0.6
0.8
1
STD_P_DIFFERENCE -0.3
-0.25
-0.2
-0.15
-0.1
-0.05
ITEM01 0 ITEM06 ITEM11 ITEM16 ITEM21 ITEM26 ITEM31 ITEM36 0.05
0.1
0.15
Some more complexity?
• Double differential functioning?
– Discriminant parameter or point-by-serial correlation • How big is big?
– Hypothetical testing • Spoiled onion in the basket?
– Purification of the criterion • Polytomous item – Testlet DIF