DifferentialItemFunctioning

Transcript DifferentialItemFunctioning

Differential Item Functioning

Anatomy of the name

• DIFFERENTIAL – Differential Calculus?

– Comparing two groups • ITEM – Focus on ONE item at a time – Not the whole test • FUNCTIONING – All we have is the item performance (1 or 0). – Not about the content, format of item •

Is there any Differential Item Functioning between groups?

ITEM01 ITEM02 ITEM03 ITEM04 ITEM05 ITEM06 0 0.2

Proportion correct

0.4

0.6

0.8

Female Male 1

ITEM01 ITEM02 ITEM03 ITEM04 ITEM05 ITEM06 0

Proportion correct: matched students

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Female Male 0.8

Why do we care about DIF?

• Validation process of test – Bias-Free against minorities • Necessary but not sufficient – Inference or interpretation beyond statistics data must be involved • Bias? DIF? Impact?

– DIF: Conditional on ability – Bias: Pejorative in nature – Impact: Not conditional on ability

Definition of DIF

• An item has no DIF if the probability of getting the item right is dependent only on ability, not on group membership.

• An item has DIF if the probability of getting the item right is dependent on group membership (and possibly on ability).

Causes & Types of DIF

• Causes – Construct irrelevant variance – Opportunity to learn • Types – Adverse – Benign

Construct Irrelevant Variance Adverse Causes (k-12) Opportunity to Learn Benign

Responsibility

MP Field Client

Some DIF Examples

• Meaning of “ascend” in MCAS vocabulary test • Potato Salad example in NAEP Biology test • Train schedule in urban area in LSAT logical reasoning problem • Color of lemon from ETS

Empirical Evidence

• It is a kind of Function.



(

) • Inputs: – Item response vector – Total score – Group indicator • Output: – A number called DIF index

Feverish World of DIF

• Every categorical data analysis method can be used, since the DIF index is just simply a mathematical function with an item response vector as the main input. – Mantel-Haenszel method – Standardization method – Logistic regression method – Dimensionality analysis – IRT based methods

One question, many answers

• Mantel-Haenszel method – Differences in constant odds ratio • Standardization method – Differences in proportion of correct • Logistic regression method – Group variable coefficient estimates • Dimensionality analysis – Second dimension of data • IRT based methods – Area between two ICCs

Area between two ICCs

1 0.8

0.6

0.4

0.2

0 -3 Male -2 -1 0 q 1 Female 2 3

DIF in MP

• Standardization method • Index describing the degree of DIF – Standardized P-Difference • Comparing groups – Male-Female – White-Black – White-Hispanic • Minimum 200 examinees in one group

Classification of DIF

C | -0.15

| B | -0.1

| | -0.05

| A |

| | 0.05

| B | 0.1

| C | 0.15

Grade Subject Form Position Item Number Type M/F Stat M/F Cat 03 03 03 03 03 03 03 mat mat mat mat mat mat mat 00 00 00 00 00 00 00 01 02 03 04 08 09 10 201401 MC 201417 MC 226696 MC 201459 MC 201408 MC 201286 MC 201604 MC -0.02

-0 0.03

0.01

-0.06

-0.05

A B A A A A A

Some more Jargon

• Matching variable – Conditional variable – Total score, theta score, external measure • Focal group – Study group • Base group – Reference group

Base group White Group

Item of Interest

Focal group Black Group 7 5 10 5 7 10 15 20 15 20

White Group Black Group 5 10 15 20 5 10 15 20 White Group Black Group We can now study this item of interest for both the White group and the Black group

Impact vs. DIF

• Impact – Difference between two groups in performance on item level (and total score level) • DIF – Difference between two groups in performance on item level AFTER groups matched with respect to the ability

Standardized P-Difference

1) Match the different groups by score level 2) At every score level get the proportion correct for each group 3) Apply weighting to the difference of proportion correct 4) Accumulate these weighted differences across all score levels 5) Divide the sum of the weighted difference by the sum of the weights

Formal Definition of Standardized P-Difference

STD P DIF

 

m fm



P bm

)



m w m

• • •

w m

: Weighting factor at score level

m P fm

: Proportion correct of the focal group

P bm

: Proportion correct of the base group

Summation (Σ)

w m

(

P fm



P bm

)  (

1 

P b

1 ) (

0 

P b

0 ) 

14 (

P f

14 

P b

14 )

w P

3 (

3 

P b

3 )

w P

2 (

2 

P b

2 ) 

40 (

P f

40 

P b

40 ) 

m m fm



P bm

) 

w P

0 (

0 

P b

0 ) 

w P

1 (

1 

P b

1 ) 

w P

14 (

14 

P b

14 ) 

w P

40 (

40 

P b

40 )

Does it work?

• If we know which items have DIF in advance, we can test the method to see whether it catches the DIF properly or not. • We simulated data from a 40 item test. One item had DIF: we made it more difficult for one group than another.

• We ran the Standardized P-Difference procedure to evaluate the DIF for each item.

• Ideally, the method would make the right decision on each item.

Data Simulation plan

• Examinees – 2000 examinees in focal group and 8000 in base group – Focal group ability: ~N (0,1) – Base group ability: ~N (1,1) • Items – 40 MC items only – 41 score levels (from 0 to 40) • DIF setting – Only 1 item having DIF – The focal group difficulty parameter is 1.0 higher than the base group one.

– The others have the same item parameters for both groups.

Raw Score Distribution

6% 5% 4% 3% 2% 1% 0% 0 base focal 3 6 9 12 15 18 21 24

Raw Score

27 30 33 36 39

ITEM01 ITEM02 ITEM03 ITEM04 ITEM05 ITEM06 ITEM07 0 0.2

0.4

P value

0.6

0.8

1 base focal

RS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Weight 0 0 0 3 2 5 17 28 44 48 69 76 96 93 104 101 94 106 86 79 82 Pf 0.00

0.00

0.24

0.21

0.25

0.27

0.32

0.28

0.41

0.45

0.47

0.52

0.59

0.56

0.58

0.68

Non-DIF item Pvalues

Pb 0.00

0.00

0.22

0.12

0.16

0.33

0.32

0.35

0.32

0.40

0.42

0.46

0.49

0.54

0.58

0.62

0.61

RS 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Weight 77 55 68 66 58 63 58 52 40 47 43 41 51 35 26 23 27 18 11 8 2000

ITEM 26

Pf 0.71

0.65

0.62

0.70

0.74

0.75

0.74

0.77

0.85

0.77

0.91

0.83

0.80

0.97

0.96

0.91

0.93

1.00

Pb 0.64

0.69

0.72

0.66

0.76

0.79

0.78

0.80

0.83

0.82

0.85

0.88

0.86

0.90

0.91

0.94

0.92

0.98

1.00

RS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Weight 0 0 0 3 2 5 17 28 44 48 69 76 96 93 104 101 94 106 86 79 82 Pf 0.00

0.00

0.04

0.02

0.06

0.04

0.09

0.05

0.09

0.07

0.12

0.13

0.14

0.20

DIF item Pvalues

Pb 0.00

0.00

0.06

0.07

0.12

0.23

0.22

0.29

0.34

0.32

0.38

0.52

0.48

RS 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Weight 77 55 68 66 58 63 58 52 40 47 43 41 51 35 26 23 27 18 11 8 2000

ITEM 27

Pf 0.18

0.24

0.18

0.30

0.28

0.35

0.36

0.35

0.50

0.30

0.47

0.61

0.71

0.74

0.73

0.91

0.81

0.78

1.00

Pb 0.54

0.60

0.61

0.69

0.73

0.76

0.79

0.75

0.81

0.83

0.87

0.88

0.89

0.94

0.95

0.97

0.98

0.99

0.98

1.00

ITEM 26

Non-DIF item P values

1 0.8

0.6

0.4

0.2

0 0 0.2

0.4

Base

0.6

0.8

1 ITEM 27

DIF item P values

1 0.8

0.6

0.4

0.2

0 0 0.2

0.4

Base

0.6

0.8

STD_P_DIFFERENCE -0.3

-0.25

-0.2

-0.15

-0.1

-0.05

ITEM01 0 ITEM06 ITEM11 ITEM16 ITEM21 ITEM26 ITEM31 ITEM36 0.05

0.1

0.15

Some more complexity?

• Double differential functioning?

– Discriminant parameter or point-by-serial correlation • How big is big?

– Hypothetical testing • Spoiled onion in the basket?

– Purification of the criterion • Polytomous item – Testlet DIF

DifferentialItemFunctioning

Transcript DifferentialItemFunctioning