Transcript IRT basics

IRT basics: Theory and
parameter estimation
Wayne C. Lee, David Chuah,
Patrick Wadlington, Steve Stark, &
Sasha Chernyshenko
1
Overview


How do I begin a set of IRT analyses?
What do I need?



Software
Data
What do I do?


On-line!
Input/ syntax files
Examination of output
2
“Eye-ARE-What?”

Item response theory (IRT)



Set of probabilistic models that…
Describes the relationship between a
respondent’s magnitude on a construct
(a.k.a. latent trait; e.g., extraversion,
cognitive ability, affective commitment)…
To his or her probability of a particular
response to an individual item
3
But what does that buy you?

Provides more information than classical test
theory (CTT)



Classical test statistics depend on the set of items
and sample examined
IRT modeling not dependent on sample examined
Can examine item bias/ measurement
equivalence and provide conditional standard
errors of measurement
4
Before we begin…

Data preparation


Raw data must be recoded if necessary
(negatively worded items must be reverse
coded such that all items in the scale
indicate a positive direction)
Dichotomization (optional)

Reducing multiple options into two
separate values (0, 1; right, wrong)
5
Calibration and validation files

Data is split into two separate files



Calibration sample for estimating IRT
parameters
Validation sample for assessing the fit of
the model to the data
Data files for the programs that we will
be discussing must be in ASCII/ text
format
6
Investigating dimensionality




The models presented make a common
assumption of unidimensionality
Hattie (1985) reviewed 30 techniques
Some propose the ratio of the 1st eigenvalue
to the 2nd eigenvalue (Lord, 1980)
On-line we describe how to examine the
eigenvalues following Principal Axis Factoring
(PAF)
7
PAF and scree plots

If the data are
dichotomous,
factor analyze
tetrachoric
correlations

Dominant
first factor
Assume continuum
underlies item
responses
8
Two models presented

The Three Parameter Logistic model
(3PL)



For dichotomous data
E.g., cognitive ability tests
Samejima's Graded Response model


For polytomous data where options are
ordered along a continuum
E.g., Likert scales
Common models among applied psychologists
9
The 3PL model

Three parameters:




a = item discrimination
b = item extremity/ difficulty
c = lower asymptote, “pseudo-guessing”
Theta refers to the latent trait
10
Effect of the “a” parameter
Small “a,”
poor
discrimination
11
Effect of the “a” parameter
Larger “a,”
better
discrimination
12
Effect of the “b” parameter
Low “b,”
“easy item”
13
Effect of the “b” parameter
Higher “b,”
more
difficult
item
“b” inversely proportional to CTT p
14
Effect of the “c” parameter
c=0,
asymptote
at zero
15
Effect of the “c” parameter
“low ability”
respondents may
endorse correct
response
16
Estimating 3PL parameters

DOS version of BILOG (Scientific Software)



Easier to estimate parameters for a large
number of scales or experimental groups
Data file must be saved as ASCII text



Multiple files in directory, but small size overall
ID number
Individual responses
Input file (ASCII text)
17
BILOG input file (*.BLG)
Title line
AGREEABLENESS CALIBRATION FOR IRT TUTORIAL.
>COMMENT
>GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE;
>SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV';
>LENGTH NITEMS=(10);
>INPUT SAMPLE=99999;
(4A1,10A1)
>TEST TNAME=AGR;
>CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0;
>SCORE MET=2, IDIST=0, RSC=0, NOPRINT;
18
BILOG input file (*.BLG)
AGREEABLENESS CALIBRATION FOR IRT TUTORIAL.
>COMMENT
>GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE;
>SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV';
>LENGTH NITEMS=(10);
>INPUT SAMPLE=99999;
(4A1,10A1)
>TEST TNAME=AGR;
>CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0;
>SCORE MET=2, IDIST=0, RSC=0, NOPRINT;
Data File Name
Parameters
Characters in ID field
File for missing
19
BILOG input file (*.BLG)
AGREEABLENESS CALIBRATION FOR IRT TUTORIAL.
>COMMENT
>GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE;
>SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV';
>LENGTH NITEMS=(10);
>INPUT SAMPLE=99999;
(4A1,10A1)
>TEST TNAME=AGR;
>CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0;
>SCORE MET=2, IDIST=0, RSC=0, NOPRINT;
Requested files for: Scoring,
Parameters, Covariances
20
BILOG input file (*.BLG)
AGREEABLENESS CALIBRATION FOR IRT TUTORIAL.
>COMMENT
>GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE;
>SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV';
>LENGTH NITEMS=(10);
Number of items
>INPUT SAMPLE=99999;
(4A1,10A1)
>TEST TNAME=AGR;
>CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0;
>SCORE MET=2, IDIST=0, RSC=0, NOPRINT;
Sample size
21
BILOG input file (*.BLG)
AGREEABLENESS CALIBRATION FOR IRT TUTORIAL.
>COMMENT
>GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE;
>SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV';
>LENGTH NITEMS=(10);
FORTRAN
>INPUT SAMPLE=99999;
(4A1,10A1)
statement for
>TEST TNAME=AGR;
reading data
>CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0;
>SCORE MET=2, IDIST=0, RSC=0, NOPRINT;
Name of scale/
measure
22
BILOG input file (*.BLG)
AGREEABLENESS CALIBRATION FOR IRT TUTORIAL.
>COMMENT
>GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE;
>SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV';
>LENGTH NITEMS=(10);
Estimation specifications
>INPUT SAMPLE=99999;
(not the default for BILOG)
(4A1,10A1)
>TEST TNAME=AGR;
>CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0;
>SCORE MET=2, IDIST=0, RSC=0, NOPRINT;
23
BILOG input file (*.BLG)
AGREEABLENESS CALIBRATION FOR IRT TUTORIAL.
>COMMENT
>GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE;
>SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV';
>LENGTH NITEMS=(10);
>INPUT SAMPLE=99999;
(4A1,10A1)
>TEST TNAME=AGR;
>CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0;
>SCORE MET=2, IDIST=0, RSC=0, NOPRINT;
Scoring: Maximum
likelihood, no prior
distribution of scale
scores, no rescaling
24
Phase one output file
(*.PH1)
CLASSICAL ITEM STATISTICS FOR SUBTEST AGR
NUMBER
ITEM NAME TRIED
NUMBER
RIGHT
ITEM*TEST CORRELATION
PERCENT
LOGIT/1.7 PEARSON
BISERIAL
--------------------------------------------------------------------1
0001
1500.0
1158.0
0.772
0.72
0.535
0.742
2
0002
1500.0
991.0
0.661
0.39
0.421
0.545
3
0003
1500.0
1354.0
0.903
1.31
0.290
0.500
4
0004
1500.0
1187.0
0.791
0.78
0.518
0.733
5
0005
1500.0
970.0
0.647
0.36
0.566
0.728
6
0006
1500.0
1203.0
0.802
0.82
0.362
0.519
7
0007
1500.0
875.0
0.583
0.20
0.533
0.674
8
0008
1500.0
810.0
0.540
0.09
0.473
0.594
9
0009
1500.0
1022.0
0.681
0.45
0.415
0.542
10
0010
1500.0
869.0
0.579
0.19
0.426
0.538
---------------------------------------------------------------------
Can indicate problems in parameter estimation
25
Phase two output file
(*.PH2)
CYCLE 12: LARGEST CHANGE = 0.00116
-2 LOG LIKELIHOOD =
15181.4541
CYCLE 13: LARGEST CHANGE = 0.00071
[FULL NEWTON STEP]
-2 LOG LIKELIHOOD =
15181.2347
Check for
convergence
CYCLE 14: LARGEST CHANGE = 0.00066
26
Phase three output file



(*.PH3)
Theta estimation
Scoring of individual respondents
Required for DTF analyses
27
Parameter file “b”
(specified, *.PAR)
“c”
“a”
AGREEABLENESS CALIBRATION FOR IRT TUTORIAL.
>COMMENT
1 10
10
0001AGR
111
1.130784
1.533393
0.101834
0.185726
0002AGR
211
0.360630
0.870309
0.087236
0.097709
0003AGR
311
1.474175
0.743095
0.108974
0.084487
0004AGR
411
1.196368
1.256263
0.087856
0.114710
0005AGR
511
0.544388
1.403904
0.071490
0.133486
0006AGR
611
0.892399
0.777440
0.093109
0.082096
0007AGR
711
0.174395
1.369223
0.083777
0.159712
(32X,2F12.6,12X,F12.6)
-0.737439
0.135455
-0.414371
0.098866
-1.983831
0.250499
-0.952323
0.123613
-0.387767
0.080438
-1.147869
0.152846
-0.127368
0.085084
0.652148
0.078989
1.149018
0.129000
1.345723
0.153003
0.796012
0.072684
0.712300
0.067727
1.286273
0.135828
0.730341
0.085190
0.147203
0.053688
0.132796
0.054461
0.197127
0.087578
0.090901
0.042937
0.056774
0.026086
0.173882
0.075829
0.088135
0.032376
28
PARTO3PL output
0001AGR
0002AGR
0003AGR
0004AGR
0005AGR
0006AGR
0007AGR
0008AGR
0009AGR
0010AGR
111
211
311
411
511
611
711
811
911
1011
1.130784
0.360630
1.474175
1.196368
0.544388
0.892399
0.174395
0.042231
0.441586
0.104452
1.533393
0.870309
0.743095
1.256263
1.403904
0.777440
1.369223
0.979045
0.839144
0.879683
a
(*.3PL)
-0.737439
-0.414371
-1.983831
-0.952323
-0.387767
-1.147869
-0.127368
-0.043135
-0.526234
-0.118738
b
0.652148
1.149018
1.345723
0.796012
0.712300
1.286273
0.730341
1.021403
1.191691
1.136773
0.147203
0.132796
0.197127
0.090901
0.056774
0.173882
0.088135
0.056546
0.129646
0.101087
c
29
Scoring and covariance files


Like the *.PAR file, specifically
requested
*.COV - Provides parameters as well as
the variances/covariances between the
parameters


Necessary for DIF analyses
*.SCO - Provides ability score
information for each respondent
30
Samejima's Graded Response
model

Used when options are ordered along a
continuum, as with Likert scales




v = response to the polytomously scored
item i
k = particular option
a = discrimination parameter
b = extremity parameter
31
Sample SGR Plot
“High option”
“Low option”
Low discrimination
(a=0.4)
32
Sample SGR Plot
Better discrimination (a=2)
33
Running MULTILOG

MULTILOG for DOS


Example with DOS batch file
INFORLOG with MULTILOG


INFORLOG is typically interactive
Process automated with batch file and an
input file (described on-line)


*.IN1 (parameter estimation)
*.IN2 (scoring)
34
The first input file (*.IN1)
CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL
>PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1;
>TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5);
>EST NC=50;
>SAVE;
>END;
5
Title line
01234
1111111111
2222222222
3333333333
4444444444
5555555555
(4A1,10A1)
35
The first input file (*.IN1)
CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL
>PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1;
>TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5);
>EST NC=50;
>SAVE;
>END;
5
01234
Number of items, examinees,
1111111111
characters in the ID field, single
2222222222
group
3333333333
4444444444
5555555555
(4A1,10A1)
36
The first input file (*.IN1)
CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL
>PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1;
>TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5);
>EST NC=50;
>SAVE;
>END;
5
SGR model
01234
Number of options
1111111111
for each item
2222222222
3333333333
4444444444
5555555555
(4A1,10A1)
37
The first input file (*.IN1)
CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL
>PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1;
>TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5);
>EST NC=50;
>SAVE;
Number of cycles
>END;
for estimation
5
01234
1111111111
2222222222
3333333333
End of command
4444444444
syntax
5555555555
(4A1,10A1)
38
The first input file (*.IN1)
CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL
>PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1;
>TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5);
>EST NC=50;
>SAVE;
>END;
5
01234
Five characters
1111111111
2222222222
Denoting five options
3333333333
4444444444
5555555555
(4A1,10A1)
39
The first input file (*.IN1)
CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL
>PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1;
>TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5);
>EST NC=50;
>SAVE;
>END;
5
01234
1111111111
2222222222
Recoding of options
3333333333
for MULTILOG
4444444444
5555555555
(4A1,10A1)
40
The second input file (*.IN2)
SCORING AGREEABLENESS SCALE SGR MODEL
>PRO SCORE IN RA NI=10 NE=1500 NCHAR=4 NG=1;
>TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5);
>START;
Y
Scoring
>SAVE;
>END;
5
12345
1111111111
2222222222
3333333333
4444444444
5555555555
(4A1,10A1)
Yes to INFORLOG
(parameters in a
separate file)
41
Running MULTILOG


Run the batch file
*.IN1  *.LS1 (*.lis file renamed as *.ls1)



ensure that the data were read in and the model
specified correctly
also provides a report of the estimation procedure
with the estimated item parameters
Things of note…
42
“a” includes a
1.7 scaling
factor
0ITEM
1:
5 GRADED CATEGORIES
P(#) ESTIMATE (S.E.)
A
1
1.99 (0.12)
B( 1)
2
-3.03 (0.18)
B( 2)
3
-2.35 (0.11)
Frequencies for
B( 3)
4
-0.98 (0.06)
each option
B( 4)
5
2.01 (0.10)
0
@THETA: -2.0
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
I(THETA): 1.08
1.04
1.05
0.81
0.49
0.35
0.47
0.79
0.99
0 OBSERVED AND EXPECTED COUNTS/PROPORTIONS IN
CATEGORY(K): 1
2
3
4
5
Collapsing
OBS. FREQ.
21
44
277
1050
108
options
OBS. PROP. 0.01
0.03
0.18
0.70
0.07
EXP. PROP. 0.01
0.03
0.19
0.70
0.07
43
Scoring output


*.IN2  *.LS2
Last portion of the file contains the
person parameters (estimated theta,
standard error, the number of iterations
used, and the respondent's ID number).
44
What now?

Review



Data requirements for IRT
Two models: 3PL (dichotomous), SGR
(polytomous), more on-line!
MODFIT


Can plot IRF’s, ORF’s
Model-data fit: Input parameters,
validation sample
45