Kim - Measured Progress

Download Report

Transcript Kim - Measured Progress

IRT Fixed Parameter Calibration and Other
Approaches to Maintaining Item Parameters
on a Common Ability Scale
Seonghoon Kim, PhD
Keimyung University
Email: [email protected]
Presented at Measured Progress
on July 10, 2008
Overview





I. Nature of IRT Ability Scale
II. Three Approaches to Maintaining
Item Parameters on a Common Scale
III. Principle of Fixed Parameter
Calibration (FPC)
IV. Use of Computer Programs for FPC
V. Applications of FPC for Scaling and
Equating
2
Reference Guide

This presentation was prepared based on my articles,






Kim, S. (2006a). A comparative study of IRT fixed parameter
calibration methods. Journal of Educational Measurement, 43 (4),
355-381.
Kim, S. (2006b). A study on IRT fixed parameter calibration methods
using BILOG-MG. Journal of Educational Evaluation, 19 (1), 323-342.
Kim, S., & Kolen, M. J. (2006). Robustness to format effects of IRT
linking methods for mixed-format tests. Applied Measurement in
Education, 19 (4), 357-381.
Kim, S., & Lee, W. (2006). An extension of four IRT linking methods
for mixed-format tests. Journal of Educational Measurement, 43 (1),
53-76.
Kim, S., & Kolen, M. J. (2007). Effects on scale linking of different
definitions of criterion functions for the IRT characteristic curve
methods. Journal of Educational and Behavioral Statistics, 32 (4),
371-397.
and my recent thoughts and works on FPC
3
I. Nature of IRT Ability Scale
Indeterminacy in IRT modeling

Item response function (IRF) and metrics




Two-parameter logistic (2PL) model
IRF = P(θ | a, b) = 1/[1+exp(-Da(θ-b))]
Suppose that θO = A θN + B
If aO = aN /A and bO = A bN + B,
P(θO | aO, bO) = P(θN | aN, bN)

Therefore, IRF and item parameters are invariant
conditional on linear transformation
Thus, in practice, either θO or θN can be used, which
means scale indeterminacy.
4
I. Nature of IRT Ability Scale
“0, 1” Scaling vs. Rasch Scaling

“0, 1” scaling



Scaling by arbitrarily assuming that the mean (M)
and standard deviation (SD) of the ability
distribution are equal to 0 (origin) and 1 (unit).
Such arbitrary but “standardized” fixing is
unavoidable when the M and SD are unknown.
Rasch scaling


Setting the origin (0) of the scale at the average
difficulty of all items involved, while fixing the unit
at 1.
The fixed unit is guaranteed by the Rasch modeling.
5
I. Nature of IRT Ability Scale
Need for a Fixed Common Ability Scale

A fixed common scale should be used
across test administrations for several
reasons




To check the invariance property of item
parameters
To achieve comparability between item
parameters from different administrations
To develop an item pool
To conduct IRT equating
6
I. Nature of IRT Ability Scale
Need for a Fixed Common Ability Scale

To develop a common ability scale
requires all new scales to be linked to the
fixed old scale θO.
θO
θN1
θN3
θN2
7
I. Nature of IRT Ability Scale
Factors for Development of a Common Scale

Development of a fixed common scale is
subject to

Data collection design for IRT scaling and equating
test forms


Scaling convention


Random groups design vs. Common-item nonequivalent
groups design
“0,1” scaling vs. Rasch scaling
Item parameter estimation method

Marginal maximum likelihood (MML) estimation vs. Joint
maximum likelihood (JML) estimation
8
The Context
Assumed in This Presentation

Data collection design for IRT scaling and equating test
forms

Common-item nonequivalent groups (CING) design


Scaling convention

“0, 1” scaling



Anchor items (i.e., common items) link two test forms
Group dependent
In a random groups design, two “0, 1” scales from alternative
forms may be considered equivalent.
Marginal Maximum Likelihood (MML) Estimation


Estimation of Item parameters
Estimation of Underlying Ability Distribution

Quadrature weights are estimated at quadrature points.
9
Data Structure Illustration for the
CING Design
Items
Old Form Unique Items
to Old Group (1)
Old Form
(Group 1)
New Form
(Group 2)
Common Items (Anchor)
to Old and New Groups
New Form Unique Items
to New Group (2)
10
II. Three Approaches to Maintaining a
Common Scale

Separate calibration by form and linking



Fixed parameter calibration (FPC)


Estimate transformation coefficients A and B using two sets of
item parameter estimates for the anchor items
Use A and B to transform new form item parameter estimates
into those on the old scale
Holding the old form anchor item parameters fixed and
estimating the new form non-anchor items
Concurrent calibration (aka multiple-group estimation)


Combining new and old form data and estimating both all item
parameters and underlying ability distributions, with the old
group being designated as the reference-scale group
Will not be addressed in details in this presentation
11
II. Maintaining the Old Scale
Separate Calibration by Form and Linking

“0, 1” scales from two test forms



Old form scale: θO (reference)
New form scale: θN (arbitrary)
Scheme of linking two “0, 1” scales

θO = A θN + B
-1
-1
0
B
0
A
1
1
θN (arbitrary origin & unit)
θO (fixed origin & unit)
12
II. Maintaining the Old Scale
Separate Calibration by Form and Linking



Linking ability scales is completed by placing all item
parameters from separate calibrations onto the fixed
old scale.
In the case of the 2PL model, given A and B, aN and bN
parameters from a new scale are transformed into
a* = aN /A and b* = A bN + B
In practice, A and B are estimated with item parameter
estimates from the old and new scales.




Mean-Sigma Method (Marco, 1977)
Mean-Mean Method (Loyd & Hoover, 1980)
Haebara Method (Haebara, 1980)
Stocking-Lord Method (Stocking & Lord, 1983)
13
II. Maintaining the Old Scale
Comparative Performance


Suppose that the characteristic curve (Haebara or
Stocking-Lord) method is employed as a linking
method for the “separate calibration and linking”
approach.
The performance of the three alternative approaches
to maintaining the old scale is differential depending on
whether the new form items are common or not
(Hanson & Béguin, 2002; Kim, 2006b; Kim & Kolen, in
process).


For the common items, concurrent calibration would perform
best, due mainly to larger sample size (new group + old
group), compared to the non-common items.
For the non-common items, the three approaches would
perform almost equally.
14
II. Maintaining the Old Scale
Comparative Performance
Possibility of Old Form Item Parameters Being Replaced
(When New Form Items Are Calibrated)
Method
Unique Items
Anchor Items
Separate Calibration and Linking
No
Yes *
Concurrent Calibration
Yes **
Yes **
Fixed Parameter Calibration
No
No
Note. It is assumed that old form item parameters were obtained before.
* Parameters of old form anchor items can be replaced with those of new
form anchor items
** Old form item parameters can be changed by the inclusion of new form
items, which may be remarkable for anchor items.
II. Maintaining the Old Scale
When is FPC most appropriate?


When using the “stable” old form anchor item
parameters to obtain or diagnose the
parameters of new form non-anchor items on
the fixed old scale
Note



Placing the parameters of new form non-anchor
items on the old scale is the focus.
Updating of the old form item parameters is not
concerned at all.
The old form anchor items are assumed to have
stable parameter estimates because a large sample
was used for obtaining them.
16
III. Principle of FPC
Basics

Why


How


To place the parameters of new form non-anchor items onto
the fixed old scale
Holding the old form anchor item parameters fixed and
estimating the new form non-anchor items
Critical Process


Estimating the underlying distribution of ability for the new
form on the fixed old scale so that the new item parameters
may be properly expressed on the old scale.
By the IRT modeling, the underlying distribution can be
estimated using both the new form data and the fixed anchor
item parameters.
17
III. Principle of FPC
Schematic Illustration of Updating Priors and Underlying
Distributions of Ability
1st Initial Prior
Fixing
a1O, b1O, a2O, b2O, …
1st Est. Ability Dist.
= 2nd Initial Prior
θO
a1N b1N … bJN
EM Iterations
2nd Est. Ability Dist.
= 3rd Initial Prior
θO
a1N b1N … bJN
Estimated New
Item Parameters a1N b1N … bJN
on the θO Scale
Final
Est. Ability Dist.
θO
18
III. Principle of FPC
Numerical Expression: Multiple Prior Weights Updating
and Multiple EM Cycles (MWU-MEM)
Likelihood Function for Estimating New Form Non-Anchor Item
Parameters
(Iteration s, quadrature point k, person i, data y, parameters Δ)
K
N


s 1)
 (Δ NEW )   log  f (y iNEW | qk , Δ NEW ) p(qk | y i , Δ OLD , Δˆ (NEW
, πˆ ( s 1) )
k 1 i 1
Closed-Form Formula for Estimating Quadrature Weights of the
Underlying Ability Distribution from the New Form Data
 k( s )
1 N
ˆ ( s 1) , πˆ ( s 1) )
  p(qk | y i , Δ OLD , Δ
NEW
N i 1
Refer to Kim (2006a) for numerical details.
19
III. Principle of FPC
Summary of Key Points




The values of the fixed anchor item parameters are
expressed on the fixed old scale, so the origin and unit
of the ability scale for the new form data have been
already set. That is, we do not need to use “0, 1”
scaling for the new form data.
New form non-anchor item parameters should be
estimated using the new form underlying distribution
that is properly recovered on the fixed old scale.
As with ability estimates, the underlying distribution
can be estimated using the new form data and the
fixed anchor item parameters.
Fixing the anchor item parameters pulls the underlying
distribution onto the old scale gradually. Accordingly,
the new form item parameters are also pulled onto the
old scale.
20
III. Principle of FPC
Concerns about the Unstable Estimates of Anchor Item
Parameters




Unstable estimates of the fixed item parameters might
adversely affect the performance of FPC.
However, Kim (2006a) showed that FPC is robust to
sampling errors of the fixed item parameter estimates
in calibrating non-anchor items.
This seems to be because the new form data
collaborate with the fixed item parameters in “revealing”
the old scale.
In other words, as long as the sample size of the new
group is large enough, unstable estimates of the fixed
item parameters would not much affect the proper
estimation of both the underlying distribution for the
new group and the non-anchor item parameters.
21
III. Principle of FPC
Two Alternatives to the MWU-MEM Method



Some computer programs, such as BILOG-MG,
do not update the prior quadrature weights
during EM cycles when conducting FPC.
The resulting posterior (quadrature) weights
would not properly represent the underlying
ability distribution for the new form data.
Two ad-hoc methods can be used to obtain
good estimates of the quadrature weights for
the underlying distribution.


Simple Transformation Prior Update (STPU) Method
Iterative-Run Prior Update (IRPU) Method
22
III. Principle of FPC
Two Alternatives to the MWU-MEM Method

Simple Transformation Prior Update (STPU)
Method


Uses A and B from a linking method to simply
update the prior ability distribution by transforming
the posterior distribution from the regular, separate
calibration with the new form. Then, conduct FPC
with the updated prior ability distribution.
Iterative-Run Prior Update (IRPU) Method

Uses iteratively updated prior ability distributions
through multiple FPC runs of BILOG-MG. An
estimated posterior distribution in a calibration run
is used as a prior distribution in the next calibration
until the sequential procedure minimizes the
difference between the two distributions.
23
III. Principle of FPC
Two Alternatives to the MWU-MEM Method

Kim (2006b) shows that the two ad hoc
methods for updating the prior ability
distribution work very well.



In recovering the parameters of non-anchor items,
the two methods perform almost equally to the
Stocking-Lord linking method and concurrent
calibration.
In practice, the STPU method may be preferred
due to simplicity.
The IRPU method has the same feature as the
MWU-MEM method, except for multiple runs of
FPC. Thus, theoretically, the IRPU method may
be more acceptable than the STPU method.
24
III. Principle of FPC
Caveats against Using “Constrained” Estimation for FPC

Someone might think that imposing strong
Bayesian priors on the fixed item parameters
and freeing the non-anchor item parameters
would function similarly to FPC.



A rationale for such constrained estimation can be
found in, for example, the BILOG (Mislevy & Bock,
1990) manual.
In theory, it sounds reasonable.
But, my experiences suggest that using strong
priors to fix the anchor item parameters tends to
distort the non-fixed item parameters.
25
III. Principle of FPC
Caveats against Using “Constrained” Estimation for FPC


Note that in constrained estimation the
anchor item parameters are to be
estimated (although almost fixed), while
in FPC they are excluded from the
parameter list to be estimated.
Without a facility to update ability prior
weights, both the underlying distribution
and non-anchor item parameters would
be distorted.
26
IV. Use of Computer Programs for
FPC

BILOG-MG 7.0 (Zimowski et al., 2003)



The “FIX” option does not function properly
because the prior weights are not updated
during EM cycles (Kim, 2006a).
The STPU or IRPU method can be used.
PARSCALE 4.1 (Muraki & Bock, 2003)


For FPC to work properly, the “POSTERIROR”
option should be used (Kim, 2006a).
Without the “POSTERIOR” option, the STPU
or IRPU method can be used.
27
IV. Use of Computer Programs for FPC
Illustration of FPC with BILOG-MG

Data

3,000 examinees for the new form data



25-item multiple-choice (MC) test
FPC




The data were obtained by simulating examinees from
Normal (1, 1) distribution, against the old group of N(0, 1)
distribution.
First 20 items fixed (item parameters are ready for
use)
Last 5 items freed
The three-parameter logistic (3PL) model is used for
item analyses.
Comparison of Default, STPU, and IRPU FPC
methods
28
IV. Use of Computer Programs for FPC
Illustration of FPC with BILOG-MG
Command File (to Use the Default FPC Facility)
Default FPC with BILOG-MG
The examinee group (2) was sampled from N(1,1)
>COMMENT
Fixed-parameter calibration
>GLOBAL DFNAME=‘New.txt', PRNAME='Sample.PRM',
NPARM=3, SAVE;
>SAVE
PAR='itempar';
>LENGTH NITEMS=25;
>INPUT NTOT=25, SAMPLE=3000, NALT=5, NID=4;
>ITEMS INUM=(1(1)25),
INAMES=(O01(1)O20, P01, P02, P03, P04, P05);
>TEST
TNAME=G2_FIX, INUM=(1(1)25), FIX=(1(0)20, 0(0)5);
(4A1, T1, 25A1)
>CALIB NQPT=31, CYCLE=100, CRIT=0.001, NEWTON=1,NOADJUST;
>SCORE NOPRINT;
29
IV. Use of Computer Programs for FPC
Illustration of FPC with BILOG-MG
Data File (New.txt)
1111111111111111110111111
1111110100111111110011100
1111101000111111010111111
1111110111111111111111111
1111111110111111011011111
1111111110111011101011111
0111100100000100001001111
0110011110111111010011111
1111111111111111111111111
0111101111111011110011111
1111111111111110111111111
1111010111111111111011111
1111111110011111100011110
1111111111111111111111111
. . . . . . . .
. . . . . . . .
. . . . . . . .
Item Responses for
Anchor Items
30
IV. Use of Computer Programs for FPC
Illustration of FPC with BILOG-MG
No. of Fixed
Items
20
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
Fixed Parameter File (Sample.PRM)
a
0.48877
0.78980
0.86113
0.59502
0.81096
0.84988
0.59386
0.79144
0.51684
0.90287
0.50175
0.81267
1.16172
0.52306
0.74785
0.77883
0.88805
0.90752
0.62818
0.85275
b
-1.76191
-1.51222
-1.46012
-1.07553
-0.79854
-0.62070
-0.30609
-0.07422
0.48596
1.19854
-2.00058
-1.53418
-1.22405
-1.01148
-0.84378
-0.68332
-0.41610
0.08592
0.65946
1.82052
c
0.18850
0.18301
0.17266
0.20835
0.20981
0.12481
0.17302
0.23463
0.20394
0.16761
0.21263
0.15649
0.13872
0.18519
0.20893
0.19013
0.18126
0.17534
0.26229
0.13813
31
IV. Use of Computer Programs for FPC
Illustration of FPC with BILOG-MG
Command File for the STPU Method (Before Transformation)
Single Group “0, 1” Scaling,
Although the examinee group was sampled from N(1,1).
>COMMENT
STPU FPC before Transformation of Ability Points
>GLOBAL DFNAME='New.txt',
NPARM=3, SAVE;
>SAVE
PAR='sampleSim01.PAR';
>LENGTH NITEMS=25;
>INPUT NTOT=25, SAMPLE=3000, NALT=5, NID=4;
>ITEMS INUM=(1(1)25),
INAMES=(O01(1)O20, P01, P02, P03, P04, P05);
>TEST
TNAME=NO_FIX;
(4A1, T1, 25A1)
>CALIB NQPT=31, CYCLE=100, CRIT=0.001, NEWTON=0, IDIST=0;
>SCORE
NOPRINT;
32
IV. Use of Computer Programs for FPC
Illustration of FPC with BILOG-MG
Posterior Distribution from “0, 1” Scaling for the STPU Method
QUADRATURE POINTS, POSTERIOR WEIGHTS, MEAN AND S.D.:
1
2
3
4
5
POINT
-0.4036E+01 -0.3767E+01 -0.3498E+01 -0.3229E+01 -0.2960E+01
POSTERIOR
0.2163E-04 0.7268E-04 0.2169E-03 0.5802E-03 0.1392E-02
6
7
8
9
10
POINT
-0.2691E+01 -0.2422E+01 -0.2153E+01 -0.1884E+01 -0.1615E+01
POSTERIOR
0.3030E-02 0.6054E-02 0.1104E-01 0.1842E-01 0.2878E-01
11
12
13
14
15
POINT
-0.1346E+01 -0.1076E+01 -0.8074E+00 -0.5384E+00 -0.2693E+00
POSTERIOR
0.4281E-01 0.5985E-01 0.7752E-01 0.9294E-01 0.1036E+00
16
17
18
19
20
POINT
-0.2361E-03 0.2688E+00 0.5379E+00 0.8069E+00 0.1076E+01
POSTERIOR
0.1074E+00 0.1034E+00 0.9265E-01 0.7725E-01 0.6001E-01
21
22
23
24
25
POINT
0.1345E+01 0.1614E+01 0.1883E+01 0.2152E+01 0.2421E+01
POSTERIOR
0.4343E-01 0.2927E-01 0.1837E-01 0.1073E-01 0.5841E-02
26
27
28
29
30
POINT
0.2690E+01 0.2959E+01 0.3228E+01 0.3498E+01 0.3767E+01
POSTERIOR
0.2957E-02 0.1399E-02 0.6105E-03 0.2514E-03 0.9631E-04
31
POINT
0.4036E+01
POSTERIOR
0.3212E-04
MEAN
S.D.
0.00000
1.00000
33
IV. Use of Computer Programs for FPC
Illustration of FPC with BILOG-MG
Command File for the STPU Method (After Transformation)
STPU FPC with Transformed Prior Points
The examinee group was sampled from N(1,1).
Omitted (The same as the commands for before-transformation “0, 1” calibration)
>TEST
TNAME=G2_FIX, INUM=(1(1)25), FIX=(1(0)20, 0(0)5);
Rescaled points by
(4A1, T1, 25A1)
θ* = Aθ+B,
>CALIB NQPT=31, CYCLE=100, CRIT=0.001, NEWTON=0, IDIST=1,
A = 1.040535
NOADJUST;
B = 1.033264
>QUAD
POINTS=(
-3.1663E+000 -2.8864E+000 -2.6065E+000 -2.3266E+000 -2.0467E+000
-1.7668E+000 -1.4869E+000 -1.2070E+000 -9.2710E-001 -6.4720E-001
-3.6730E-001 -8.6352E-002
1.9314E-001
4.7304E-001
7.5305E-001
1.0330E+000
1.3130E+000
1.5930E+000
1.8729E+000
2.1529E+000
2.4328E+000
2.7127E+000
2.9926E+000
3.2725E+000
3.5524E+000
3.8323E+000
4.1122E+000
4.3921E+000
4.6731E+000
4.9530E+000
5.2329E+000),
WEIGHTS=(
2.1630E-005
7.2680E-005
2.1690E-004
5.8020E-004
1.3920E-003
3.0300E-003
6.0540E-003
1.1040E-002
1.8420E-002
2.8780E-002
From “0, 1” Scaling 4.2810E-002
5.9850E-002
7.7520E-002
9.2940E-002
1.0360E-001
(Not Transformed) 1.0740E-001
1.0340E-001
9.2650E-002
7.7250E-002
6.0010E-002
4.3430E-002
2.9270E-002
1.8370E-002
1.0730E-002
5.8410E-003
2.9570E-003
1.3990E-003
6.1050E-004
2.5140E-004
9.6310E-005
3.2120E-005);
>SCORE NOPRINT;
34
IV. Use of Computer Programs for FPC
Illustration of FPC with BILOG-MG
2nd Command File for the IRPU Method
IRPU FPC with Updated Prior Weights
The examinee group was sampled from N(1,1).
Omitted (The same as the commands for the default FPC run)
>TEST
TNAME=G2_FIX, INUM=(1(1)25), FIX=(1(0)20, 0(0)5);
(4A1, T1, 25A1)
>CALIB NQPT=31, CYCLE=100, CRIT=0.001, NEWTON=0, IDIST=1,
NOADJUST;
>QUAD
POINTS=(
-4.0000E+000 -3.7330E+000 -3.4670E+000 -3.2000E+000
-2.6670E+000 -2.4000E+000 -2.1330E+000 -1.8670E+000
-1.3330E+000 -1.0670E+000 -8.0000E-001 -5.3330E-001
-7.7720E-016
2.6670E-001
5.3330E-001
8.0000E-001
1.3330E+000
1.6000E+000
1.8670E+000
2.1330E+000
2.6670E+000
2.9330E+000
3.2000E+000
3.4670E+000
4.0000E+000),
WEIGHTS=(
8.8370E-007
3.0840E-006
1.0040E-005
3.1720E-005
2.5560E-004
6.3580E-004
1.4490E-003
3.0500E-003
Updated Weights
1.8890E-002
3.0200E-002
4.5590E-002
(= Posterior weights 1.1060E-002
8.4190E-002
1.0160E-001
1.1300E-001
1.1550E-001
from the 1st run of
9.2970E-002
7.3160E-002
5.2690E-002
3.4660E-002
IRPU FPC)
1.1400E-002
5.7180E-003
2.6290E-003
1.1160E-003
1.5790E-004);
>SCORE NOPRINT;
Fixed Points
(-4.0 to 4.0)
-2.9330E+000
-1.6000E+000
-2.6670E-001
1.0670E+000
2.4000E+000
3.7330E+000
9.4690E-005
6.0110E-003
6.4400E-002
1.0830E-001
2.0800E-002
4.3390E-004
35
IV. Use of Computer Programs for FPC
Illustration of FPC with BILOG-MG
History of Updated Posterior Distributions by the IRPU Method
Iter#
0
1
2
3
4
5
6
7
8
9
10
11
12
Mean
0.000
0.699
0.876
0.933
0.954
0.963
0.967
0.969
0.971
0.972
0.973
0.973
0.974
Std. Dev.
1.000
0.923
0.921
0.932
0.943
0.951
0.956
0.960
0.963
0.965
0.966
0.967
0.968
From Default FPC
Iterations stopped
because the M and
SD were not
changed beyond
the 0.001 limit
36
IV. Use of Computer Programs for FPC
Illustration of FPC with BILOG-MG
FPC Estimates of Non-Anchor Item Parameters
on the Fixed Old Scale
Item
21
22
23
24
25
Item
21
22
23
24
25
Mean/Sigma
a
b
0.591 -1.947
0.831 -1.643
1.027 -1.781
0.566 -0.988
0.605 -0.727
a
0.605
0.863
1.065
0.575
0.614
STPU FPC
b
-1.909
-1.587
-1.723
-0.991
-0.729
c
0.212
0.222
0.196
0.213
0.206
c
0.210
0.222
0.196
0.209
0.205
a
0.650
0.922
1.128
0.635
0.681
Default FPC
b
c
-1.994
0.214
-1.699
0.230
-1.850
0.198
-1.089
0.220
-0.847
0.216
a
0.624
0.887
1.100
0.594
0.637
IRPU FPC
b
-1.844
-1.542
-1.663
-0.952
-0.689
c
0.208
0.217
0.195
0.207
0.206
37
IV. Use of Computer Programs for FPC
Illustration of FPC with BILOG-MG
FPC Estimates of Mean and SD of the Underlying
Distribution on the Fixed Old Scale
Under-estimation
Method
Mean
Std. Dev.
Default FPC
STPU FPC
IRPU FPC
0.699
1.003
0.974
0.923
1.018
0.968
B = 1.033
A = 1.041
Mean-Sigma
Note. The new group examinees were from a
N(1,1) distribution that was
expressed on the fixed old scale.
38
IV. Use of Computer Programs for FPC
Illustration of FPC with PARSCALE

Data

3,000 examinees for the new form data



A mixed-format test of 15 MC items and 2 fivecategory constructed-response (CR) items
FPC




The data were obtained by simulating examinees from
Normal (0.5, 1.22) distribution, against the old group of
N(0, 1) distribution.
First 10 MC items fixed (item parameters are ready
for use)
Last 5 MC and 2 CR items freed
The 3PL model for MC items and the generalized
partial credit (GPC) model for CR items
Comparison of STPU and MWU-MEM methods
39
IV. Use of Computer Programs for FPC
Illustration of FPC with PARSCALE
Command File (MWU-MEM FPC)
MWU-MEM FPC with PARSCALE
The examinee group was sampled from N(0.5, 1.2^2)
>COMMENT
10 common items fixed and 2 CR items calibration
>FILE
DFNAME='new.txt',
IFNAME='MC10FIX.IFN',
SAVE;
>SAVE
PARM='MC10FIX';
>INPUT NTOT=17, TAKE=3000, NID=5, NTEST=1, LENGTH=17;
(5A1, T1, 17A1)
>TEST
TNAME=I10FIX, ITEMS=(1(1)45), NBLOCK=17;
>BLOCK BNAME=FIXED, NITEMS=1, NCAT=2,
ORI=(0, 1), MOD=(1, 2), REP=10, SKIP;
>BLOCK BNAME=FREEMC, NITEMS=1, NCAT=2,
ORI=(0, 1), MOD=(1, 2), GPARM=0.2, GUESS=(2, EST), REP=5;
>BLOCK BNAME=FREED, NITEMS=1, NCAT=5,
ORI=(0,1,2,3,4), MOD=(1,2,3,4,5), REP=2;
>CALIB NQPT=41, PAR, LOG, SCALE=1.7, CYCLE=200, NEWTON=0,
FREE=(NOADJUST, NOADJUST), ESTORDER, SPRI, GPRI, POSTERIOR;
>SCORE NOSCORE;
40
IV. Use of Computer Programs for FPC
Illustration of FPC with PARSCALE
Data File (New.txt)
11111101111111132
11111111111111144
11111011001111032
11111111101111134
11111100011111031
11111111111111144
11110110000101113
11010100011111111
01111101001111144
00000101000100001
00011000001100100
11111101101111122
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
Item Responses for
CR Items
Item Responses for
Anchor Items
41
IV. Use of Computer Programs for FPC
Illustration of FPC with PARSCALE
Command File to Prepare IFNAME File (MC10FIX.IFN)
MWU-MEM FPC with PARSCALE
No Fix, “0, 1” Scaling
>COMMENT
10 common items fixed and 2 CR items calibration
>FILE
DFNAME='new.txt',
No IFNAME
SAVE;
>SAVE
PARM='MC10FIX';
>INPUT NTOT=17, TAKE=3000, NID=5, NTEST=1, LENGTH=17;
No
(5A1, T1, 17A1)
SKIP
>TEST
TNAME=I10FIX, ITEMS=(1(1)45), NBLOCK=17;
>BLOCK BNAME=FIXED, NITEMS=1, NCAT=2,
ORI=(0, 1), MOD=(1, 2), GPARM=0.2, GUESS=(2, EST), REP=10;
>BLOCK BNAME=FREEMC, NITEMS=1, NCAT=2,
ORI=(0, 1), MOD=(1, 2), GPARM=0.2, GUESS=(2, EST), REP=5;
>BLOCK BNAME=FREED, NITEMS=1, NCAT=5,
ORI=(0,1,2,3,4), MOD=(1,2,3,4,5), REP=2;
>CALIB NQPT=41, PAR, LOG, SCALE=1.7, CYCLE=200, NEWTON=0,
FREE=(NOADJUST, NOADJUST), ESTORDER, SPRI, GPRI, POSTERIOR;
>SCORE NOSCORE;
42
IV. Use of Computer Programs for FPC
Illustration of FPC with PARSCALE
Item Parameter Output File from “0, 1” Scaling
MWU-MEM FPC with PARSCALE
No Fix, “0, 1” Scaling
I10FIX
17
17
7
0
1
1
1
1
1
1
1
1
GROUP 01
FIXED
20001
0.94308
0.07058
0.00000
0.00000
0.00000
0.00000
BLOCK
20002
0.98019
0.06877
0.00000
0.00000
0.00000
0.00000
BLOCK
20003
1.18582
0.07723
0.00000
0.00000
0.00000
0.00000
1
1
1
1
1
1
1
1
-1.12375
0.14908
0.26134
0.07792
-0.93880
0.12173
0.21813
0.06540
-0.72689
0.08253
0.19030
0.04856
50016
1.16556
0.03437 -0.14845
0.01309
1.25729
0.29044 -0.33537 -1.21236
0.04262
0.03157
0.02902
0.03037
50017
1.42147
0.04095 -0.19171
0.01178
1.29058
0.38858 -0.50917 -1.16999
0.03895
0.02653
0.02434
0.02606
0.00000
0.00000
0.00000
0.00000
1
1
(Omitted)
FREED
0.00000
0.00000
BLOCK
0.00000
0.00000
43
IV. Use of Computer Programs for FPC
Illustration of FPC with PARSCALE
Modified Item Parameter File (MC10FIX.IFN)
Replaced with
MWU-MEM FPC with PARSCALE Replaced with
No Fix, “0, 1” Scaling
fixed a
fixed b
I10FIX
17
17
7
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
GROUP 01
FIXED
20001
0.69300
0.00000 -1.50000
0.00000
0.12500
0.00000
0.00000
0.00000
0.00000
BLOCK
20002
0.78600
0.00000 -1.00000
0.00000
0.18500
0.00000
0.00000
0.00000
0.00000
BLOCK
20003
0.89700
0.00000 -0.60000
0.00000
0.23300
0.00000
0.00000
0.00000
0.00000
(Omitted)
FREED
0.00000
0.00000
BLOCK
0.00000
0.00000
Replaced with
fixed c
1
1
1
1
0.00000
0.00000
0.00000
Replacing for
the 10 fixed
items
50016
1.16556
0.03437 -0.14845
0.01309
1.25729
0.29044 -0.33537 -1.21236
0.04262
0.03157
0.02902
0.03037
50017
1.42147
0.04095 -0.19171
0.01178
1.29058
0.38858 -0.50917 -1.16999
0.03895
0.02653
0.02434
0.02606
0.00000
0.00000
0.00000
0.00000
44
IV. Use of Computer Programs for FPC
Illustration of FPC with PARSCALE
Command File for the STPU Method (After Transformation)
STPU FPC with Transformed Prior Points
The examinee group was sampled from N(1,1).
Rescaled points by
θ* = Aθ+B,
A = 1.38
B = 0.24
Omitted (The same as the commands for MWU-MEM
>CALIB NQPT=31, PAR, LOG, SCALE=1.7, CYCLE=200, NEWTON=0,
FREE=(NOADJUST, NOADJUST), ESTORDER, SPRI, GPRI,
DIST=4, QPREAD;
>QUADP POINTS=(
-5.2976E+000 -4.9280E+000 -4.5598E+000 -4.1902E+000
-3.4524E+000 -3.0828E+000 -2.7132E+000 -2.3450E+000
-1.6059E+000 -1.2377E+000 -8.6808E-001 -4.9891E-001
2.3929E-001
6.0846E-001
9.7749E-001
1.3467E+000
2.0844E+000
2.4540E+000
2.8236E+000
3.1918E+000
3.9310E+000
4.2992E+000
4.6688E+000
5.0384E+000
5.7761E+000),
WEIGHTS=(
1.2430E-005
3.4290E-005
8.7330E-005
2.0480E-004
9.1150E-004
1.8720E-003
4.1960E-003
1.0550E-002
4.2780E-002
5.0290E-002
5.8510E-002
8.6110E-002
From “0, 1” Scaling 8.6880E-002
9.7990E-002
1.0840E-001
9.0140E-002
(Not Transformed) 6.4860E-002
4.3230E-002
2.4440E-002
1.3010E-002
3.3710E-003
1.6010E-003
7.1410E-004
2.9710E-004
4.1380E-005);
>SCORE NOSCORE;
-3.8206E+000
-1.9754E+000
-1.2988E-001
1.7162E+000
3.5614E+000
5.4066E+000
4.4420E-004
2.5160E-002
9.9290E-002
7.7730E-002
6.7400E-003
1.1500E-004
45
IV. Use of Computer Programs for FPC
Illustration of FPC with PARSCALE
FPC Estimates of Non-Anchor Item Parameters
on the Fixed Old Scale
STPU Method
a
Item
11
0.741
12
0.767
13
0.741
14
0.942
15
1.181
16
0.920
17
1.120
b
-1.361
-0.995
-0.906
-0.442
-0.113
0.025
-0.031
c
c2
0.194
0.238
0.185
0.140
0.234
-1.569
-1.667
MWU-MEM Method
a
b
Item
11
0.741 -1.361
12
0.768 -0.994
13
0.741 -0.908
14
0.942 -0.444
15
1.180 -0.113
16
0.921
0.025
17
1.120 -0.030
c
c2
0.194
0.238
0.184
0.139
0.234
-1.568
-1.666
c3
c4
c5
-0.343
-0.522
0.449
0.615
1.562
1.452
c3
-0.342
-0.522
c4
0.450
0.615
c5
1.561
1.454
46
IV. Use of Computer Programs for FPC
Illustration of FPC with BILOG-MG
FPC Estimates of Mean and SD of the Underlying
Distribution on the Fixed Old Scale
Method
Mean
Std. Dev.
STPU FPC
MWU-MEM FPC
0.460
0.456
1.242
1.227
Over-estimation
Mean-Sigma
B = 0.239
A = 1.384
Under-estimation
Note. The new group examinees were from a
N(0.5,1.22) distribution that was
expressed on the fixed old scale.
47
V. Applications of FPC for Scaling and
Equating




Online Calibration in Computerized Adaptive
Testing (CAT)
Calibration of Pretest Items on the Fixed
Operational Scale in Regular, Non-CAT
Administration
In a Mixed-Format Test, Separate Calibration of
CR Items from MC Items To Minimize Effects of
Bad CR Items on MC Item Calibration
Equating Test Forms in the CING Design
48
V. Applications of FPC
Online Calibration in CAT



In CAT, different sets of operational items are
adaptively administered to examinees, with
pretest items “seeded” in a certain common
block of examinee groups.
Because the operational items were already
calibrated, their parameters are known in CAT
Thus, FPC may be the best way to calibrate
and diagnose the pretest items on the scale of
the operational items, without affecting the
operational item parameters.
49
V. Applications of FPC
Calibration of Pretest Items on the Fixed Operational
Scale



To develop test forms, pretest items are often
administered together with operational items to
examinees.
However, it would be wise to calibrate
operational items separately from pretest items,
because the operational item parameters could
be contaminated by bad pretest items.
In this case, the ability distribution that is
estimated using only the operational items can
be reasonably used as the prior ability
distribution for FPC with the pretest items,
while the operational item parameters are used
to fix the operational items in the FPC.
50
V. Applications of FPC
FPC with Different Formats of Items




A mixed-format test contains different types of items;
for instance, some are MC items and others are CR
items.
Simultaneous calibration with both types of items can
be conducted, assuming that a dominant factor
underlies examinees’ responses to items.
However, practitioners may want to calibrate MC items
separately from CR items, because calibration with bad
CR items might adversely affect the estimation of MC
item parameters.
In this case, MC items are first calibrated and then CR
items are calibrated while fixing the MC item
parameters.
51
V. Applications of FPC
Equating Test Forms in the CING Design



Test equating using IRT requires all item
parameters to be placed on a common scale
(which is usually the old form scale).
Once all item and ability parameters are placed
on a common scale, IRT true score or observed
score equating is conducted.
Thus, FPC can be effectively used for placing
all item parameters on the fixed old scale.
Surely, the anchor is the common items
between the new and old forms.
52
EXPLORE FPC
END
Thank You