1. Overview - University of Connecticut

Download Report

Transcript 1. Overview - University of Connecticut

A Practitioner’s Introduction to Equating
with Primers on Classical Test Theory (CTT) and
Item Response Theory (IRT)
Joseph Ryan, Arizona State University
Frank Brockmann, Center Point Assessment Solutions
Workshop:
Assessment, Research and Evaluation Colloquium
Neag School of Education, University of Connecticut
October 22, 2010
Acknowledgments
• Council of Chief State School Officers (CCSSO)
• Technical Issues in Large Scale Assessment (TILSA) and Subcommittee on Equating,
part of the State Collaborative on Assessment and Student Standards (SCASS)
• Doug Rindone and Duncan MacQuarrie, CCSSO TILSA Co-Advisers
Phoebe Winter, Consultant
Michael Muenks, TILSA Equating Subcommittee Chair
• Technical Special Interest Group of National Assessment of Educational Progress (NAEP)
coordinators
• Hariharan Swaminathan, University of Connecticut
• Special thanks to Michael Kolen, University of Iowa
Workshop Topics
The workshop covers the following topics:
1. Overview - Key concepts of assessment, linking, and
equating
2. Measurement Primer – Classical and IRT theories
3. Equating Basics
4. The Mechanics of Equating
5. Equating Issues
1. Overview
Key Concepts in
Assessment, Linking, Equating
Assessment, Linking, and Equating
Validity is…
… an integrated evaluative judgment of the degree to which
empirical evidence and theoretical rationales support the
adequacy and appropriateness of inferences and actions based
on test scores or other modes of assessment.
(Messick, 1989, p. 13)
Validity is the essential motivation for developing and
evaluating appropriate linking and equating procedures.
Assessment, Linking, and Equating
The Linking Continuum
(weaker kinds
of linking)
Scores are
matched or paired
Scores do NOT have the
same meaning or interpretation
2007
NRT
Gr 5 Test
2007
SBA
Gr 5 Test
Equating
(strongest link)
Scores are
matched or paired
Scores have the SAME
meaning or interpretation
2006
SBA
Gr 5 test
2007
SBA
Gr 5 Test
Linking and Equating
• Equating
• Scale aligning
• Predicting/Projecting
Holland in Dorans, Pommerich and Holland (2007)
Misconceptions About Equating
Equating is…
• …a threat to measuring gains.
• …a tool for universal applications.
• …a repair shop.
• …a semantic misappropriation.
2. Measurement Primer
Classical Test Theory (CTT)
Item Response Theory (IRT)
Classical Test Theory
The Basic Model
O = T + E
observed
score
true score
error
(with some MAJOR
assumptions)
• Reliability is derived from the ratio of error score to true score
• Key item features include:
 Difficulty
 Discrimination
 Distractor Analysis
Classical Test Theory
Reliability reflects the consistency of students' scores
• Over time, test retest
• Over forms, alternate form
• Within forms, internal consistency
Validity reflects the degree to which scores assess
what the test is designed to measure in terms of
• Content
• Criterion related measures
• Construct
Item Response Theory (IRT)
The Concept
An approach to item and test analysis that estimates
students’ probable responses to test questions, based on
• the ability of the students
• one or more characteristics of the test items
Item Response Theory (IRT)
• IRT is now used in most large-scale assessment
programs
INFO
• IRT models apply to items that use
 dichotomous scoring with right (1) or wrong (0)
answers and
 polytomous scoring with items scored with ordered
categories (1, 2, 3, 4) common with written essays
and open-ended constructed response items
• IRT is used in addition to procedures from CTT
Item Response Theory (IRT)
IRT Models
All IRT models reflect the ability of students. In addition, the
most common basic IRT models include:
The 1-parameter model – (aka Rasch model) models item difficulty
The 2-parameter model – models item difficulty and discrimination
The 3-parameter model – models item difficulty, discrimination
and pseudo guessing
Item Response Theory (IRT)
IRT Assumptions
Item Response Theory requires major assumptions:
• Unidimensionality
• Item Independence
• Data-Model Fit
• Fixed but arbitrary scale origin
Item Response Theory (IRT)
A Simple Conceptualization
Item 1
Item 2
Item 3
Easier
Items
Lower
Ability
Students
Harder
Items
-3
-2
-1
0
2
1
3
Blake
Alex
Devon
Chris
-1.5
+2.25
Higher
Ability
Students
Item Response Theory (IRT)
Probability of a Student Answer
Item Response Theory (IRT)
Item Characteristic Curve for Item 2
Item Response Theory (IRT)
IRT and Flexibility
IRT provides considerable flexibility in terms of
INFO
• constructing alternate tests forms
• administering tests well matched or adapted to
students’ ability level
• building sets of connected tests that span a
wide range (perhaps two or more grades)
• inserting or embedding new items into existing
test forms for field testing purposes so new
items can be placed on the measurement scale
3. Equating Basics
Basic Terms (Sets 1, 2, and 3)
Equating Designs (a, b, c)
Item Banking (a, b, c, d)
Basic Terms Set 1
Column A
Column B
__Anchor Items
__Appended Items
__Embedded Items
A. Sleepwear
B. Nautically themed
apparel
C. Vestigial organs
D. EMIP learning module
USEFUL
TERMS
Basic Terms Set 2
For each term, make some notes on your handout:
USEFUL
TERMS
Pre-equating -
Post equating -
Basic Terms Set 3
For each term, make some notes on your handout:
USEFUL
TERMS
Horizontal Equating –
Vertical Equating (Vertical Scaling) –
Form-to-Form (Chained) Equating –
Item Banking –
Equating Designs
a. Random Equivalent Groups
b. Single Group
c. Anchor Items
Equating Designs
a. Random Equivalent Groups
Random
Sample
Group 1
Test Form A
Random
Sample
Group 2
Test Form B
Testing Population
Equating Designs
b. Single Group
Testing Population
Random
Sample Group
Test Form A
(first)
Test Form B
(second)
The potential for order effects is significant--equating designs
that use this data collection method should always be
counterbalanced!
CAUTION
Equating Designs
b. Single Group with Counterbalance
Random
Subgroup 1
Test Form A
(first)
Test Form B
(second)
Testing Population or
Tested Sample
Random
Subgroup 2
Test Form B
(first)
Test Form A
(second)
Equating Designs
c. Anchor Item Design
Testing Sample 1
Testing Sample 2
Test Form
A Items
Test Form
B Items
Anchor
Items
common
items
not always at the end
Anchor
Items
Equating Designs
c. Anchor Item Set
GRADE 5
Mathematics Test Form A
(50 test items)
PROPER
10 items
Content Standard 1
10 items
GRADE 5
Mathematics Anchors Set
(10 items)
Anchor Selection
2 items
Content Standard 1
Content Standard 2
2 items
Content Standard 2
10 items
Content Standard 3
2 items
Content Standard 3
10 items
Content Standard 4
2 items
Content Standard 4
10 items
Content Standard 5
2 items
Content Standard 5
Equating Designs
c. Anchor Item Designs
USEFUL
TERMS
•
Internal/Embedded
•
Internal/Appended
•
External
Equating Designs
Internal Embedded Anchor Items
Test Form A
Item 1 (A)
Item
1 (A)
Item
2 (Anchor)
Item
2 (Anchor)
Item
3 (A)
Item
3 (A)
Item
4 (A)
Item
4 (A)
Item 5 (Anchor)
Item
5 (Anchor)
Item
6 (A)
Item
6 (A)
Item
7 (A)
Item
7 (A)
Item
8 (A)
Item
8 (A)
Item
9 (Anchor)
Item
(Anchor)
Item 109(A)
Item
Item
1110
(A)(A)
Item
(A)
Item
1211
(Anchor)
Item
Item
1312
(A)(Anchor)
Item
13
Item 14 (A)(A)
Item
(A)
Item
1514
(Anchor)
Item 15 (Anchor)
Test Form B
Embedded, Internal
Anchor Items
Item 2 (Anchor)
Item 5 (Anchor)
Item 9 (Anchor)
Item 12 (Anchor
Item 15 (Anchor)
Item 1 (B)
Item
1 (B)
Item
2 (Anchor)
Item
2 (Anchor)
Item
3 (B)
Item
3 (B)
Item
4 (B)
Item
4 (B)
Item 5 (Anchor)
Item
5 (Anchor)
Item
6 (B)
Item
6 (B)
Item
7 (B)
Item
7 (B)
Item
8 (B)
Item
8 (B)
Item
9 (Anchor)
Item
(Anchor)
Item 109(B)
Item
Item
1110
(B)(B)
Item
(B)
Item
1211
(Anchor)
Item
Item
1312
(B)(Anchor)
Item
13
Item 14 (B)(B)
Item
(B)
Item
1514
(Anchor)
Item 15 (Anchor)
Equating Designs
Test Form B
Item 1 (A)
Item
1 (A)
Item
2 (A)
Item
2 (A)
Item
3 (A)
Item
3 (A)
Item 4 (A)
Item
4 (A)
Item
5 (A)
Item
5 (A)
Item
6 (A)
Item
6 (A)
Item
7 (A)
Item
7 (A)
Item
8 (A)
Item
8 (A)
Item 9 (A)
Item
(A)
Item
10 9(A)
Item 10 (A)
Item 11 (C)
Item
Item
1211
(C)(C)
Item
Item
1312
(C)(C)
Item
Item
1413
(C)(C)
Item
Item
5514
(C)(C)
Item 55 (C)
Item 1 (B)
Item
1 (B)
Item
2 (B)
Item
2 (B)
Item
3 (B)
Item
3 (B)
Item 4 (B)
Item
4 (B)
Item
5 (B)
Item
5 (B)
Item
6 (B)
Item
6 (B)
Item
7 (B)
Item
7 (B)
Item
8 (B)
Item
8 (B)
Item 9 (B)
Item
(B)
Item
10 9(B)
Item 10 (B)
Item 11 (C)
Item
Item
1211
(C)(C)
Item
Item
1312
(C)(C)
Item
Item
1413
(C)(C)
Item
Item
5514
(C)(C)
Item 55 (C)
Appended, Internal
Anchor Items
Form-Specific Items
Test Form A
Anchor
Items
Anchor
Items
Form-Specific Items
Internal Appended Anchor Items
Equating Designs
Test Form B
Item 1 (A)
Item
1 (A)
Item
2 (A)
Item
2 (A)
Item
3 (A)
Item
3 (A)
Item 4 (A)
Item
4 (A)
Item
5 (A)
Item
5 (A)
Item
6 (A)
Item
6 (A)
Item
7 (A)
Item
7 (A)
Item
8 (A)
Item
8 (A)
Item 9 (A)
Item
(A)
Item
10 9(A)
Item 10 (A)
Item 1 (B)
Item
1 (B)
Item
2 (B)
Item
2 (B)
Item
3 (B)
Item
3 (B)
Item 4 (B)
Item
4 (B)
Item
5 (B)
Item
5 (B)
Item
6 (B)
Item
6 (B)
Item
7 (B)
Item
7 (B)
Item
8 (B)
Item
8 (B)
Item 9 (B)
Item
(B)
Item
10 9(B)
Item 10 (B)
Item 1 (C)
Item
1 (C)
Item
2 (C)
Item
2 (C)
Item
3 (C)
Item
3 (C)
Item
4 (C)
Item
4 (C)
Item
5 (C)
Item 5 (C)
Part 1
Test Form A
Appended, External
Anchor Items
Part 2
Part 2
Part 1
External Anchor Items
Item 1 (C)
Item
1 (C)
Item
2 (C)
Item
2 (C)
Item
3 (C)
Item
3 (C)
Item
4 (C)
Item
4 (C)
Item
5 (C)
Item 5 (C)
Equating Designs
Guidelines for Anchor Items
RULES
of
THUMB
• Mini-Test
• Similar Location
• No Alterations
• Item Format Representation
3. Equating Basics
Basic Terms (Sets 1, 2, and 3)
Equating Designs (a, b, c)
Item Banking (a, b, c, d)
Item Banking
a. Basic Concepts
b. Anchor-item Based Field Test
c. Matrix Sampling
d. Spiraling Forms
Item Banking
a. Basic Concepts
• An item bank is a large collection of calibrated and
scaled test items representing the full range, depth,
and detail of the content standards
• Item Bank development is supported by field testing
a large number of items, often with one or more
anchor item sets.
• Item banks are designed to provide a pool of items
from which equivalent test forms can be built.
• Pre-equated forms are based on a large and stable
item bank.
Item Banking
b. Anchor Item Based Field Test Design
RULE of
THUMB
Field test items are most appropriately embedded within, not
appended to, the common items.
Item Banking
c. Matrix Sampling
• Items can be assembled into relatively small blocks
(or sets) of items.
• A small number of blocks can be assigned to each
test form to reduce test length.
• Blocks may be assigned to multi forms to enhance
equating.
• Blocks need not be assigned to multi forms if
randomly equivalent groups are used.
Item Banking
c. Matrix Sampling
Item Banking
d. Spiraling Forms
Tests forms can be assigned to individual students, or
students grouped in classrooms, schools, districts, or
some other units.
1.
“Spiraling” at the student level involves assigning
different forms to different students within a
classroom.
2.
“Spiraling” at the classroom level involves assigning
different forms to different classrooms within a school.
3.
“Spiraling” at the school or district level follows a
similar pattern.
Item Banking
d. Spiraling Forms
Item Banking
d. Spiraling Forms
Spiraling at the student level is technically desirable:
• provides randomly equivalent groups
• minimizes classroom effect on IRT estimates
(most IRT procedures assume independent
responses)
Spiraling at the student level is logistically problematic:
• exposes all items in one location
• requires careful monitoring of test packets and
distribution
• requires matching test form to answer key at the
student level
It’s Never Simple!
IMPORTANT
CAUTION
Linking and equating procedures are employed in
the broader context of educational measurement
which includes, at least, the following sources of
random variation (statistical error variance) or
imprecision.
•
•
•
•
•
•
Content and process representation
Errors of measurement
Sampling errors
Violations of assumptions
Parameter estimation variance
Equating estimation variance
4. The Mechanics of Equating
The Linking-Equating Continuum
Classical Test Theory (CTT) Approaches
Item Response Theory (IRT) Approaches
The Linking-Equating Continuum
USEFUL
TERMS
Linking is the broadest terms used to refer to a
collection of procedures through which
performance on one assessment is associated or
paired with performance on a second
assessment.
Equating is the strongest claim made about the
relationship between performance on two
assessments and asserts that the scores that are
equated have the same substantive meaning.
The Linking-Equating Continuum
different
forms of linking
equating
(strongest kind of linking)
The Linking-Equating Continuum
Frameworks
There are a number of frameworks for describing various forms of
linking:
• Mislevy, 1992
• Linn, 1993
• Holland, 2007
(in Dorans, Pommerich, and Holland 2007)
The Linking-Equating Continuum
Scores have
the SAME
meaning
or interpretation
Scores do NOT
have the same
meaning or
interpretation
Projection
Moderation
Calibration
Equating
Linking Procedures/Approaches
CTT
Linear
Equipercentile
IRT
Common Item
Common Person
Pool/Item Bank Development
Pre- and Post-equating
In 1992, Mislevy described four typologies of linking test forms: moderation, projection, calibration, and
equating (Mislevy, 1992, pp. 21-26). In his model, moderation is the weakest form of linking tests, while
equating is considered the strongest type. Thus, equating is done to make scores as interchangeable
as possible.
The Linking-Equating Continuum
Equating – strongest form of linking, invariant across
populations, maintains substantive meaning
USEFUL
TERMS
Calibration – may use equating procedures, not
necessarily invariant across populations, and
substantive meaning might not be preserved
Prediction/Projection – unidirectional statistical
procedure for predicting scores or projecting
distributions
Moderation – weakest form of linking, may be
statistical or judgmental (social), based on comparisons
of distributions or panel/reviewers decisions.
CTT Linking-Equating Approaches
a. Mean Method
b. Linear Method
c. Equipercentile Method
CTT Linking-Equating Approaches
a. Mean Method
•
Adjusts one set of scores based on the
difference in the means of two tests
•
Assumes a constant difference in the
scales across all scores
•
Useful for carefully developed and
parallel or close-to-parallel forms
•
Simple, but strains assumptions of
parallel forms
CTT Linking-Equating Approaches
b. Linear Method
•
Based on setting standardized deviation
scores from two tests equal
•
Can be done in raw score scale with
simple linear regression
CTT Linking-Equating Approaches
b. Linear Method
Line a r Equa ting
10
Raw Score on Form A
9
SD +1
8
Mean
7
6
SD -1
5
4
3
2
1
SD -1
Mean
SD +1
0
0
1
2
3
4
5
6
7
Ra w Score on Form B
8
9
10
CTT Linking-Equating Approaches
c. Equipercentile Method
•
•
•
•
Based on scores that correspond to the same
percentile rank position from two tests
Does not assume a linear relationship between the
two tests
Provides for linking scores across the full range of
possible test scores
May require “smoothing” of the distributions,
especially with small samples
CTT Linking-Equating Approaches
c. Equipercentile Method
IRT Linking-Equating Approaches
a. common items
b. common people or randomly
equivalent groups treated as
being the same people
IRT Linking-Equating Approaches
IRT linking and equating approaches:
•
provide flexibility and are applicable
to many settings
•
provide consistency by employing the
IRT model being used for calibration
and scaling
•
provide indices that reveal departures
from what is expected (tests of fit)
IRT Linking-Equating Approaches
a. Common Items
Approaches can be based on:
1. Applying an equating constant
2. Estimating item parameters with fixed or
concurrent/simultaneous calibration
3. Applying the Test Characteristic Curve
procedure (TCC) of Stocking & Lord, 1983
IRT Linking-Equating Approaches
a. Common Items
Applying an equating constant
•
Appropriate when two or more tests
have a common set of anchor items and
also some items unique to each form
•
Requires selecting one form or some
other location on the scale as the origin
of the scale
IRT Linking-Equating Approaches
a. Common Items
1. Applying an Equating Constant
Test Form X
(20 items)
Item C
Item A
= test item
Item B
Easier
Items
Harder
Items
A
Lower
Ability
Students
-3
-2
B
-1
C
0
1
2
3
Higher
Ability
Students
IRT Linking-Equating Approaches
a. Common Items
1. Applying an Equating Constant
Test Form Y
(20 items)
Item C
Item A
= test item
Item B
Easier
Items
Harder
Items
A
Lower
Ability
Students
-3
-2
-1
00
B
1
C
2
3
Higher
Ability
Students
IRT Linking-Equating Approaches
THREE ITEMS IS NEVER -- EVEN UNDER MASSIVE
DELUSIONARY INFLUENCES -- ENOUGH ITEMS!
Common Item Approach
CAUTION
EVEN WITH 15 TO 20 ITEMS -- A MINIMUM -- IT NEVER
WORKS THIS SIMPLY.
Applying an equating constant
Test Form X (top)
vs.
Test Form Y (bottom)
Harder
Items
A
-1
(-3+2)
0
(-2+2)
B
C
2
(0+2)
1
3
(1+2)
Item C
Item A
Item B
A
Easier
Items
-3
-2
-1
0
B
1
C
2
3
4
(2+2)
5
(3+2)
IRT Linking-Equating Approaches
a. Common Items
1. Applying an Equating Constant
Test Form X* (adjusted)
and
Test Form Y
Item C
Item A
Item B
Easier
Items
Harder
Items
A
-3
-2
-1
0
B
1
C
2
3
4
5
IRT Linking-Equating Approaches
a. Common Items
1. Determining an Equating Constant
Form Y
Form X
(Y-X)
Item A
0.5
-1.5
2
Item B
1.0
-1.0
2
Item C
1.5
-0.5
2
Sum
3.0
-3.0
6
Average
1.0
-1.0
2
Constant = Form Y - Form X = 2
If C= Y – X ;
then Y = X + C
IRT Linking-Equating Approaches
CLOSER
LOOK
IRT Linking-Equating Approaches
a. Common Items
1. Applying an Equating Constant
•
The common items used for equating are the anchor
items
•
Generally 15 to 20 items are needed for common item
equating
•
Not all items designed as anchor items will work
effectively
•
The anchor items should be in the same location on the
tests
•
The anchor items should reflect the content, format and
difficulty range of the whole test
IRT Linking-Equating Approaches
a. Common Items
2. Fixed Calibration
Test Form
X
1.
2.
3.
Test Form
Anchor
Items
1, 5, 7, 10
Etc.
Designate this as the Base form, which
defines the scale origin.
Calibrate parameters (difficulty,
discrimination, and guessing) of all items.
Treat the item parameters of the anchor
items as fixed.
Y
4.
5.
6.
Use parameters of the anchor items from
Form X for the same items (anchors) on
Form Y.
Calibrate the Form Y items using
the fixed parameter values of
the anchor items.
Treat all other items and their
parameters as free to vary
The resultant calibration of Form Y will be
on the same scale as Form X ; it is
“anchored” through the fixed values of
the common items .

IRT Linking-Equating Approaches
a. Common Items
2. Concurrent or Simultaneous Calibration
Test Form
X
15 Anchor
Items
Test Form
Y
40 Items
40 Items
Consider the following:
•
500 students take Form X
•
500 students take Form Y
•
1,000 take the anchor items
•
The data for all students are stacked as shown
on the next slide…
IRT Linking-Equating Approaches
a. Common Items
2. Concurrent or Simultaneous Calibration
Form X Items
(25 items)
Anchor Items
(15 Items)
Form Y Items
(25 Items)
The 500 students
who take Form X
will take 40 items
Item Responses to
25 items
Item Responses to
15 items
Missing
Data
The 500 students
who take Form Y
will take 40 items
Missing
Data
Item Responses to
15 items
Item Responses to
25 items
500 students
1,000 students
500 students
• Data are calibrated on 1,000 students
• Students each “take” 65 items
• Students are missing data on the form they did not take.
• All students respond to the anchor items.
IRT Linking-Equating Approaches
a. Common Items
3. Test Characteristic Curve (TCC) Procedures
•
Developed by Stocking and Lord (1983)
•
Very flexible and widely used
•
Commonly applied with the 2- and 3- parameter
IRT models.
IRT Linking-Equating Approaches
a. Common Items
3. Test Characteristic Curve (TCC) Procedures
•
•
IRT scales have an arbitrary origin and an arbitrary scale spacing
e.g., size of each unit of measurement.

Origin is selected and fixed

Scale spread is expended or reduced
Item parameter estimates for the same items from two
independent calibration will differ due to

Origin and scale differences

Characteristics of other items

Possibly sampling and estimation error
IRT Linking-Equating Approaches
a. Common Items
3. Test Characteristic Curve (TCC) Procedures
•
If two scales differ in origin (location) and spread (variability),
a linear transformation can be applied to one scale to reexpress or transform it to be on the other scale
•
The choice of what scale to use is informed by considering the
intended use of the items, test forms, or item bank
•
The figures on the next slide illustrate the basic idea of the
TCC method
IRT Linking-Equating Approaches
IRT Linking-Equating Approaches
a. Common Items
3. Test Characteristic Curve (TCC) Procedures
•
Transforms the Item parameter values for the common items
on one test form to be on the same scale as their corresponding
parameter values on the other (target) form
•
Requires two constants: the parameters are multiplied by one
constant and then added to the second constant
•
Begins with carefully chosen initial values for the constants
•
Refines the constants to minimize the differences in estimated
scores based on the transformed test form and the target form
•
Never as simple as the theory
IRT Linking-Equating Approaches
IRT Linking-Equating Approaches
b. Common Persons
or Random Equivalent Groups
The same students, or two groups sampled to be equivalent on critical
relevant characteristics, take Form X and Form Y; the forms do not have
any common items
• Example: students’ average ability on Form X is -1.0 (low
ability) and their average ability on Form Y is +1.0 (high ability)
How can the same group of students have two
different mean abilities?
QUESTION
• Differences in students’ abilities cannot explain the differences
in the performance on Forms X and Y since the same students
(common students) take both forms
IRT Linking-Equating Approaches
b. Common Persons
or Random Equivalent Groups
• The difference in mean performance reflects the difference in
the difficulty of the two forms
• The test forms must be different in difficulty since the students’
abilities were held constant (same students)
• On Form X, students look less able with a mean of -1; on Form Y
students look more able with a mean of +1
• Form X is harder than Form Y in that it makes students look less
able; the test forms differ by +2 units
• The difference of +2 is used as a linking constant to adjust the
tests onto a single scale in the same way as a linking constant
derived from common items
5. Equating Issues
Substantive Concerns
Technical Issues
Quality Control Issues
• Test design, development & administration
• Scoring, analysis and equating
Technical Documentation
Accountability Compliance
Item Formats and Platforms
Common Equating Concerns/Issues
Substantive Concerns
•
Validity is the central issue
•
Validity evidence must document fairness,
absence of bias, and equal access for all
students
•
Carefully planned and rigorously monitored
item and test form development are the most
essential ingredients for successful equating
•
Equating goes bad through items and test
forms, not in the psychometrics
Common Equating Concerns/Issues
Technical Issues
•
Examining and testing IRT
assumptions
•
Conducting and documenting IRT
tests of fit


•
Data to model fit
Linking/equating fit
Item Parameter Drift
Common Equating Concerns/Issues
Quality Control Issues
Test design, development, and administration problems
•
•
•
•
•
•
Changes in content standards or test specifications
Item contexts that differ between forms and affect
performance on anchor items
Anchor items that appear in very different locations among
forms
Item misprints/errors
Unintended accommodations (maps or periodic tables on
walls, calculators, etc.)
All manner of weird and unimaginable stuff and happenings
Common Equating Concerns/Issues
Quality Control Issues
Item scoring, analysis, and equating quality issues
•
•
•
•
•
•
•
Non-standard scoring criteria or changes in scoring
procedures
Redefinition in scoring rubrics, variation in benchmark papers
Item parameter drift
Departures from specified equating procedures
Unreliable and/or inconsistent item performance or score
distributions
Departures from specified data processing protocols
All manner of weird and unimaginable stuff and happenings
Common Equating Concerns/Issues
Technical Documentation
•
•
•
•
General technical reports
Standards setting reports
Equating technical reports
Specify requirements for documentation
in RFPs, with TAC reviews and due dates
Can an independent contractor replicate the
equating results?
QUESTION
Common Equating Concerns/Issues
Accountability Concerns
•
Standard Setting
•
Adequate Yearly Progress (AYP)
Common Equating Concerns/Issues
Item Formats and Platforms
•
Open-ended or Constructed
Response Tasks
•
Writing Assessment
•
Paper-and-Pencil and
Computerized Assessments
References
Dorans, N. J., Pommerich, M., & Holland, P. W. (2007). Linking and aligning scores and
scales. Statistics for social and behavioral sciences. New York: Springer.
Linn, R.L. (1993) Linking results of distinct assessments. Applied Measurement in Education,
6, 83-102.
Mislevy, R.J. (1992) Linking educational assessments: Concepts, issues, methods, and
prospects. Princeton, NJ: Educational Testing Service
Ryan, J. and Brockmann, F. (2009). A Practitioner’s Introduction to Equating with Primers on
Classical Test Theory and Item Response Theory. Washington, D.C.:Council of Chief State
School Officers (CCSSO).
A Practitioner’s Introduction to Equating
END
Joseph Ryan, Arizona State University
[email protected]
Frank Brockmann, Center Point Assessment Solutions
[email protected]