Vertical Scaling
Download
Report
Transcript Vertical Scaling
VERTICAL SCALING
H. Jane Rogers
Neag School of Education
University of Connecticut
Presentation to the TNE Assessment Committee, October 30, 2006
Scaling
Definition:
Scaling is a process in which raw scores
on a test are transformed to a new scale
with desired attributes (e.g., mean, SD)
Scaling
Purposes:
1. Reporting scores on a convenient
metric
2. Providing a common scale on which
scores from different forms of a test
can be reported (after equating or
linking)
Scaling
There are two distinct testing situations
where scaling is needed
Scaling
SITUATION 1
• Examinees take different forms of a test for
security reasons or at different times of year
• Forms are designed to the same specifications
but may differ slightly in difficulty due to
chance factors
• Examinee groups taking the different forms are
not expected to differ greatly in proficiency
Scaling
SITUATION 2
• Test forms are intentionally designed to differ
in difficulty
• Examinee groups are expected to be of
differing proficiency
• EXAMPLE: test forms designed for different
grade levels
EQUATING
For SITUATION 1, we often refer to the
scaling process as EQUATING
Equating is the process of mapping the
scores on Test Y onto the scale of Test X
so that we can say what the score of an
examinee who took Test Y would have
been had the examinee taken Test X (the
scores are exchangeable)
EQUATING
This procedure is often called
HORIZONTAL EQUATING
LINKING
For SITUATION 2, we refer to the scaling
process as LINKING, or scaling to
achieve comparability
This process is sometimes called
VERTICAL EQUATING, although
equating is not strictly possible in this
case
REQUIREMENTS FOR SCALING
• In order to places the scores on two tests
on a common scale, the tests must
measure the same attribute
e.g., the scores on a reading test cannot
be converted to the scale of a
mathematics test
EQUATING DESIGNS FOR
VERTICAL SCALING
1. COMMON PERSON DESIGN
Tests to be equated are given to different
groups of examinees with a common group
taking both tests
2. COMMON ITEM (ANCHOR TEST)
DESIGN
Tests to be equated are given to different
groups of examinees with all examinees
taking a common subset of items (anchor
items)
EQUATING DESIGNS FOR
VERTICAL SCALING
• EXTERNAL ANCHOR OR SCALING
TEST DESIGN
Different groups of examinees take
different tests, but all take a common
test in addition
Example of Vertical Scaling Design
(Common Persons)
Test Level
Grade
Student
2
(November
testing)
1
2
.
.
N2
3
(November
testing)
1
2
.
.
N3
4
(November
testing)
1
2
.
.
N
4
(November
testing)
N+1
N+2
.
.
N4
2
3
4
Mean = 26.6
SD = 4.1
Mean = 34.7
SD = 4.3
Mean = 26.1
SD = 4.7
Mean = 35.3
SD = 5.1
Mean = 25.9
SD = 4.8
Mean = 26.0
SD = 5.0
Example of Vertical Scaling Design
(Common Items)
1
Year 1
Year 2
Year 3
Item Block
2
3
4
Problems with Vertical Scaling
• If the construct or dimension being
measured changes across grades/years/
forms, scores on different forms mean
different things and we cannot
reasonably place scores on a common
scale
• May be appropriate for a construct like
reading; less appropriate for
mathematics, science, social studies, etc.
Problems with Vertical Scaling
• Both common person and common item
designs have practical problems of items
that may be too easy for one group and
too hard for the other
• Must ensure that examinees have had
exposure to content of common items or
off-level test (cannot scale up, only down
in common persons design)
Problems with Vertical Scaling
• Scaled scores are not interpretable in
terms of what a student knows or can do
• Comparison of scores on scales that
extend across several years is particularly
risky