Unraveling the Mysteries of Standard Setting and Scaled

Download Report

Transcript Unraveling the Mysteries of Standard Setting and Scaled

Unraveling the Mysteries of Setting
Standards and Scaled Scores
Julie Miles PhD, 10.27.2011
Overview of the Session
1. What is Standard Setting?
– Basic Vocabulary
– Definition
– Performance Level Descriptions
– Threshold Descriptions
– When Does It Occur?
– Methods Used in Virginia
2. The Connection to Scaled Scores
– Converting Raw Scores to Scaled Scores
– Example Conversion
3. From Scaled Scores to Equated Forms
– How Are Scaled Scores Connected to Equating?
– The Basics of Equating
– Recap of How It All Comes Together
3
Presentation Title runs here l 00/00/00
What Is Standard Setting?
4
Presentation Title runs here l 00/00/00
What is Standard Setting?
Basic Vocabulary
Content Standards: the content and skills that students are expected
to know and be able to do.
Performance Levels (Achievement Levels, Performance Categories):
Labels for levels of student achievement (e.g., below basic, basic,
proficient and advanced).
Performance Level Descriptors (PLDs): Descriptions of the
competencies associated with each level of achievement.
Cut Scores (Performance Standards): Scores on an assessment that
separate one level of achievement from another.
5
Presentation Title runs here l 00/00/00
What is Standard Setting?
Definition
A judgmental process which has a variety of steps and includes
relevant stakeholders throughout. Steps in this process typically
include:
1. Identifying the relevant knowledge and skills to be taught and
assessed at each grade/content area to support the goals of the
state
2. Defining the expectations associated with each Performance Level
3. Convening a committee of educators to provide content-based
recommendations for cut scores at each grade or subject area
4. Review of cut score recommendations and adoption by the State
Board of Education
6
Presentation Title runs here l 00/00/00
What is Standard Setting?
Performance Level Descriptors (PLDs)
Define the knowledge, skills, and abilities (KSAs) that are expected of the
students to gain entry into specific performance levels (e.g., Proficient or
Advanced)
• The main goal of standard setting is to quantify or operationalize the
Performance Level Descriptors.
EXAMPLE Proficient PLD:
Explain the role of geography in the political, cultural, and economic development of
Virginia and the United States
7
Presentation Title runs here l 00/00/00
What is Standard Setting?
Threshold Descriptions (TDs)
Define what students who are “just over the threshold” in a performance level
(e.g., a student scoring a 400 or 401 or 500 or 501) should be able to
demonstrate in terms of KSAs.
• These are the borderline or minimally qualified students in terms of
performance
EXAMPLE Proficient PLD:
Explain the role of geography in the political, cultural, and economic development of
Virginia and the United States
EXAMPLE “Just-Barely” Proficient TD:
Identify and explain major geographic features on maps.
Interpret charts based on background geographic information.
8
Presentation Title runs here l 00/00/00
What is Standard Setting?
When Does It Occur?
Design and Implementation of Revised SOL Tests
YEAR ONE
Revision of
Content
Standards
Revise
Curriculum
Frameworks
Develop Item
and Test
Specifications
New Item
Development
YEAR TWO
New Item
Content
Review
Embedded
Field-Testing
of New SOL
items
Spring 2010 SOL Test
Administration
(aligned to old curriculum)
Field-test
Item Analysis
and Review
SOL Test Form
Development
(First operational assessments
aligned to new Curriculum)
YEAR THREE
Standard Setting
Meeting
9
Presentation Title runs here l 00/00/00
Spring 2011 SOL
Administration
Score
Operational/
Field Test
Items
Report
Assessment
Results
What is Standard Setting?
Methods Used in Virginia
Virginia predominantly uses “Modified Angoff” (SOL and VMAST),“Body of
Work” (VAAP), and “Reasoned Judgment” (VGLA) methods. All methods
typically have similar components:
1. Overview of standard setting
2. Review of test blueprint and performance level descriptions
3. Creation of the threshold descriptions
4. Overview of actual test administered to students
5. Three rounds of judgments by committee:
•
MC Tests: should a ‘just-barely’ student get the item correct 2 out of 3
times?
•
VGLA: how many points should a ‘just-barely’ student earn on this
SOL?
•
VAAP: which performance level does a COE represents?
6. Final round results in cut score recommendations that are provided to the
SBOE.
•
The number of correct answers needed to gain entry into each
performance level.
10 Presentation Title runs here l 00/00/00
The Connection to Scaled Scores
11 Presentation Title runs here l 00/00/00
The Connection to Scaled Scores
Converting Raw Scores to Scaled Scores
The recommendations for a cut score from standard setting are in a
raw score metric. But this is not helpful from year-to-year.
• Student ability is different from student to student
• Test forms change from year-to-year (and within year)
– A raw score of 36 on a slightly easier test does not indicate the
same level of achievement as a raw score of 36 on a slightly more
difficult test.
Need a metric that is stable from year-to-year!
• This is where I earn my keep 
• The metric is based on item response theory (IRT) and it is called
“theta.” This theta value (associated with raw score) is converted
to a scaled score that remains stable from year-to-year so that 400
is comparable to 400 regardless of the student, year, or form.
12 Presentation Title runs here l 00/00/00
The Connection to Scaled Scores
Example Conversion to Scaled Scores
Algebra II
500  aa  b
400  a p  b
where θa is the value of theta (2.616)
corresponding to the raw score (45) at the
pass/advanced level and θp is the value of
theta (.6416) corresponding to the raw
score (30) at the pass/proficient level.
ScaledScore  50.659(.6416)  367.497  399.999
13 Presentation Title runs here l 00/00/00
Solving for a yields:
100
a
a   p 
And substituting the values of theta
corresponding to the raw score cuts gives:
a
100
 50.659
2.6157 .6416
Solving for b yields:
b  400  a p 
And substituting the values of θp and a
gives
b  400 50.659.6416  367.497
From Scaled Scores to Equated Forms
14 Presentation Title runs here l 00/00/00
From Scaled Scores to Equated Forms
How are Scaled Scores Connected to Equating?
•
When a test is built, the item difficulties (in the Rasch metric) are
known from the field test statistical analyses.
•
The tests are built to Rasch difficulty targets for the overall test and
all reporting categories based on the standard setting form.
•
Even though an attempt is made to construct test forms of equal
Rasch-based difficulty from form to form and year to year, there will
be small variations in difficulty.
•
When building tests, the IRT model makes it possible to estimate the
raw score that corresponds to a scale score of 400.
•
Each core form of a test is equated to the established scale so that
the scores indicate the same level of achievement regardless of the
core form taken.
15 Presentation Title runs here l 00/00/00
From Scaled Scores to Equated Forms
The Basics of Equating
Common-Item Nonequivalent Groups Design
The common-item
set is constructed as
a “mini version” of
the total test.
16 Presentation Title runs here l 00/00/00
Year 1
Year 2
Test X
Test Y
Item C1
Common
Item C1
Item …
Items
Item …
Item C10
Item C10
Item X1
Item Y1
Item X2
Item Y2
Item …
Item …
Item X50
Item Y50
From Scaled Scores to Equated Forms
The Basics of Equating
Year 1 (more difficult)
Mean b
= 0.5
b
Test X
-1.0
Item C1
…
Year 2 (less difficult)
Test Y
b
Common Items
Item C1
-1.3
Item …
Difference =
Item …
…
0.8
Item C10
0.5 - 0.2 = 0.3
Item C10
0.5
…
Item X1
Item Y1
-1.5
…
Item X2
Item Y2
-0.6
…
Item …
Item …
…
…
Item X50
Item Y50
1.3
17 Presentation Title runs here l 00/00/00
Mean b
= 0.2
Recap of How It All Comes Together

Scores
Test is Equated
Test is Scaled
Cut Scores are adopted by SBOE
Cut Scores are Recommended
Test Is Developed
18 Presentation Title runs here l 00/00/00
Questions?
[email protected]