Science WASL is

Download Report

Transcript Science WASL is

CSSS Large Scale Assessment Webinar
Adaptive Testing in Science
Kevin King (WestEd)
Roy Beven (NWEA)
1
CSSS Large Scale Assessment Webinar
Adaptive Testing in Science
Agenda
1. Experience with CATs
2. CAT Overview: What and Why
3. Longitudinal Scale
4. Nature of CATs and their Item Banks
5. Using different item types in CATs
6. Discussion: Implications for NGSS?
2
CSSS Large Scale Assessment Webinar
Adaptive Testing in Science
Kevin King:
HS Biology, Integrated Science, Research Methods
Teacher (9 years)
Science Assessment Specialist for UT State (20032010)
Assessment Development Coordinator for UT State
(2010-2012)
Senior Assessment Manager for WestEd (2012present)
Roy Beven:
HS Physics, Math, Geology, Tech-Ed Teacher (23
years)
Lead Science Assessment Specialist for WA State
(2001-2008)
Senior Science Content Specialist for NWEA (2008present)
3
CSSS Large Scale Assessment Webinar
Adaptive Testing in Science
Presenters’ Experience with CAT
Utah: peer review acceptance of Utah Adaptive
Assessment System
Smarter Balanced: state co-chair work group
member for item development
program management liaison for multiple work groups
MAP® for Science: an interim adaptive test designed
to measure growth administered last year to over 1.7
million students mostly in grades 3-8 across the nation
4
CSSS Large Scale Assessment Webinar
Adaptive Testing in Science
CAT Overview: What
Tests are designed to assess the performance of students
by locating them on a scale with a high degree of accuracy
and precision.
A computer algorithm selects items according to where the
student was last on the scale or some other criteria.
When the student answers an item correctly, the computer
selects an item higher on the scale and vice versa.
The computer selects items until all the criteria are met.
5
CSSS Large Scale Assessment Webinar
Adaptive Testing in Science
CAT Overview: What
Correct
Student 1
Student 2
Correct
Spell “Encyclopedia”
Incorrect
Student 1
Student 2
Student 1
Student 2
Spell “School”
Student 3
Student 4
Incorrect
Correct
Student 3
Student 4
Student 3
Spell “Red”
Incorrect
Student 4
6
CSSS Large Scale Assessment Webinar
Adaptive Testing in Science
CAT Overview: What
(continued)
Possible criteria for item selection (aka, CAT blueprint):
• Student grade range (i.e., blueprinted standards)
• Number of items (i.e., operational and field test)
• Claims being reported (e.g., 3 or 4 disciplines)
• Standard Error of Measurement (SEM)
• Adequate coverage of standards
• Adequate cognitive complexity (DoK)
• Adequate types of items
7
CSSS Large Scale Assessment Webinar
Adaptive Testing in Science
CAT Overview: What
(continued)
Constraining a CAT
8
CSSS Large Scale Assessment Webinar
Adaptive Testing in Science
CAT Overview: What
(continued)
Sample Test Design with only 3 Criteria
- 3 reporting goals (e.g., life, earth/space, physical)
- 30 operational items (10 per goal)
- SEM
Items 1-10 to
establish preliminary
score
Items 11-25 to balance
the number of items per
goal
Items 26 to 30
to establish the
SEM
9
CSSS Large Scale Assessment Webinar
Adaptive Testing in Science
CAT Overview: Why (continued)
Tests present an individually tailored set of questions to
each student. Tests can quickly identify which skills
students have mastered. Tests provide accurate scores for
all students across the full range of the achievement
continuum. SBAC http://www.smarterbalanced.org/smarter-balanced-assessments/computeradaptive-testing/
CATs have been found to be as accurate as fixed-form tests
that are twice as long. CATs drawing from large item pools
can provide much more information, and more precise
information, than fixed-form tests. CATs provide immediate
feedback to students and teachers. ASCD
http://www.ascd.org/publications/educational-leadership/mar14/vol71/num06/The-Potential-of-10
Adaptive-Assessment.aspx
CSSS Large Scale Assessment Webinar
Adaptive Testing in Science
Longitudinal Scale
Many existing LSA’s develop a new scale for each grade
level test each year, then equate these new scales back to
the scale established when the tests were first
administered .
CATs establish one scale. Items are calibrated onto this one
scale for the life of the test. The scale could be reestablished, but this would affect all items in the item bank.
11
CSSS Large Scale Assessment Webinar
Adaptive Testing in Science
Nature of CATs and their Item Banks
Larger than static test item banks (typically 4-10 times
larger).
Last longer than static test item banks, as individual item
exposure is limited.
Need to cover the “range of the algorithm” criteria (e.g.,
DoK, standards) at a range of item difficulty. Do not fully
know the range of difficulty until after items are field
tested. A challenge in building a bank at the onset of a
new test.
12
CSSS Large Scale Assessment Webinar
Adaptive Testing in Science
Using different item types in CATs
1. Multiple Choice dichotomously scored items
2. Technology Enhanced Items (TEI’s)
3. Polytomously score items
4. Constructed response items
5. Common Stimulus Item Sets (CSIS)
6. Simulations with scoring by path (PhET, SimSci, NAEP)
7. Others?
13
CSSS Large Scale Assessment Webinar
Adaptive Testing in Science
Discussion: Implications for LSA of NGSS?
•
What part of a state’s NGSS assessment system could
(might) be a CAT?
•
Can (should) a single CAT measure all grade ranges?
K-2, 3-5, 6-8, 9-12
•
Can (should) a CAT report on the 3 dimensions of the
NGSS (DCI’s, SEP’s, and CC’s)?
•
Can a CAT report on the 4 disciplines of the NGSS?
•
Can a CAT report on an adequate range of NGSS PE’s?
14