Believe me, it’s not cheating, but some strange method”
Download
Report
Transcript Believe me, it’s not cheating, but some strange method”
C R E S S T / Harvard
“Believe me, it’s not cheating,
but some strange method”
GRE/TOEFL prep teacher, Shanghai
Daniel Koretz
Harvard Graduate School of Education
National Center for Research on Evaluation, Standards, and Student Testing
Annual CRESST Conference
September 11, 2002, Los Angeles, CA
C R E S S T / Harvard
Validating inferences in the age of NCLB
Validity is a property of inferences, not of
measures
Key inferences are now about gains obtained
under high stakes conditions
Traditional validation is insufficient
Inappropriate framework
Insufficient methods
Risk is false positives: inflated gains
2
C R E S S T / Harvard
Map of talk
Will not show evidence of severe inflation—old
hat by now
Will discuss approach to validation of gains
Will illustrate possible leverage points for
coaching, inflation of scores
Will note possible directions for future
3
C R E S S T / Harvard
Why traditional validation is insufficient
Cross-sectional, insensitive to changes in levels
of performance
Insufficient in high-stakes contexts:
Largely ignores behavioral responses to
testing
Ignores inadvertent emphases in tests
Assumes stability in relationships between
aspects of performance, both tested and
untested
4
C R E S S T / Harvard
Why these limitations matter
Scores can rise rapidly—and be inflated—
without affecting correlations among tests
Behavioral responses to testing (e.g., coaching)
can make sampled content unrepresentative of
domain after initial validation
Inadvertent emphases in tests can provide
leverage for coaching
5
C R E S S T / Harvard
KY math trends, KIRIS and ACT
Standard Deviation
0.7
KIRIS
0.6
0.5
0.4
0.3
0.2
0.1
0
-0.1
ACT
1992
1993
1994
Year
6
1995
C R E S S T / Harvard
Correlations Between ACT and
KIRIS Mathematics
Student level
School level
1992
0.54
0.69
7
1993
0.71
0.75
1994
0.70
0.58
1995
0.72
0.74
C R E S S T / Harvard
Keys to validating gains
Assess generalizability of gains to other (audit)
measures
Determine how much generalizability should be
expected
Based on users’ inferences (example of
TAAS vs. NAEP)
Examine behavioral responses
8
C R E S S T / Harvard
CRESST work on the validation of gains
Develop framework for validation efforts (Tech
Report 551)
Explore teacher surveys and interviews as a
means of obtaining information behavioral
responses to testing (ongoing)
Develop statistical models for the analysis of
gains (new)
9
C R E S S T / Harvard
Framework for validating gains
Identify substantive and nonsubstantive
performance elements in test, inferences
Determine weights given to PEs in test
May be unintended
May be trivial or zero
Determine weights given to PEs in key
inferences about gains
Validity hinges on consistency of change in
performance on PEs with inference weights
10
C R E S S T / Harvard
Types of test preparation
Teaching more
Working harder
Working more effectively
Reallocation
Alignment
Coaching
Cheating
11
C R E S S T / Harvard
Reallocation
Refers to shifting limited instructional resources
among substantive areas
Within subject
Between subjects
Results in reallocating achievement
Can lead to either meaningful change or inflation
Inflates by undermining representation of the
domain
12
C R E S S T / Harvard
Alignment
Sometimes presented as providing protection
against inflation: emphasis on PEs deemed
important
But this is just a form of reallocation
Whether gains are inflated depends on
Importance of emphasized material to
inference, and
Importance of de-emphasized or omitted
material to inference
13
C R E S S T / Harvard
Coaching
Focuses on details of the test
Substantive, including item style
Non-substantive, such as item formats and
scoring rubrics
Includes test-taking tricks (e.g., POE, plug-in)
Can inflate scores or simply waste time
14
C R E S S T / Harvard
Possible levers for coaching
Possibly inadvertent content overweighting
Item style
Recurrent content detail
Recurrent form of presentation
Inadvertent, recurrent construct
underrepresentation
Recurrent cognitive demand with limited
construct relevance
15
C R E S S T / Harvard
Item from G8 MCAS
Eva has four sets of straws. The
measurements of the straws are given
below. Which set of straws could not be
used to form a triangle?
A.
B.
C.
D.
Set 1:
Set 2:
Set 3:
Set 4:
4 cm, 4 cm, 7 cm
2 cm, 3 cm, 8 cm
3 cm, 4 cm, 5 cm
5 cm, 12 cm, 13 cm
16
C R E S S T / Harvard
Item from G8 MCAS
Each arrangement in this pattern is made up of tiles.
How many tiles will be in the 6th arrangement in the
pattern?
17
C R E S S T / Harvard
Prompt from G8 MCAS
Use the balance scales below to answer the question below
18
C R E S S T / Harvard
Prompt from G10 NAEP
Use the unit of length below to estimate the perimeter of
the figure shown. Between which two consecutive
whole-number units does the perimeter lie?
19
C R E S S T / Harvard
Prompt from G10 MCAS
Use the map below to answer this question.
20
C R E S S T / Harvard
Prompt from a G8 KIRIS item
21
C R E S S T / Harvard
Prompt from G10 MCAS
Use the figure below to answer the next question
22
C R E S S T / Harvard
Answers for G10 MCAS prompt
If the figure above is
folded into a cube,
which of the following
solids will be formed?
23
C R E S S T / Harvard
Next steps for research
Develop methods for ascertaining which levers
teachers use to inflate scores
Develop methods for identifying systematically
the patterns in tests that facilitate or inhibit
coaching and inappropriate reallocation
Develop methods for ‘unpacking’ lack of
generalization and for better distinguishing
between meaningful gains and inflation
24