Transcript Slide 1

Measuring and Modeling Growth in a High Stakes
Environment
John Cronin, Ph.D.
Director
The Kingsbury Center @ NWEA
Measuring and Modeling Growth in a
High Stakes Environment
Presenter - John Cronin, Ph.D.
Contacting us:
Rebecca Moore: 503-548-5129
E-mail: [email protected]
This PowerPoint presentation and recommended resources are
available at our website: www.kingsburycenter.org
This presentation is the top presentation on this page
http://kingsburycenter.org/our-research/research-reports-publications
How tests are used to evaluate teachers
Testing
Metric (Growth Score)
Analysis (Value Added)
Evaluation (Rating)
Three principles
• Ineffective should mean egregiously
incompetent.
• You must support teachers with an ineffective
rating through improvement or dismissal.
• You shouldn’t dismiss a teacher that can’t be
replaced with someone more effective.
What question is being answered? –
Performance Management
Is the progress produced by this
teacher dramatically greater or
less than teaching peers that
deliver instruction to comparable
students?
Our nation has moved from a model of
education reform that focused on fixing
schools to a model that is focused on
fixing the teaching profession.
Difficulty of New York Proficient Level
100
90
80
70
60
50
Math
40
Reading
30
20
10
0
Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8
Difficulty of ACT college readiness standards
100
NWEA Percentile – 2011 Norms
90
80
70
60
Math
50
Reading
40
English
30
20
10
0
Grade 8 EXPLORE
Grade 10 PLAN
Grade 11 ACT
Moving from Proficiency to Growth
All students count when
accountability is measured
through growth.
One district’s change in 5th grade math performance
relative to Kentucky cut scores
Mathematics
No Change
Down
Number of Students
proficiency
college readiness
Up
Fall RIT
Number of 5th grade students meeting math growth
target in the same district
Mathematics
Number of Students
Failed growth target
Met growth target
Student’s score in fall
Issues in the use of growth and value-added
measures
Measurement design of the
instrument
Many assessments are not
designed to measure growth.
Others do not measure growth
equally well for all students.
Tests are not equally accurate for all
students
California STAR
NWEA MAP
Tests are not equally accurate for all
students
Grade 6 New York Mathematics
Issues in the use of growth and value-added
measures
“Among those who ranked in the top
category on the TAKS reading test, more
than 17% ranked among the lowest two
categories on the Stanford. Similarly
more than 15% of the lowest value-added
teachers on the TAKS were in the highest
two categories on the Stanford.”
Corcoran, S., Jennings, J., & Beveridge, A., Teacher Effectiveness on High and Low Stakes
Tests, Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI
(2010).
Issues in the use of growth and value-added
measures
Instability of results
A variety of factors can cause valueadded results to lack stability.
Results are more likely to be stable
at the extremes. The use of
multiple-years of data is highly
recommended.
Reliability of teacher value-added
estimates
Teachers with growth scores in lowest and
highest quintile over two years using NWEA’s
Measures of Academic Progress
Bottom
quintile
Y1&Y2
Top quintile
Y1&Y2
Number
59/493
63/493
Percent
12%
13%
r
.64
r2
.41
Typical r values for measures of teaching effectiveness range
between .30 and .60 (Brown Center on Education Policy, 2010)
Issues in the use of growth and value-added
measures
Control for statistical error
All models attempt to address this
issue. Nevertheless, many teachers
value-added scores will fall within
the range of statistical error.
Range of teacher value-added
estimates
12.00
11.00
Mathematics Growth Index Distribution by Teacher - Validity Filtered
10.00
9.00
8.00
7.00
Each line in this display represents a single teacher. The graphic
shows the average growth index score for each teacher (green line),
plus or minus the standard error of the growth index estimate (black
line). We removed students who had tests of questionable validity and
teachers with fewer than 20 students.
Average Growth Index Score and Range
6.00
5.00
4.00
Q5
3.00
2.00
Q4
1.00
0.00
Q3
-1.00
-2.00
-3.00
-4.00
-5.00
-6.00
-7.00
-8.00
-9.00
-10.00
-11.00
-12.00
Q2
Q1
Issues in the use of growth and value-added
measures
Instructional alignment
Tests should align to the teacher’s
instructional responsibilities.
Issues in the use of growth and value-added
measures
Measurement sensitivity
Assessments must align with the
should be instructionally sensitive.
Issues in the use of growth and value-added
measures
Model Wars
There are a variety of models in the
marketplace. These models may
come to different conclusions about
the effectiveness of a teacher or
school. Differences in findings are
more likely to happen at the
extremes.
Issues in the use of growth and value-added
measures
Lack of random assignment
The use of a value-added model
assumes that the school doesn’t
add a source of variation that isn’t
controlled for in the model.
e.g. Young teachers are assigned
disproportionate numbers of
students with poor discipline
records.
Issues in the use of growth and value-added
measures
Uncovered Subjects and Teachers
High quality tests may not be
administered, or available, for many
teachers and grades. Subjects like
social studies may be particularly
problematic.
Issues in the use of growth and value-added
measures
Idiosyncratic cases
In self-contained classrooms, one or
two idiosyncratic cases can have a
large effect on results.
Other issues
Security and Cheating
When measuring growth, one
teacher who cheats disadvantages
the next teacher.
Security considerations
• Teachers should not be allowed to view the contents
of the item bank or record items.
• Districts should have policies for accomodation that
are based on student IEPs.
• Districts should consider having both the teacher and
a proctor in the test room.
• Districts should consider whether other security
measures are needed for both the protection of the
teacher and administrators.
Other issues
Proctoring
Proctoring both with and without the
classroom teacher raises possible
problems.
Documentation that test
administration procedures were
properly followed is important.
Potential Litigation Issues
The use of testing data for high stakes personnel
decisions does not yet have a strong, coherent,
body of case law.
Expect litigation if value-added results are the
lynchpin evidence for a teacher-dismissal case
until a body of case law is established.
Possible legal issues
• Title VII of the Civil Rights Act of 1964 –
Disparate impact of sanctions on a protected
group.
• State statutes that provide tenure and other
related protections to teachers.
• Challenges to a finding of “incompetence”
stemming from the growth or value-added
data.
Recommendations
• Embrace the formative advantages of growth
measurement as well as the summative.
• Create comprehensive evaluation systems with
multiple measures of teacher effectiveness (Rand,
2010)
• Select measures as carefully as value-added models.
• Use multiple years of student achievement data.
• Understand the issues and the tradeoffs.