Nonprofit Leadership Forum

Download Report

Transcript Nonprofit Leadership Forum

What Does Research Tell Us About
Identifying Effective Teachers?
Jonah Rockoff
Columbia Business School
Nonprofit Leadership Forum, May 2010
First, Let’s Define “Effective”
●
Can be an inputs based concept
– Observable actions or characteristics
●
Can be outcomes based concept
– Measured by student success
●
Recent work of economists focuses on outcomes
– Use a value-added approach
– Outcomes measured are typically standardized exams
in math and reading, usually elementary/middle school
●
Movement to bring rigorous analysis to teacher
evaluations based on in-class observation
2
Basics of Value Added Analysis
●
●
VA is all about comparing actual student
outcomes to a counterfactual expectation
Suppose we knew the “right” counterfactual
expectation for each child, call it A*
– Expected achievement w/ some basic level of
educational quality (e.g., “the average
teacher”)
1. Subtract expectation (A*) from actual
student achievement (A); call this G
2. To get VA for a teacher, take the average G
across all of the students she taught
3
Setting Expectations
●
●
How to set up the counterfactual expectation is
the big question in value-added work
Typically, we estimate expectations with data
– Example: set expectation (A*) as the average
achievement of students w/ the same prior test scores
●
Quality of estimates contingent on quality of the
data and the process that generates it
– Expectations set too low make teachers look good;
expectations set too high make them look bad
4
Potential Statistical Problems
●
#1: Systematic sorting of students
– Concern here is bias
– Unfair treatment of teachers that is systematic
• Example: P’s friends get “easier” kids
●
#2: Instability of VA estimates
– Concern here is imprecision
– If estimates are very noisy, using them for
rewards/consequences means lots of mistakes
– Also means it may be a poor motivational tool
5
Basic Findings from VA Research
●
Substantial variation in VA across teachers
– 1 s.d. in VA  0.1 to 0.2 s.d. in achievement
– A bit more variation in math than reading/ELA
– Much of the variation is within schools
●
VA estimates appear to contain real power
to predict teacher effectiveness as
measured by student achievement
– Stability across years is enough to appear
useful in teacher evaluation
– Bias is not a big deal overall, though it could
matter for individual teachers
6
.25
.3
Results on Stability from KS, KRS
00
.05
.1.1
.15
.2 .2
(1) Group Teachers, Years 1/2
(2) Compare in Years 3/4
(3) Large Persistent Differences
-6
-5
-3
2.5 3
0 0
-2.5
Students'Percentile
Percentile
Rank,
Relative
to Peers
to Peers
Relative
Rank,
Students'
Bottom Quartile
5
6
Top Quartile
7
Why Get Excited About Value Added?
●
Why not just hire good teachers?
– Wise selection is the best means of improving the school
system, and the greatest lack of economy exists
wherever teachers have been poorly chosen.
• Frank Pierrepont Graves, NYS Commissioner, 1932
●
Because it is, unfortunately, easier said than done
– Decades of work on type of certification, graduate
education, exam scores, GPA, college selectivity
– (Very) small, positive effects on student outcomes
●
Rockoff et al. (2008): non-traditional predictors
– Personality, content knowledge, cognitive ability, selfefficacy, commercial teacher selection test score
– Result: no silver bullets, but moderate power to
distinguish when pool measures into an index
8
What You Get is What You See?
●
●
Why not identify individuals likely to be effective
teachers through direct observation of teaching?
There is consistent evidence that subjective
evaluations of existing teachers are strongly
related to gains in student achievement
– Research extends back nearly a century
• Hill (‘21), Gotham (‘45), Brookover (’45), Anderson (’54)
– More recent analysis focuses on rubric-based teaching
evaluations and principal opinions
• Schacter & Thum, Milanowski, Tyler et al., Rockoff & Speroni,
Jacob & Lefgren, Harris & Sass, Rockoff et al.
9
Less Math, but No Less Difficult
●
●
One nice aspect of subjective evaluation is
it does not rely on complicated formulae
However, the details of how evaluation is
done present issues similar to VA analysis
–
–
–
–
Context (Does one size fit all?)
Focus (What goes on the evaluation form?)
Bias (Are evaluators fair and impartial?)
Imprecision (A few lessons  a whole year?)
10
A (Modest?) Proposal
●
Provide VA estimates to principals
– Help them with the problem of estimating A*
– Let them combine VA with other information
(e.g., observation) to evaluate teachers
●
NYC has done this: “Teacher Data
Reports”
– Piloted in using a randomized control trial
• “Treatment principals” received reports and training
– Rockoff et al. (2009) study this pilot using
baseline and follow-up surveys of principals
11
Principals’ Evaluations and VA
.4
.3
4
3
.2
2
.1
1
-.4
-.2
0
.2
Value Added Score for Math
.4
na
l
tio
Ex
ce
p
G
oo
d
Poor
Good
Exceptional
y
G
oo
d
r
Fa
i
Po
or
Very Poor
Fair
Very Good
V
er
V
er
y
Po
or
0
●
Substantial variation in baseline evaluations
Teacher Impacts on Math Performance
Strong relationship
with Rating
VA estimates
by Principal's
0
●
Note: The Value Added Score is 'AllBaseline
Schools,Overall
All Teachers,
Same Grade'measured
Evaluation
For the Math test the student-level standard deviation is approximately:
0.70 in 4th grade, 0.77 in 5th, 0.81 in 6th. 0.79 in 7th, and 0.74 in 8th.
12
New and Useful Information?
●
Were treatment principals’ evaluations
affected by the VA reports?
Treatment
Difference
– Are the effects greater for
moreControl
precise
VA?
Math
Confidence Interval of Teacher's Value-Added Estimate
Value-added Score, Multi-year, Peer Comparison
0.185**
0.034
0.150**
M ore Precise than M edian Teacher
Less Precise than M edian Teacher
(0.036)
(0.035)
[0.002]
Treatment Control Difference Treatment
Control Difference
Principal's Pre-experiment Rating
0.631**
0.718**
-0.087
Math
(0.046)
(0.042) 0.067
[0.157] 0.140*
Value-added Score
0.360** -0.075
0.436**
0.208**
Experience Controls
Y
Y
(0.111)
(0.083)
[0]
(0.060)
(0.042)
[0.022]
R-squared
0.468
Principal's
Pre-experiment Rating
0.552** 0.724** 0.473
-0.172+
0.555**
0.705**
-0.15
Sample Size
615
631
(0.079)
(0.077)
[0.069]
(0.094)
(0.081)
[0.146]
English Language Arts (ELA) 0.684
R-squared
0.734
0.713
0.635
Value-added
Score, Multi-year, Peer
Comparison
0.025
0.003
Sample
Size
340
336
275
2950.022
English Language Arts (ELA)
(0.038)
(0.036)
[0.672]
Value-added
0.049
0.236+
0.021
Principal'sScore
Pre-experiment Rating 0.285*
0.671**
0.696** 0.062
-0.025 -0.041
(0.125)
(0.091)
[0.07]
(0.065)
(0.044)
(0.046) (0.057)
[0.69] [0.572]
Principal's
Pre-experiment
0.539** 0.747** -0.208*
0.590**
0.632** -0.042
Experience
Controls Rating
Y
Y
(0.091)
(0.082)
[0.045]
(0.085)
[0.702]
R-squared
0.439
0.419 (0.101)
R-squared
0.645
0.72
0.662
0.635
Sample Size
583
607
Sample Size
313
322
270
285
13
In Conclusion
●
●
Identifying highly effective teachers is near
impossible if all you have to go on is a CV
Value-added and in-class observation offer
potential insight into this problem
– Both, of course, are imperfect
●
Innovative evaluation policies that begin to
harness this information can raise teacher
quality and improve student outcomes
14