No Slide Title

Download Report

Transcript No Slide Title

Standards-based Teacher
Evaluation: What Have We
Learned?
Steven Kimball, Tony Milanowski &
Herb Heneman
Consortium for Policy Research in Education
Wisconsin Center for Education Research
University of Wisconsin-Madison
Why Study Teacher Evaluation?

Traditional teacher evaluation weak on
accountability and instructional improvement:
– Narrow conceptions of teaching
– Undifferentiated criteria, i.e.,
satisfactory/unsatisfactory
– Limited evidence, e.g., single class observation
– One-way communication in evaluation
– Variability/inconsistency
Why Study Teacher Evaluation?

“To this day, almost all educational personnel
decisions are based on judgments which,
according to the research, are only slightly more
accurate than they would be if they were based
on pure chance.”
(Medley & Coker, 1987, p.243)

Classroom-based teacher evaluation increasingly a
part of new compensation reforms (e.g., Denver
ProComp; Teacher Incentive Fund program)
Performance Evaluation A Key Part of
Knowledge & Skill-based Pay




Base pay increase or bonus (typically $300 - $3,000) for competency
demonstration
- skill blocks – technology, student assessment, curriculum unit
design, etc.
- portfolio completion
- dual certification
- graduate degree in subject taught
Base pay increase or bonus for NBPTS certification ($1,000 $15,000)
Base pay increase or bonus for classroom performance mastery
(typically $1,000 - $3,000), as measured by standards-based
teacher evaluation
May involve changes to single salary schedule
- fewer steps
- fewer or redefined lanes
- performance-linked career ladder progression
Why Standards-based Teacher
Evaluation?

Standards for teaching and related performancebased assessments exist for NBPTS, PRAXIS III,
Connecticut BEST program, Danielson Framework

Is it better than typical evaluation methods?

Can it serve as a foundation for knowledge- and
skills-based pay?
What is Standards-based Teacher
Evaluation?

Developed from effective teaching literature and
newer conceptions of teaching and learning (e.g.,
pedagogical content knowledge)

Comprehensive

Clear expectations with performance differentiation

Typically reflection and growth oriented

Multiple sources of evidence
Framework for Teaching (Danielson, 1996)
4 Domains, 22 components, 66 elements
Domains: Planning and Preparation, Classroom
Environment, Instruction, Professional
Responsibilities
Components from Instruction Domain:
– Communicating clearly and accurately
– Using question & discussion techniques
– Engaging students in learning
– Providing feedback to students
– Flexibility & responsiveness
Example from Instruction Domain,
Communicating Clearly and Accurately
Component
Element
Unsatisfactory
Directions Teacher
and
directions and
Procedures procedures are
confusing to
students.
Source:
Danielson
(1996),
p. 91.
Basic
Proficient
Distinguished
Teacher directions
and procedures
are clarified after
initial student
confusion or are
excessively
detailed.
Teacher
directions and
procedures are
clear to
students and
contain an
appropriate
level of detail.
Teacher direction
and procedures are
clear to students
and anticipate
possible student
misunderstanding.
Let’s Try It!
 Using rubric to evaluate practice
CPRE Research Standards-based Teacher
Evaluation
 Primary Sites:
- Cincinnati Public Schools
- Vaughn Next Century Learning Center (LA charter school)
- Washoe County (NV) School District
Secondary : Anoka-Hennepin & La Crescent, MN; Coventry,
RI; Newport News, VA
Each site adopted standards-based teacher performance
evaluation systems based largely or in part on Framework for
Teaching (Danielson, 1996)
Vaughn’s Standards
Instructional Planning & Classroom Management
comparable to Framework
3-10 Content-specific Dimensions (Departs from
Framework)
– Literacy
– Language development
– Mathematics
– Special Education, Science, Social Studies, Arts,
Technology, Physical Education, Teaming
Vaughn’s Standards, cont.
Elements of Literacy:
– Teaches phonemic awareness & systematic explicit phonics
– Develops spelling skills
– Strengthens grade-level vocabulary development
– Guides students through the reading process
– Focus on comprehension skills, aiming at higher-order thinking skills
– Guides students through the writing process
– Integrates literature
– Uses appropriate materials & teaching strategies
– Implements appropriate student activities
Evaluation Procedures - Cincinnati
Experienced Teachers
– Peer ‘Teacher Evaluator’ makes 4, administrator 2
observations, Teacher Evaluator rates on management
& instruction using 4 level rubric
– Administrator uses portfolio to rate planning &
professionalism using 4 level rubric
 First Year Teachers, Struggling Teachers
– Peer ‘Consulting Teacher’ makes 6 observations to
rate on management & instruction, uses portfolio to
rate on planning & professionalism
Evaluation Procedures
Vaughn
– Teacher, peer, and administrator rate twice per year based
on varied # of observations during a fixed period, using 4
level rubric
Washoe
– Administrator rates using 4 level rubric closely modeled on
Framework. Evidence drawn from 1-6 formal classroom
observations, ‘walk-thru’s’, artifacts and discussions with
teachers
‘Theory of Action’ Linking SBTE to
Student Learning
KSBP System
-Model of Quality
Teaching
-Assessment Process
-Feedback
-PD Opportunities
-Incentives
Shared Conception
of Good Teaching
Teacher Skill
Development
Teaching
According
to the
Model
Attraction &
Retention
Student
Learning
Other
Causes
Key Links We Focused On
Is teaching according to the model associated with
higher levels of student learning?
What are the impacts of assessing teacher skills on
the postulated mediating factors of teacher quality?
- Shared conception of teaching
- Skill development
- Attraction/retention
Statistical Methods
 Multilevel random intercept models with controls for prior
year achievement and various student characteristics
( e.g., free/reduced lunch) at Level 1

Two versions at Level 2:
1. no predictors of classroom intercepts  EB intercept
residuals obtained and correlated with evaluation scores for
evaluated teachers
2. evaluation score as predictor of classroom intercepts 
calculate effect of change in evaluation score on student test
scores (in S.D. units)
Evaluation Score – Student
Achievement Correlations*
Reading
Math
Other
Cincinnati 01-02
Gr 3-8
02-03
03-04
Vaughn
00-01
Gr 2-5
01-02
02-03
Washoe
Gr 3-5 01-02
4-6 02-03
3-6 03-04
.32
.28
.29
.50
.61
.10
.43
.34
.22
.21
.45
.13
.27
-.02 (Science)
.29
.18
.38 (Lang.
.33 Arts)
.21
.25
.19
.19
.24
.21
-
* Combined across grades
Red italic: C.I. Includes 0
Effect of Change in 1 TE Level on
Achievement on Test Scores*
Reading
Cincinnati 01-02
.26
02-03
.14
03-04
.14
Vaughn
00-01
01-02
.24
02-03
.03
Washoe
01-02
.13
02-03
.14
03-04
.10
* Combined across grades
Math
Other
.23
.18
.13
.36
.12
.14
.19
.16
.09
-.01 (Science)
.13
.27 (Lang.
.20 Arts)
-
Reliability/Inter-rater Agreement

Cincinnati:
- 60% to 80% absolute agreement at domain level; higher for
classroom mgmt, lower in new teacher samples
- Estimated reliability of 6 classroom observations over the
year: .73-.89 at domain level; lower for new teachers
Vaughn:
- self, peer, administrator agreement .74-.86 (Coef. Alpha
for 5 common domains)
- “Cross-semester” domain score correlations: .74-.93
Decision-Making Studies

Evaluator Interviews & Review of Evaluation Reports
(Cincinnati, Washoe)
- Cincinnati: well-defined rating process & close
adherence to it; outside evaluators reduce leniency
- Vaughn: less structure, more ‘gut’; strong culture
and agreement on instruction counteract leniency
- Washoe: more variation in process; more rigor for
new teachers, leniency for tenured teachers
Is Experience Better at Predicting
Student Achievement?

Correlation of student achievement with experience
lower than with evaluation score and varies by site:
- Cincinnati: -.20 to .15
- Vaughn: .00 to .32
- Washoe: .07 to .16

Student achievement tends to rise with experience over
first 3-5 years, then levels off
S tandardiz ed E B IR -M ath
Relationship of Experience to Student
Achievement, Washoe County 03-04
4
0
-3
12 3456 7
10
step
15
20
Impact – Shared Conception



Cincinnati – helped make district expectations clear,
especially to new teachers; reinforced emphasis on
student standards.
Vaughn – reinforcement & reflection of Vaughn
culture (“little school that could”)
Washoe – limited impact, competition with other
initiatives and school priorities
Impact – Attraction & Retention

Cincinnati:
- Initial implementation appears to have encouraged senior teachers
with concerns about skill evaluation to retire
- Rigorous process may encourage turnover of new teachers

Vaughn
- Strong communication of Vaughn expectations discourages low
efficacy teachers from joining and lower performers from staying

Washoe
- No impact on recruitment
- Marginal improvement in weeding out marginal teachers
Impact – Skill Improvement

Commonalities:
- Impact wide but not deep
- Promotes reflection
- Classroom management, planning
- Attention to student academic standards
- ‘Tips’, materials, solutions to specific problems

While evaluation process affects teacher practice, to maximize
impact, systematic feedback, coaching, & aligned professional
development are needed
Impact – Skill Improvement

Cincinnati:
- Initially encouraged participation in study groups
- Minimal impact on experienced teachers

Vaughn
- Influences development of new & junior teachers

Washoe
- Varies widely depending on principal & teacher
- Developmental potential limited by leniency
Other Findings about KSBP &
Evaluation

Classroom-level value-added jumps around from year
to year

At Vaughn, by the end of year 3 there was minimal
between-classroom variation in student test scores

Teachers accepted the teaching standards used to
evaluate performance, but had mixed reactions on the
fairness and validity of evaluation ratings
Other Findings, cont.
Administrators accept the teaching standards, reported
increased workload in implementing new system, & had
difficulties providing feedback
 Implementation glitches were frustrating to teachers and
administrators in systems with high stakes
Lack of alignment of human resource systems (recruitment,
selection, induction, mentoring, professional development,
compensation, performance management, instructional
leadership) to the teaching standards
Guidelines for Design and
Implementation

Specify that performance improvement is a strategic
imperative

Develop teaching standards and scoring rubrics (i.e.,
competency model)

Prepare for added teacher and administrator workload

Thorough communication
Guidelines, continued

Video-taping classroom practice and/or use of
multiple evaluators

Train and re-train evaluators

Support teachers through feedback and professional
development

Align human resource management systems
Strategic HR Alignment
Student Achievement Goals
Performance Improvement Strategy
(Programs, Plans)
Performance Competencies
(What Teachers & Administrators Need to Know & Be
Able to Do)
Human Resource Programs
Recruitment - Selection - Induction - Mentoring
Prof. Development - Compensation - Performance Management Leaders
Guidelines, continued

Pilot system and monitor implementation

Examine validity and inter-rater agreement
Examples of CPRE Work on Standardsbased Teacher Evaluation (SBTE)
1997: Odden & Kelley, Paying Teachers for What They Know
and Can Do (2nd ed. 2002, Corwin Press)
 2005: Milanowski, Kimball, & Odden. Teacher
Accountability Measures and Links to Learning. In Rubenstein,
R., Schwartz, A.E., Stiefel, L., and Zabel, J. (Eds.), Measuring
School Performance & Efficiency: Implications for Practice
and Research, 2005 Yearbook of the American Education
Finance Association. Larchmont, NY: Eye on Education.
 2006: Heneman, Milanowski, Kimball & Odden, Standards-
based teacher evaluation as a foundation for knowledge-and
skill-based pay, available at:
http://www.cpre.org/Publications/RB45.pdf
 For more information: www.wcer.wisc.edu/cpre