Brophy and Colwell NAfME Presentation

Transcript Brophy and Colwell NAfME Presentation

ASSESSMENT SRIG
BIENNIAL MEETING
MARCH 30, 2012
NAfME NATIONAL CONFERENCE
3:45PM-5:45PM
GRAND B
TIMOTHY S. BROPHY, CHAIR
KELLY PARKES, INCOMING CHAIR
Teacher
Evaluation:
Issues of
Validity
and
Reliability
TODAY’S PROGRAM
 3:45pm. Greeting and Welcome; Election results. Timothy S.
Brophy, Chair
 3:55pm. Program begins: Teacher Evaluations – Issues of
Validity and Reliability
 Timothy S. Brophy and Richard Colwell. Teacher Evaluation: Issues of
Validity and Reliability.
 4:20pm Dru Davison, Memphis City Schools. The Tennessee Fine Arts
Pilot: A Multiple Measures Portfolio System (Perform, Create, Respond,
Connect) with Blind Peer Review. Electronic presentation.
 4:40pm Keitha Lucas Hamann, U. Minnesota-Twin Cities, and Doug
Orzolek, University of St. Thomas. Teacher Performance Assessment in
Minnesota: Challenges for Music Educators.
 5:05pm Breakout groups – Strategies for Measuring Student
Growth in Music
 5:30pm Leaders report
 5:40pm Announcements of upcoming events. Closing remarks by
Kelly Parkes, Incoming Chair
TEACHER EVALUATIONS:
ISSUES OF VALIDITY AND
RELIABILITY
TIMOTHY S. BROPHY, UNIVERSITY OF FLORIDA
RICHARD COLWELL, PROFESSOR EMERITUS, UNIVERSITY OF
ILLINOIS
NAfME CONFERENCE ASSESSMENT SRIG MEETING
MARCH 30, 2012
SESSION OVERVIEW
 The Context for The Reform of Teacher Evaluation
 The Problem: Determining Music Teacher Effectiveness
 Validity and Reliability Issues
 Challenges to the SRIG
THE POLITICAL CONTEXT:
THE AMERICAN RECOVERY
AND REINVESTMENT ACT (2009)
Achieving Equity in Teacher Distribution
 The State will take actions to improve teacher
effectiveness and comply with section
1111(b)(8)(C) of the ESEA (20 U.S.C.
6311(b)(8)(C)) in order to address inequities in
the distribution of highly qualified teachers
between high- and low-poverty schools, and
to ensure that low-income and minority
children are not taught at higher rates than
other children by inexperienced, unqualified, or
out-of-field teachers. (H.R.1, p. 169)
THE POLITICAL CONTEXT:
RACE TO THE TOP PHASE 2 CFDA NUMBER: 84.395A (2010)
RTTT Phase 2 defines teacher evaluation:
States, LEAs, or schools must include multiple
measures, provided that teacher effectiveness is
evaluated, in significant part, by student growth
(as defined in this notice). Supplemental
measures may include, for example, multiple
observation-based assessments of teacher
performance. (p. 19499)
THE POLITICAL CONTEXT:
RACE TO THE TOP PHASE 2
Student achievement means:
(b) For non-tested grades and subjects: alternative measures
of student learning and performance such as student scores
on pre-tests and end-of-course tests; student performance on
English language proficiency assessments; and other
measures of student achievement that are rigorous and
comparable across classrooms. (p. 19500)
Student growth means the change in student achievement for
an individual student between two or more points in time . A
State may also include other measures that are rigorous and
comparable across classrooms . (p. 19500)
Source:
Federal Register/Vol. 75, No. 71/Wednesday, April 14, 2010/ Notices
THE NEW “EVALUATION EQUATION”
35-50%
student
achievement
Teacher
evaluation and
“effectiveness”
determination
50-65%
observations
or other
methods
MUSIC TEACHER EFFECTIVENESS
RTTT defines effective teachers in very specific terms.
We need to be able to know what it means for music teachers
to be:
“Effective” – when students achieve at
“acceptable rates” – at least one grade level in an
academic year
“Highly effective” – when her/his students
achieve at “high rates” – for example, 1.5 grade
levels in an academic year
A BIG QUESTION:
What is a “year’s growth” in music education?
How do we find out?
THE “ELEPHANT IN THE LIVING ROOM” GROWTH IN MUSIC
What do we need to measure “one grade
level” of growth in music?
Rigorous,
standardsbased grade
level music
curriculum on
all standards
Clear,
consistent
grade-level
expectations
Valid, reliable
assessments
Comparability
across
schools,
districts, and
states
PART 1 OF THE EQUATION: VALID AND RELIABLE
ASSESSMENTS OF STUDENT MUSIC LEARNING
Student music learning = student achievement in RTTT
Assessment must be done well or not at all
NAEP is one reference for validity and reliability
NAfME continues to advocate for the arts as a core subject. Question: if music is a
core subject, how do we define it? What is assessed?
The 2008 NAEP analysis omitted validity, reliability, item analysis, regressions,
factor analysis and other test characteristics
NAEP analysis was concerned with demographic and SES related characteristics –
race, gender, free and reduced lunch, community and school type, etc.
PART 2 OF THE EQUATION: “OTHER MEASURES”
STRENGTHS AND CAUTIONS
Classroom
Observation
Principal
Evaluation
Instructional
Artifact
Portfolio
Teacher
Self-Report
Student
Survey
Value-Added
Model
Source: Goe, Holdheide, & Miller (2011). A practical
guide to designing comprehensive teacher evaluation
systems. National Comprehensive Center for Teacher
Quality: Washington, DC.
GENERAL VALIDITY ISSUES USING STUDENT LEARNING MEASURES IN TEACHER
EVALUATIONS
To what extent do changes in a student’s performance reflect actual changes in his
or her understanding of the underlying content?
When student test scores are used to estimate teaching effectiveness, what is the
extent to which those estimates accurately represent the teacher’s contribution to
student learning?
What evidence do we have regarding various threats to the validity of inferences for
a particular use of a measure?
How do we attribute student performance to individual teachers when the
assessments are intended to cover material from multiple courses?
Source: Steel, Hamilton, & Stecher (2010). Incorporating student performance
measures into teacher evaluation systems. Santa Monica, CA: Rand Corporation.
VALIDITY ISSUES FOR MUSIC TEACHER
EVALUATION
Student achievement data used for music teacher evaluation MUST be from music
assessments, not an arbitrary attribution of the effect of the music teacher on scores for
the “usual tested subjects” of math, reading, science, and writing
Student music achievement MUST be measured using valid, reliable instruments
“Other measures” used MUST be valid for music teachers and account for the variables
unique to music education
Observations and Evaluative tools MUST be implemented by trained personnel who are
content experts in music education
GENERAL RELIABILITY ISSUES
Common approach: internal consistency reliability, which expresses the extent to which items
on the test measure the same underlying construct
Measures of internal consistency reliability do not take into account interrater reliability in the
scoring of any open-response items that tests may include, and they also do not measure the
reliability of the value-added estimates themselves.
Interrater reliability is an important consideration in the case of items that are assessed by
human scorers because one wants to minimize the extent to which an individual’s score on
the assessment is dependent on the idiosyncrasies of the rater who happens to score it.
Reliability of value-added estimates is an important consideration because, due to random
classroom- and student-level error, value-added estimates are known to be unstable from year
to year.
Source: Steel, Hamilton, & Stecher (2010). Incorporating student performance measures into
teacher evaluation systems. Santa Monica, CA: Rand Corporation.
RELIABILITY NEEDS FOR MUSIC TEACHER EVALUATION:
STUDENT MUSIC ACHIEVEMENT
Clearly defining “openended” responses in
music – prepared
performance, ondemand performance,
composition,
improvisation,
arrangement, etc.
Norming/calibration of
rubrics used for openended responses
Expert rubric
development and
training of scorers
Thorough item analysis
for all item types
DEVELOPING ASSESSMENT RELIABILITY
AND VALIDITY: ITEM ANALYSIS
Readily available analysis techniques allow us to obtain sophisticated item
analysis data for music items
Item Response Theory models should become the standard analysis approach
3 parameter models for dichotomous items which measure difficulty and
discrimination while controlling for guessing
Polytomous generalized rating scale models extend IRT theory to the analysis
of rubric-based assessments (i.e. Samejima’s graded response model)
Easy software programs such as XCalibre4™ make these complex calculations
accessible
Frank Baker’s classic book, Basics of Item response theory, is now a free ERIC
document
TEACHER EFFECTIVENESS
A CALL FOR ACTION IN MUSIC EDUCATION
Prince et al (2009) The Other 69 Percent: “Identifying
highly effective teachers of subjects that are not tested
with standardized achievement tests — such as teachers
of art, music, physical education, vocational education,
and foreign languages — requires a different approach.”
(p. 5)
“It is easy to believe that we can assess whether students
read well or solve math problems well or understand
social studies or science, but it is much more difficult to
imagine how to assess whether students properly
understand a subject such as art. Until we can agree on
what constitutes effective teacher performance, it will be
difficult to measure it and reward it.” (p. 6)
MUSIC TEACHER EFFECTIVENESS
QUESTIONS FOR OUR PROFESSION
What is an effective music teacher?
What is a highly effective music
teacher?
How do we measure music teacher
effectiveness?
How do we evaluate music teacher
effectiveness?
CHALLENGE TO THE SRIG:
EVALUATION RESEARCH NEEDS
FIRST AND FOREMOST: We must lead the profession to develop technically
sound, valid, reliable, assessments of student music learning in every state, that
are thoroughly analyzed for validity, reliability, DIF, and item characteristics
A process or model of assessment development for states and districts
In cooperation with SMTE, collect and evaluate the validity and reliability of music
teacher evaluation systems in NAfME states
Design and implement studies to develop empirically supported criteria for music
teacher evaluation, use these to develop music teacher evaluation models, and
assess their validity and reliability
THE “EVALUATION DILEMMA”
“Solutions to the evaluation dilemma are
as complex as the issue itself. The
evaluation of music teachers remains an
area in need of relevant research, and the
development of an appropriate evaluation
and observation instrument must be
urgently addressed. It is now the
responsibility of the united music
teaching profession, in tandem with
active music education researchers, to
address this challenge.”
Source: Brophy
(1993)
Evaluation of
music
educators:
Toward defining
an appropriate
instrument.
THANK YOU