Transcript Slide 1
Measuring and Modeling Growth in a High Stakes Environment John Cronin, Ph.D. Director The Kingsbury Center @ NWEA Measuring and Modeling Growth in a High Stakes Environment Presenter - John Cronin, Ph.D. Contacting us: Rebecca Moore: 503-548-5129 E-mail: [email protected] This PowerPoint presentation and recommended resources are available at our website: www.kingsburycenter.org This presentation is the top presentation on this page http://kingsburycenter.org/our-research/research-reports-publications How tests are used to evaluate teachers Testing Metric (Growth Score) Analysis (Value Added) Evaluation (Rating) Three principles • Ineffective should mean egregiously incompetent. • You must support teachers with an ineffective rating through improvement or dismissal. • You shouldn’t dismiss a teacher that can’t be replaced with someone more effective. What question is being answered? – Performance Management Is the progress produced by this teacher dramatically greater or less than teaching peers that deliver instruction to comparable students? Our nation has moved from a model of education reform that focused on fixing schools to a model that is focused on fixing the teaching profession. Difficulty of New York Proficient Level 100 90 80 70 60 50 Math 40 Reading 30 20 10 0 Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8 Difficulty of ACT college readiness standards 100 NWEA Percentile – 2011 Norms 90 80 70 60 Math 50 Reading 40 English 30 20 10 0 Grade 8 EXPLORE Grade 10 PLAN Grade 11 ACT Moving from Proficiency to Growth All students count when accountability is measured through growth. One district’s change in 5th grade math performance relative to Kentucky cut scores Mathematics No Change Down Number of Students proficiency college readiness Up Fall RIT Number of 5th grade students meeting math growth target in the same district Mathematics Number of Students Failed growth target Met growth target Student’s score in fall Issues in the use of growth and value-added measures Measurement design of the instrument Many assessments are not designed to measure growth. Others do not measure growth equally well for all students. Tests are not equally accurate for all students California STAR NWEA MAP Tests are not equally accurate for all students Grade 6 New York Mathematics Issues in the use of growth and value-added measures “Among those who ranked in the top category on the TAKS reading test, more than 17% ranked among the lowest two categories on the Stanford. Similarly more than 15% of the lowest value-added teachers on the TAKS were in the highest two categories on the Stanford.” Corcoran, S., Jennings, J., & Beveridge, A., Teacher Effectiveness on High and Low Stakes Tests, Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI (2010). Issues in the use of growth and value-added measures Instability of results A variety of factors can cause valueadded results to lack stability. Results are more likely to be stable at the extremes. The use of multiple-years of data is highly recommended. Reliability of teacher value-added estimates Teachers with growth scores in lowest and highest quintile over two years using NWEA’s Measures of Academic Progress Bottom quintile Y1&Y2 Top quintile Y1&Y2 Number 59/493 63/493 Percent 12% 13% r .64 r2 .41 Typical r values for measures of teaching effectiveness range between .30 and .60 (Brown Center on Education Policy, 2010) Issues in the use of growth and value-added measures Control for statistical error All models attempt to address this issue. Nevertheless, many teachers value-added scores will fall within the range of statistical error. Range of teacher value-added estimates 12.00 11.00 Mathematics Growth Index Distribution by Teacher - Validity Filtered 10.00 9.00 8.00 7.00 Each line in this display represents a single teacher. The graphic shows the average growth index score for each teacher (green line), plus or minus the standard error of the growth index estimate (black line). We removed students who had tests of questionable validity and teachers with fewer than 20 students. Average Growth Index Score and Range 6.00 5.00 4.00 Q5 3.00 2.00 Q4 1.00 0.00 Q3 -1.00 -2.00 -3.00 -4.00 -5.00 -6.00 -7.00 -8.00 -9.00 -10.00 -11.00 -12.00 Q2 Q1 Issues in the use of growth and value-added measures Instructional alignment Tests should align to the teacher’s instructional responsibilities. Issues in the use of growth and value-added measures Measurement sensitivity Assessments must align with the should be instructionally sensitive. Issues in the use of growth and value-added measures Model Wars There are a variety of models in the marketplace. These models may come to different conclusions about the effectiveness of a teacher or school. Differences in findings are more likely to happen at the extremes. Issues in the use of growth and value-added measures Lack of random assignment The use of a value-added model assumes that the school doesn’t add a source of variation that isn’t controlled for in the model. e.g. Young teachers are assigned disproportionate numbers of students with poor discipline records. Issues in the use of growth and value-added measures Uncovered Subjects and Teachers High quality tests may not be administered, or available, for many teachers and grades. Subjects like social studies may be particularly problematic. Issues in the use of growth and value-added measures Idiosyncratic cases In self-contained classrooms, one or two idiosyncratic cases can have a large effect on results. Other issues Security and Cheating When measuring growth, one teacher who cheats disadvantages the next teacher. Security considerations • Teachers should not be allowed to view the contents of the item bank or record items. • Districts should have policies for accomodation that are based on student IEPs. • Districts should consider having both the teacher and a proctor in the test room. • Districts should consider whether other security measures are needed for both the protection of the teacher and administrators. Other issues Proctoring Proctoring both with and without the classroom teacher raises possible problems. Documentation that test administration procedures were properly followed is important. Potential Litigation Issues The use of testing data for high stakes personnel decisions does not yet have a strong, coherent, body of case law. Expect litigation if value-added results are the lynchpin evidence for a teacher-dismissal case until a body of case law is established. Possible legal issues • Title VII of the Civil Rights Act of 1964 – Disparate impact of sanctions on a protected group. • State statutes that provide tenure and other related protections to teachers. • Challenges to a finding of “incompetence” stemming from the growth or value-added data. Recommendations • Embrace the formative advantages of growth measurement as well as the summative. • Create comprehensive evaluation systems with multiple measures of teacher effectiveness (Rand, 2010) • Select measures as carefully as value-added models. • Use multiple years of student achievement data. • Understand the issues and the tradeoffs.