Transcript Document
Practical Considerations for Choosing an Accountability Model Pete Goldschmidt American Educational Research Association Annual Meeting San Francisco, CA - April 7-11, 2006 UCLA Graduate School of Education & Information Studies National Center for Research on Evaluation, Standards, and Student Testing Practical considerations - 1 • Need to decide what the accountability system is trying to do. • Accountability models should hold schools accountable for those things that schools are responsible. • Generally we consider those things to be student outcomes. • Outcomes can be multi-faceted, but emphasis is on academic performance. • Academic performance usually limited to a few subjects. • Academic performance usually measured by a large scale assessment. ‹#›/28 Accountability Means That We Intend to Hold Schools Responsible for Student Outcomes • Student outcomes accumulate over time. • Student achievement in grade 4 is a function of achievement in grade 3 and grade 2 (etc) as well as school processes, family background, innate ability, peers, luck, and error. (Hanusheck, 1979). • To get an unbiased estimate the effect of school processes, would need a measure of ability prior to any schooling. • Unlikely to have those data, but to reduce bias estimate marginal effect of schooling on the change in achievement from one grade to the next (fixed info incorporated into previous test score). • By only looking at incremental change reduce bias from not having true a priori measures. ‹#›/28 Practical considerations - 2 • An accountability model should only be based on results that reflect the effects of internal factors (factors schools control). • Simple aggregate static measures of student performance judge schools based on both internal and external factors – but are overly influenced by external factors. ‹#›/28 Concerns - 1 • Static models assumes current achievement in solely a function of current schooling processes. • Aggregating individual student variables inflates their importance – correlations between aggregate performance and school enrollment characteristics about .75. ‹#›/28 Directly Comparing the Relationships Among Indicators Reveals… API 2000 API 2000 API 2001 Overall Quality External Factor Internal Factor 1 API 2001 Overall Quality External Factor Internal Factor .995 .757 .889 .405 1 .754 .884 .419 1 .900 .857 1 .584 1 ‹#›/28 Concerns - AYP is a Simple Aggregate Static Measure • AYP, by construction also penalizes large (and) heterogeneous schools. • AYP as an accountability model can categorize schools, but does so very imprecisely. • Policy makers want the accountability model to provide more than simply a categorization (and at least the correct categorization) but potentially also use the results to inform school improvement. • It ignores the fact that the accumulation of both external and internal factors over time affect current student performance. • Although P = progress, no inferences regarding progress can be made as AYP does is heavily influenced by external factors – factors that are outside of school’s control. ‹#›/28 Validating AYP • Validate AYP results with additional accountability model – good idea • Too often focus is on choosing the “right” model – as decided by: identifying fewer schools as not making AYP • AYP results do not match growth or value added results. • The less an accountability model relies on static aggregate information to rank schools, the less its results will match AYP – and the more likely it will be that school traditionally thought of a being good, will not look as favorable ‹#›/28 Policy – What should results address? • Growth and value added models address different questions (states should explicitly address this rather than have multiple sets of results that may confuse stakeholders) • e.g. If you are interested in how everyone at the school did this year. • If you are interested in how this cohort of students is dong this year. • If you are interested in how students improved from last year. • If you are interested in how second grade (3rd etc) improved. • Achievement gaps. • How will we get to 100% proficiency? ‹#›/28 Concerns - 3 • Accountability Model Will Not Facilitate Meeting Ultimate Goal in 2013-2014 • No model can ensure that a state will meet goals • e.g. we may have accounting rules in business, and the goal of every business – say Delta, for example, aims to make a profit, but simply setting a trajectory for growth in revenue (and decrease in costs) by year does not guarantee it will happen ‹#›/28 Practical considerations – moving beyond AYP - Data • Is more data better? • Yes • Allows for longer growth trajectories. • Generate more precise estimates of growth. • Reduces effect of initial status on growth (or picks up that initial status is not finitely determinant of future. ‹#›/28 Data • But, • More data, more missingness (although it is beneficial to tradeoff additional missingness for more occasions). • Time span may not be relevant for grade level. • Models using more occasions are more consistent than models using fewer, but models of different occasions are only moderately correlated. ‹#›/28 Data and policy questions • Expectations. • Accounting for external factors. • Gaps and changes in gaps. • Interested in cohort to cohort improvement. • Cohort models have less stringent data requirements than panel models (e.g. vertical scaling). • Cohort results may be confounded with external factors (e.g. changes in student background). • Interested in individual student growth. • Need to consider metric and its interpretation. • More likely to address confounding issues. ‹#›/28 Practical way to combine results? High Growth Second cut First cut First cut Low Growth Not Met AYP Met AYP ‹#›/28 Criteria You Choose Will Yield Different Conclusions About School Performance Considering the relationship 40 School B: Highest Gain School C: Low Initial Status, High Gain schgain 35 School A: Highest Initial Status 30 25 School D: Low Initial Status, Better than expected gain School E: Medium Initial Status, Lower than expected gain 20 170 180 190 200 210 220 schinitial Note: Schools rank by different criteria. Status (x-axis): A > B > E > C > D; Gain (y-axis): B > C > A > E > D; Conditional gain (regression line): B > C > D > E > A. The vertical line and the horizontal line represent district average initial status and district average gain, respectively. ‹#›/28 Model subgroups directly (less data) • Focusing on achievement gaps and the likelihood of meeting the target in 20132014 and utilizing an accountability model not intended for evaluation. • Can use a longitudinal binomial growth model that simply models the probability over time that a subgroup will be proficient. • Does not require a vertically equated metric. • Provides a clear picture of current status. • Provides a direct estimate of progress over time. • Demonstrates where subgroups are and are going. ‹#›/28 Data Structure by subgroups Year 1 Girl Boy Low SES 215/245 234/257 Not Low SES 300/345 300/330 Girl Boy Low SES 220/249 230/260 Not Low SES 304/351 304/326 Girl Boy 215/232 243/260 3006/3347 301/330 Year 2 Year 3 Low SES Not Low SES ‹#›/28 Binomial longitudinal model 1.00 LOW_1_1 = 0,GIRL_1_1 = 0 LOW_1_1 = 0,GIRL_1_1 = 1 LOW_1_1 = 1,GIRL_1_1 = 0 LOW_1_1 = 1,GIRL_1_1 = 1 Prob of Proficient 0.75 0.50 0.25 0 0 1.00 2.00 3.00 4.00 Year Figure 2: Subgroup performance over four years. ‹#›/28 Binomial longitudinal model LOW_1_1 = 0 LOW_1_1 = 1 Lev-id 6057814 Lev-id 6057822 0.772 MPROFNUM MPROFNUM 0.772 0.596 0.421 0.245 0.069 0.596 0.421 0.245 0.069 0 1.00 2.00 3.00 4.00 0 1.00 YEAR2 0.772 0.596 0.596 0.421 0.245 0.069 4.00 0.421 0.245 0.069 0 1.00 2.00 3.00 4.00 0 1.00 YEAR2 2.00 3.00 4.00 YEAR2 Lev-id 6061337 Lev-id 6061345 0.772 0.772 0.596 0.596 MPROFNUM MPROFNUM 3.00 Lev-id 6057848 0.772 MPROFNUM MPROFNUM Lev-id 6057830 0.421 0.245 0.069 0.421 0.245 0.069 0 1.00 2.00 3.00 4.00 0 1.00 YEAR2 2.00 3.00 4.00 YEAR2 Lev-id 6061352 Lev-id 6061360 0.772 MPROFNUM 0.772 MPROFNUM 2.00 YEAR2 0.596 0.421 0.245 0.069 0.596 0.421 0.245 0.069 0 1.00 2.00 YEAR2 3.00 4.00 0 1.00 2.00 3.00 4.00 YEAR2 ‹#›/28 Binomial growth model by NCLB categories 100% 90% 80% 70% 60% State AMO 50% 40% 30% 20% 10% 0% year 0 year 1 year 2 SY 2001 SY2002 SY2003 SY2004 SY2005 SY2006 SY2007 SY2008 SY2009 SY2010 SY2011 SY2012 SY2013 Can see where schools are and where they are headed. Despite favorable early results, only 26% of schools meet 2013-2014 standard (incl. CI). ‹#›/28 Decomposing changes in school performance (more data) • Cohort and individual student performance. • Change Cohort Performance. • Change in individual student performance. ‹#›/28 Cohorts year 1 (less data) 8 Achievement ti 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 Grade ti ‹#›/28 Cohorts year 2 8 Achievement ti 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 Grade ti ‹#›/28 Panel and cohorts 8 Betw een at Time1 and Time 2 7 Achievement ij 6 5 Within 4 3 2 1 0 0 1 2 3 4 5 6 7 8 Grade ‹#›/28 Longitudinal Cohort & Panel Growth Model (more data). Table 5: Random effects Variability Breakdown Level 1 Temporal variation Level 2 Between students within cohorts, schools Initial Status Individual growth 84.9% 42.7% Level 3 Between cohorts, within schools Initial Status Individual growth Cohort growth 6.7% 42.2% 45.2% Level 4 Between schools Initial Status Individual growth Cohort growth 8.4% 15.1% 54.8% ‹#›/28 Relationship between initial status, cohort, and individual student growth r0 r .18 r .52 Figure 9: Four level mixed model value added plot. ‹#›/28 Practical applications • Some capacity and assumed stakeholder understanding affects state growth model proposals • Regression based growth models • Value tables • Percent of expected growth • Percent achieving a year’s growth • All accountability models depend on value – some models explicitly assign value to results (although this can be somewhat arbitrary, e.g. year’s growth, expected growth, or points for changing categories) ‹#›/28 Discussion • A aggregate measure leads to an ecological fallacy. • Static measures ignore accumulated effects on performance over time. • AYP is an aggregate static measure. • Cohort improvement models address school improvement but not directly student growth. • Panel growth models follow individual students. • How handle external factors? • Use initial status • Use student background • Time frame • No model assures meeting the target and growth and assuming many factors remain constant – still unlikely a majority of schools will reach 100% proficiency in 2013-2014. ‹#›/28 Pete Goldschmidt voice 310.794-4395 email [email protected] ©2006 Regents of the University of California next presentation ‹#›/28