Transcript Document

Practical Considerations for
Choosing an Accountability Model
Pete Goldschmidt
American Educational Research Association
Annual Meeting
San Francisco, CA - April 7-11, 2006
UCLA Graduate School of Education & Information Studies
National Center for Research on Evaluation,
Standards, and Student Testing
Practical considerations - 1
• Need to decide what the accountability
system is trying to do.
• Accountability models should hold schools
accountable for those things that schools are
responsible.
• Generally we consider those things to be
student outcomes.
• Outcomes can be multi-faceted, but emphasis
is on academic performance.
• Academic performance usually limited to a
few subjects.
• Academic performance usually measured
by a large scale assessment.
‹#›/28
Accountability Means That We Intend to Hold
Schools Responsible for Student Outcomes
• Student outcomes accumulate over time.
• Student achievement in grade 4 is a function of
achievement in grade 3 and grade 2 (etc) as well
as school processes, family background, innate
ability, peers, luck, and error. (Hanusheck, 1979).
• To get an unbiased estimate the effect of school
processes, would need a measure of ability prior
to any schooling.
• Unlikely to have those data, but to reduce bias
estimate marginal effect of schooling on the
change in achievement from one grade to the
next (fixed info incorporated into previous test
score).
• By only looking at incremental change reduce
bias from not having true a priori measures.
‹#›/28
Practical considerations - 2
• An accountability model should only be
based on results that reflect the effects
of internal factors (factors schools
control).
• Simple aggregate static measures of
student performance judge schools
based on both internal and external
factors – but are overly influenced by
external factors.
‹#›/28
Concerns - 1
• Static models assumes current
achievement in solely a function of
current schooling processes.
• Aggregating individual student variables
inflates their importance – correlations
between aggregate performance and
school enrollment characteristics
about .75.
‹#›/28
Directly Comparing the Relationships
Among Indicators Reveals…
API 2000
API 2000
API 2001
Overall Quality
External Factor
Internal Factor
1
API 2001
Overall
Quality
External
Factor
Internal
Factor
.995
.757
.889
.405
1
.754
.884
.419
1
.900
.857
1
.584
1
‹#›/28
Concerns - AYP is a Simple Aggregate
Static Measure
• AYP, by construction also penalizes large (and)
heterogeneous schools.
• AYP as an accountability model can categorize
schools, but does so very imprecisely.
• Policy makers want the accountability model to
provide more than simply a categorization (and
at least the correct categorization) but potentially
also use the results to inform school
improvement.
• It ignores the fact that the accumulation of both
external and internal factors over time affect
current student performance.
• Although P = progress, no inferences regarding
progress can be made as AYP does is heavily
influenced by external factors – factors that are
outside of school’s control.
‹#›/28
Validating AYP
• Validate AYP results with additional
accountability model – good idea
• Too often focus is on choosing the “right”
model – as decided by: identifying fewer
schools as not making AYP
• AYP results do not match growth or
value added results.
• The less an accountability model relies on
static aggregate information to rank schools,
the less its results will match AYP – and the
more likely it will be that school traditionally
thought of a being good, will not look as
favorable
‹#›/28
Policy – What should results address?
• Growth and value added models address different
questions (states should explicitly address this
rather than have multiple sets of results that may
confuse stakeholders)
• e.g. If you are interested in how everyone at
the school did this year.
• If you are interested in how this cohort of
students is dong this year.
• If you are interested in how students improved
from last year.
• If you are interested in how second grade (3rd
etc) improved.
• Achievement gaps.
• How will we get to 100% proficiency?
‹#›/28
Concerns - 3
• Accountability Model Will Not Facilitate
Meeting Ultimate Goal in 2013-2014
• No model can ensure that a state will
meet goals
• e.g. we may have accounting rules in business,
and the goal of every business – say Delta, for
example, aims to make a profit, but simply
setting a trajectory for growth in revenue (and
decrease in costs) by year does not guarantee
it will happen
‹#›/28
Practical considerations – moving
beyond AYP - Data
• Is more data better?
• Yes
• Allows for longer growth trajectories.
• Generate more precise estimates of growth.
• Reduces effect of initial status on growth (or
picks up that initial status is not finitely
determinant of future.
‹#›/28
Data
• But,
• More data, more missingness (although it is
beneficial to tradeoff additional missingness
for more occasions).
• Time span may not be relevant for grade
level.
• Models using more occasions are more
consistent than models using fewer, but
models of different occasions are only
moderately correlated.
‹#›/28
Data and policy questions
• Expectations.
• Accounting for external factors.
• Gaps and changes in gaps.
• Interested in cohort to cohort improvement.
• Cohort models have less stringent data
requirements than panel models (e.g. vertical
scaling).
• Cohort results may be confounded with external
factors (e.g. changes in student background).
• Interested in individual student growth.
• Need to consider metric and its interpretation.
• More likely to address confounding issues.
‹#›/28
Practical way to combine results?
High Growth
Second cut
First cut
First cut
Low Growth
Not Met AYP
Met AYP
‹#›/28
Criteria You Choose Will Yield Different
Conclusions About School Performance
Considering the relationship
40
School B:
Highest Gain
School C: Low Initial
Status, High Gain
schgain
35
School A:
Highest Initial Status
30
25
School D: Low Initial Status,
Better than expected gain
School E: Medium Initial Status,
Lower than expected gain
20
170
180
190
200
210
220
schinitial
Note: Schools rank by different criteria.
Status (x-axis): A > B > E > C > D;
Gain (y-axis): B > C > A > E > D;
Conditional gain (regression line): B > C > D > E > A.
The vertical line and the horizontal line represent district average initial status and district average gain, respectively.
‹#›/28
Model subgroups directly
(less data)
• Focusing on achievement gaps and the
likelihood of meeting the target in 20132014 and utilizing an accountability model
not intended for evaluation.
• Can use a longitudinal binomial growth model that
simply models the probability over time that a
subgroup will be proficient.
• Does not require a vertically equated metric.
• Provides a clear picture of current status.
• Provides a direct estimate of progress over time.
• Demonstrates where subgroups are and are going.
‹#›/28
Data Structure by subgroups
Year 1
Girl
Boy
Low SES
215/245
234/257
Not Low SES
300/345
300/330
Girl
Boy
Low SES
220/249
230/260
Not Low SES
304/351
304/326
Girl
Boy
215/232
243/260
3006/3347
301/330
Year 2
Year 3
Low SES
Not Low SES
‹#›/28
Binomial longitudinal model
1.00
LOW_1_1 = 0,GIRL_1_1 = 0
LOW_1_1 = 0,GIRL_1_1 = 1
LOW_1_1 = 1,GIRL_1_1 = 0
LOW_1_1 = 1,GIRL_1_1 = 1
Prob of Proficient
0.75
0.50
0.25
0
0
1.00
2.00
3.00
4.00
Year
Figure 2: Subgroup performance over four years.
‹#›/28
Binomial longitudinal model
LOW_1_1 = 0
LOW_1_1 = 1
Lev-id 6057814
Lev-id 6057822
0.772
MPROFNUM
MPROFNUM
0.772
0.596
0.421
0.245
0.069
0.596
0.421
0.245
0.069
0
1.00
2.00
3.00
4.00
0
1.00
YEAR2
0.772
0.596
0.596
0.421
0.245
0.069
4.00
0.421
0.245
0.069
0
1.00
2.00
3.00
4.00
0
1.00
YEAR2
2.00
3.00
4.00
YEAR2
Lev-id 6061337
Lev-id 6061345
0.772
0.772
0.596
0.596
MPROFNUM
MPROFNUM
3.00
Lev-id 6057848
0.772
MPROFNUM
MPROFNUM
Lev-id 6057830
0.421
0.245
0.069
0.421
0.245
0.069
0
1.00
2.00
3.00
4.00
0
1.00
YEAR2
2.00
3.00
4.00
YEAR2
Lev-id 6061352
Lev-id 6061360
0.772
MPROFNUM
0.772
MPROFNUM
2.00
YEAR2
0.596
0.421
0.245
0.069
0.596
0.421
0.245
0.069
0
1.00
2.00
YEAR2
3.00
4.00
0
1.00
2.00
3.00
4.00
YEAR2
‹#›/28
Binomial growth model by NCLB categories
100%
90%
80%
70%
60%
State AMO
50%
40%
30%
20%
10%
0%
year 0
year 1
year 2
SY
2001
SY2002 SY2003 SY2004 SY2005 SY2006 SY2007 SY2008 SY2009 SY2010 SY2011 SY2012 SY2013
Can see where schools are and where they are headed.
Despite favorable early results, only 26% of schools meet 2013-2014 standard (incl. CI).
‹#›/28
Decomposing changes in school
performance (more data)
• Cohort and individual student
performance.
• Change Cohort Performance.
• Change in individual student
performance.
‹#›/28
Cohorts year 1 (less data)
8
Achievement ti
7
6
5
4
3
2
1
0
0
1
2
3
4
5
6
7
8
Grade ti
‹#›/28
Cohorts year 2
8
Achievement ti
7
6
5
4
3
2
1
0
0
1
2
3
4
5
6
7
8
Grade ti
‹#›/28
Panel and cohorts
8
Betw een at Time1 and Time 2
7
Achievement ij
6
5
Within
4
3
2
1
0
0
1
2
3
4
5
6
7
8
Grade
‹#›/28
Longitudinal Cohort & Panel Growth
Model (more data).
Table 5: Random effects
Variability
Breakdown
Level 1
Temporal variation
Level 2
Between students within cohorts, schools
Initial Status
Individual growth
84.9%
42.7%
Level 3
Between cohorts, within schools
Initial Status
Individual growth
Cohort growth
6.7%
42.2%
45.2%
Level 4
Between schools
Initial Status
Individual growth
Cohort growth
8.4%
15.1%
54.8%
‹#›/28
Relationship between initial status,
cohort, and individual student growth
r0
r  .18
r  .52
Figure 9: Four level mixed model value added plot.
‹#›/28
Practical applications
• Some capacity and assumed stakeholder
understanding affects state growth model
proposals
• Regression based growth models
• Value tables
• Percent of expected growth
• Percent achieving a year’s growth
• All accountability models depend on value – some
models explicitly assign value to results (although
this can be somewhat arbitrary, e.g. year’s
growth, expected growth, or points for changing
categories)
‹#›/28
Discussion
• A aggregate measure leads to an ecological fallacy.
• Static measures ignore accumulated effects on
performance over time.
• AYP is an aggregate static measure.
• Cohort improvement models address school
improvement but not directly student growth.
• Panel growth models follow individual students.
• How handle external factors?
• Use initial status
• Use student background
• Time frame
• No model assures meeting the target and growth and
assuming many factors remain constant – still unlikely
a majority of schools will reach 100% proficiency in
2013-2014.
‹#›/28
Pete Goldschmidt
voice
310.794-4395
email
[email protected]
©2006 Regents of the University of California
next presentation
‹#›/28