NCLB and Growth Models: In Conflict or in Concert?

Download Report

Transcript NCLB and Growth Models: In Conflict or in Concert?

NCLB and Growth Models: In
Conflict or in Concert?
Susan L. Rigney, United States Department of Education
Joseph A. Martineau, Michigan Department of Education
Presented at the MARCES conference on
Longitudinal Modeling of Student Achievement
College Park, MD
November 7, 2005
Introduction
“In response to your concerns about giving
schools credit for improving student
achievement, we are also considering the
idea of a growth model…”
Margaret Spellings
9/13/05
Author Perspectives

Sue Rigney
Education Specialist in the office of Student
Assessment and School Accountability (Title I)
at the U. S. Department of Education.
 Primary responsibility = monitoring state
compliance with the standards, assessment
and accountability requirements of NCLB
 Secondary responsibility = contributing to
ongoing discussion, clarification and
implementation of policies related to
assessment and accountability.

Author Perspectives

Joseph Martineau




Psychometrician for the Michigan Office of Educational
Assessment and Accountability.
Primary concerns = congruence of accountability
systems with values of educational research &
adequacy of statistical & psychometric methodology
His secondary concerns = philosophy and policy of
accountability in terms of both practicality and feasibility
Authorship should not be construed as an endorsement
of NCLB as a whole.
In conflict?
CRS says
 Substantial interest…in the possible use of
individual/cohort growth models… Such AYP
models are not consistent with certain statutory
provisions of NCLB as currently interpreted by
USED
But, NCLB (Sec 4) says
 The Secretary shall take such steps as are
necessary to provide for the orderly transition to,
and implementation of, programs authorized by
this Act
In concert?

USED Growth Model Study Group

IES grant for longitudinal data systems

State Accountability Workbook
Amendments
Types of Models

Definitions developed by a State collaborative
through CCSSO (Goldschmidt et al, 2005)
 Definitions

Cross-sectional models



Status Models
Improvement Models
Longitudinal Models


Growth Models
Residual Growth (RG) Models


Commonly labeled “Value Added” Models
Why we use the term RG
The Intersection of Policy and Growth Models

3-8 Assessments Provide Longitudinal Data

Safe Harbor

Use of Improvement Index in AYP

CCSSO SCASS Activities

USED Assistant Secretary Luce
Systemic Coherence:
A Standard for Evaluating Models

Three broad principles of systemic
coherence
Models are consistent with policy goals
 Models are integrated as a part of a consistent
system of content standards, assessments,
performance standards, and accountability
criteria
 Models are implemented in a manner
consistent with the values of educational
research

1. Standards-based
Assessments must cover depth and breadth
 Results expressed in terms of performance
levels
 % Proficient is most influential component
of AYP

2. All Students



Participate (95% rule)
Results reported for all
AYP = Not all Visible




Full Academic Year
Minimum n
LEP exemption for ELA test
Held to same standards

Alternate based on alternate achievement
standards
3. School Improvement


Annual Measurable Objectives

Increased in 2004-05

Adjustment for transition in 2005-06
School accountable for subgroups


More visible in 2005-06
Consequences

Can/should growth moderate consequences?
Consistency of Content Standards, Assessments,
Performance Standards, and Accountability Criteria

Accountability based on academic
indicators

Peer Review of State Assessment Systems
Alignment
 Performance descriptors
 Alternate assessments

Coherent Assessment System
State assessments
 Rational, coherent design
 Relative contribution of different tests
 Matrix forms equivalent
 Comparability


English vs Spanish
Computer vs paper & pencil
Local assessments
 Aligned, equivalent, comparable results for
subgroups, aggregable
Results understandable


Educators know what to do

Articulation across grades

Articulation across performance levels
A “progression matrix” that show

Proficient is different from basic because…

Proficient in third grade is different form proficient in fourth grade
because…

Administrators know how to allocate resources
Consistency with Values of Educational
Research

As defined by Gregory N. Derry1.

Free flow of information & Curiosity
Replicability
 Thorough peer review
 Improvement


Honesty and Open-mindedness
Willingness to consider multiple alternatives
 Scrupulous investigations of weaknesses
 Flexibility to adopt feasible improvements

1
Professor of Physics at Loyola University and author of What Science Is and How It
Works (Princeton University Press, 1999)
Attributes of Systemic Coherence Applicable
in this Context
1.
2.
3.
4.
5.
Alignment of standards and assessments
The same performance standards for all
Inclusion of all student groups
Explicit tracking of achievement gaps
Appropriate statistical and psychometric
models
6. A program of ongoing research
7. Consistency of reports with all other
attributes
1. Alignment of Standards and Assessments

Foundation of validity of school
accountability decisions

USED expects independent verification of
Full range of content standards?
 Address content and process skills?
 Same degree and pattern of emphasis?
 Scores reflect full range of achievement?
 Procedures to maintain/improve?

Alignment methods

Alignment Methodology
Webb (SCASS TILSA)
 Porter (SCASS SEC)
 Achieve
 Buros

Methods do not address articulation across
grades
 JM: Current instantiations of “independent
review” may underestimate alignment

2. The Same Standards for All Students

Grade-level achievement standards


All students proficient by 2013-14



Except for students with most significant cognitive disabilities (1%)
What about growth toward proficient?
What about length of time in system?
Proposals to balance fairness toward both educators and
student groups should also be a part of any plan to
implement growth models for accountability purposes.
Fairness toward one should not be sacrificed for fairness
toward the other.
2. The Same Standards for All Students

JM: The NCLB expectation that all students will be
proficient by a given date seems unreasonable. The
recognition that there will always be individual differences
among students (and aggregate differences across
schools in their intake populations) should also be
incorporated in setting policy targets.
 SR: Safe harbor recognizes that adequate yearly progress
may be met with less than 100% meeting annual and longrange goals.
 JM: The safe harbor provision of NCLB is a good
beginning, but does not fully account for these realities.
2. The Same Standards for All Students

JM: The punitive nature of NCLB consequences can
actually undermine policy objectives by adding turbulence
to schools serving low-achieving students.
 SR: The pressures of accountability have resulted in
remarkable successes (Ed Trust), and there are multiple
safeguards to prevent Type I error.
 JM: The multiple safeguards are an important starts, but
policies encouraging more assistance in and attraction of
highly effective educators to low-achieving schools is more
likely to support the policy objectives.
 SR: NCLB funds are available for recruitment and
retention bonuses, and data indicate that states are
beginning to use these funds in this way.
Implications for growth model

Expectation of same growth for all
maintains achievement gap

Expectation of 12 months growth in 1 year
maintains achievement gap

Expectation of normative growth
maintains achievement gap
3. Inclusion of All Student Groups

Missing data means missing students


How many missing students does it take to
compromise validity?
Robustness to missing data does not imply
that it is OK to leave out data where it can
reasonably be obtained
4. Explicitly Tracking Achievement Gaps

Closing the achievement gap is a…
Policy objective
 Matter of ethics
 Attainable


Tracking the achievement gap makes
inequities publicly visible
4. Explicitly Tracking Achievement Gaps,
continued…
Separate models from those used to track
attainment of growth targets
 Include in the model variables defining
policy-defined subgroups
 Interaction of grade with subgroup variables
 Simple graphical representation of the
results

5. Appropriate Statistical and Psychometric
Models

Statistical concerns






Match of model to data structure
Violations of assumption
Do random effects models “cheat?”
How do we integrate results from alternate
assessments?
What is the sample, and what is the population?
Different models needed for different purposes



Meeting growth targets
Tracking achievement gaps
Primary research
5. Appropriate Statistical and Psychometric
Models

Statistical concerns

Are the models correlational or causal? The mandated
data collection is correlations.


JM: The mandated policy uses are more causal. The
descriptive statistics are used to label schools as in need of
improvement, and if students are not achieving reasonable
goals, it is hard to argue with this label. However, the
distinction between schools in need of improvement and
ineffective educators is unlikely to be either fathomed or
appreciated by many people. The nature of NCLB
consequences invites this unfounded interpretation.
SR: The statute provides substantial resources for
professional development and instructional materials in order to
help educators meet the extraordinary needs of the children
they serve.
5. Appropriate Statistical and Psychometric
Models, continued…

Unwarranted assumptions

No equating error
Vertical – Doran (2005)
 Horizontal – not studied, but most assessments only
have a few anchor items in common across years


Interval level scale
If using scale scores, most models assume equal
interval measurement
 Psychometrically suspect
 Effects not well studied

5. Appropriate Statistical and Psychometric
Models, continued…

Unwarranted assumptions, continued…

A single continuous scale on the same construct across grades
(vertical or developmental scales)

Mathematical demonstrations (Martineau, 2004, in press)






We purposely build content shift into our assessments across grades
High correlations among sub-constructs do not take care of the problem
Students where growth is occurring outside the curriculum-defined range
for the grade are not measured well
Effects of prior schools/grades become attributed to later schools/grades
Practically significant effects of the misattributions occur in all reasonably
conceivable assessment scenarios
Empirical validation (Lockwood et al, under peer review)



Subscales of math assessment, greater variability within teacher across
subscales than across teachers within subscale.
Low correlations in “value added” across subscales
The sub-content matters tremendously
5. Appropriate Statistical and Psychometric
Models, continued…

Unwarranted assumptions, continued…


We need to account for equating error
We need to study the effects of the interval-level
measurement assumption and either



We need to either



Validate the assumption, or
Not make the assumption
Develop psychometric models that can account for change in
content across grades, or
Not assume the same content across grades
Analytical models that avoid scale assumptions



Hill’s Value Table approach (this conference)
Betebenner transition matrix approach (2005)
Standards-based interpretations, can use baseline data
6. An Ongoing Program of Research
A turbulent field (“in its adolescence,” to
quote Lissitz)
 Large-scale implementation in a turbulent
field requires extraordinary flexibility to keep
up with the state of the art
 And yet, too much flexibility can thwart
useful interpretation of trend data

7. Consistency of Reports with Other
Attributes

Responsive to instruction?

Understandable to stakeholders?

Grounded in policy aims?

Valid & reliable?
Setting standards for growth
What’s reasonable?
vs
What do we hope to accomplish?
What’s fair?
Growth & school consequences
Growth
Less than
More than
1 year
Achievement
1 year one year
Advanced
OK(?)
OK
Great
Proficient
Not OK
OK
OK
Basic
Not Ok Not OK
OK (?)
Conclusions

Can we add growth?


Yes!
Should we add growth?

Yes, where there is an evaluative framework tied to policy
objectives, a systemic approach, and alignment with the values of
educational research

Must we add growth?

An option, not a requirement because of the extraordinary
necessary infrastructure
Recommendations for Policymakers
Understand the basic differences between
models – Run simulations with real data
 Understand the limitations

Listen to practitioners
 Listen to methodologists

Anticipate cost/benefits
 Lack of stability corrupts meaning
 Do not over-specify the details in statute

This field moves ahead quickly
 Flexibility to implement advances is key

Recommendations for Accountability
Implementation Staff

State Directors: give your staff time to write it up!!
 Require greater detail in the Technical Manuals
that allows for comprehensive review of the
procedures
 Explain it (as much as you can) to your legislators
and Congresspersons
 Challenge assumptions




Status quo is good
Change is good
Resource assumptions
Claims of proponents
Recommendations for Technical Researchers

Validity need not conflict with transparency

Validity


Transparency for non-technical stakeholders



Maintain sufficient complexity to produce valid results
Simple, but accurate reports
Grounded interpretations
Transparency for technical stakeholders



Comprehensive documentation of the entire system, including
psychometric and statistical models
Facilitation of replication
Facilitation of primary research on strengths and weaknesses
Recommendations for Technical Researchers

Pay systemic attention to…






Assumptions of psychometric models
Assumptions of content standard models
Assumptions of statistical models
Think carefully about what the models can tell us and
cannot tell us about instruction, curriculum, and student
development
Develop simple graphical representations of the model
and its important concepts for policymaker consumption
Become involved in public policy forums as a community
lobby in order to promote appropriate interpretation of
data.

We cannot give our cautions, wash our hands of how the data is
used, and stand on the outside of the political process
Recommendations for All Stakeholders

Realize that with all of the high stakes
surrounding accountability uses of student
achievement data, there are forces that can work
against community interests:


Economic benefits, reputations, and other personal
investments can cause proponents of specific systems
to avoid scrupulous investigations of the shortcomings
of those systems and/or the benefits of competing
approaches
Willingness to be and accountability for being rigorously
honest and open-minded about multiple approaches is
an essential part of improving and evaluating growthbased accountability systems