A Feasibility Study for the Vertical Linking of the

Download Report

Transcript A Feasibility Study for the Vertical Linking of the

Growth Options for California
County and District
Evaluators’ Meetings
May 10 and 19, 2005
Unpublished Work © 2005 by
Educational Testing Service
Californians Want to Measure Student
Growth



CST scales are separate by grade
Each grade has its own Basic (300) and
Proficient (350) standards
Connections do not presently exist
between grades
2
“Measuring growth” can mean different
things to different users

“Vertical scaling”




Catch-all phrase used by a variety of
people to represent growth measures
A technical term for one particular
statistical procedure
May or may not be most useful and costeffective growth measure needed by CA
Today we will explain options for
measuring growth and get your input
3
Progress Toward Determining the Best
Growth Measure(s) for CA






Exploratory study of vertical scaling of CSTs
Technical Advisory Group
Interviews of CA school district staff about
what growth measures would be useful
Growth Options Task Force
Evaluators’ meetings
Growth Options Task Force follow-up
4
Vertical Scaling (Technical definition)



Connect the scales across grades by
having students take “linking” items
from adjacent grade tests
These links place the items (and
scores) across grades on a common
scale
Scale scores might range from
200 (grade 2) up to 800 (grade 11)
5
Vertical Scaling

Ideal goals:





Scale scores increase by grade
Scale scores can be compared across grades
A 500 “means the same thing” if it comes
from a grade 4 test or a grade 5 test
“Growth” of 10 units “means the same thing”
in low grades as high grades
Ideal approximated in real life but
never exactly met


Vast majority of vertical scales have been
developed with published norm-referenced
tests
Few vertical scales exist for state standardsreferenced tests
6
Exploratory Vertical Scaling Study for
California



ELA grades 2-11
Math grades 2-7
Linking embedded in 2004 operational
CST testing


No incremental testing or cost to state
Linking items


Measured standards that were common
across adjacent grades
Placed in “field test buckets”
7
Design





N=3000 to 5000 per linking item
ELA 17-25 linking items per grade pair
Math 18-24 linking items per grade pair
Grade 2 students took some grade 3
items and grade 3 students took some
grade 2 items, etc.
Scales linked sequentially:
2<3<4<5<6<7<8<9<10<11
8
Evaluation of Links

Evidence that supports the
validity of vertical scaling is the
growth of student scores


Better performance of higher-grade
students than lower-grade students
on common items
Scale score distributions that
increase as grade increases
9
Findings:



Higher-grade students consistently did better
than lower-grade students on common items
that came from the higher-grade operational
test
Higher-grade students did not necessarily do
better than lower-grade students when
common items were from the lower-grade
operational test
Position effects were evident: items became
more difficult when they appeared later in a
test
10
Findings (cont.):

Scale scores generally increased by
grade except


ELA: grades 9, 10, 11 minimal growth
Math: grades 6 and 7 essentially no growth
11
Conclusions of exploratory study

Concerns:



Possible factors affecting vertical links




ELA: Minimal growth in grades 9, 10, and 11
Math: Minimal growth in grades 6 and 7
Item position effects
Grade x curriculum interactions
Changes in populations
Not clear if vertical scaling will work for
CSTs at all grades
12
Phone Interviews



March/April 2005
15 respondents from CA counties and
districts
Asked 5 questions
13
Are you currently using STAR data to make
any longitudinal comparisons, and if so,
what are you doing with that data?


Used NRT or CST
Aware of inappropriateness of
using current CST scale scores
for growth
14
Who are the most important potential users
in your district of longitudinal information?





Full range: Teachers to Superintendents
Parents
School Boards
Administrators: instructional planning
Teachers: expected student
performance
15
If we were able to improve the psychometric
underpinnings for making comparisons across grades
using CSTs, would that be of benefit to your district?
How would you plan to use that information?



Overwhelming enthusiasm for
legitimate method of making
longitudinal comparisons
Should provide legitimate procedure
so users don’t “hurt themselves”
Concern about over-burdening the
CSTs by addition of one more purpose
16
Longitudinal comparisons do have their limitations and
can be misinterpreted, so we’d like to get your input on
what interpretive materials would be most useful to you.




Current post-test workshops and
guides should cover this
Few saw need for special efforts
Largest districts have resources
to address this
Teacher-specific interpretive
materials would be helpful
17
One of the options we are considering is a vertical scale. If
we used a vertical scale, there would be some changes, and
we would need to have an in-grade scale that differed from
an “across- grade” scale. Would that be a problem in your
district?

Two diametrically opposed opinions:



Acquired meaning of 300 and 350 too
important to do away with
The meaning of 300 and 350 could be
easily supplanted
Use of both in-grade and across-grade
scales seen as complicated and
potentially confusing
18
Growth Options Task Force
Tom Barrett, Riverside USD
Paula Carroll, Lodi USD
J.T. Lawrence, San Diego COE
Phil Morse, LAUSD
Jim Parker, Paramount USD
Jim Stack, SFUSD
Mary Tribbey, Butte COE
Mao Vang, Sacramento City USD
19
Major Options for Tracking Growth



Vertical Scales
Norms
Tables of Expected Growth
20
Vertical Scales

Advantages



Scale scores comparable across grades
Useful if tracking students across many
grades
Suitable for statistical analyses
21
Vertical Scales

Disadvantages






Assumption of hierarchical growth maybe not met;
scores may not grow between grades
Across-grade scale different from within-grade
scale
Can highlight inconsistencies (if they exist) of
with-in grade standards
Scale scores have no intrinsic meaning
Need caution in comparing growth in different
parts of scale
Special data collection needed
22
Norms




CA percentiles, NCEs, or Z-scores
By grade by content area
“Typical” growth defined to be what is
seen cross-sectionally in state from
grade to grade
Types


Static (using a base year such as 2003)
Rolling (using current year)
23
Norms

Advantages





Fairly easy to understand
Allow comparisons of relative standing and
growth relative to norm group
Minimal assumptions are required
Comparisons can be made across content
areas
No special data collection needed
24
Norms

Disadvantages




Need to keep clear relative nature of
comparison (static vs rolling norm)
No continuous growth scale
Growth expectations are based on crosssectional, not longitudinal data
“Typical” growth does not necessarily
mean student is progressing sufficiently
toward Proficiency
25
Tables of Expected Growth




Use longitudinal CA data (e.g., grade 3 and 4
performance for the same students)
Determine statistical expectation of grade 4
scores typically seen for students with each
possible grade 3 score
Calculate standard error along with
expectation
Standardized deviations from expectations
can be compared across grades and content
areas
26
Tables of Expected Growth

Advantages





Fairly easy to understand
Allow comparisons of growth relative to
norm group
Minimal assumptions are required; could
be done for high school courses
Comparisons can be made across content
areas
Based on actual student growth
27
Tables of Expected Growth

Disadvantages





Tables of expectations may need to be
recalculated each year
No continuous growth scale
“Typical” growth does not necessarily mean
student is progressing sufficiently toward
Proficiency
Matching student data over years required
Expectations would not include students who
have been in CA < 1 year or who cannot be
tracked
28
Growth Options Task Force






Discussed options in detail for a day
Norms may be most easily understood
Growth Expectations may be most useful for
administrators and program evaluation
Classification may be useful: Growth is
average/above average/below average
Standardized growth measures that could be
pooled over grades could be useful:
(Observed score – Expected score)/SE
Will work with CDE and ETS to pilot test some
options
29