Causal Rasch Models and Individual Growth Trajectories National Center for the Improvement of Educational Assessment January 18, 2011 A.

Download Report

Transcript Causal Rasch Models and Individual Growth Trajectories National Center for the Improvement of Educational Assessment January 18, 2011 A.

Causal Rasch Models and Individual Growth
Trajectories
National Center for the Improvement of Educational Assessment
January 18, 2011
A. Jackson Stenner
Chairman & CEO, MetaMetrics
[email protected]
1
“Although adopting a probabilistic
model for describing responses to an
intelligence test, we have taken no
sides in a possible argument about
responses being ultimately
explainable in causal terms.”
(Rasch, 1960, p.90)
2
Three well researched constructs
 Reader ability
 Text Complexity
 Comprehension
3
Reader Ability
Temperature
4
Reading is a process in which
information from the text and
the knowledge possessed by
the reader act together to
produce meaning.
Anderson, R.C., Hiebert, E.H., Scott, J.A., & Wilkinson, I.A.G. (1985)
Becoming a nation of readers: The report of the Commission on Reading
Urbana, IL: University of Illinois
5
An Equation
Conceptual
Reader
Ability
=
Comprehension
-
Text
Complexity
Statistical
Raw
Score
=

i
e (RA – TC i )
1 + e(RA – TC )
i
RA = Reading Ability
TC = Text Calibrations
6
Each of these thermometers is engineered to
use the same correspondence table
Each of these reading tests is engineered to use
the same correspondence table
7
o
Correspondence Table: C and Lexile
Raw
Score
Co
Lexile
Raw
Score
Co
Lexile
Raw
Score
Co
Lexile
Raw
Score
Co
Lexile
1
35.6
378L
12
36.8
905L
23
38.0
1116L
34
39.2
1331L
2
35.7
509L
13
36.9
926L
24
38.1
1134L
35
39.3
1355L
3
35.8
589L
14
37.0
947L
25
38.2
1151L
36
39.4
1381L
4
35.9
647L
15
37.1
968L
26
38.3
1170L
37
39.6
1409L
5
36.0
695L
16
37.2
987L
27
38.4
1188L
38
39.7
1440L
6
36.1
736L
17
37.3
1007L
28
38.6
1207L
39
39.8
1474L
7
36.2
770L
18
37.4
1025L
29
38.7
1226L
40
39.9
1513L
8
36.3
801L
19
37.6
1044L
30
38.8
1245L
41
40.0
1560L
9
36.4
830L
20
37.7
1062L
31
38.9
1265L
42
40.1
1616L
10
36.6
857L
21
37.8
1080L
32
39.0
1286L
43
40.2
1697L
11
36.7
881L
22
37.9
1098L
33
39.1
1308L
44
40.3
1829L
8
Anatomy of Two Measurement Procedures
Aspect/Construct
Temperature
Reader Ability
Object of measurement
Person
Person
Instrument
Thermometer
Reading test
Measurement outcome
Number of theory calibrated cavities
(0-45)
that fail to reflect green light
Count correct on a collection of 45
theory calibrated test items
Substantive theory
Thermodynamic theory
Lexile Theory
Unit of measurement
Degree Fahrenheit (oF)
Lexile (L)
Correspondence table/
calibration equation
Exploits a chemical reaction and light
absorption to table temperature as a
function (Guttman Model)
of a sufficient statistic
Exploits semantic and syntactic
features of test items to table reader
ability as a function (Rasch model) of
a sufficient statistic
Measure/Quantity
Measurement outcome converted into
a quantity via the substantive theory
Measurement outcome converted into
a quantity via the substantive theory
Readable technology
NexTemp Thermometer™
Oasis™
General objectivity
Point estimates of temperature are
independent of the thermometer
Point estimates of reader ability are
independent of the reading test
9
Ten Features of Causal Response Models –
whether Guttman or Rasch
1.
Both measurement procedures depend on within-person causal
interpretations of how these two instruments work. NexTemp uses a
causal Guttman Model, The Lexile Framework for Reading uses a causal
Rasch Model.
2.
In both cases the measurement mechanism is well specified and can be
manipulated to produce predictable changes in measurement outcomes
(e.g. percent correct or percent of cavities turning black).
3.
Item parameters are supplied by substantive theory and, thus, person
parameter estimates are generated without reference to or use of any
data on other persons or populations. Therefore, effects of the examinee
population have been completely eliminated from consideration in the
estimation of person parameters for reader ability and temperature.
10
Ten Features of Causal Response Models –
whether Guttman or Rasch cont’d.
4.
In both cases the quantitivity hypothesis can be experimentally tested
by evaluating the trade-off property. A change in the person parameter
can be off-set or traded-off for a compensating change in the
measurement mechanism to hold constant the measurement outcome.
5.
When uncertainty in item difficulties is too large to ignore, individual
item difficulties may be a poor choice to use as calibration parameters in
causal models. As an alternative we recommend, when feasible,
averaging over individual item difficulties to produce “ensemble”
means. These means can be excellent dependent variables for testing
causal theories.
6.
Index models are not causal because manipulation of neither the
indicators nor the person parameter produces a predictable change in
the measurement outcome.
11
Ten Features of Causal Response Models –
whether Guttman or Rasch, cont’d.
7.
Causal Rasch models are individual centered and are explanatory at both within-subject and
between-subject levels. The attribute on which I differ from myself a decade ago is the same
attribute on which I differ from my brother today.
8.
When data fit a Rasch model differences between person measures are objective. When data
fit a causal Rasch model absolute person measures are objective (i.e. independent of
instrument).
9.
The case against an individual causal account, although popular, has been poorly made.
Investigators need only experiment to isolate the causal mechanism in their instruments, test
for the trade-off property and confirm invariance over individuals. This has been
accomplished for a construct, reader ability, that has been described by scholars as the most
complex cognitive activity that humans regularly engage in. Given the success with reading,
we think it likely that other behavioral constructs can be similarly measured.
10.
Causal Rasch models make possible the construction of generally objective growth
trajectories. Each trajectory can be completely separated from the instruments used in its
construction and from the performance of any other persons whatsoever.
12
To causally explain a phenomenon [a measurement
outcome] is to provide information about the factors
[person processes and instrument mechanisms] on
which it depends and to exhibit how it depends on
those factors. This is exactly what the provision of
counterfactual information…accomplishes: we see
what factors some explanandum M [measurement
outcome, raw score] depends on (and how it depends
on those factors) when we have identified one or
more variables such that changes in these (when
produced by interventions) are associated with
changes in M (Woodward, 2003, p.204).
13
How Many Ways Can We Say X Causes Y?
X “elicited a greater” Y
X “impacts” Y
X “accounts for” Y
X “has been linked to” Y
Y “is the result of” X
X “didn’t diminish” Y
Y “because of” X
Y “depends on” X
X “has led to” Y
X “largely motivates” Y
Y “stemmed from” X
X “proved critical to” Y
X “fosters” Y
X “changes” Y
X “triggers” Y
X “affects” Y
14
Psychometrics vs. Metrology
Aspect
Group Centered
Individual Centered
Interpretation of Probability
Interpretation involves 100 people
with the same ability answering a
single item
Interpretation involves
administering 100 items with the
same calibration to a single person
Person Measures
A person’s response record is
embedded in different samples and
each group specific Rasch analysis
produces a different measure
A person’s response record is
evaluated against theory-referenced
calibrations
Measurement Error
Traditional test theory uses a sample
standard deviation and a sample
correlation to compute an SEM
which is intended to characterize the
individual
ISEM is the within person standard
deviation over replications of the
measurement procedure
Data Fit to the Model
Varies with the locally constructed
frame of reference, sample
dependent
Fit is to a theory, thus, sample
independent
Validity
Correlational, thus, sample
dependent
Causal within person, thus, sample
independent
15
16
17
Figure 1: Plot of Theoretical Text Complexity
versus Empirical Text Complexity for 475 articles
“Pizza Problems”
r = 0.952
r” = 0.960
R2” = 0.921
RMSE” = 99.8L
18
What could account for the 8% unexplained
variance?
 Missing Variables
 Improved Proxies/Operationalizations
 Expanded Error Model
 Rounding Error
 Interaction between Individual and Text
 Psychometric Uncertainty Principle
19
20
Student 1528
May 2007 – Dec. 2009
6th Grade
Male
Hispanic
Paid Lunch
284 Encounters
117,484 Words
2,894 Items
848 Minutes
Text Demands for
College and Career
1600
1400
1200
1000
May 2016
(12th Grade)
21
Item-Based vs.
Ensemble-Based Psychometrics
22
Reading Task-Complexity Plane for Dichotomous
Items
Unit Size Adjustment
Applied to Logits
1.3
1.2
Added
Easiness
1.1
Added
Hardness
1.0
0.9
0.8
Auto-Generated
Cloze
0.7
Native Lexile
Production
Cloze
23
Comparing Item-Based vs. EnsembleBased Psychometrics
 Item-Based
– Item statistics
– Item characteristic curves
– DIF for items
 Ensemble-Based
– Ensemble statistics
– Ensemble characteristic curves
– DIF for ensembles
24
The Ensemble
 Objective: Correspondence Table
– Raw score to Lexile measure
 What we think we know
– Mean and spread of item distributions for a passage
 What is assumed to be unknown
– Individual item difficulties
1300L
(132L)
25
The Process – Iteration 1
STEP 1
STEP 2
Sample 45 Item
Difficulties from
Ensemble
Compute Lexile
Measures for Each
Raw Score (1 to 44)
STEP 3
Table Results
Sample 1
Raw Score
Lexile Measure
1
2
3
.
.
.
44
362L
514L
584L
.
.
.
1811L
26
The Process – Iteration 2
STEP 1
STEP 2
Sample 45 Item
Difficulties from
Ensemble
Compute Lexile
Measures for Each
Raw Score (1 to 44)
STEP 3
Table Results
Sample 1
Sample 2
Raw Score
Lexile Measure
Lexile Measure
1
2
3
.
.
.
44
362L
514L
584L
.
.
.
1811L
354L
506L
575L
.
.
.
1797L
27
The Process – Iteration 1,000
STEP 1
STEP 2
Sample 45 Item
Difficulties from
Ensemble
Compute Lexile
Measures for Each
Raw Score (1 to 44)
Sample 1
…
STEP 3
Table Results
Sample 1,000
Mean of 1,000
Raw Score
Lexile Measure
Lexile Measure
Mean Lexile Measure
1
2
3
.
.
.
44
362L
514L
584L
.
.
.
1811L
354L
506L
575L
.
.
.
1797L
378L
509L
589L
.
.
.
1829L
28
Closing
No matter how it is sliced and diced, analyses
of joint and conditional probability
distributions yield no more than patterns of
association. Nothing in the response data nor
Rasch analyses of these data exposes the
processes (features of the object of
measurement) or mechanisms (features of the
instrument) that are hypothesized to be
conjointly causal on the measurement
outcomes.
29
Contact Info:
A. Jackson Stenner
CEO, MetaMetrics
[email protected]
30