The Tortured History of Reading Comprehension Assessment P. David Pearson UC Berkeley Professor andFormer Former Dean Slides available at www.scienceandliteracy.org.

Transcript The Tortured History of Reading Comprehension Assessment P. David Pearson UC Berkeley Professor andFormer Former Dean Slides available at www.scienceandliteracy.org.

The Tortured History of Reading
Comprehension Assessment
P. David Pearson
UC Berkeley
Professor andFormer
Former Dean
Slides available at www.scienceandliteracy.org
___________
__________
__________
___________
__________
• _________
• _________
The guiding questions…
Are there lessons from the past?
Is there hope for the future?
Will we ever get it right?
My predictions
Are there lessons from the past? MAYBE
YES
Is there hope for the future?
MAYBE
MAYBE
Will we ever get it right?
MAYBE
NO
Why me?
 Have lived the tensions
in comprehension
assessment, but
 An unwitting and
unwilling candidate for
the task?
 Comprehension
assessment was on the
way to more important
goals.
Mea Culpa
Those whose work I omit
Those whose work I DON’T omit
Why now?
 Post-NCLB uneasiness among practitioners that the code,
as important as it is, may not be the point of reading
 New Rhetorical Moves
 Deeper Learning
 21st Century Skills
 Reading-Writing Connections
 Common Core Standards
• Read like a detective; write like a reporter
 Common Core Standards: The substance
 New R&D: Reading for Understanding
 New psychometric tools
7
Why now? More…
National thirst for accountability requires
impeccable measures (both conceptually and
psychometric)
When the stakes are high, so too must be the
standards
Pleas of teachers desperate for useful tools
Comprehension analogue of running records of
oral reading as indices of fluency and accuracy
8
The real need…
Theoretically elegant, yes
Even more need for everyday monitoring tool
9
LESSONS FROM THE PAST
There are many
A few of my favorites
Lesson #1: No matter how hard we
try, we never “see” reading
comprehension.
artifacts
edzucators
If we never see the click of
comprhension…
What criterion do we adopt as our gold
standard for determining the validity the
various markings we have to live with?
Cognitive interview
• Illinois Goal Assessment Program
• NAEP and most other assessments
The immediate apprehension of understanding
• The sense of getting it!
– Self ratings and think alouds
Criterion variables worth predicting
Lesson #2: No matter how novel you think
your idea is, you can find a precedent for it
that is at least a century old
I know! Let’s use open ended
performance items on our high
stakes state test!
Tidbits from the 1919 Wisconsin State
High School Reading Exam
Write a few memory gems from your
literature experiences in school.
Name three novels you have read this
year and give the plot of each.
What is the significance of the the “letter”
in Hawthorne’s The Scarlet Letter?
Define these terms: poetic license,
flashback, and simile.
So as long as we are forced to
use MC formats, let’s have the
students pick more than one
right andwer!
A curious example from early 1900s
“Every one of us, whatever our speculative
opinion, knows better than he practices, and
recognizes a better law than he obeys.”
Check two of the following statements with the
same meaning as the quotation above.
To know right is to do the right.
Our speculative opinions determine our
actions.
Our deeds often fall short of the actions we
approve.
Our ideas are in advance of our everyday
behavior.
From Thurstone, undated circa 1910
21
So what if we use an error
detection paradigm?
Chapman 1924
Find the statements in Part 2 of the paragraph
that don’t fit the statements in Part 1…
I know! Let’s do think alouds to
get at what students are really
doing while they answer the
questions.!
Touton and Berry (1931) Error
analyses
(a) failure to understand the question
(b) failure to isolate elements of “an involved
statement” read in context
(c) failure to associate related elements in a context
(d) failure to grasp and retain ideas essential to
understanding concepts
(e) failure to see setting of the context as a whole
(f) other irrelevant answers
25
Lesson #3: Grain size really matters
in reading assessment, even within
comprehension assessment
The Scene in the US in the 1970s and
early 1980s
Behavioral objectives
Mastery Learning
Criterion referenced assessments
Curriculum-embedded assessments
Minimal competency tests: New Jersey
Statewide assessments: Michigan &
Minnesota
Slides available at www.scienceandliteracy.org
27
Historical relationships between instruction and assessment
Skill 1
Teach
Assess
Conclude
Skill 2
Teach
Assess
Conclude
The 1970s Skills management mentality: Teach a skill, assess it for
mastery, reteach it if necessary, and then go onto the next skill.
28
Foundation: Benjamin Bloom’s ideas of mastery learning
Skill 1
Teach
Assess
The 1970s, cont.
Conclude
Skill 2
Teach
Assess
Conclude
Skill 3
Teach
Assess
Conclude
And we taught each
of these skills until
we had covered the
entire curriculum for
a grade level.
Skill 4
Teach
Assess
Conclude
Skill 5
Teach
Assess
Conclude
Skill 6
Teach
Assess
Conclude
29
Sue’s grandmother lives on a farm. Ellen’s grandmother
lives in the city. Sue’s grandmother, who just turned 55,
phones Sue every month. Ellen’s grandmother, who is
also 55, sends Ellen e-mails several times a week. Both
grandmothers love their granddaughters.
 How are Sue and Ellen’s grandmothers alike?
 They both love their granddaughters
 They both use e-mail
 They both live on a farm
 How are they different?
 They live in different places
 They have different color hair
 They are different ages
Fast Forward to 2002
These specific skill tests have not gone away
Today’s standards are yesterday’s objectives or
skills
Skill 1
/standard
Teach
Assess
The 2000s
Conclude
/standard
Skill 2
Teach
Assess
Conclude
Skill 3
/standard
Teach
Assess
Conclude
Skill 4
/standard
Teach
Assess
And we taught each
of these standards
until we had covered
the entire
curriculum for a
grade level.
Conclude
Skill 5
/standard
Teach
Assess
Conclude
Skill 6
/standard
Teach
Assess
Conclude
32
A word about benchmark
assessments…
 The world is filled with assessments that
provide useful information…
 But are not worth teaching to
 They are good thermometers or dipsticks
 Not good curriculum
33
The ultimate assessment dilemma…
 What do we do with all of these timed tests of
fine-grained skills:
 Words correct per minute
 Words recalled per minute
 Letter sounds named per minute
 Phonemes identified per minute
 Scott Paris: Constrained versus unconstrained
skills
 Pearson: Mastery constructs versus growth
constructs
Why they are so seductive
 Mirror at least some of the components of the
NRP report
 Correlate with lots of other assessments that have
the look and feel of real reading
 Takes advantage of the well-documented finding
that speed metrics are almost always correlated
with ability, especially verbal ability.
 Example: alphabet knowledge
90% of the kids might be 90% accurate but…
They will be normally distributed in terms of
LNPM
How to get a high correlation between a
mastered skill and something else
Letter Name
Fluency
(LNPM)
Letter Name
Accuracy
The wider the distribution of scores, the greater the
likelihood of obtaining a high correlation with a criterion
Face validity problem: What virtue is there in
doing things faster?
naming letters, sounds, words, ideas
What would you do differently if you knew
that Susie was faster than Ted at naming X, Y,
or Z???
Why I fear the use of these tests
They meet only one of tests of validity:
criterion-related validity
correlate with other measures given at
the same time--concurrent validity
predict scores on other reading
assessments--predictive validity
Fail the test of curricular or face validity
They do not, on the face of it, look like
what we are teaching…especially the
speeded part
Unless, of course, we change
instruction to match the test
Really fail the test of consequential
validity
Weekly timed trials instruction
Confuses means and ends
Proxies don’t make good goals
The Achilles Heel: Consequential
Validity
Give DIBELS
Give Comprehension Test
Use results to craft instruction
Give DIBELS again
Give Comprehension Test
The emperor has no clothes
42
Collateral Damage
Tight link between instruction and assessment
Assess at a low level of challenge
 Basic Skills Conspiracy
First you have to get all the words right and all
the facts straight before you can do the what
ifs and I wonder whats.
The bottom line on so many of these
tests
Never send a test out to do a
curriulum’s job!
Lesson #4: It is very difficult to
oust the incumbent
• Two mini-case studies
• Unconventional state
assessments
• Performance assessments
Valencia and Pearson (1987) Reading Assessment:
Time for a Change. In Reading Teacher
A set of contrasts between cognitively oriented views of
reading and prevailing practices in assessing reading circa
1986
New views of the reading process tell us that . . .
Yet when we assess reading comprehension, we . . .
Prior knowledge is an important determinant of reading comprehension.
Mask any relationship between prior knowledge and reading comprehension
by using lots of short passages on lots of topics.
A complete story or text has structural and topical integrity.
Use short texts that seldom approximate the structural and topical integrity of
an authentic text.
Inference is an essential part of the process of comprehending units as small
as sentences.
Rely on literal comprehension text items.
The diversity in prior knowledge across individuals as well as the varied
causal relations in human experiences invites many possible inferences to fit
a text or question.
The ability to vary reading strategies to fit the text and the situation is one
hallmark of an expert reader.
Use multiple-choice items with only one correct answer, even when many of
the responses might, under certain conditions, be plausible.
The ability to synthesize information from various parts of the text and
different texts is hallmark of an expert reader.
Rarely go beyond finding the main idea of a paragraph or passage.
The ability to ask good questions of text, as well as to answer them, is
hallmark of an expert reader.
Seldom ask students to create or select questions about a selection they may
have just read.
All aspects of a reader’s experience, including habits that arise from school
and home, influence reading comprehension.
Rarely view information on reading habits and attitudes as being as important
information about performance.
Reading involves the orchestration of many skills that complement one
another in a variety of ways.
Use tests that fragment reading into isolated skills and report performance on
each.
Skilled readers are fluent; their word identification is sufficiently automatic
to allow most cognitive resources to be used for comprehension.
Rarely consider fluency as an index of skilled reading.
Learning from text involves the restructuring, application, and flexible use of
knowledge in new situations.
Often ask readers to respond to the text’s declarative knowledge rather than
to apply it to near and far transfer tasks.
Seldom assess how and when students vary the strategies they use during
normal reading, studying, or when the going gets tough.
New views of the reading
process tell us that . . .
Yet when we assess
reading comprehension,
we . . .
Prior knowledge is an
Mask any relationship
important determinant of
between prior knowledge
reading comprehension.
and reading comprehension
by using lots of short
passages on lots of topics.
A complete story or text has Use short texts that seldom
structural and topical
approximate the structural
integrity.
and topical integrity of an
authentic text.
Inference is an essential for Rely on literal
comprehending units as
comprehension text items.
small as sentences.
New views of the reading process tell Yet when we assess reading
us that . . .
comprehension, we . . .
The diversity in prior knowledge
across individuals as well as the
varied causal relations in human
experiences invites many possible
inferences to fit a text or question.
Use multiple-choice items with only
one correct answer, even when many of
the responses might, under certain
conditions, be plausible.
The ability to synthesize information Rarely go beyond finding the main idea
from various parts of the text and
of a paragraph or passage.
different texts is hallmark of an expert
reader.
The ability to vary reading strategies
to fit the text and the situation is one
hallmark of an expert reader.
Seldom assess how and when students
vary the strategies they use during
normal reading, studying, or when the
going gets tough.
New views of the reading
process tell us that . . .
Yet when we assess reading
comprehension, we . . .
The ability to ask good
questions of text, as well as to
answer them, is hallmark of an
expert reader.
Seldom ask students to create or
select questions about a
selection they may have just
read.
All aspects of a reader’s
experience, including habits that
arise from school and home,
influence reading
comprehension.
Rarely view information on
reading habits and attitudes as
being as important information
about performance.
Reading involves the
Use tests that fragment reading
orchestration of many skills that into isolated skills and report
complement one another in a
performance on each.
variety of ways.
New views of the reading
process tell us that . . .
Yet when we assess reading
comprehension, we . . .
Skilled readers are fluent;
their word identification is
sufficiently automatic to
allow most cognitive
resources to be used for
comprehension.
Learning from text
involves the restructuring,
application, and flexible
use of knowledge in new
situations.
Rarely consider fluency as
an index of skilled reading.
Often ask readers to
respond to the text’s
declarative knowledge
rather than to apply it to
near and far transfer tasks.
Why did We Take this Stance?
Need a little mini-history of assessment to
understand our motives
Slides available at www.scienceandliteracy.org
The Scene in the US in the 1970s and
early 1980s
Behavioral objectives
Mastery Learning
Norm-referenced assessments
Criterion referenced assessments
Curriculum-embedded assessments
Minimal competency tests: New Jersey
Statewide assessments: Michigan &
Minnesota
Slides available at www.scienceandliteracy.org
52
Cognitive perspectives claim that we
had
Paid too much attention to measurement
theory and
Not enough to reading theory
53
Authentic Texts
Select, not construct, texts for understanding
Can’t tinker with the text to rationalize items
and distractors
54
More than one right answer
 How does Ronnie reveal his interest in Anne?
 Ronnie cannot decide whether to join in the conversation.
 Ronnie gives Anne his treasure, the green ribbon.
 Ronnie gives Anne his soda.
 Ronnie invites Anne to play baseball.
 During the game, he catches a glimpse of the green ribbon
in her hand.
55
Rate all of the responses on some
scale of relevance
 How does Ronnie reveal his interest in Anne?
 (2)(1)(0) Ronnie cannot decide whether to join in the
conversation.
 (2)(1)(0) Ronnie gives Anne his treasure, the green ribbon.
 (2)(1)(0) Ronnie gives Anne his soda.
 (2)(1)(0) Ronnie invites Anne to play baseball.
 (2)(1)(0) During the game, he catches a glimpse of the
green ribbon in her hand.
Best predictor of retelling scores
56
Include
Prior knowledge
Metacognition
Habits, attitudes, and dispositions
57
Initiatives
IGAP in Illinois
MEAP in Michigan
Alternative assessment systems in lots of
states
Kentucky
Vermont
Maryland
Washington
CLAS
Some findings from the ill-fated IGAP
Comprehension plus PK, Metacognition,
Habits/Attitude
Factor Analyses (Pearson, et al, 1991)
demonstrated three reliably separable factors
Metacognitive stances
habits/attitudes items
 a combination of the comprehension and prior
knowledge items (could not separate them)
59
Fate
Went the way of all tests that challenge the
conventional wisdom
More than one right answer was NOT the right
answer
Intentionally validated for group decisions not
individual (as accountability changed…)
Parts of it were corruptible:
Not good to teach to (e.g. metacognitive items)
60
Lost opportunities
Selecting or rating answers really is a good
idea
Maximizes information and reliability
Legacies
Authentic passages
Text analyses to determine importance
Item design process
The infusion of reading theory to stand
alongside
The Golden Years of the 90s?
A flying start in the late 1980s and early 1990s
International activity in Europe, Down Under,
North America
Developmental Rubrics
Performance Tasks
Portfolios of Various Sorts
Increase the use of constructed response items in
NRTs (including NAEP)
Late 1980s/early 1990s:
Portfolios
Performance Assessments
Make Assessment Look Like Instruction
From which we draw
Activities
Conclusions
On standards 1-n
We engage in instructional activities, from which we collect
evidence which permits us to draw conclusions about student
growth or accomplishment on several dimensions (standards) of
interest.
64
The complexity of performance assessment
practices: one to many
Activity X
Any given activity
may offer evidence
for many standards,
e.g, responding to a
story.
Standard 1
Standard 2
Standard 3
Standard 4
Standard 5
65
The complexity of performance assessment
practices: many to one
Activity 1
Activity 2
Activity 3
Activity 4
Activity 5
Standard X
For any given
standard, there are
many activities from
which we could
gather relevant
evidence about
growth and
accomplishment, e.g.,
reads fluently
66
The complexity of performance assessment
practices, many to many
Activity 1
Activity 2
Activity
Activiity
3
3
Activity 4
Actiivity
5
Activity
5
Standard 1
Standard 2
Standard 3
Standard 4
Standard 5
• Any given artifact/activity can provide evidence for many
standards
• Any given standard can be indexed by many different
artifacts/activities
67
Representation of Self
Assessment
can be
something that
someone or
something else
does to you.
Assessment
can be
something that
you (learn to)
do for yourself.
Post 1996: The Demise of
Performance Assessment
A definite retreat from performancebased assessment as a wide-scale tool
Psychometric issues
Cost issues
Labor issues
Political issues
70
The Remains…
Still alive inside classrooms and schools
Hybrid assessments based on the NAEP
and other assessment models
multiple-choice
short answer
extended response
Is there hope for the future?
The stars may be aligned
 Consensus on Comprehension
 CCSS
 PARCC/SBAC
 Renaissance brewing?
 Automated scoring
 Integration
of the first order—with reading and writing and
language
of the second order—with the disciplines
Consensus on comprehension
Something like a “Kintchian” view that readers
build successive models of meaning
Text base: What the text says
Situation model: What the text means
Kintchian Model
3
Knowledge Base
Text
1
Locate and
Text Recall
Base
Reader as Decoder
Integrate2and Interpret
Situation Model
Reader as Meaning Maker
Experience
Says
Means
Inside the head
Out in the
world
RAND REPORT
The process of simultaneously extracting and
constructing meaning through interaction and
involvement with written language. We use the
words extracting and constructing to emphasize
both the importance and the insufficiency of the
text as a determinant of reading
comprehension.”
10 recurring standards for College and Career Readiness
Show up grade after grade
In more complex applications to more sophisticated texts
Across the disciplines of literature, science, and social studies
Affordances of the CCSS
1. An uplifting vision based on our best research on the
nature of reading comprehension
2. Focus on results rather than means
3. Integrated model of literacy
4. Reading standards are consistent with cognitive
theory
5. Elaborated theory of text complexity
6. Shared responsibility (text in subject matter learning)
for promoting literacy
7. Lots of meaty material in writing and language
standards
1. An Uplifting Vision: ELA CCSS
 Students who meet the Standards readily undertake the close,
attentive, reading that is at the heart of understanding and enjoying
complex works of literature.
 They habitually perform the critical reading necessary to pick
carefully through the staggering amount of information available
today in print and digitally.
 They actively seek the wide, deep, and thoughtful engagement with
high-quality literary and informational texts that builds knowledge,
enlarges experience, and broadens world views.
 They reflexively demonstrate the cogent reasoning and use of
evidence essential to both private deliberation and responsible
citizenship in a democratic republic.
PARCC and SBAC: The National
Consortia
A Performance Renaissance?
ORCA: Tuesday: 2:15 Mardi Gras
Ballroom FGH Marriott
Some compelling trends on the horizon
GISA CBAL:
8:15ETS
Mardi Gras
CBAL Tuesday:
and GISA from
Ballroom
FGH
Marriott
ORCA from Uconn
SCALE-New York City
• Global and Integrated Scenario-Bases Assessment
• CBAL (Cognitively-Based Assessment of, for and as
Learning).
• Online Reading Comprehension Assessment
• Stanford Center for Learning, Assessment and
Equity
CBAL ETS
Elegant Conceptual
Organization
Semiotic Space
Literate Practices
CBAL ETS
Lots of Scaffolding
across tasks
CBAL ETS
Categorizing evidence for
arguments
CBAL ETS
MC version of
scaffolding
CBAL ETS
Student accepts
more responsibility
CBAL ETS
Integration of
Reading, Writing,
Rhetoric, and
Discourse
Lead in Tasks
scaffold the
writing and
assess reading
Critique on the way
to production
SCALE-NYC
Challenging
SCALE-New York City
SCALE-NYC
Highly Scaffolded
SCALE-NYC
Explicit Criteria
openly shared
SCALE-NYC
Multiple Primary
Sources
Integration of
Reading, Writing,
Disciplinary
Knowledge
,
What do you think?
Rhetoric, and
What makes
you think so?
Discourse
Thoughtful Learning
Progressions
Simple probes can
lead to deep learning
Simple but deep comprehension questions:
What does this document tell you about the causes
of the Spanish-American War?
What evidence supports your answer?
New Psychometric Tools
Mislevy et al’s EvidenceCentered Design
Advances in scaling
More elaborate design frameworks
Integrated with cognitive accounts of the
Wilson et al’s BEAR system
phenomena being tested
Automated scoring of essays and
constructed responses
From unbrldled skepticism
To
Hopeless enthusiasm
Two faces of integration
Reading-writing-language
Disciplinary grounding
So is there hope for the future?
Maybe +
No
Maybe
Yes
Will we ever get it right?
What is missing
Learning progressions
Generalizability agenda
Disciplinary grounding
Beyond cognition to critical thinking and
critical literacy
Context
Text Task Scenarios
Built on the idea that
reading really is
determined by a
reader interacting
with a text (of some
sort) by completing a
task for a purpose in
a context.
How can we model the variability that
the context brings to comprehension?
 Level playing field
 Standardize the stimuli and conditions of
assessment
 Change the question from
How do folks stack up when I have maximized the
similarity of conditions?
To
Under what conditions can a given student succeed or
fail at a given task?
• Hearkens back to Feuerstein’s dynamic assessment notions
Knowledge
Interest
Purpose
Discipline
Text Complexity
Task Characteristics
Degrees of Scaffolding
Social supports
Another vision of computer adaptive
testing
Report student profiles as a function of the
levels of each relevant variable that put
him/her at different points along the scale.
What would this approach mean for
task generalizability?
 Seeking differentiation not generalization
 Different underlying construct of “leveling the
playing field”
Not maximizing the standardization of relevant factors
across persons and occasions
Maximizing the optimization of relevant factors within
persons
• How do YOU perform when the stars are optimatlly aligned
in your favor
• How do YOU perform when one or more of those stars is/are
less optimally aligned
Return to the hard work on
assessment
 Encouraged by recent funding of new century
assessments
 Encouraged by reading for understanding assessment
grants in the US
 Encouraged by grass roots efforts
 Tests that take the high road
 Focus on making and monitoring meaning
 Focus on the role of reading in knowledge building and the
acquisition of disciplinary knowledge
 Focus on critical reasoning and problem solving
 Focus on representation of self.
The unfinished business from the 1990s
Will we ever get it right?
No
High
Stakes
and Low
Maybe
Challenge
Yes

The Tortured History of Reading Comprehension Assessment P. David Pearson UC Berkeley Professor andFormer Former Dean Slides available at www.scienceandliteracy.org.

Transcript The Tortured History of Reading Comprehension Assessment P. David Pearson UC Berkeley Professor andFormer Former Dean Slides available at www.scienceandliteracy.org.

Directory