The Tortured History of Reading Comprehension Assessment P. David Pearson UC Berkeley Professor andFormer Former Dean Slides available at www.scienceandliteracy.org.
Download ReportTranscript The Tortured History of Reading Comprehension Assessment P. David Pearson UC Berkeley Professor andFormer Former Dean Slides available at www.scienceandliteracy.org.
The Tortured History of Reading Comprehension Assessment P. David Pearson UC Berkeley Professor andFormer Former Dean Slides available at www.scienceandliteracy.org ___________ __________ __________ ___________ __________ • _________ • _________ The guiding questions… Are there lessons from the past? Is there hope for the future? Will we ever get it right? My predictions Are there lessons from the past? MAYBE YES Is there hope for the future? MAYBE MAYBE Will we ever get it right? MAYBE NO Why me? Have lived the tensions in comprehension assessment, but An unwitting and unwilling candidate for the task? Comprehension assessment was on the way to more important goals. Mea Culpa Those whose work I omit Those whose work I DON’T omit Why now? Post-NCLB uneasiness among practitioners that the code, as important as it is, may not be the point of reading New Rhetorical Moves Deeper Learning 21st Century Skills Reading-Writing Connections Common Core Standards • Read like a detective; write like a reporter Common Core Standards: The substance New R&D: Reading for Understanding New psychometric tools 7 Why now? More… National thirst for accountability requires impeccable measures (both conceptually and psychometric) When the stakes are high, so too must be the standards Pleas of teachers desperate for useful tools Comprehension analogue of running records of oral reading as indices of fluency and accuracy 8 The real need… Theoretically elegant, yes Even more need for everyday monitoring tool 9 LESSONS FROM THE PAST There are many A few of my favorites Lesson #1: No matter how hard we try, we never “see” reading comprehension. artifacts edzucators If we never see the click of comprhension… What criterion do we adopt as our gold standard for determining the validity the various markings we have to live with? Cognitive interview • Illinois Goal Assessment Program • NAEP and most other assessments The immediate apprehension of understanding • The sense of getting it! – Self ratings and think alouds Criterion variables worth predicting Lesson #2: No matter how novel you think your idea is, you can find a precedent for it that is at least a century old I know! Let’s use open ended performance items on our high stakes state test! Tidbits from the 1919 Wisconsin State High School Reading Exam Write a few memory gems from your literature experiences in school. Name three novels you have read this year and give the plot of each. What is the significance of the the “letter” in Hawthorne’s The Scarlet Letter? Define these terms: poetic license, flashback, and simile. So as long as we are forced to use MC formats, let’s have the students pick more than one right andwer! A curious example from early 1900s “Every one of us, whatever our speculative opinion, knows better than he practices, and recognizes a better law than he obeys.” Check two of the following statements with the same meaning as the quotation above. To know right is to do the right. Our speculative opinions determine our actions. Our deeds often fall short of the actions we approve. Our ideas are in advance of our everyday behavior. From Thurstone, undated circa 1910 21 So what if we use an error detection paradigm? Chapman 1924 Find the statements in Part 2 of the paragraph that don’t fit the statements in Part 1… I know! Let’s do think alouds to get at what students are really doing while they answer the questions.! Touton and Berry (1931) Error analyses (a) failure to understand the question (b) failure to isolate elements of “an involved statement” read in context (c) failure to associate related elements in a context (d) failure to grasp and retain ideas essential to understanding concepts (e) failure to see setting of the context as a whole (f) other irrelevant answers 25 Lesson #3: Grain size really matters in reading assessment, even within comprehension assessment The Scene in the US in the 1970s and early 1980s Behavioral objectives Mastery Learning Criterion referenced assessments Curriculum-embedded assessments Minimal competency tests: New Jersey Statewide assessments: Michigan & Minnesota Slides available at www.scienceandliteracy.org 27 Historical relationships between instruction and assessment Skill 1 Teach Assess Conclude Skill 2 Teach Assess Conclude The 1970s Skills management mentality: Teach a skill, assess it for mastery, reteach it if necessary, and then go onto the next skill. 28 Foundation: Benjamin Bloom’s ideas of mastery learning Skill 1 Teach Assess The 1970s, cont. Conclude Skill 2 Teach Assess Conclude Skill 3 Teach Assess Conclude And we taught each of these skills until we had covered the entire curriculum for a grade level. Skill 4 Teach Assess Conclude Skill 5 Teach Assess Conclude Skill 6 Teach Assess Conclude 29 Sue’s grandmother lives on a farm. Ellen’s grandmother lives in the city. Sue’s grandmother, who just turned 55, phones Sue every month. Ellen’s grandmother, who is also 55, sends Ellen e-mails several times a week. Both grandmothers love their granddaughters. How are Sue and Ellen’s grandmothers alike? They both love their granddaughters They both use e-mail They both live on a farm How are they different? They live in different places They have different color hair They are different ages Fast Forward to 2002 These specific skill tests have not gone away Today’s standards are yesterday’s objectives or skills Skill 1 /standard Teach Assess The 2000s Conclude /standard Skill 2 Teach Assess Conclude Skill 3 /standard Teach Assess Conclude Skill 4 /standard Teach Assess And we taught each of these standards until we had covered the entire curriculum for a grade level. Conclude Skill 5 /standard Teach Assess Conclude Skill 6 /standard Teach Assess Conclude 32 A word about benchmark assessments… The world is filled with assessments that provide useful information… But are not worth teaching to They are good thermometers or dipsticks Not good curriculum 33 The ultimate assessment dilemma… What do we do with all of these timed tests of fine-grained skills: Words correct per minute Words recalled per minute Letter sounds named per minute Phonemes identified per minute Scott Paris: Constrained versus unconstrained skills Pearson: Mastery constructs versus growth constructs Why they are so seductive Mirror at least some of the components of the NRP report Correlate with lots of other assessments that have the look and feel of real reading Takes advantage of the well-documented finding that speed metrics are almost always correlated with ability, especially verbal ability. Example: alphabet knowledge 90% of the kids might be 90% accurate but… They will be normally distributed in terms of LNPM How to get a high correlation between a mastered skill and something else Letter Name Fluency (LNPM) Letter Name Accuracy The wider the distribution of scores, the greater the likelihood of obtaining a high correlation with a criterion Face validity problem: What virtue is there in doing things faster? naming letters, sounds, words, ideas What would you do differently if you knew that Susie was faster than Ted at naming X, Y, or Z??? Why I fear the use of these tests They meet only one of tests of validity: criterion-related validity correlate with other measures given at the same time--concurrent validity predict scores on other reading assessments--predictive validity Fail the test of curricular or face validity They do not, on the face of it, look like what we are teaching…especially the speeded part Unless, of course, we change instruction to match the test Really fail the test of consequential validity Weekly timed trials instruction Confuses means and ends Proxies don’t make good goals The Achilles Heel: Consequential Validity Give DIBELS Give Comprehension Test Use results to craft instruction Give DIBELS again Give Comprehension Test The emperor has no clothes 42 Collateral Damage Tight link between instruction and assessment Assess at a low level of challenge Basic Skills Conspiracy First you have to get all the words right and all the facts straight before you can do the what ifs and I wonder whats. The bottom line on so many of these tests Never send a test out to do a curriulum’s job! Lesson #4: It is very difficult to oust the incumbent • Two mini-case studies • Unconventional state assessments • Performance assessments Valencia and Pearson (1987) Reading Assessment: Time for a Change. In Reading Teacher A set of contrasts between cognitively oriented views of reading and prevailing practices in assessing reading circa 1986 New views of the reading process tell us that . . . Yet when we assess reading comprehension, we . . . Prior knowledge is an important determinant of reading comprehension. Mask any relationship between prior knowledge and reading comprehension by using lots of short passages on lots of topics. A complete story or text has structural and topical integrity. Use short texts that seldom approximate the structural and topical integrity of an authentic text. Inference is an essential part of the process of comprehending units as small as sentences. Rely on literal comprehension text items. The diversity in prior knowledge across individuals as well as the varied causal relations in human experiences invites many possible inferences to fit a text or question. The ability to vary reading strategies to fit the text and the situation is one hallmark of an expert reader. Use multiple-choice items with only one correct answer, even when many of the responses might, under certain conditions, be plausible. The ability to synthesize information from various parts of the text and different texts is hallmark of an expert reader. Rarely go beyond finding the main idea of a paragraph or passage. The ability to ask good questions of text, as well as to answer them, is hallmark of an expert reader. Seldom ask students to create or select questions about a selection they may have just read. All aspects of a reader’s experience, including habits that arise from school and home, influence reading comprehension. Rarely view information on reading habits and attitudes as being as important information about performance. Reading involves the orchestration of many skills that complement one another in a variety of ways. Use tests that fragment reading into isolated skills and report performance on each. Skilled readers are fluent; their word identification is sufficiently automatic to allow most cognitive resources to be used for comprehension. Rarely consider fluency as an index of skilled reading. Learning from text involves the restructuring, application, and flexible use of knowledge in new situations. Often ask readers to respond to the text’s declarative knowledge rather than to apply it to near and far transfer tasks. Seldom assess how and when students vary the strategies they use during normal reading, studying, or when the going gets tough. New views of the reading process tell us that . . . Yet when we assess reading comprehension, we . . . Prior knowledge is an Mask any relationship important determinant of between prior knowledge reading comprehension. and reading comprehension by using lots of short passages on lots of topics. A complete story or text has Use short texts that seldom structural and topical approximate the structural integrity. and topical integrity of an authentic text. Inference is an essential for Rely on literal comprehending units as comprehension text items. small as sentences. New views of the reading process tell Yet when we assess reading us that . . . comprehension, we . . . The diversity in prior knowledge across individuals as well as the varied causal relations in human experiences invites many possible inferences to fit a text or question. Use multiple-choice items with only one correct answer, even when many of the responses might, under certain conditions, be plausible. The ability to synthesize information Rarely go beyond finding the main idea from various parts of the text and of a paragraph or passage. different texts is hallmark of an expert reader. The ability to vary reading strategies to fit the text and the situation is one hallmark of an expert reader. Seldom assess how and when students vary the strategies they use during normal reading, studying, or when the going gets tough. New views of the reading process tell us that . . . Yet when we assess reading comprehension, we . . . The ability to ask good questions of text, as well as to answer them, is hallmark of an expert reader. Seldom ask students to create or select questions about a selection they may have just read. All aspects of a reader’s experience, including habits that arise from school and home, influence reading comprehension. Rarely view information on reading habits and attitudes as being as important information about performance. Reading involves the Use tests that fragment reading orchestration of many skills that into isolated skills and report complement one another in a performance on each. variety of ways. New views of the reading process tell us that . . . Yet when we assess reading comprehension, we . . . Skilled readers are fluent; their word identification is sufficiently automatic to allow most cognitive resources to be used for comprehension. Learning from text involves the restructuring, application, and flexible use of knowledge in new situations. Rarely consider fluency as an index of skilled reading. Often ask readers to respond to the text’s declarative knowledge rather than to apply it to near and far transfer tasks. Why did We Take this Stance? Need a little mini-history of assessment to understand our motives Slides available at www.scienceandliteracy.org The Scene in the US in the 1970s and early 1980s Behavioral objectives Mastery Learning Norm-referenced assessments Criterion referenced assessments Curriculum-embedded assessments Minimal competency tests: New Jersey Statewide assessments: Michigan & Minnesota Slides available at www.scienceandliteracy.org 52 Cognitive perspectives claim that we had Paid too much attention to measurement theory and Not enough to reading theory 53 Authentic Texts Select, not construct, texts for understanding Can’t tinker with the text to rationalize items and distractors 54 More than one right answer How does Ronnie reveal his interest in Anne? Ronnie cannot decide whether to join in the conversation. Ronnie gives Anne his treasure, the green ribbon. Ronnie gives Anne his soda. Ronnie invites Anne to play baseball. During the game, he catches a glimpse of the green ribbon in her hand. 55 Rate all of the responses on some scale of relevance How does Ronnie reveal his interest in Anne? (2)(1)(0) Ronnie cannot decide whether to join in the conversation. (2)(1)(0) Ronnie gives Anne his treasure, the green ribbon. (2)(1)(0) Ronnie gives Anne his soda. (2)(1)(0) Ronnie invites Anne to play baseball. (2)(1)(0) During the game, he catches a glimpse of the green ribbon in her hand. Best predictor of retelling scores 56 Include Prior knowledge Metacognition Habits, attitudes, and dispositions 57 Initiatives IGAP in Illinois MEAP in Michigan Alternative assessment systems in lots of states Kentucky Vermont Maryland Washington CLAS Some findings from the ill-fated IGAP Comprehension plus PK, Metacognition, Habits/Attitude Factor Analyses (Pearson, et al, 1991) demonstrated three reliably separable factors Metacognitive stances habits/attitudes items a combination of the comprehension and prior knowledge items (could not separate them) 59 Fate Went the way of all tests that challenge the conventional wisdom More than one right answer was NOT the right answer Intentionally validated for group decisions not individual (as accountability changed…) Parts of it were corruptible: Not good to teach to (e.g. metacognitive items) 60 Lost opportunities Selecting or rating answers really is a good idea Maximizes information and reliability Legacies Authentic passages Text analyses to determine importance Item design process The infusion of reading theory to stand alongside The Golden Years of the 90s? A flying start in the late 1980s and early 1990s International activity in Europe, Down Under, North America Developmental Rubrics Performance Tasks Portfolios of Various Sorts Increase the use of constructed response items in NRTs (including NAEP) Late 1980s/early 1990s: Portfolios Performance Assessments Make Assessment Look Like Instruction From which we draw Activities Conclusions On standards 1-n We engage in instructional activities, from which we collect evidence which permits us to draw conclusions about student growth or accomplishment on several dimensions (standards) of interest. 64 The complexity of performance assessment practices: one to many Activity X Any given activity may offer evidence for many standards, e.g, responding to a story. Standard 1 Standard 2 Standard 3 Standard 4 Standard 5 65 The complexity of performance assessment practices: many to one Activity 1 Activity 2 Activity 3 Activity 4 Activity 5 Standard X For any given standard, there are many activities from which we could gather relevant evidence about growth and accomplishment, e.g., reads fluently 66 The complexity of performance assessment practices, many to many Activity 1 Activity 2 Activity Activiity 3 3 Activity 4 Actiivity 5 Activity 5 Standard 1 Standard 2 Standard 3 Standard 4 Standard 5 • Any given artifact/activity can provide evidence for many standards • Any given standard can be indexed by many different artifacts/activities 67 Representation of Self Assessment can be something that someone or something else does to you. Assessment can be something that you (learn to) do for yourself. Post 1996: The Demise of Performance Assessment A definite retreat from performancebased assessment as a wide-scale tool Psychometric issues Cost issues Labor issues Political issues 70 The Remains… Still alive inside classrooms and schools Hybrid assessments based on the NAEP and other assessment models multiple-choice short answer extended response Is there hope for the future? The stars may be aligned Consensus on Comprehension CCSS PARCC/SBAC Renaissance brewing? Automated scoring Integration of the first order—with reading and writing and language of the second order—with the disciplines Consensus on comprehension Something like a “Kintchian” view that readers build successive models of meaning Text base: What the text says Situation model: What the text means Kintchian Model 3 Knowledge Base Text 1 Locate and Text Recall Base Reader as Decoder Integrate2and Interpret Situation Model Reader as Meaning Maker Experience Says Means Inside the head Out in the world RAND REPORT The process of simultaneously extracting and constructing meaning through interaction and involvement with written language. We use the words extracting and constructing to emphasize both the importance and the insufficiency of the text as a determinant of reading comprehension.” 10 recurring standards for College and Career Readiness Show up grade after grade In more complex applications to more sophisticated texts Across the disciplines of literature, science, and social studies Affordances of the CCSS 1. An uplifting vision based on our best research on the nature of reading comprehension 2. Focus on results rather than means 3. Integrated model of literacy 4. Reading standards are consistent with cognitive theory 5. Elaborated theory of text complexity 6. Shared responsibility (text in subject matter learning) for promoting literacy 7. Lots of meaty material in writing and language standards 1. An Uplifting Vision: ELA CCSS Students who meet the Standards readily undertake the close, attentive, reading that is at the heart of understanding and enjoying complex works of literature. They habitually perform the critical reading necessary to pick carefully through the staggering amount of information available today in print and digitally. They actively seek the wide, deep, and thoughtful engagement with high-quality literary and informational texts that builds knowledge, enlarges experience, and broadens world views. They reflexively demonstrate the cogent reasoning and use of evidence essential to both private deliberation and responsible citizenship in a democratic republic. PARCC and SBAC: The National Consortia A Performance Renaissance? ORCA: Tuesday: 2:15 Mardi Gras Ballroom FGH Marriott Some compelling trends on the horizon GISA CBAL: 8:15ETS Mardi Gras CBAL Tuesday: and GISA from Ballroom FGH Marriott ORCA from Uconn SCALE-New York City • Global and Integrated Scenario-Bases Assessment • CBAL (Cognitively-Based Assessment of, for and as Learning). • Online Reading Comprehension Assessment • Stanford Center for Learning, Assessment and Equity CBAL ETS Elegant Conceptual Organization Semiotic Space Literate Practices CBAL ETS Lots of Scaffolding across tasks CBAL ETS Categorizing evidence for arguments CBAL ETS MC version of scaffolding CBAL ETS Student accepts more responsibility CBAL ETS Integration of Reading, Writing, Rhetoric, and Discourse Lead in Tasks scaffold the writing and assess reading Critique on the way to production SCALE-NYC Challenging SCALE-New York City SCALE-NYC Highly Scaffolded SCALE-NYC Explicit Criteria openly shared SCALE-NYC Multiple Primary Sources Integration of Reading, Writing, Disciplinary Knowledge , What do you think? Rhetoric, and What makes you think so? Discourse Thoughtful Learning Progressions Simple probes can lead to deep learning Simple but deep comprehension questions: What does this document tell you about the causes of the Spanish-American War? What evidence supports your answer? New Psychometric Tools Mislevy et al’s EvidenceCentered Design Advances in scaling More elaborate design frameworks Integrated with cognitive accounts of the Wilson et al’s BEAR system phenomena being tested Automated scoring of essays and constructed responses From unbrldled skepticism To Hopeless enthusiasm Two faces of integration Reading-writing-language Disciplinary grounding So is there hope for the future? Maybe + No Maybe Yes Will we ever get it right? What is missing Learning progressions Generalizability agenda Disciplinary grounding Beyond cognition to critical thinking and critical literacy Context Text Task Scenarios Built on the idea that reading really is determined by a reader interacting with a text (of some sort) by completing a task for a purpose in a context. How can we model the variability that the context brings to comprehension? Level playing field Standardize the stimuli and conditions of assessment Change the question from How do folks stack up when I have maximized the similarity of conditions? To Under what conditions can a given student succeed or fail at a given task? • Hearkens back to Feuerstein’s dynamic assessment notions Knowledge Interest Purpose Discipline Text Complexity Task Characteristics Degrees of Scaffolding Social supports Another vision of computer adaptive testing Report student profiles as a function of the levels of each relevant variable that put him/her at different points along the scale. What would this approach mean for task generalizability? Seeking differentiation not generalization Different underlying construct of “leveling the playing field” Not maximizing the standardization of relevant factors across persons and occasions Maximizing the optimization of relevant factors within persons • How do YOU perform when the stars are optimatlly aligned in your favor • How do YOU perform when one or more of those stars is/are less optimally aligned Return to the hard work on assessment Encouraged by recent funding of new century assessments Encouraged by reading for understanding assessment grants in the US Encouraged by grass roots efforts Tests that take the high road Focus on making and monitoring meaning Focus on the role of reading in knowledge building and the acquisition of disciplinary knowledge Focus on critical reasoning and problem solving Focus on representation of self. The unfinished business from the 1990s Will we ever get it right? No High Stakes and Low Maybe Challenge Yes