Ecologically valid uses for assessment at the nexus between language, content, and task John Norris & Barry O’Sullivan Copyright © 2007, John M.

Download Report

Transcript Ecologically valid uses for assessment at the nexus between language, content, and task John Norris & Barry O’Sullivan Copyright © 2007, John M.

Ecologically valid uses for assessment at the
nexus between language, content, and task
John Norris & Barry O’Sullivan
Copyright © 2007, John M. Norris
Inevitable changes in L2 education
Sources
Implications
•Philosophy of education
>Learning is experiential
•Cognitive psychology
>Expertise = K+S+D (holistic dvlpt)
•SLA theory and research
>Learner+context interact
•The language ‘crisis’
>Insufficient scope and achvmt
•The value of L2 learning
>Language enables, empowers
•The realities of education
>Learning is local, needs differ
The ecologies of language education are changing…
How will language educators evolve?
Integration Innovation
Goal: Learning to …
use language to communicate particular meanings for particular purposes
LANGUAGE
CONTENT
Innovation: TBLT – CBI – LSP – LAC – etc.
TASK
The role of assessment…?
Possible contributions of assessment:
Objective measurement of language proficiency constructs
Quality assurance/accountability mechanism
Tool for policy implementation, gate-keeping, etc.
Understanding: Heuristic for awareness-raising and program illumination
Improving: Integral feedback component in curriculum-instruction-learning
How can assessment support innovative teaching and learning of
integrated content-language-task objectives?
What are the characteristics of assessment systems that
respond to the needs of language educators in this new ecology?
The language – content – task ‘problem’ in assessment
Language testers’ point of view
Content
•Meaningful stuff about which
humans tend to want to talk
•Background, subject,
situational, experiential
knowledge
•Problematic because it
comes in variable amounts
and types, and it interacts
with L2 use
Language
•L2 knowledge, skills
•Grammatical, textual,
functional, sociolinguistic
Communicative
Language
Ability
Task
•Devices for eliciting samples
of L2 knowledge, skills
•Constellation of purposes,
features, conditions,
interlocutors in use domain
•Problematic: the more they
resemble what we actually
do, the less they can be
trusted to tell us about L2
knowledge
•Enables/disables talking
docontrol
we get
maximally
accurate measure of the
•So, weHow
need to
for itsat a
about
stuff for purposes
effects by careful selection,
•So, we have to sample a
•Mediated bydespite
strategic content
language
ability
construct
and task (or
maximizing the generic, and
bunch of them to get at
competence
learning context factors)? Technical
problemand we
through other
observation/analysis
reliable estimates,
of bias
have to control them from
becoming too ‘tasky’
Characteristics of assessment
Emphasize…
But what about…
•Test ‘tasks’ doable in a single sitting
of an exam
>Complex tasks not authentically
completed in exam sitting/setting?
•Rating criteria based on models of
CLA
>Indigenous rating criteria used by
actual interlocutors?
•Generalization and extrapolation
about abilities beyond test context
>Task-specific abilities as realized in
context?
•Maximal control over nature of
performance elicited
>Open-ended, creative, unanticipated
performance?
•Norm-referenced (between learner)
comparisons
>Criterion- or individual-referenced
comparisons?
•Interpretations about language ability
constructs
>Interpretations about languagecontent-task abilities?
Language educators’ realities
Innovative language educators ask…
How do we place learners into integrated language+content+task curricula?
What do we provide feedback on (language form, content coverage, task
completion), and how can we do it effectively on different target tasks?
How do we gather maximally useful information—with
How respect
can we track
development
in language,
and task,
and what is
tolearner
language
and content
andcontent,
task—for
doing
the relationship between the three?
specific things in our classes and programs?
How should we determine weighting of language, content, and task abilities or
knowledge in assigning grades?
Functional problem
What do we target as outcomes of our programs—what learners really can do—and
what are the best ways of demonstrating them?
How do we encourage student learning of what we target to the levels they need?
How do we improve our classes and programs?
A fundamental tension
“… assessments can have many different functions.
What is appropriate for one assessment purpose
may be inappropriate for another…”
Council of Europe (2001), p. 180
“…the inferences we want to make are about
underlying ‘language ability’ or ‘capacity for language
use’ or ‘ability for use’…”
Bachman (2002), p. 454
Reconceptualizing the ‘problem’
Sum 1: If ‘language’ education is going to be about more than
‘communicative language ability’, then shouldn’t assessment follow suit?
Sum 2: Monolithic and prescriptive practices of language assessment belie
the realities of multiple assessment uses in education.
Sum 3: Treating assessment as a primarily technical problem of ‘good
measures’ probably won’t resolve the primarily functional needs of
language educators for useful assessments.
Developing useful assessments that do educational good
Assessments that do good…
What is the starting point in assessment development?
Measurement approach
Ecological approach
What’s the construct to be
measured?
Why are we assessing?
How do we create reliable
measures of that construct?
How will assessment
processes and outcomes be
used by whom?
How do we eliminate
threats to the validity of
interpretations about that
construct?
How do we maximize the
utility of assessments for
specific users in specific
educational settings?
Even though we might assess different L2 To meet the actual uses for assessment,
constructs, there is one right way to do it. there are lots of right ways to do it.
Intended uses for assessment
WHO?
Test
Users
WHAT?
Test
Information
INTENDED
ASSESSMENT
USE
IMPACT?
Test
Consequences
WHY?
Test
Purposes
Intended uses for assessment
WHO?
Test
the
assessment
Users
WHAT?
Testknow?
need to
Information
What do they
Task success?
Learners
Content coverage?
Teachers
L2 knowledge?
Program administrators
Accuracy/complexity/fluency dvlpt?
INTENDED
Policy
makers
To what end?
Declarative
What will they
knowledge?
do with it?
Parents
Effective learning ASSESSMENT
Performance
Placement
ability?
Other
programs
USE Capacity
Improved teaching
Feedback
to learn?
Well-articulated courses
Motivation
Valued learning outcomes
Curriculum development
Evolving
programs
Program evaluation
& change
IMPACT?
WHY?
Enlightened
Test education
Test
Articulation
Consequences
Purposes
Satisfied
learners
Certification
Who are
users?
Developing intentional assessments
Primary
Intended Users
Negotiate &
specify:
Assessment
Consultants
• priority uses
• methods
Stakeholders &
Audiences
• analyses
•reporting
ENABLING USE
• constraints
EMPOWERING USERS
Why bother?
Prioritizes needs of assessment users in the specific ecology of an
educational setting
Shifts ownership and responsibility into the hands of the users
Rules out irrelevant concerns before they bog down development
Forces users to be clear about the interpretations they want to make
Illuminates gaps in educational planning and implementation;
demands content-language-task expertise from educators
Puts specific information into the hands of specific users in ways that
enable them to take action
Educationally relevant and useful assessments
Resolving the ‘problem’ in practice
What happens if we ignore use?
Common European Framework:
Language+Pluriculturalism+Anti-racism
Online Advanced French Course:
Advanced intercultural evaluation
Targeted learning outcomes:
Lang: Language/Writing dvlpt
Content: Socio-cultural awareness
Task: Intercultural evaluation
Assessment realities
(writing feedback):
•Language development
•Socio-cultural awareness
•Intercultural evaluation
From: Starkey & Osler (2001)
Using Intended Use in situ
Context
•German at Georgetown University
•New curriculum: Developing Multiple Literacies
•Fully integrated Language & Content courses
•Task- & genre-based instruction
•Advanced L2 literacy targets
Using Intended Use in situ
“Taken together, these documents are intended
to guide not only the development and
implementation, but also the evaluation and
revision of all quizzes, tests, examinations,
written and oral performances, and other forms
of assessment which play an integral role in the
success of the GUGD’s educational efforts.”
From: Assessment policies in the GUGD (rev. August, 2002)
Intended use 1: Placement
Intended test use
Constraints
Who: Faculty decision makers;
incoming students
What: Estimate of incoming
students’ curriculum-related
German knowledge/abilities;
capacity to benefit from courses
Why: Placement into curricular level
acknowledging abilities and
addressing learning needs
Impact: Efficient/effective teaching
and learning for learners grouped
by similar ability and need





Wide range of learner abilities
Administration time (2 hrs.)
Scoring time (same day)
Decision-making efficiency
Transportability (off-site
administration)
 Migration to computer-based
administration
 Language v. content v. task as
basis for placement?
Intended use 1: Placement
Basis for assessment: score
Learner abilities to process texts (receptively and
LCT
productively) selected as representative of critical transition points from
one curricular level to the next. Focus on language ability inCurriculum
context.
Content and task reflected in nature of texts selected (basis for
elicitation), but not used as basis for interpretation.
Level
Adjudication
score
C-test
Development: Identification and vetting of texts (auralRecommendation
and written) by
teacher curriculum expertsinvestment in placement process, accuracy,
impact on teaching and learningincreased awareness about learner
+
RCT
abilities
vis-à-vis curricular
expectations.
score
Communication
Background info form
Intended use 2: Writing
development and outcomes
Intended test use
Constraints
Who: Faculty, instructors, curriculum
developers, learners
What: Representative samples of writing
performance abilities (task + L2 +
content) at the end of each curricular
level
Why: Understanding student
development & achievement of
targeted abilities for improving C&I
Impact: Feasible curricular expectations
supported by effective pedagogy
 Explicitness of curricular
expectations
 Availability/agreement on
‘prototypical’ performance
tasks, content, L2
 Competing uses for
assessment (feedback,
tracking, etc.)
 End-of-semester timing
 Learner investment,
understanding of assessment
expectations
Intended use 2: Writing
development and outcomes
Prototypical performance
writing tasks
T
E
A
C
H
E
R
S
Level performance profiles
Deliberation
Development
Task assignment sheets
Analysis
Revision
Sample student performances
L
E
V
E
L
G
R
O
U
P
S
Intended use 2: Writing
development and outcomes
Consistent
assignment
framework
Task
Curricular
level
expectations
Content
Language
Assignment 1
Assignment 2
Semester…
Assignment 3
Assignment 4
Explicit
performance
criteria
Prototypical
Performance
Writing
Task
Intended use 2: Writing
development and outcomes
“Assessment in this kind of a context is, I would
almost say probably an indispensable aspect in
order to clarify any number of things. Because it
is in the discourse about assessment and how we
would do that that our knowledge became
articulated or the holes in that knowledge
became clearer to ourselves. Or the cover-ups
that we had engaged in were no longer possible if
we wanted to be honest with ourselves about it.”
Intended use 2: Writing
development and outcomes
Using Assessment for Curricular Change
• Forced the curriculum to become real
• Close specification of L2 progress within/across curricular levels
• Disambiguation of learning outcomes in terms of task, content,
language
• Curricular ‘map’ for use by teachers and learners (what happens
when?); basis for feedback on task, content, language
• Forged agreement between curricular levels on what can and
cannot be expected
• Basis for longitudinal developmental analysis (L2 in task with
content)
Why Bother
to Rethink
Assessment?
What we value is what we assess
By adopting intended use as the starting (and ending) point for
assessment, we…
increase the likelihood that assessments will be used and useful
decrease misuse/abuse of assessment
force educators to make explicit their assumptions about the
relationship between language, content, task
enable educators to gather empirical information for making
decisions and taking actions relevant to curriculum, teaching, learning
situate assessment practices within specific educational ecologies
(localization), rather than situating education within generic ecologies
of assessment (globalization).