Test Alignment

Transcript Test Alignment

Report on the
CEFR /ACTFL / ILR & STANAG
“Alignment Conference”
June 30 - July 03, 2010
Leipzig, Germany
Julie J. Dubeau, M.A.
BILC Secretary
Varna, Bulgaria
October 14, 2010
CEFR / ACTFL / ILR /STANAG
“Alignment Conference”
• Goal (s) of Conference
• Some Perspectives Presented
• Some Preliminary Questions
• Some Preliminary Conclusions
About the Conference
Taken from http://www.uni-leipzig.de/actflcefr2010
• The goal of the ACTFL / CEFR Alignment
Conference 2010 is to bring together some
45 leaders in the field from both Europe
and North America to explore a crosswalk
between the ACTFL Proficiency
Guidelines and the Common European
Framework of Reference for Languages
(CEFR) and to establish equivalencies on
theoretical and empirical grounds.
About the Conference (cont’d)
• Both sets of scales claim to measure the
same construct: proficiency. The
successful establishment of equivalencies
would support the validity of both scales.
• Problems in establishing equivalencies
would point to the need for further
research and development.
Further very ambitious goals included the following:
• to present and discuss:
• empirical studies on the validity and reliability of
tests based on either framework;
• theoretical studies of the construct validity of
either framework; and empirical studies
comparing both frameworks and/or tests based
on both frameworks;
• to present and critically discuss:
• standardized tests (test systems) based on
either framework; and for different target groups
(age, education, professional purposes, etc.);
Further very ambitious goals included the following:
• to develop guidelines for developing tests that
can be rated according to both scales; and
• to develop guidelines for developing proficiency
tests, their administration, and evaluation.
• These goals will be accomplished by combining
general session presentations with break-out
discussion groups. Oh My!!
The ‘Alignment Conference’
• 4 parallel workshops
• Opening addresses
• Opening plenary presentations:
– It’s easier to malign tests than to align tests
• Ray Clifford
– The CEFR: An evolving framework of reference
• Nick Saville
See abstracts on website
The ‘Alignment Conference’
12 papers organized by topics:
• Topic 1: Conceptual issues in a crosswalk
• Topic 2: Technical issues across domains
• Topic 3: Research and empirical issues in a
crosswalk
• 3 breakout sessions each with focus either on purposes
and benefits, skills and domains, and on issues such as
intercultural competence, implications for younger
learners and language policy and curricular issues.
•
http://www.uni-leipzig.de/actflcefr2010/abstracts.html
Test Equivalence, Equating, and Linking:
The Issue of Validity
Olaf Bärenfänger
Universität Leipzig
[email protected]
Prof. Olaf Bärenfänger
Universität Leipzig
Slides used with permission
Test Equating and Linking
1. Some
Conceptual
Clarifications
2. Validity as
Core of
Equivalence
3. Conclusions
Equating adjusts for differences in difficulty, not differences in
content. <... > in most cases, (different) tests clearly measure very
different content/constructs. We refer generically to a relationship
between scores on such tests as linking.
<...> it is virtually certain that score differences are attributable to
construct differences as well as to errors of measurement, either or
both of which could be quite large.
With equal force, however, the adequacy of the linking may be
highly suspect depending on the nature of the decisions made
based on the linking.
(Kolen & Brennan 20042: 423 f.)
Olaf Bärenfänger
Does Linking Really Only Mean to Correlate Test
Scores?
1. Some
Conceptual
Clarifications
2. Validity as
Core of
Equivalence
3. Conclusions
Linking = Comparing the validity of two different test and adjusting
for difficulty
„Validity is an integrated evaluative judgement of the degree to
which empirical evidence and theoretical rationales support the
adequacy and appropriateness and actions based on test scores or
other modes of assessment“. (Messick 1989: 13)
„even for purposes of applied decision making, reliance on criterion
validity or content coverage is not enough“. (Messick 1989: 17)
Olaf Bärenfänger
Suggestions for a Linking Argument
1. Some
Conceptual
Clarifications
2. Validity as
Core of
Equivalence
3. Conclusions
Step 1: Gather and compare all kind of evidential information
about the two tests, e.g.
 Constructs
 Internal validity (e.g. difficulty, discrimination, estimation of
reliability, SEM, factor analysis, qualitative analyses,
G Theory,IRT)
 External validity (e.g. correlation/regression studies, anchoring,
test calibration with IRT, linking through experts‘ judgements)
 Content relevance (quality of relation between test domain and
real life domain)
 Content representativeness (quantity of relation between test
domain and real life domain)
 Process analyses
Olaf Bärenfänger
Test Interpretation
Test Use
Evidential Basis
Construct validity
Construct validity +
Relevance/utility
Consequential
Basis
Value implications
Social consequences
Some Conclusions
1. Some
Conceptual
Clarifications
2. Validity as
Core of
Equivalence
3. Conclusions
 Things are far more complicated than assumed initially .
 When the goal is to link two tests, we need to be aware that
linking is more than mere concurrent validity.
 Linking is essentially an issue of validity.
 A venue might be to make use of an equivalence argument as
suggested.
 In order to pursue this goal, more collaboration between
researchers and test institutions is needed as well as an
agreement on the details of an equivalence argument.
 It is probably still a long way until we have safe crosswalks
between different test systems.
Olaf Bärenfänger
CAN ACTFL/ILR AND CEFR BE
ALIGNED FOR SPEAKING
ASSESSMENT?
Pardee Lowe, Jr.
U.S. Gov. Interagency Language Roundtable
Slides used with permission
POSSIBILITIES
•
•
•
•
COMPARE?
RELATE?
ALIGN?
EQUATE?
• For the last two: Does one need the same
“construct”?
Pardee Lowe, Jr.
QUESTIONS?
• THE FOLLOWING DISCUSSES
SIMILARITIES AND DIFFERENCES
BETWEEN ACTFL/ILR AND CEFR
• PRESENTED ARE THOSE ASSESSMENT
FEATURES WHICH ALLOW ACTFL/ILR TO
BE USED FOR EXAMINATIONS IN
SPEAKING
• THE QUESTION TO CEFR ADEPTS?
–HOW MANY OF THESE FEATURES OCCUR
IN CEFR?
–IN WHAT WAY(S)?
Pardee Lowe, Jr.
SIMILARITIES
•
•
•
•
•
•
•
WORK IN PROGRESS
PROSE DESCRIPTIONS
HIERARCHICAL
CRITERION-REFERENCED
CAN-DO STATEMENTS
LEVELS VS BANDS
PLUS LEVELS VS PLUS BANDS
Pardee Lowe, Jr.
CEFR BANDS
•
•
•
•
•
•
HOW MANY BANDS ARE THERE?
WHAT DOES “BAND” MEAN?
BASKET ANALOGY?
IS IT A RANGE?
DOES IT HAVE HEIGHT? DEPTH?
HOW ARE QUALITY AND QUANTITY ACCOUNTED
FOR WITHIN A BAND?
• HOW MANY TASKS IN EACH BAND?
• HOW ARE “BANDS” ASSIGNED?
– PARTICULARLY “PLUS BANDS”??
Pardee Lowe, Jr.
MAJOR PARAMETERS
FOR ACTFL/ILR & CEFR ALIGNMENT
ACTFL
RELATIONAL FRAME
• FIXED
RELATIONAL MODEL
• NATIVE SPEAKER
FEATURES
• A CORE PER LEVEL
BOUNDARIES
• DELINEATED
CEFR
• FLEXIBLE
• MODEL LEARNER?
• RICH NUMBER
• BASKETS
Pardee Lowe, Jr.
QUESTIONS?
• HOW MANY OF THESE ACTFL/ILR
FEATURES OCCUR IN CEFR?
• ARE THEY EMPLOYED IN THE SAME
WAY?
• IF NOT, HOW WOULD ACTFL/ILR AND/OR
CEFR HAVE TO BE ALTERED TO ACHIEVE
ALIGNMENT?
Pardee Lowe, Jr.
Framing research to develop
guidelines for developing tests
that can be rated according to
both scales: The case of writing
Liz Hamp-Lyons
University of Nottingham, UK/University of Hong Kong
Slides used with permission
What the CEFR knows about
assessing high level English writing
Liz Hamp-Lyons
What the CEFR knows about
assessing high level English writing
Liz Hamp-Lyons
What the CEFR knows about
assessing high level English writing
Liz Hamp-Lyons
What ACTFL knows about
assessing high level English writing
“It must be noted that the Superior level
encompasses levels 3, 4, and 5 of the
ILR scale. However, the abilities at the
Superior level described in these
guidelines are baseline abilities for
performance at that level rather than a
complete description of the full range of
Superior.”
Liz Hamp-Lyons
ACTFL Superior Level writing (ILR
Levels 3 and above)
SUPERIOR
Writers at the Superior level are able to produce
most kinds of formal and informal correspondence,
complex summaries, precis, reports, and research
papers on a variety of practical, social, academic, or
professional topics treated both abstractly and
concretely. They use a variety of sentence
structures, syntax, and vocabulary to direct their
writing to specific audiences, and they demonstrate
an ability to alter style, tone, and format according to
the specific requirements of the discourse. These
writers demonstrate a strong awareness of writing
for the other and not for the self.
See p.24 of handout booklet for full description.
Liz Hamp-Lyons
What the ILR knows about high
level English writing
Writing 5 (Functionally Native Proficiency):
Has writing proficiency equal to that of a well
educated native. Without non-native errors of
structure, spelling, style or vocabulary can write and
edit both formal and informal correspondence,
official reports and documents, and professional/
educational articles including writing for special
purposes which might include legal, technical,
educational, literary and colloquial writing. In
addition to being clear, explicit and informative, the
writing and the ideas are also imaginative. The writer
employs a very wide range of stylistic devices.
Liz Hamp-Lyons
Are the two frameworks intended to
be, or indeed claimed to be,
equivalent?
They are stylistically different
They are strikingly different in length
ACTFL descriptors are a mix of ‘can-do’s’,
personal attributes (“they are able to…”),
text characteristics…
CEFR descriptors are superficially ‘cando’s’ (“Can express him/herself with clarity and
precision relating to the addressee flexibly and
effectively.”) but in fact are far too vague to
be useable as they stand
Liz Hamp-Lyons
“The successful establishment of
equivalencies would support the
validity of both scales.”
OK…
We are not there yet
We may not be on the track to get
there
What CAN we do?
Liz Hamp-Lyons
Framing… to develop
guidelines for assessing writing
at the higher levels
Guidelines should
Begin from a construct
The construct needs
A theory of second language acquisition
A theory of learning
A theory of written language mastery trajectories
An empirical model of written language use at
different levels
An argument (a) language use argument (a)
assessment use argument that will weave all these
dimensions together appropriately for different
audiences/clients
Liz Hamp-Lyons
Framing… to develop
guidelines for assessing writing
at the higher levels
Guidelines should
Continue by stipulating the need to
obtain perceptions/judgements from a
range of stakeholders
Language testing specialists
Teachers of the area under study
Students of the area under study
Score users
Test developers/item writers
Liz Hamp-Lyons
Framing… to develop
guidelines for assessing writing at
the higher levels
Guidelines should also
Propose a specification study comparing/
contrasting tasks, input texts, level descriptors
across all components of each test (for which
the “same construct” is being claimed(
Propose a text linguistic study such as corpus
analysis of tasks for difficulty specification; or
discourse analysis of what persons scoring
specified levels on each test can do within the
domain.
Liz Hamp-Lyons
Framing… to develop
guidelines for assessing writing
at the higher levels
Guidelines also need to
Stipulate the need for rigorous quantitative
studies to test hypotheses deriving from the
construct definition and qualitative elicitation
stages
Propose consequential validity processes to
check the impact of any emerging conclusions
on teachers, learners and other stakeholders
outside the testing and research enterprises
Liz Hamp-Lyons
A safe crosswalk???
Where we are TODAY!
Considerations
• Can BILC Study groups tackle these
complex and complicated issues?
• Is the issue about comparing/aligning
STANAG/CEFR scales?
• Can we reconcile scale orientations?
• Each test would have to be linked to a test
derived from the other scale!
• Generalizations cannot be made today.
Will they ever be reliable?
The Way Ahead?
• Another conference is being planned for
next year.
• Setting a research agenda will be critical
• BILC will be monitoring & reporting back
and continue to encourage dialogue &
research among our constituents
• What can you do????
Communicative proficiency and linguistic development:
intersections between SLA and language testing research
• http://eurosla.org/monographs/EM01/EM01home.html
•
It is an edited volume by the SLATE network (Second Language Acquisition
and Testing in Europe), a group of SLA researchers and language testers
which aims to explicate the CEFR in various languages informed by SLA
and language assessment research. http://www.slategroup.eu/
•
The introductory chapter by Hulstijn et al., in particular, offers a useful
summary of the origins of the CEFR and raises a number of issues also
discussed in the “Leipzig’ conference (e.g. the origins of the CEFR in the
Threshold series as well as in some North American scales including
ACTFL, FSI and ILR, see p. 14; the suitability of the CEFR for language
assessment etc.).
•
2010 ACTFL Convention and World Languages Expo at the Hynes
Convention Center in Boston, MA from November 19-21, 2010 (Preconvention workshops, Thursday, November 18). www.actfl.org

Test Alignment

Transcript Test Alignment

Directory