Evaluating the Assessment of Writing in an Academic Context

Download Report

Transcript Evaluating the Assessment of Writing in an Academic Context

How ‘good’ are our speaking
test tasks: implications of
recent research findings
Barry O’Sullivan
Centre for Language Assessment Research (CLARe)
Roehampton University
CLARe
1
Focus of this talk
Outline the basic premise of the paper
Present the findings of four research studies
Discuss the implications for task-based testing &
research
CLARe
2
Focus 1 – O’Sullivan
Identified a series of variables likely to offer potential ‘affective’
reactions to interlocutors in ‘direct’ test tasks
Explored impact on performance in a ‘direct’ test of a series of
variables:
1. age; language level; personality; sex – of test taker and of
interlocutor
2. acquaintanceship
Also explored impact of topic and gender in an ‘indirect’ test task
CLARe
3
Focus 1 – Results
Found a significant effect in each study which focused on a
single variable
Significant interactions involving all variables explored.
Tendency for complex, often three-way, interactions
Significant (though small) effects found for ‘indirect’ task
where question on male oriented topic delivered by male
speaker
CLARe
4
Focus 2 – Weir & Wu
Looked at the parallel-form equivalence of 3 alternate forms
of a semi-direct oral proficiency test which was comprised of 3
tasks
Argue that various kinds of evidence are needed to ensure
true equivalence
Present quantitative and qualitative evidence of equivalence
CLARe
5
Focus 2 – Results
Found that different forms of test tasks can be shown to be
equivalent from the quantitative perspective
Demonstrated how qualitative evidence (rater judgements)
can support or reject the claims made from the quantitative
evidence
“The results show that without taking the necessary steps to
control context variables affecting test difficulty, test quality may
fluctuate over tasks in different test forms.” Weir & Wu (2006: 192)
CLARe
6
Focus 3 – O’Sullivan, Weir & Horai
Explored the impact on task performance of three variables
(planning time; planning condition; response time)
Suggest a methodology for ensuring the true equivalence of
test tasks
Focused on the individual long turn task
CLARe
7
Focus 3 – Evidence of Equivalence
Examined using
checklist (based on
Skehan 1996)
Identified 9 task
versions
Reduced to 8 tasks
Pilot studies with
learners
Trial with 54 learners
CLARe
Quantitative:
Reduced to 4 tasks
Qualitative:
Confirmed 4 tasks
8
Focus 4 – Horai
Followed on from the study reported in Focus 3 to include
proficiency level as an intervening variable
Found significant differences in performance and in cognitive
processing for the four different tasks
Supports the argument that task difficulty rests not in the task
but in the interaction between the task and the ability within
the individual (i.e. Context & Cognitive Validity)
CLARe
9
Observations
Focus 1
learners’ affective reaction to their interlocutor (peer or
examiner) can systematically impact on performance
Focus 2
it is possible to generate truly equivalent speaking
tests, but that there may be differences at the task level
Focus 3
task equivalence can only be claimed where both
quantitative and qualitative evidence is established
Focus 4
task difficulty is not a constant (as is presumed in
much assessment work) but changes with the level
of the test taker
CLARe
10
Implications for Task-Based Testing
1
The results from Study 1 + the ‘negotiation of discourse’
argue against using interactive tasks in test events
2
The results of Studies 2 & 3 suggest that true alternate
test and task forms are possible for monologic formats
3
The results from Study 4 imply that group level
comparisons based on task performance may be unstable
4
The results of all four imply that it may not be possible to
develop truly equivalent versions of interactive test tasks
CLARe
11
Implications for TBLT Research
Have researchers
taken either affect or
equivalence into
account?
Should they?
When in particular?
O’Sullivan (2000) & Wu (2005)
present review tables than suggest
the answer is NO
YES
When their research is reliant on
using two or more ‘similar’ tasks
When they are exploring the
language of interaction
CLARe
12
References
Horai, Tomoko. forthcoming. Intra Task Comparison in monologic tasks in L2 Speaking Testing . PhD
dissertation, Roehampton University.
Lumley, Tom & O’Sullivan, Barry. 2006 The Impact of Test Taker Characteristics on Speaking Test
Task Performance. Language Testing, 22 (4): 415–437.
O’Sullivan, Barry. 2000. Exploring Gender and Oral Proficiency Interview Performance. System, 28 (3):
373-386.
O’Sullivan, Barry. 2002. Learner Acquaintanceship and Oral Proficiency Test Pair-Task Performance.
Language Testing, 19 (3): 277-295.
O’Sullivan, Barry. forthcoming. Modelling Performance in Oral Language Testing. Frankfurt: Peter
Lang. Based on PhD dissertation from the University of Reading (2000).
O’Sullivan, Barry, Weir, Cyril & Horai, Tomoko. 2004. Exploring difficulty in speaking tasks: an intratask perspective. ESOL/The British Council/ IDA Australia: IELTS Research Report.
Weir, Cyril & Wu, Jessica. 2006. Establishing Test Form and Individual Task Comparability – A Case
Study of the GEPT Intermediate Spoken Performance Test. Language Testing, 23 (2): 167–197.
Weir, Cyril. 2004. Language Testing and Validity: an evidence-based approach. Oxford: Palgrave
Wu, Jessica. 2005. Task difficulty in semi-direct speaking tests. Unpublished PhD dissertation.
Roehampton University.
CLARe
13
CONTACT
Dr Barry O’Sullivan
Director
Centre for Language Assessment Research (CLARe)
Digby Stuart College
Roehampton University
Roehampton Lane
London
SW15 5PU
United Kingdom
Tel: +44 (0)20 8392 3348
Fax: +44 (0)20-8392-3031
[email protected]
CLARe
14