Transcript lll

Processing Corpus-derived
Multi-unit Sequences by L2
English Learners
Fei Fei
Second Language Studies Program
Michigan State University
May 19, Beijing
1
Purpose of the present study
Formulaic language use has long been one of the
research foci in the study of second language
acquisition. For L2 learners of intermediate and
advanced proficiency, formulaic language was the
biggest stumbling block to sounding nativelike (Wray,
2002).
However, most studies on formulaic sequences focus
on textual and descriptive aspects. Few studies
investigate multi-unit sequence processing among L2
English learners. Even fewer explore individual factors
in multi-unit sequence processing.
2
What is a formulaic sequence?
• Formulaic sequences: stored and retrieved
holistically from memory at the time of use.
• Issues are
1. Compositionality (e.g., Howarth, 1998; Wray, 2002)
2. Representation and production (e.g., Sinclair, 1991;
N. Ellis, 1996)
3. Development in L2 (e.g., Wong-Fillmore, 1976)
3
Why corpus-derived multi-unit sequences?
Word strings or lexical bundles generated based on frequency may not be
stored holistically in mind, and also formulaic sequences stored as a whole
may not be identified through certain corpus analysis.
However, Wray (2002, p. 25) suggested that “frequency as a salient,
perhaps even a determining, factor in the identification of formulaic
sequences.”
Numerous studies of formulaic sequences are based on corpus frequency
(e.g., Sinclair & Renouf, 1988; DeCock, Granger, Leech & McEnery, 1998;
Moon, 1998; Hunston & Francis, 2000).
The more often a word string is needed, the more likely it is to be stored in
prefabricated form to save processing effort. Once it is stored, the more
likely it is to be the preferred choice at the time of use.
4
What is a corpus-derived MUS?
In short, corpus-derived multi-word sequence(s)
• are based on corpus frequency;
• may not be psycholinguistically valid;
• is either fully fixed in form, or semi-preconstructed
phrases.
• is a subset of formulaic sequences
5
Target multi-unit sequences
• Schmitt et al.’s (2004) :
Longman Grammar of Spoken and Written English
Lexical Phrases and Language Teaching
Hyland’s list
BNC (British National Corpus)
CANCODE (Cambridge and Nottingham Corpus of
Discourse in English)
MICASE (Michigan Corpus of Academic Spoken English)
• Biber (2004): the T2K-SWAL Corpus (TOEFL 2000
Spoken and Written Academic Language Corpus)
• ANC: American National Corpus
http://americannationalcorpus.org/frequency.html
6
Schmitt et al’s (2004) study on
processing MUS: textual attributes
• Frequency
• Length
• Transparency in terms of meaning and
function
7
Individual variables in processing multiunit sequences: proficiency
• Research (Hinger & Spottl, 2002; Spottl & McCarthy, 2003, 2004;
Schmitt et al., 2004) indicated that vocabulary size and language
proficiency were two factors in investigating cross-linguistic lexical
operations.
• Spottl & McCarthy (2004), in their cross-linguistic study of
formulaic sequences, argued that without a certain level of general
language proficiency, noticing did not even take place, and word
strings were completely ignored, or simply avoided by learners. L2
language proficiency was defined as scores on a proficiency test
(upper intermediate and advanced level).
• In Schmitt et al.’s (2004) study, the highest level non-native
speakers in the study demonstrated native-like performance
mostly.
8
Individual variables in processing MUS:
working memory
• Working memory can be divided into two main
components: one is phonological short-term memory
(STM), and the other is storage and processing
capacity, referred to as the Central Executive (CE).
• Previous studies showed that WM can affect:
1. L2 syntactic processing and development (e.g., Ellis &
Sinclair, 1996; Ellis, 2001; Juffs, 2004);
2. L2 lexical processing and development (e.g., French,
2003; Papagno & Vallar, 1995);
3. L2 proficiency and aptitude (e.g., Kroll, Michael,
Tokowicz, & Dufour, 2002; Payne & Whitney, 2002;
Service & Kohonen, 1995).
9
Individual variables in processing MUS:
working memory
• Myles et al. (1999) found that STM capacity can predict
the ability to chunk. “Chunking”, in their study, was
defined as the ability to remember set phrases in L2
and later use them appropriately.
• Roberts and Gibson (2001) found high correlations
between sentence memory and complex span;
sentence memory and N-back span. It was argued that
memory for sentences was not simply a result of
linguistic experience; rather, it was likely that an
independent working memory component contributes
to participants’ performance on sentence memory.
10
In sum,
The present study seeks to test the role
of proficiency and WM in L2 English
learners’ processing of high frequency
multi-unit sequences. Influences of
textual attributes of MUS are also
addresses. The study may contribute to
explaining the variances in L2 English
learners’ formulaic language use.
11
Research questions
• What is the relationship between
proficiency, WM and participants’
processing of MUS?
• Do textual attributes of MUS affect how L2
learners process them?
• What are the linguistic features of learners’
reproduction of MUS?
12
Participants
• Thirty-two adult L2 English learners participated in the
present study.
• They were graduate students recruited from a wide
range of disciplines from a big Mid-western university
in the States. The reported TOEFL scores ranged from
570 to 650.
• Participants' ages ranged from 21 to 38, with 10-14
years of formal English learning experience.
• All participants were native speakers of Chinese and
had been living in the United States for less than 2
years.
13
Measuring the variables
• Elicited Imitation (EI) test is used as a measure of learner’s
knowledge of precise grammatical factors (e.g. Hamayan et al.,
1977; Gallimore and Tharp, 1981; Munnich et al., 1994), L2
competence (Baddeley et al., 1998; N. Ellis, 2001), and implicit
knowledge (Erlam, 2006).
• The utterance elicited is argued to reflect the degree to which a
test taker is able to assimilate the stimulus into an internal
grammar (Munnich et al., 1994).
• “The basic idea is that if the stretches of language are long
enough, it overloads working memory, and the person is forced to
reconstruct the content of the dictation via their language
resources, rather than repeating the dictation back from rote
memory. One of those language resources is the inventory of
formulaic sequences stored in memory.” (Schmitt et al., 2004)
14
Measuring the variables
• The Elicited Imitation (EI) test is available at
http://distancelearning.llc.msu.edu/research/chunks/
with assigned ID and password
• Two factors in designing an EI test: sentence length (Bley-Vroman
and Chauron, 1994) and time pressure (R. Ellis, 2005)
• There were two tasks in the EI test
Task 1 was a passage revised based on Schmitt's study (2004),
which contained 25 target multi-word sequences.
Task 2 included 18 target multi-word sequences derived from the
American National Corpus and the T2K-SWAL Corpus. They were
embedded into 18 single sentences.
• Scoring: complete reproduction = 2 points
attempted reproduction with missing lexis = 1 point
missing reproduction = 0 point
15
Measuring the variables
• The Working Memory test included
1. a reverse digit span task (15 items)
2. a word span task (15 items)
• Both span tasks were classical WM tasks.
They were adapted and written by two
researchers. The length of WM test items
varies from 5 to 8 for both reverse digit
span and word span.
16
Summary of the variables
• Dependent variable: processing of MWS as indicated
by participants’ mean scores on the Elicited Imitation
(EI) test
• Independent variables
Individual factors
1. Language proficiency (TOEFL scores within 2 years)
2. Working memory
Textual attributes
1. Frequency
2. Length
3. Transparency in terms of meaning and function
17
Quantitative results
Intercorrelations Among Proficiency, WM and Dictation scores
Scores on the
EI test
Proficiency
Reverse digit
span
Word span
Scores on the EI
test
1.000
Proficiency
.586**
1.000
Reverse digit
span
.333
.323
1.000
Word span
.616**
.551**
.658**
1.000
WM in total
.484**
.449*
.948**
.864**
WM in total
1.000
Note.** Correlation significant at the 0.01 level (2-tailed).
* Correlation significant at the 0.01 level (2-tailed).
18
Quantitative results
Results of Multiple Regression Analysis
Predictors in the
model
R
R2
R2△
F
(Constant)
B
S.E.
-136.647
47.697
Beta
t
Sig.
-2.865
.008
Word span
.616
.379
.357
17.086 **
.942
.377
.420
2.495
.019
Proficiency
.683
.467
.427
11.805 **
.192
.091
.355
2.105
.045
Scores on EI test = -136.664 + 0.942 *word span + 0.191*proficiency
19
Quantitative results
Means, SD and t-tests of Textual Factors: Transparency, Length, Frequency
Independent
variables
Groups
Mean
SD
t
Sig. (2-tailed)
Transparency
High
32.6842
16.63014
3.004
.006**
Low
19.6667
10.07220
High
27.3750
15.29937
0.979
.334
Low
22.9474
13.97805
High
27.1200
16.46795
0.891
.378
Low
23.0556
11.94828
Length
Frequency
Note. * p <0.05 level ** p <0.01 level
20
Qualitative results
• Close examination of the transcribed data showed the following:
• (a) Complementizers in the clauses were not produced in general (e.g.
“that” in multi-word sequences such as “make sure that” and “I
understand that;”)
• (b) Participants reconstructed multi-word sequences in a creative way
(e.g. “in a variety of” was produced as “in varieties of,” “have varieties
of,” and have various (colors);” )
• (c) There were many cases where semantically similar sequences
were produced (e.g. “from the point of view” was replaced by phrases
such as “as to,” “for,” “in terms of;”)
• (d) There were L1 interferences in reproduction (e.g. Three participants
used “day and night” rather than “night and day.”)
•
It is assumed that the participants may have retrieved more frequent
or salient MUS within the same lexical framework (morph-syntax
interface).
21
Discussion
The primary purpose of the present study is
to examine the impact of textual and
individual factors on L2 English learners’
processing of corpus-derived MUS.
22
Discussion: WM and proficiency
• The finding that general proficiency played a role in processing MUS
was consistent with previous studies (Spottl et al., 2002; Schmitt et al.,
2002).
• However, when WM was taken
mixed. Evidence indicated that
differently in the processing of
significant relationship between
performance scores.
into consideration, the results were
different memory tasks functioned
MUS. Specifically, there was no
the reverse digit span and the
• Significant correlation was found between the word span and the
performance scores. This finding was consistent with Roberts and
Gibson’s (2003) view that STM as measured by simple word span may
be a better indicator of individual differences in online processing.
23
Discussion: WM and proficiency
• The findings were also supported by Myles et al.
(1999) who concluded that high-word-span
learners can accumulate more chunks than lowspan learners. The more chunks a learner has, the
more comparisons he/she can carry out to
establish cross-chunk analyses. The more
frequent chunk-internal analyses have been made,
the easier it is to process chunks online.
• However, the results needed to be treated with
caution. This study investigated only a small
number of MUS (43 in total).
24
Discussion: Textual attributes
• Significant differences were only found when MUS were
categorized based on the degree of transparency in terms
of meaning and function. However, there were no
significant differences in terms of processing when MUS
were categorized based on frequency or length.
• A plausible interpretation was that the results had to do
with contextual information, that is, sentences, with the
target sequences embedded, might mitigate the
differences in terms of frequency or length to a certain
extent.
25
Conclusion
• Implications: the relationship MUS and language proficiency
• Robinson (2002) stressed that “WM is only one of a complex
set of cognitive factors that come together to account for
learners’ performance.” In this study, two individual factors
(proficiency and WM as measured by word span) account for
46.7% of the variance of the scores on the EI test. Future
studies might include other variables in order to achieve a
better understanding of MUS processing.
• So, which variables to choose? Do we need a model?
26
Next steps
• Pausing as a significant indicator (R. Ellis)
• Using Chinese EFL learner’s corpus
• A sample of 50 participants and a NS control group
• Data from stimulated recall for qualitative analysis
• Issue of scoring EI test (Prof. Hansen)
• The issue of using an EI test for FS will be addressed in a
follow-up study.
27
T
H
A
谢
Y
N
K
谢
O
U
28