Transcript AILA

Profiling French Vocabulary:
The shape of lexicons by
frequency & coverage
10.45-11.15, Monday, March 23, Session K
Nfld., Room 13, Mezzanine
Tom Cobb
Abstract
•
•
Lexical frequency profiling (LFP; Laufer & Nation, 1995), which has been highly influential in
ESL vocabulary research and instruction, has had a slower beginning in French. This has been
due to lack of access to large corpora of French from which pedagogically relevant frequency
information could be derived. Pioneering efforts in the 1990s (Goodfellow & Lamy, 2002) had
facilitated promising comparisons of the lexical coverage of French and English texts (Author
& Horst, 2004), which had pedagogical implications that were both interesting and practical
(Ovtcharov, Author & Halter, 2006) but inconclusive owing to incompleteness of the
frequency information. Now, however, work behind the Frequency Dictionary of French by
Lonsdale and Lebras (Routledge, 2009) has produced and made available complete and
lemmatized corpus-based frequency information for French. This means that both
researchers and teachers can now in principle use the LFP methodology to explore
thoroughly the lexical composition, sophistication, and ‘richness’ of French texts. To be
discussed will be the method of incorporating the frequency information within an LFP
methodology, examples of the sort of research such profiling makes possible, and the means
by which researchers can access the tools of this analysis and use them for their own
purposes. Representative initial findings from the application of this methodology to French
will be offered, including a suggestion that French deploys its lexical resources rather
differently from how English does and may present unique and previously undefined lexical
challenges to its learners.
Recent corpus work in French has made lexical frequency profiling (LFP) methodology
available to French researchers and teachers. Initial findings suggest that French may present
unique lexical challenges to its learners. Attendees will be shown how to access the tools of
2
this analysis for use in their own work.
Lexical frequency profiling (LFP; Laufer &
Nation, 1995), which has been highly
influential in ESL vocabulary research
and instruction, has had a slower
beginning in French.
3
This has been due to lack of access to
large corpora of French from which
pedagogically relevant frequency
information could be derived.
4
Pioneering efforts in the 1990s
(Goodfellow & Lamy, 2002) had facilitated
promising comparisons of the lexical
coverage of French and English texts
(Author & Horst, 2004), which had pedagogical
implications that were both interesting
and practical (Ovtcharov, Author & Halter, 2006) but
inconclusive owing to incompleteness
of the frequency information.
5
Now, however, work behind the
Frequency Dictionary of French by
Lonsdale and Lebras (Routledge, 2009) has
produced and made available
complete and lemmatized corpusbased frequency information for
French.
6
This means that both researchers and
teachers can now in principle use the
LFP methodology to explore the lexical
composition, sophistication, and
richness of French texts.
7
To be discussed will be the method of
incorporating frequency information
within an LFP methodology, examples
of the sort of research such profiling
makes possible, and the means by
which researchers can access the tools
of this analysis and use them for their
own purposes.
8
Initial findings from the application of
this methodology to French will be
offered, including a suggestion that
French deploys its lexical resources
rather differently from English and
may present unique and previously
undefined lexical challenges to its
learners.
9
The main new idea of the “vocab
revolution” 1990- in ESL/FL
Is Zipf’s old idea that some
words get way more use in any
language
Made recently useable
by computer technology
10
11
12
1, consistency, 2 where to look
13
14
The AWL effect
15
So it was a reasonable question to
ask,
“Is there an AWL in French?”
An interesting question for several
reasons…
This gradually became a question that
could be answered
16
17
18
19
20
FRENCH – v.1
zoom
21
English
ENG 1+2=80, FR 1+2=90
French
22
So French is getting the AWL effect for free
And for fewer words
23
So the question had to be
reformulated:
Is there an AWL in French?
“Is there room for an AWL In French?”
24
25
26
The answered seemed, “No”
1k+2k is already giving 90% coverage
And the remaining 10% is presumably
needed for technical, archaic, & oddball
items
With the implication that acquiring a
functional second lexicon was easier in
French
27
Back to English
1995-2005, a happy picture in ESL vocab
2k+AWL=90% (+technical=95%)
BUT SHORT LIVED
1. The goal of vocab development was recalculated (Nation,
2006)
The Comprehension-Bar got raised
95% coverage  98% coverage
2. The how-to of building tech lists became less clear
3. Bigger, better frequency lists put the existence of an AWL in
question
–
–
BNC lists (2005)
BNC-COCA lists (2012)
But the notion of 2000 words = 80% has pretty much survived
28
VP-BNC-Coca
zoom
29
So the new question about French is ~
Is there room for an AWL In
French?
“How are the medium and low frequency
lexical resources of French deployed in the
remaining 10% space available?”
What does this imply for learning French?
This question gradually became answerable 
30
31
25 lemmatized French k-lists
• From Lonsdale & Le Bras dictionary
project at BYU
• Based on 23-million word corpus
• Continental + International French 50/50
• Spoken and written 50/50
• Literary 40%, expository 60%
• List-crunched for RANGE + FREQ
32
33
34
FRENCH – v.5
35
So now we can investigate the shape
of the mid-frequency French lexicon
And make plausible comparisons with
English
• What lies between 90% and 95%
coverage in French texts?
–Or between 90% and 98%?
• Is there “less to learn” in French than in
English ?
• (Remembering that lemmas ≠ families)
36
3 tests
37
Test 1
Translated popular texts
20 translated Readers’ Digest texts
 20 Fr, 20 Eng
Half translated E->F, half F-> E
Total 2939 words Eng, 3650 words Fr
Run through VP-Fr as a mini-corpus
(as a single file)
38
E
N
G
L
I
S
H
95%
98%
39
F
R
E
N
C
H
95%
98%
40
Eng
(fams)
Side by side
Fr
Using 98% criterion (lemmas)
41
Fr
(lemmas)
• A lot of words in that
blue circle!
• The difference
between k8 to k16 is
only 100 word types
• But these 100 words
are drawn from a
pool of 8,000 lemmas
42
Test 2
Translated extended literary work
• Samuel Beckett’s idea - French
as “an impoverished lexicon”?
• Actually he never said this
• But he did write in French, and
“use stark language to convey a stark world”
• How stark is Beckett’s
French?
43
44
45
“Waiting for Godot”
Proper nouns-<1k has changed the 1k-2k thing
«En attendant Godot»
46
Test 3
Maybe Tests 1+2 were something about
translated texts?
Ok, then let’s compare
4 random original editorial texts
Chosen 14-15 March, 2015
From
(1) Le Monde - Paris
(2) Le Devoir – Montreal
(3) The Globe & Mail – Toronto
(4) The NY Times – New York
47
48
49
Conclusion
(1) Comparing languages:
– French may make slightly more use of its
common words than English does
– But it makes far more use of its mid- and lowfrequency lexical resources (3k to 20k+)
– Cobb & Horst (2004) was right as far as it went, but
incomplete
•
For lack of resources
50
Conclusion
(2) Comparing learning tasks:
Learning enough vocab for 90% coverage
looks slightly easier in French than English
But learning enough words for 98% or even
95% coverage looks far more difficult
How many FL2 S’s ever get there?
51
(3) The shapes of the two lexicons seem to be
like this:
English
95%
98%
52
French
95%
98%
53
54
F
E
But notice that the French early advantage
persists to about 4k
(So 3k words in French gives better coverage than in English)
55
Discussion
• Is the greater ease of acquiring a 90% lexicon in French
a reason for the traditional FL2 emphasis on phonology
and syntax?
• Is it that French is a more “academic/elitist” lexicon…
• Or just that English is less so?
– Maybe the shape of English reflects the lingua franca role
the language has come to play
– Such that its writers use *circumlocution* for complex
ideas, rather than seeking « le mot juste »?
• Flaubert
56
ENGLISH AS A LINGUA FRANCA? BUT SURELY NOT IN 19th CENT.
57
Further work
• As ever in corpus work, this needs empirical
validation
– Do L2 readers with 10k lexicons actually
experience a comprehension deficit?
• As ever in list work, new lists are probably just
around the next corner
– Any picture is strictly provisional
58
Pedagogical implications
• Are there manageable zones within the
French lexicon, like “technical lists”?
– … that could be found through work with
specialist corpora?
• Till then, the message seems to be
– Get out your flashcards!
• At least now we know what to put on them
• OR 
59
60
All chapters + papers + /list_learn/
available at
www.lextutor.ca
Thank you!
[email protected]
61
A method note
• But wait!
• We are comparing lemmas v. families
Cat cats v. cat cats catty
• 1000 families give more coverage than 1000
lemmas
– How much more?
• Some recent work by Charles Browne
suggests an answer
62
http://www.newgeneralservicelist.org/
2368 / 2818 *100 = 84%
1000 lems have ~16% less coverage than 1000 fams in Eng
At High-Frequency NGSL zone (1k+2k)
(probably less at lower frequency zones)
63
But even assuming (1) a 16%
difference that (2) was maintained at
lower-frequency zones
• About every six lemma lists (6 x 16% = 96%)
we would lose a k-level to maintain lemmafamily equivalence
– So in 18 levels we would lose 3
• The picture would not change greatly
– Even in exaggerated worst-case scenario
64
Eng
(fams)
K8 E-fams = k16 F-lems for 98% ?
K8 E-fams = k13 F-lems for 98%
Pattern is the same
Fr
(lemmas)
65