The Role of Corpora in ELT

Download Report

Transcript The Role of Corpora in ELT

Using Corpus Resources in
English Language Teaching
Hilary Nesi
Coventry University
UK
First, a bit of history behind the
“corpus revolution”……..
Two simultaneous linguistic
traditions
• Chomsky’s Syntactic Structures 1957
• Firth’s A Synopsis of Linguistic Theory
1957
The American
Linguistic Tradition
• A branch of cognitive psychology – a sort of
biology
• Data can be derived intuitively and presented in
isolated sentences
• The theory does not have social relevance
• The goal is to develop a system of abstract
principles to account for everything that humans
know about language ‘in advance of experience’
The British Linguistic Tradition
• A branch of social science
• Data must be studied in the context of whole
texts - attested, authentic instances of use
• The theory helps solve practical problems of
language and society
• The goal is to study meaning. Language is
social interaction, and transmits culture.
Competence and Performance
‘Competence’ = knowledge of the
language system
‘Performance’ = language that is
actually produced
Performance contains lots of
human errors
For example, lecturers say things like:
we'll local government will have had a
dialogue with their communities and
produced a s-,er agenda twent-, a Local
Agenda twenty-one nineteen-ninety-six
(BASE corpus sslct012)
And writers make mistakes with
spelling etc…
• On the other hand, diphtheria toxin, which
inhibits protein synthesis in eukaryotic
ribosomes, has no affect on the protein
synthesis
(BAWE corpus 0006c)
So is our own competence a
better guide to the language
system?
We can invent perfectly formed
sentences, even though we all make
slips, false starts, spelling mistakes etc.
when we use the language…..
However..
• There are lots of aspects of language use
we can’t invent because we are not
consciously aware of them…….
Fluent language users are usually quite
good at identifying:
• whether a sentence is correctly structured
• whether a word is frequent or rare, formal
or informal
• the collocations and connotations of
words
But they are not very good at
identifying:
• the relative frequency of different
grammatical structures
• small differences in meaning between
different structures
• whether a word is likely to occur in some
sorts of contexts but not others
For example,
Which is used more frequently, the simple or
the progressive form of the present tense?
What is the difference in meaning
between
‘take a job’ and ‘take on a job’?
Why does it sound odd to say:
‘I am deeply happy’?
Most people can’t answer these
sort of questions intuitively
• That means that most teachers would not
be able to provide this sort of information
“off the cuff”
• It also means that the answers are not in
reference books compiled by experts who
rely on competence rather than
performance
The old way of compiling
dictionaries and grammar books…..
Compilers relied on:
• their own intuition
• examples of unusual usage that they had
noticed (usually while reading)
The new way of compiling
dictionaries and grammar books…..
Compilers look at:
• Thousands of ordinary examples of
language use – i.e. they examine
performance, rather than relying on
competence
This is the “corpus revolution”
• The thousands of ordinary examples of
language use can be found in a corpus
• Nowadays most new dictionaries,
pedagogical grammars, descriptive
grammars and textbooks are corpusbased
These books are based on corpora
such as:
• The British National Corpus (100 million
words)
• The Bank of English – 524 million words
Started in 1991, but new data is acquired
continuously from websites, newspapers
etc.
A corpus is….
“a collection of pieces of language text in
electronic form, selected according to
external criteria to represent, as far as
possible, a language or language variety
as a source of data for linguistic research.”
(Sinclair 2005)
http://ahds.ac.uk/linguistic-corpora/
“text in electronic form”
so that large amounts of text can be
consulted very quickly, using
concordancing software –
it used to take years to count the number
of times a word occurred in a book – now
it only takes a second to find it in a
thousand books.
Corpus analysis can reveal:
• the relative frequency of different
grammatical structures
– e.g. it goes is very common, it had been
going is very rare
Corpus analysis can reveal:
• small differences in meaning between
different structures
- e.g. take a job and take on a job
take a job
1.
2.
3.
4.
5.
6.
7.
8.
to pay off, she cannot now take a job paying less than pounds 12,000 a
river. He is now leaving to take a job in Brussels as a European
a kitchen assistant before taking a job as a pizza delivery driver 18
x years. Three years ago I took a part-time job and have received my tax
eir boy to be a lawyer. He took a job with the Ministry of the Interior but
e neuroses.' At 16, Moore took a summer job working on the chassis line
moving to New York, she took a modelling job and, while doing an ad for
block any move for him to take another job in football." Little would see a
take on a job
1.
2.
3.
4.
5.
6.
7.
8.
Whitbread is strong. Why take on the job of scrapping excess
ays be people unwilling to take on the stressful job-loads most
A group of students could take on the job of compiling the electoral
teaching qualification to take on a demanding job from which you
oes not improve when he takes on the job of defending Boston's
pounds 200,000. Now he takes on an unpaid job for an organisat
He's fat, he's 53 and he's taking on a stress-loaded job. He may be
ated plants, while women took on the job of grain preparation.‘
What’s the difference?
Take a job occurs in contexts which
state what sort of job it is, it means to
take employment.
Take on a job means to assume
responsibility for a task, paid or unpaid.
Corpus analysis can reveal:
• whether a word is likely to occur in some
sorts of contexts but not others
- i.e. what is wrong with ‘I am deeply
happy’?
‘I am deeply happy today’
i am deeply honored today
Hume was deeply worried about the view
we are so deeply indebted to
we have always been deeply grateful for the
that can be deeply offensive to people
to express very deeply held feelings of vulnerability
what accounts for stability in deeply divided societies?
horrifying and reprehensible, but also deeply puzzling
‘just like that’
“We’ll connect you to a network just like that”
(advertisement for an internet service provider)
How to consult one of the large
corpora for yourself…
• The British National Corpus
Searchable at www.natcorp.ox.ac.uk and
http://corpus.byu.edu/bnc
• The Bank of English
Sample available at
www.collins.co.uk/Corpus/CorpusSearch.aspx
• The Corpus of Contemporary American
English (COCA) - 400+ million words.
Searchable at http://www.americancorpus.org
The BNC interface
Part of the COCA interface
You can also consult other online concordancers,
e.g. http://springerexemplar.com/
But… is it worth your while to
consult a corpus?
Advantages:
• A corpus can provide language
information that is not available in
reference books
• A corpus can provide lots of authentic
examples of the way words and phrases
are used in context – you can use these
as a basis for teaching materials
Disadavantages
• Concordance lines can be difficult to
interpret
• Because they are examples of
performance – real rather than idealised
language use - they may contain errors
• A corpus may not represent the kind of
language your students need to
produce……
A corpus is ‘selected …. to represent
….. a language or language variety’
• BUT “no corpus, no matter how large,
how carefully designed, can have exactly
the same characteristics as the language
itself”
Any corpus will under-represent:
•
some types of language user
•
some types of text
“small” corpora
• Created to represent types of language
that are inadequately represented in the
very large corpora used for dictionaries
and reference grammars
For example
• British Academic Written English (BAWE) –
(6,506,995 words). http://www.coventry.ac.uk/bawe
• British Academic Spoken English (BASE) –
(1,644,942 words) http://www.coventry.ac.uk/base
• The Michigan Corpus of Academic Spoken English
(MICASE) – (1,848,364 words)
http://www.hti.umich.edu/m/micase/
• The Michigan Corpus of Upper-level Student Papers
(MICUSP) - (roughly 2.6 million words)
http://micusp.elicorpora.info/
A query tool for BASE and BAWE
Corpus design
• What kind of (small) corpus would you like
to create?
• What would it represent?
• How would you go about collecting
representative data?
Some uses of corpora
•
•
•
•
•
•
•
Word counts and word lists
The production of dictionaries and grammars
The study of idiom and collocation
Diachronic studies
The study and comparison of language varieties
The study of language acquisition
The production of learning materials
etc….