投影片 1 - 台中市立西苑高級中學

Download Report

Transcript 投影片 1 - 台中市立西苑高級中學

語料庫與語言教學研究
國立台灣師大英語系
陳浩然
[email protected]
1
大綱
• 語料語言學是近年來語言研究發展迅速的
學門之一,有關此學門的研究及網站不斷
增加。此工作坊將探討語料與語言教學研
究的關係。主要的講授內容包含:
• 一、著名的各類語料庫
• 二、如何蒐集語料及建構語料庫
• 三、分析語料的軟體介紹
• 四、語料庫的檢索及分析
• 五、現有語料庫網站介紹
2
一、著名的各類語料庫
•
•
•
•
•
•
•
•
What are the most famous cases?
Brown
LOB
BNC
Bank of English
GigaEnglish
And then the Google and what else?
What else?
3
The other types of corpora
•
•
•
•
•
•
•
Written corpus
Spoken corpus
Academic corpus
Learner corpus
Specialized corpus
Business corpus
Bilingual corpus
4
National Corpora
•
•
•
•
•
•
•
•
•
•
•
The British National Corpus
The American National Corpus
The Polish National Corpus
The Czech National Corpus
The Hungarian National Corpus
The Russian Reference Corpus
The CORIS corpus
The Hellenic National Corpus
The German National Corpus
The Slovak National Corpus
The Modern Chinese Language Corpus
5
Corpora of the Brown family
•
•
•
•
•
•
•
•
•
Brown
Frown
LOB
Pre-LOB
FLOB
Kolhapur
ACE
WWC
LCMC
6
Collins
• COBUILD
– John Sinclair at University of Birmingham
– originally 20 m words
– now over 300 m word BANK of English
– the more the better
– no fixed size: the idea of a Monitor corpus
7
BNC
8
ANC
9
Spoken corpora
•
•
•
•
•
•
•
•
•
•
•
•
•
The London-Lund Corpus
SEC, MARSEC and Aix-MARSEC
The Bergen Corpus of London Teenage Language
The Cambridge and Nottingham Corpus of Discourse in English
The Spoken Corpus of the Survey of English Dialects
The Intonational Variation in English Corpus
The Longman British Spoken Corpus
The Longman Spoken American Corpus
The Santa Barbara Corpus of Spoken American English
The Saarbrücken Corpus of Spoken English
The Switchboard Corpus
The Wellington Corpus of Spoken New Zealand English
The Limerick corpus of Irish English
The Hong Kong Corpus of Conversational English
10
Academic and professional English
corpora
1. The Michigan Corpus of Academic
Spoken English
2. The British Academic Spoken English
corpus
3. The Reading Academic Text corpus
4. The Academic Corpus
5. The Corpus of Professional Spoken
American English
6. The Corpus of Professional English
11
Developmental and learner corpora
1. The Child Language Data Exchange System
2. The Louvain Corpus of Native English Essays
3. The Polytechnic of Wales corpus
4. The International Corpus of Learner English
5. The LINDSEI corpus
6. The Longman Learners’ Corpus
7. The Cambridge Learner Corpus
12
Chinese EFL Learner Corpus
• China: from CLEC (Yang and Guei) and
several other universities. (including both
written and spoken data)
• Taiwan: Iwill of TKU, Soochow U, NTNU,
NTHU, NTU, and several others.
• Hong-Kong: about 10-million from Milton,
John.
13
Chinese Learner Corpus
14
Online Search CLEC
15
Multilingual corpora
1. The Canadian Hansard Corpus
2. The English-Norwegian Parallel Corpus
3. The English-Swedish Parallel Corpus
4. The Oslo Multilingual Corpus
5. The ET10/63 and ITU/CRATER parallel corpora
6. The IJS-ELAN Slovene-English Parallel Corpus
7. The CLUVI parallel corpus
8. European Corpus Initiative Multilingual Corpus I
9. The MULTEXT corpora
10. The PAROLE corpora
11. Multilingual Corpora for Cooperation
12. The EMILLE Corpus
13. The BFSU Chinese-English Parallel Corpus
14. The Babel Chinese-English Parallel Corpus
15. Hong Kong Parallel Text
16
Chinese English Bilingual/Parallel
Corpora
•
•
•
•
•
•
Chinese English search
Mainland China
Hong-Kong
TaiwanNational Tsing-Hua University NLP Lab
National Taiwan Normal University LLRC
17
The Trouble with Google
• not enough instances (max 1000)
• not enough context
– ca 10-word snippet around search term
• ridiculous sort order
– search term in titles and headings
• linguistically dumb
– not lemmatised
• think/thinks/thinking/thought: four searches
– not POS-tagged
• mixes up beat (n) and beat (v)
– and why not parsed
18
The Internet as (virtual) corpus
19
History
109
Size
108
(in
words) 107
106
1960s 1970s 1980s 1990s 2000s
Brown/LOB COBUILD BNC Gigaword
2010
?
20
Annotated Corpora
• What are the possible annotations?
• Leech highlighted the importance of
tagged corpora in corpus research.
21
Annotation of Corpora
•
•
•
•
•
A. Part-of-speech tagging
B. Syntactic annotation
C. Semantic annotation
D. Discourse annotation
E. Pragmatic annotation
22
A. Part-of-speech tagging
LOB sample with POS tagging
A01 2 ^ *'_*' stop_VB electing_VBG life_NN peers_NNS **'_**'
._.
A01 3 ^ by_IN Trevor_NP Williams_NP ._.
A01 4 ^ a_AT move_NN to_TO stop_VB \0Mr_NPT
Gaitskell_NP from_IN
A01 4 nominating_VBG any_DTI more_AP labour_NN
A01 5 life_NN peers_NNS is_BEZ to_TO be_BE made_VBN
at_IN a_AT meeting_NN
A01 5 of_IN labour_NN \0MPs_NPTS tomorrow_NR ._.
2323
23
Syntactic annotation
• Parsing
• the (automatic) analysis of texts (sentences) in terms of
syntactic categories
S
NP
NP
VP
ADJP
NP
PP
NP
NP
Pierre
Vinken
61
old will join
years
the
as an executive Nov 29
board
director
24
Treebanks
• Geoffrey Sampson
• Meticulously hand-crafted syntactic annotation
– SUSANNE
– CHRISTINE
– LUCY
• Penn-Treebank
– University of Pennsyvania
– Massive amounts of utomatically annotated data aimed for natural
language processing work
25
Discourse annotation
• example: anaphoric relations in the
IBM/Lancaster corpus (UCREL)
• try to build up sth. like an ‘anaphoric treebank’
• what are anaphoric relations?
– links between a proform and an antecedent
– example:
The married couple said that they were happy
with their lot.
The married couple said that they were happy
with their lot.
26
Sense tagging
27
Linguistic Data Consortium
28
Fields where corpora are used
• Lexicography to design dictionaries
• Language studies (relations between
languages, differences between genre,
evolution of the language)
• Computational linguistics (training and
testing methods)
• Language teaching (learner’s corpora)
• Cultural studies, psycholinguistics
29
Using corpora
• Application & evaluation
– Linguistic analysis
• Frequency & distribution
• Grammatical analyses
• Phraseology & pattern grammar
• Collocation analysis
• Contrastive analysis
– Applied analysis
• Discourse analysis
• Learner language analysis
• Literary analysis
• Translation studies
30
Using corpora
• Application & evaluation
– Applied in Education
• Lexicography
• Material preparation & presentation
• Data-driven learning
• E-learning
• Action research: problem solving
– Applied in NLP
31
二、如何蒐集語料及建構語料庫
• How about the ideas of creating your own
work?
• And where to find suitable materials
• Free data?
32
How to find and create your own
corpora?
•
•
•
•
•
•
What are the possible resources?
Gutenberg as a resource of corpus
Wikipedia as a corpus
Web articles
ESP materials
Learner data
33
Gutenberg Texts
34
Wikipedia text files
35
VOA News also available
36
How do collect EAP corpora?
• Collecting some academic articles you
want to work on.
• There are various web sites, for instance,
the LLT journal is available online.
37
PDF journal papers and others
• Some PDF files can be converted and
there are some tools you can use to
convert PDF
• And then you can process the texts with
the help of concordancers.
38
How about Collect Learner Corpus?
•
•
•
•
•
•
Taiwan learner corpus.
Begin to work learner data
Collect students’ writing assignments.
High school
College
Graduate level
39
JCEE
• Collect the data from your own students
• THE JCEE wants to have something
• The class you can collect their writing
samples for analysis/
40
How to collect bilingual texts?
•
•
•
•
•
Where to find suitable materials?
The bilingual news at VOA
The FTV bilingual news
The Sinorama (used by NTHU CANDLE)
The Studio Classroom texts.
41
How to Collect other texts?
• 1. It would more difficult to collect spoken
data.
• 2. However, it is very useful to have
spoken learner corpus (EAP or non-EAP)
• 3. How about video taping the classroom
discourse.
42
Classroom Discourse: Score
43
Web as Corpus
• Some web sites might offer tools to search
the web.
• Google ngram might be available.
• Downloadable huge corpus
44
WAC
45
三、分析語料的軟體介紹
•
•
•
•
•
If you have the data, then what you can do?
The commercial tools? Some are useful
Monoconc pro
Wordsmith
And free tool like Antconc and others
46
Ways to exploit a corpus
•
•
•
•
•
Word (token) / types frequency lists
N-grams
Concordances
Collocations/collegations
Specially designed programs (when the
corpus is annotated)
47
PC Concordancers
•
•
•
•
•
•
PC Concordancers
Monoconc
Concordance
Wordsmith
Others
freeware
48
Several Concordancers
49
An example of concordancing
(with Wordsmith)
50
The Monoconc
pro by Michael Barlow
51
More Concordancers and Tools
• Not-free
– R.J.C. Watt's Concordance, Michael Barlow's MonoConc,, and Mike
Scott's WordSmith Tools have similar features, and all cost a pretty
penny, though time and/or data-limited versions are available. All are
Windows programs. Most of their functionality is available on free tools.
• FREE
– Xaira is the successor SARA for the BNC+XML. It can be used with
other files. The ANC can be modified to be used with Xaira. It is easy to
add simple XML markup to a text file--in fact, Xaira Tools will do it for
you. Because it indexes the corpus, it is very fast to use, which
WSTools is not with a large corpus like ANC. Quite powerful and did I
mention free.
– Kfngram makes ngram indices of any text(s) you give it. Like WSTools'
Cluster function, but free. Works on Windows.
– As noted above ConcApp is very servicable and free.
52
SARA: corpus list
53
54
Try some of these tools
•
•
•
•
•
The Antconc from Japan
The ConcApp from Hong Kong
The Kfngram from USA
Hands-on session.
Also download the trial version.
Wordsmith and Monoconc pro
55
History of Tagging Methods
Trigram Tagger
(Kempe)
96%+
DeRose/Church
Efficient HMM
Sparse Data
95%+
Greene and Rubin
Rule Based - 70%
1960
Brown Corpus
Created (EN-US)
1 Million Words
HMM Tagging
(CLAWS)
93%-95%
1970
Brown Corpus
Tagged
LOB Corpus
Created (EN-UK)
1 Million Words
Tree-Based Statistics
(Helmut Shmid)
Rule Based – 96%+
Transformation
Based Tagging
(Eric Brill)
Rule Based – 95%+
1980
Combined Methods
98%+
Neural Network
96%+
1990
2000
LOB Corpus
Tagged
POS Tagging
separated from
other NLP
Penn Treebank
Corpus
(WSJ, 4.5M)
British National
Corpus
(tagged by CLAWS)
56
POS Taggers
•
•
•
•
•
•
Why do we need to have taggers
Brill’s tagger
Tree-tagger
Go-tagger
Lancaster CLAW site
http://www.comp.lancs.ac.uk/ucrel/claws/tri
al.html
57
Gotagger
58
Tree-tagger
59
Any other tools-Many tools are
available
60
LLT journals: Tool for MWE
extractions
61
KFngram
62
Parser
63
Parsing
• Parsing is more demanding task
involving not only annotation but also
linguistic analysis.
• Treebank:
collections
of
labeled
constituent
structures
or
phrase
markers.
• A parsed corpus provides a labeled
analysis for each sentence to show how
the various words function.
• Two main approaches: probabilistic and
rule-based
64
四、語料庫的檢索及分析
• The directions:
• 1. The description of English language
• 2. The use of corpora in language
classroom
• 3. The analysis of learner language
65
Special Issue
Using Corpora in Language Teaching and
Learning
•
•
•
•
Genres, Registers, Text Types, Domain, and Styles: Clarifying the Concepts and
Navigating a Path Through the BNC Jungle
David YW Lee
Lancaster University, UK
pp. 37-72
Text Categories and Corpus Users: A Response to David Lee (Commentary)
Guy Aston
University of Bologna, Italy
pp. 73-76
An Evaluation of Intermediate Students' Approaches to Corpus Investigation
Claire Kennedy & Tiziana Miceli
Griffith University, Brisbane
pp. 77-90
Looking at Citations: Using Corpora in English for Academic Purposes
Paul Thompson
Reading University
Chris Tribble
King's College London University & Reading University
pp. 91-105
66
•
•
•
•
•
Lexical Behaviour in Academic and Technical Corpora: Implications for ESP
Development
Alejandro Curado Fuentes
University of Extremadura, Spain
pp. 106-129
Teaching German Modal Particles: A Corpus-Based Approach
Martina Mollering
Macquarie University, Sydney
pp. 130-151
The Emergence of Texture: An Analysis of the Functions of the Nominal
Demonstratives in an English Interlanguage Corpus
Terry Murphy
Yonsei University, Seoul
pp. 152-173
Exploring Parallel Concordancing in English and Chinese
Wang Lixun
The Open University of Hong Kong
pp. 174-184
A Case for Using a Parallel Corpus and Concordancer for Beginners of a
Foreign Language
Elke St.John
University of Sheffield, UK
pp. 185-203
67
字詞研究(Lexical research)
• 利用語料庫探討語言現象,一般最容易想到的是字詞研究(Lexical
research),因為在語料中找單字或字串及其上下文最容易,確實,
以語料為基礎的字詞研究歷史最悠久、研究者也最多。字詞研究大致
上的方向有:
• 字義:一字多義時,在何種上下文中作何意義。如一個有趣的句子:
If you dog a dog during the dog days of summer, you'll be a dog
tired dogcatcher (Wall, et al. 1996)此句中多個dog,意義迥異。
• 字頻:藉以判斷編寫中小型字典時應取哪些高頻字彙。
• 單字使用:來自道地英語文的實錄句子,提供了最佳的使用例證。
• 單字的詞類:如alarm一字是名詞、也是動詞,可找出其分別的用法。
• 字詞搭配(collocation)模式:某字應與何字聯用,譬如:learn*
knowledge,正確的用法是gain knowledge。
• 字根與字首、字尾之間的關係:如impossible, irregular,同一否定字
首,拼法卻是隨著緊鄰的字母發音而變化的。
• 慣用語的使用:出現頻率最高的慣用語、某慣用語的上下文或語境等。
•
68
文法(Grammar)研究:
•
•
•
•
•
•
•
•
•
片語的指認(identification)
子句的指認
片語類型,如名詞片語、動詞片語的指認
子句類型,如主要子句、從屬子句的指認
時態與上下文之間的關係
字彙與文法之間關係(Lexico-Grammar)的研究:
單字與文法結構的搭配模式
各種文法結構之間單字使用的比較
數個相似詞(synonyms)在不同文法結構之間使用的比
較
69
語言習得與發展(Language acquisition
and development)研究
•
•
•
•
•
•
從大量學習者的語言特色中尋找習得模式
母語習得與外語習得的比較
從大量學童作文中審視寫作能力的發展
從大量成人寫作中審視寫作風格的發展
電腦語言學(Computational Linguistics)
語料語言學與電腦語言學在標示及搜尋技巧上有許多
共通之處,只是電腦語言學的重點在於NLP(Natural
Language Processing),譬如電腦翻譯(Machine
Translation)研究,事實上,電腦翻譯可以利用語料語言
學中的平行語料庫來進行研發的。
70
語/文體(Register)研究:
• 言談(Discourse)研究:
• 跨指稱(reference)類型的名詞與代名詞對比
• 跨語/文體(register),如會話與文言之間的照應詞(anaphoric
expression)用法比較
• 某一文本(text)裡的文法修辭特色與結構發展
• 不同類型的文本裡的指稱標示(reference marking)
• 語/文體(Register)研究:
• 某一種語/文體裡的語言特色
• 跨語/文體譬如:口語及書寫之間;各種書寫體之間的特色比較
• 跨語/文體的字頻比較
• 跨語/文體的詞素(morpheme)比較
• 跨語/文體的名詞化結構(nominalization)比較
• 跨語/文體的語法結構(syntactic structure)比較
71
外語教學
•
例如,慣用語是語言學習過程中較難的一環,
許多「資深學習者」(advanced learner)對於
很常見的慣用語,往往都還不甚瞭解或是完全誤
解,因此,目前英國諾庭漢(Nottingham)大學
就有一個以語料庫為基礎的「外國學生在英文慣
用語上的習得」研究計畫,其中使用一億字的
CIC(Cambridge International Corpus)來搜尋
最常見的慣用語,及常見慣用語所出現的上下文
或語境,然後與學習者對於此兩者的認知做比較、
分析研究。
• From 台北科技大學應用英文系副教授黃希敏
72
What analysis you can do for that?
•
•
•
•
•
•
•
1. word Freq
2. lexical bundles
3. check the usage and contexts
4. Collocation patterns
5. Find the MWEs
6. Sentence structures
7. Finding information from tagged corpora
73
Ideas for using concordancers
with students
•
•
•
•
•
•
Ideas for using concordancers with students
Concordancers can be extremely useful for the creation of your own teaching
materials or for students to do some research for themselves. Here some ideas that I
have tried.
Exploring collocations. When you get students to learn new words you can ask
them to enter those words into the concordancer and see if they can find and record
other words which are commonly used with them.
Looking at their own errors. If your students commonly make collocation errors,
instead of correcting them yourself you can ask the student to put the route word into
the concordancer and see if they can discover what the error is for themselves.
Understanding different uses / meanings. When your students are learning words
which have multiple meanings, you can collect example sentences from a corpus and
get the students to group the sentences according to their meaning.
Finding genuine examples. Once you have taught your students some new words
or phrases, you can get them to use the concordancer to find and record their own
examples of the words being used. If you teach a specific use of a word or phrase
you could get them to make sure the example they find matches the use you have
taught.
Materials creation. Teachers often produce gapfill activities that have a group of
words they want to teach or revise, and a number of sentences that the students
must put the correct words into. Using a concordancer and a simple word processing
programme you can get authentic text with which to create your own gapfill activities,
or even get your students to create the materials themselves.
74
Some Studies in Taiwan
• 本研究採用商業詞彙分析軟體WordSmith第四版
為主要的研究工具,探討以下四個研究問題:
(一)在每個系列中,出現頻率最高的連貫副詞
為何?(二) 教科書的難易度與連貫副詞的數量
及種類的關係為何?(三)連貫副詞出現的位置
為何?(四)連貫副詞的種類在四個語義類別中
的數量為何?研究發現,寫作及口 語用的連貫副
詞皆出現在同一本教科書中,而書內並沒有進一
步說明口語和書寫之差別,且在每系列中,出現
頻率最高的連貫副詞皆為口語用連貫副詞。
•
高雄第一科大英語為第二語言或外國語言寫作教科書連貫副詞之詞料庫研究
游秋暖
75
由短劇演出及學生互評的研究
談大一英文的教學活動
•
江敏之 政大英語系
•
本章各段落首先提供利用電腦和人工所得到的各項統計資料,各種詞、句分類及其例句,接著再討論這些數字和文
字所代表的意義,以呈現學生的英文寫作在文句結構和語言功能上的特色。
–
•
•
•
•
•
•
詞彙使用概況及其意義
利用Wordsmith 3.0中的Wordlist 及Concordance,可以得到以下各種數字及文字資料,它們反映了本班同學
英文詞彙的使用概況。
一、基本數據 --- 每位同學以及全班同學的(1)總詞彙數、(2)各種詞彙數、(3)各種詞彙數與總詞彙數之比例、
以及(4)詞彙的平均長度(見附錄一)。
詞頻排行榜---使用頻率最高的前50詞(demo版之限制)。
詞頻排行榜前十名:you (447)、 is (249)、 the (236)、and (232)、a (185)、
your (156)、good (150)、are (132)、very (130)、I (93)
è You are good. You are very good.
•
出現頻率最高的這十個詞分別屬於代名詞 (you、your、I),Be動詞 (is、are),冠詞(the、a),連接詞 (and),形
容詞 (good),及副詞 (very)。利用這些詞彙,我們可以造出You are good 以及 You are very good兩個句子;有
趣的是,這兩句話也最精簡地代表了本班同學在評論彼此之演出時的基本看法。
•
以下我們從前50個使用頻率最高的詞彙中,選擇最具代表性的詞彙,分別歸入人稱代名詞、Be 動詞、名詞、形容
詞及副詞、連接及冠詞等詞類,針對每類提供例句,並討論這些高頻詞所代表的意義。
76
David Lee conference Links
77
Linguistic and Functional Analysis
of EAP
• Adding Information
• Comparing and Contrasting: Describing similarities and
differences
• Exemplification: Introducing examples Expressing Cause
and Effect
• Expressing Personal Opinions
• Expressing Possibility and Certainty
• Introducing a Concession
• Introducing Topics and Related Ideas
• Listing Items
• Reformulation: Paraphrasing or clarifying Quoting and
Reporting
• Summarizing and Drawing Conclusions
78
Spoken Features
• Descriptive Studies
• we are happy to report that the research projects involving MICASE
--both in-house and externally-- became too numerous and difficult
for us to keep track of. Most of our early studies consisted of various
kinds of “forays” into the database as we searched for both
significant findings and for ways of effectively investigating them.
One of the earliest, conducted by graduate student research
assistant Stephanie Lindemann and external consultant Anna
Mauranen, took the much-studied little word “just.” They have been
able to show that in the MICASE data available at that juncture only
8% of its occurrences are concerned with time (as in “the mail has
just arrived”). Although this temporal use tends to be stressed in
ESL pedagogical materials, the primary function of “just” in
academic speech (about 75% of occurrences) is to soften comments
and suggestions, as in “if we could just move on to the second
issue.”
79
Teaching and Testing
Developments
• Data collected for MICASE has also been used in
instructional materials projects. Elizabeth Axelson, ELI
Lecturer, used transcripts and sound files to develop
training materials for ITAs (International Teaching
Assistants), focusing on linguistic aspects of interactive
teaching. The materials developed by Axelson, along
with other discipline specific instructional materials for
ITA training, are available on the “Discourse within the
Disciplines” web pages at
http://deil.lang.uiuc.edu/TESOL/index.html. Susan
Reinhart has also incorporated MICASE data in her
forthcoming textbook on oral presentations (U-M Press).
We have made some of our in-house teaching materials
based on MICASE available at
www.lsa.umich.edu/eli/micase/teaching.htm
80
MICASE Kibbitzer
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Criteria/data: Uncountable or Countable? (by John Swales)
Among or Between (by Faye Reinhard and John Swales)
Hyperbole in Academic and Research Speech (by Aaron Ohlrogge & Judy Tsang)
Less and Fewer? (by John Swales and Faye Reinhard)
Making suggestions in MICASE (by John Swales & Sheryl Leicher)
Modal contractions in MICASE: The case of will/'ll (by Sara Pilon & John Swales)
No way (by Aaron Ohlrogge)
Announcements of Self-repair (by Stephanie Marx & John Swales)
"Anyone" and "anybody" in MICASE (by John Swales)
The distribution of anaphoric so in MICASE (by Jennifer McCormick & Sarah
Richardson)
Pre- and post-dislocations in MICASE (by Rebecca Maybaum & John Swales)
Vocatives in MICASE (by Jennifer McCormick & Sarah Richardson)
‘End up’ in MICASE (by Rafael Alejo, Annelie Adel, Jamie Kruis & John Swales )
Interactional Query formulae in MICASE: “you know what I mean?" (by John Swales )
How to do MICASE-based Investigations:
Starting with a Word: Concern (by John Swales)
Starting with a Functional Category (by John Swales)
81
Learner Corpus-Granger
• How about the learner corpus we can all
check the web site of Granger and we can
get a lot of information from their web site.
•
82
Center for ECL
83
Some Possible Topics for Learner
Corpus
84
五、現有語料庫網站介紹
• If you do not want to develop your own
and you only want to use the existing tools
• There are some options? Or if you want to
combine your tools and some other web
sites?
• There are existing web sites which might
help you to use their tools and it is very
useful to search their web sites.
85
Web Concordacners
•
•
•
•
Hong Kong Polytechnic
National Chiaotung
National Tsing-Hua
National Taiwan Normal
86
Corpus Workbench-tools
87
Hong-Kong VLC
88
Web Concordances
89
American national corpus
Downloadable
90
BNC Search
91
PIE and BNC
92
View of BYU
• VIEW (Variation in English Words and
Phrases)
– http://view.byu.edu/
– Concordancing tool for the British National
Corpus
– A powerful concordancing tool
– Has a useful tutorial
• Click on what you want to do to see samples of
searches
93
BYU BNC
94
American English:360 Millions
95
British National Corpus: 100 million words, UK, 1980s-90s; end up Ving
96
British National Corpus: end up Ving by register / genre
97
BNC: end up Ving by “micro”-register
98
Oxford English Dictionary (OED): 37m words, 2.2 million quotations
99
Oxford English Dictionary (OED): 37m words, 2.2 million quotations
100
Collins COBUILD
101
MICASE
102
MICASE Search
103
BASE Corpus
104
Business English
105
Lexical Tutor
106
Just the Word
107
Sketch Engine from UK
• The very famous Sketch Engine is another
good tool for searching collocations and it
is very useful for finding…
108
Sketch Engine
109
SKE
110
Connection between Corpus and
CALL
111