How I Learned to Stop Empiricising and Love my Intuitions

Download Report

Transcript How I Learned to Stop Empiricising and Love my Intuitions

How I Learned to Stop Empiricising
and Love my Intuitions
Or: Why corpus research is like a tornado
DOUGAL GRAHAM – DOUGAL.GRA@KMUT T.AC.TH
Me & My Research
o Computational background
o Academic Formulas List
(Simpson-Vlach, et al, 2010)
o AFL for Engineering English
Q
1. In which genre (spoken, fiction, newspaper, academic) is
shall used most and in which the least, compared to will?
2. Put the following verbs in order of frequency (high to
low): promise, shine, finish, enable, jump.
3. Which of the following would occur more frequently
with little, and which with small: success, plate, hill,
baby, impact, pieces, wonder, distance.
(Davies, 2011)
Empirical approaches
o Phrase research: “as shown in chapter”
o Phrase list plus…
o Empirical metrics:
o Frequency
o Range
o Mutual Information
o LL
o FTW (Simpson-Vlach et al, 2010)
Results
Three Words
what is the
the number of
as shown in
# and #
Four Words
can be used to
as a function of
the magnitude of the
as shown in figure
can be used
with respect to the
shown in figure
the value of
in this chapter we
the value of the
Five Words
at a rate of #
you should be able to
beyond the scope of this
how long will it take
the
first
law
of
thermodynamics
in such a way that
the rate of change of
Intuitively…
o Results not so useful
o Goal: “A useful list of formulaic Eng. phrases”
o Re-visit metrics
o Frequency
o Range
o Mutual Information
o LL
o FTW (Simpson-Vlach et al, 2010)
Re-evaluation
o Intuitively, the results weren’t useful
o Confusion
o Martinez & Schmitt’s PHRASE List
o Intuitive criteria
Problems
o AFL approach
o results not sufficiently useful
o are the assumptions warranted?
o PHRASE List approach
o Criteria very intuitive
o Hand-sorting 15,000 items
Liking my intuitions
o Needs to be useful for learners
o Should be difficult language
o How can we determine the language that will be
difficult?
Results
Three Words
what is the
the number of
as shown in
# and #
Four Words
can be used to
as a function of
the magnitude of the
as shown in figure
can be used
with respect to the
shown in figure
the value of
in this chapter we
the value of the
Five Words
at a rate of #
you should be able to
beyond the scope of this
how long will it take
the
first
law
of
thermodynamics
in such a way that
the rate of change of
Semi-empirical
o marked part of speech
“for a given”
o marked word form
“is known as”
o marked collocations
“under the action of”
Semi-Intuitive
o non-prototypical word meaning
“let us consider”
o non-literal phrase meaning
“we can write”
o specialized syntax
“let X be”
Empiricism
Intuition
1. Empiricism
2. Intuitive re-evaluation
3. Semi-empirical criteria
4. Semi-intuitive criteria
5. Results
Embrace the tornado
Final Points
o Embrace the tornado
o Iterative design
o Precision vs. Recall
Selected References
Davies, M. (2011). Synchronic and diachronic use of corpora. In V. Viana,
S. Zyngier, & G. Barnbrook (Eds.), Perspectives on corpus linguistics (Vol.
48, pp. 63–80). John Benjamins Publishing.
Martinez, R., & Schmitt, N. (2012). A Phrasal Expressions List. Applied
Linguistics, 33(3), 299–320.
Simpson-Vlach, R., & Ellis, N. C. (2010). An Academic Formulas List: New
Methods in Phraseology Research. Applied Linguistics, 31(4), 487–512.
doi:10.1093/applin/amp058