Transcript Slide 1

Computational Linguistics
What is it and what (if any) are its
unifying themes?
Computational linguistics
1
I often agree with XKCD…
2
computational linguistics linguistics?
physics chemistry
biology neuropsychology
psychology
literary
criticism
morerigorous
less
flakey
more rigorous
3
What defines the rigor of a field?
• Whether results are reproducible
• Whether theories are testable/falsifiable
• Whether there are a common set of
methods for similar problems
• Whether approaches to problems can
yield interesting new questions/answers
4
Linguistics
5
engineering
linguistics
sociology
literary
criticism
less rigorous
more rigorous
6
other areas of sociolinguistics
(e.g. Deborah Tannen)
“theoretical” linguistics
(e.g. minimalist syntax)
“theoretical” linguistics
(e.g. lexical-functional grammar)
historical linguistics
some areas of sociolinguistics
(e.g. Bill Labov)
psycholinguistics
experimental phonetics
less rigorous
more rigorous
7
The true situation with linguistics
Okay enough already
What is computational linguistics
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Text normalization/segmentation
Morphological analysis
Automatic word pronunciation prediction
Transliteration
Word-class prediction: e.g. part of speech tagging
Parsing
Semantic role labeling
Machine translation
Dialog systems
Topic detection
Summarization
Text retrieval
Bioinformatics
Language modeling for automatic speech recognition
Computer-aided language learning (CALL)
8
Computational linguistics
• Often thought of as natural language
engineering
• But there is also a serious scientific
component to it.
9
Why CL may seem ad hoc
• Wide variety of areas (as in linguistics)
• If it’s natural language engineering, the
goal is often just to build something that
works
• Techniques tend to change in somewhat
faddish ways…
– For example: machine learning approaches
fall in and out of favor
10
11
12
13
14
Machine learning in CL
• In general it’s a plus since it has meant
that evaluation has become more rigorous
• But it’s important that the field not turn into
applied machine learning
• For this to be avoided, people need to
continue to focus on what linguistic
features are important
• Fortunately, this seems to be happening
15
Some interesting themes…
• Finite-state methods:
– Many application areas
– Raises interesting questions about how much of
language is “regular” (in the sense of “finite state”)
• Grammar induction:
– Linguists have done a poor job at their stated goal of
explaining how humans learn grammar
• Computational models of language change:
– Historical evidence for language change is only
partial. There are many changes in language for
which we have no direct evidence.
16
Finite state methods
• Used from the 1950’s onwards
• Went out of fashion a bit during the 1980’s
• Then a revival in the 1990’s with the
advent of weighted finite-state methods
17
Some applications
• Analysis of word structure – morphology
• Analysis of sentence structure
– Part of speech tagging
– Parsing
•
•
•
•
Speech recognition
Text normalization
Computational biology
…
18
Regular languages
• A regular language is a language with a
finite alphabet that can be constructed out
of one or more of the following operations:
– Set union
– Concatenation
– Transitive closure (Kleene star)
19
Finite state automata: formal
definition
Every regular language can be recognized by a finite-state automaton.
Every finite-state automaton recognizes a regular language. (Kleene’s theorem)
20
Representation of FSA’s: State
Diagram
21
Regular relations: formal definition
22
Finite-state transducers
23
An FST
24
Composition
• In addition to union, concatenation and
Kleene closure, regular relations are
closed under composition
• Composition is to be understood here the
same way as composition in algebra:
– R1oR1 means take the output of R1 and feed it
to the input of R2
25
Composition: an illustration
26
R1 as a transducer
27
R2 as a transducer
28
R1○R2
29
Some things you can do with FSTs
• Text analysis/normalization
– Word segmentation
– Abbreviation expansion
– Digit-to-number-name mappings
i.e. mapping from writing to language
• Morphological analysis
• Syntactic analysis
– E.g. part-of-speech tagging
• (With weights) pronunciation modeling and
language modeling for speech recognition
30
That’s fine for engineering but…
• Does it really account for the facts?
– Is morphology really regular?
– Is the mapping between writing and speech
really regular?
31
What is morphology?
• scripsērunt is third person, plural,
perfect, active of scrībō (`I write’)
• Morphology relates word forms
– the “lemma” of scripsērunt is scrībō
• Morphology analyzes the structure of
word forms
– scripsērunt has the structure scrīb+s+ērunt
32
Morphology is a relation
• Imagine you have a Latin morphological
analyzer comprising:
– D: a relation that maps between surface form
and decomposed form
– L: a relation that maps between decomposed
form and lemma
• Then:
– scripsērunt ○ D = scrīb+s+ērunt
– scripsērunt ○ D ○ L = scrībō
33
English regular plurals
•
•
•
•
cat + s = cats /s/
dog + s = dogs /z/
spouse + s = spouses /Əz/
This can be implemented by a rule that
composes with the base word, inserting
the relevant form of the affix at the end
34
Templatic affixes in Yowlumne
Transducer for each affix transforms base into required templatic
form and appends the relevant string.
35
Subtractive morphology
Transducer deletes final VC of the base…
36
Bontoc infixation
• Insert a marker “>” after the first consonant
(if any)
• Change “>” into the infix –um-
37
Side note … infixation in English
Kalamazoo
f*****g
38
Reduplication: Gothic
Problem: mapping w to ww is not a regular relation
39
Factoring Reduplication
• Prosodic constraints
• Copy verification transducer C
40
Non-Exact Copies
• Dakota (Inkelas & Zoll, 1999):
41
Non-Exact Copies
• Basic and modified
stems in Sye (Inkelas &
Zoll, 1999):
“they will fall all over”
42
Morphological Doubling Theory
(Inkelas & Zoll, 1999)
• Most linguistic accounts of reduplication
assume that the copying is done as part of
morphology
• In MDT:
– Reduplication involves doubling at the
morphosyntactic level – i.e. one is actually
simply repeating words or morphemes
– Phonological doubling is thus expected, but
not required
43
Gothic Reduplication under
Morphological Doubling Theory
44
Summary
• If Inkelas & Zoll are right then all
morphology can be computed using
regular relations
• This in turn suggests that computational
morphology has picked the right tool for
the job
45
Another Example:
Linguistic analysis of text
• Maps between the stuff you see on the page –
e.g. text written in the standard orthography of a
language – into linguistic units (words,
morphemes, phonemes…)
• For example:
– I ate a 25kg bass
– [aI εIt Ə twεnti faIv kIlƏgræm bæs]
• This can be done using transducers
– But is the mapping between writing and language
really regular (finite-state)?
46
Linguistic analysis of text
•
•
•
•
•
•
Abbreviation expansion
Disambiguation
Number expansion
Morphological analysis of words
Word pronunciation
…
47
A transducer for number names
Consider a machine that maps between digit strings and their reading as number
names in English.
30,294,005,179,018,903.56 →
thirty quadrillion, two hundred and ninety four trillion, five billion, one hundred
seventy nine million, eighteen thousand, nine hundred three, point five six
48
Mapping between speech and
writing
It seems obvious on the face of it that the
mapping between speech and its written
form is regular. After all, the words are
ordered in the same way as speech. Even
the letters tend to be ordered in the same
way as the sounds they represent.
49
Some examples where it isn’t…
‘honorific inversion’
m
n
`nx t
j
w t
r`
xpr
w
nb
50
Finite state methods
• In morphology they seem almost exactly
correct as characterizations of the natural
phenomenon
• In the mapping from writing to language,
again, finite-state models seem almost
exactly correct
51
Grammar induction
The common “nativist” view in linguistics…
From Gilbert Harman's review of Chomsky's New Horizons in the Study of
Language and Mind (published in Journal of Philosophy, 98(5), May 2001):
Further reflection along these lines and a great deal of empirical study of
particular languages has led to the "principles and parameters" framework
which has dominated linguistics in the last few decades. The idea is that
languages are basically the same in structure, up to certain parameters, for
example, whether the head of a phrase goes at the beginning of a phrase or
at the end. Children do not have to learn the basic principles, they only need
to set the parameters. Linguistics aims at stating the basic principles and
parameters by considering how languages differ in certain more or less
subtle respects. The result of this approach has been a truly amazing
outpouring of discoveries about how languages are the same yet different.
52
Similarly…
Cedric Boeckx and Norbert Hornstein. 2003.
“The Varying Aims of Linguistic Theory.”
Children come equipped with a set of principles
of grammar construction (i.e. Universal Grammar
(UG)). The principles of UG have open
parameters. Specific grammars arise once
values for these open parameters are specified.
Parameter values are determined on the basis of
[the primary linguistic data]. A language specific
grammar, then, is simply a specification the
values that the principles of UG leave open.
53
My “challenge” with Shalom Lappin
54
…
55
Automatic induction of grammars
from unannotated text
• Klein, Dan and Manning, Christopher.
2004. “Corpus-based induction of syntactic
structure: models of dependency and
constituency”. Proceedings of the 42nd
Annual Meeting on Association for
Computational Linguistics
• Lots of subsequent work…
56
Different syntactic representations
57
Dependency Model with Valence
(DMV)
• Each head generates a set of non-STOP
arguments to one side, then a STOP
argument; then similarly on the other side
• Trained using expectation maximization
58
Performance
59
Improvements
• Constituent structure can be induced in a similar
way to inducing word classes (e.g. parts of
speech) – by considering the environments in
which the putative constituent finds itself.
• In Klein & Manning’s constituent-context model
(CCM) probability of a bracketing is computed as
follows:
60
Combined DMV+CCM
Subsequent work – e.g. Rens Bod’s 2006 Unsupervised Data Oriented Parsing –
report F-scores close to 83.0
For comparison, the best supervised parsers get about 91.0
61
Some objections … and a synopsis
• Children do not learn grammars from unannotated text
corpora: they get a lot of guidance from the
environmental situation
– Sure
• Performance of automatic induction algorithms is still far
from human performance so they do not constitute
evidence that we can do away with (nativist) linguistic
theories of language acquisition
– They do not show this. But the argument would have more
weight if nativist theories had already been demonstrated to
contribute to a working model of grammar induction
• But Computational Linguistics is starting to make some
serious contributions to this 50-year-old debate
62
The evolution of complex structure
in language
Examples from: Stump, Gregory (2001) Inflectional Morphology:
A Theory of Paradigm Structure. Cambridge University Press.
63
Evolutionary Modeling
(A tiny sample)
• Hare, M. and Elman, J. L. (1995) Learning and
morphological change. Cognition, 56(1):61--98.
• Kirby, S. (1999) Function, Selection, and Innateness:
The Emergence of Language Universals. Oxford
• Nettle, D. "Using Social Impact Theory to simulate
language change". Lingua, 108(2-3):95--117, 1999.
• de Boer, B. (2001) The Origins of Vowel Systems.
Oxford
• Niyogi, P. (2006) The Computational Nature of Language
Learning and Evolution. Cambridge, MA: MIT Press.
64
A multi-agent simulation
•
System is seeded with a grammar and small number of agents
– Each agent randomly selects a set of phonetic rules to apply to forms
– Agents are assigned to one of a small number of social groups
•
2 parents “beget” child agents.
– Children are exposed to a predetermined number of training forms combined
from both parents
• Forms are presented proportional to their underlying “frequency”
– Children must learn to generalize to unseen slots for words
– Learning algorithm similar to:
• David Yarowsky and Richard Wicentowski (2001) "Minimally supervised morphological
analysis by multimodal alignment." Proceedings of ACL-2000, Hong Kong, pages 207216.
• Features include last n-characters of input form, plus semantic class
– Learners select the optimal surface form to derive other forms from (optimal =
requiring the simplest resulting ruleset – a Minimum Description Length criterion)
•
•
Forms are periodically pooled among all agents and the n best forms are
kept for each word and each slot
Population grows, but is kept in check by “natural disasters” and a quasiMalthusian model of resource limitations
– Agents age and die according to reasonably realistic mortality statistics
65
Final states for a given initial state
66
Another example
• Kirby, Simon. 2001. “Spontaneous evolution of linguistic
structure: an iterated learning model of the emergence of
regularity and irregularity.” IEEE Transactions on
Evolutionary Computation, 5(2):102--110.
• Assumes two meaning components each with 5 values,
for 25 possible words
• Initial “speaker” randomly selects examples from the 25,
producing random strings for each, and “teaches” them
to the “hearer”
• Not all of the slots are filled, thus producing a
“bottleneck”: the hearer must compute forms for the
missing slots
67
The basic algorithm produces
results that are too regular
Initial state
Final state
68
A more realistic result…
• Addition of other
constraints, including
– a random tendency for
“speakers” to omit
symbols,
– a frequency
distribution over the 25
possible meaning
combinations
69
Summary
• Evolutionary modeling is evolving slowly
– We are a long way from being able to model
the complexities of known language evolution
• Nonetheless, computational approaches
promise to lend insights into how complex
social systems such as language change
over time, and complement discoveries in
historical linguistics
70
Final thoughts
• Language is central to what it means to be
human.
• Language is used to:
–
–
–
–
–
Communicate information
Communicate requests
Persuade, cajole…
(In written form) record history
Deceive
• Other animals do some or most of these things
(cf. Anindya Sinha’s work on bonnet macaques)
• But humans are better at all of these
71
Final thoughts
• So the scientific study of language ought
to be more central than it is
• We need to learn much more about how
language works
– How humans evolved language
– How languages changed over time
– How humans learn language
• Computational linguistics can
contribute to all of these questions.
72
73