CSCI 5582 Artificial Intelligence

Download Report

Transcript CSCI 5582 Artificial Intelligence

Natural Language
Processing
Finite State Morphology
Slides adapted from Jurafsky & Martin, Speech and Language Processing
Outline
• Finite State Automata
• (English) Morphology
• Finite State Transducers
2
Finite State Automata
• Let’s start with the sheep language from
Chapter 2
 /baa+!/
Sheep FSA
• We can say the following things about this
machine





It has 5 states
b, a, and ! are in its alphabet
q0 is the start state
q4 is an accept state
It has 5 transitions
More Formally
• You can specify an FSA by enumerating
the following things.
 The set of states: Q
 A finite alphabet: Σ
 A start state
 A set of accept/final states
 A transition relation that maps QxΣ to Q
Yet Another View
• The guts of FSAs
can ultimately be
represented as
tables
If you’re in state 1
and you’re looking at
an a, go to state 2
0
1
2
3
4
b a
1
2
3
3
!
4
e
Recognition
• Recognition is the process of determining if
a string should be accepted by a machine
• Or… it’s the process of determining if a
string is in the language we’re defining
with the machine
• Or… it’s the process of determining if a
regular expression matches a string
• Those all amount the same thing in the end
Recognition
• Traditionally, (Turing’s notion) this process is
depicted with a tape.
Recognition
• Simply a process of starting in the start
state
• Examining the current input
• Consulting the table
• Going to a new state and updating the
tape pointer
• Until you run out of tape
D-Recognize
Key Points
• Deterministic means that at each point in
processing there is always one unique
thing to do (no choices).
• D-recognize is a simple table-driven
interpreter
• The algorithm is universal for all
unambiguous regular languages.
 To change the machine, you simply change
the table.
Recognition as Search
• You can view this algorithm as a trivial kind
of state-space search.
• States are pairings of tape positions and
state numbers.
• Operators are compiled into the table
• Goal state is a pairing with the end of tape
position and a final accept state
• It is trivial because?
Non-Determinism
Non-Determinism
• Yet another technique
 Epsilon transitions
 Key point: these transitions do not examine or
advance the tape during recognition
Equivalence
• Non-deterministic machines can be
converted to deterministic ones with a
fairly simple construction
• That means that they have the same
power; non-deterministic machines are
not more powerful than deterministic
ones in terms of the languages they can
accept
ND Recognition
•
Two basic approaches (used in all major
implementations of regular expressions,
see Friedl 2006)
1. Either take a ND machine and convert it to a
D machine and then do recognition with
that.
2. Or explicitly manage the process of
recognition as a state-space search (leaving
the machine as is).
Non-Deterministic
Recognition: Search
• In a ND FSA there exists at least one path
through the machine for a string that is in the
language defined by the machine.
• But not all paths directed through the machine
for an accept string lead to an accept state.
• No paths through the machine lead to an accept
state for a string not in the language.
Non-Deterministic
Recognition
• So success in non-deterministic
recognition occurs when a path is found
through the machine that ends in an
accept.
• Failure occurs when all of the possible
paths for a given string lead to failure.
Example
Example
Example
Example
Example
Example
Example
Example
Key Points
• States in the search space are pairings of
tape positions and states in the machine.
• By keeping track of as yet unexplored
states, a recognizer can systematically
explore all the paths through the machine
given an input.
Why Bother?
• Non-determinism doesn’t get us more
formal power and it causes headaches so
why bother?
 More natural (understandable) solutions
 Not always equivalent
Compositional Machines
• Formal languages are just sets of strings
• Therefore, we can talk about various set
operations (intersection, union,
concatenation)
• This turns out to be a useful exercise
Union
Concatenation
Words
• Finite-state methods are particularly useful in dealing
with a lexicon
• Many devices, most with limited memory, need access to
large lists of words
• And they need to perform fairly sophisticated tasks with
those lists
• So we’ll first talk about some facts about words and
then come back to computational methods
32
English Morphology
• Morphology is the study of the ways that
words are built up from smaller
meaningful units called morphemes
• We can usefully divide morphemes into
two classes
 Stems: The core meaning-bearing units
 Affixes: Bits and pieces that adhere to stems
to change their meanings and grammatical
functions
33
English Morphology
• We can further divide morphology up into
two broad classes
 Inflectional
 Derivational
34
Inflectional Morphology
• Inflectional morphology concerns the
combination of stems and affixes where the
resulting word:
 Has the same word class as the original
 Serves a grammatical/semantic purpose that is
 Different from the original
 But is nevertheless transparently related to the
original
35
Nouns and Verbs in English
• Nouns are simple
 Markers for plural and possessive
• Verbs are only slightly more complex
 Markers appropriate to the tense of the verb
36
Regulars and Irregulars
• It is a little complicated by the fact that
some words misbehave (refuse to follow
the rules)
 Mouse/mice, goose/geese, ox/oxen
 Go/went, fly/flew
• The terms regular and irregular are used
to refer to words that follow the rules and
those that don’t
37
Regular and Irregular Verbs
• Regulars…
 Walk, walks, walking, walked, walked
• Irregulars
 Eat, eats, eating, ate, eaten
 Catch, catches, catching, caught, caught
 Cut, cuts, cutting, cut, cut
38
Derivational Morphology
• Derivational morphology is the messy stuff
that no one ever taught you.
 Quasi-systematicity
 Irregular meaning change
 Changes of word class
39
Derivational Examples
• Verbs and Adjectives to Nouns
-ation
computerize
computerization
-ee
appoint
appointee
-er
kill
killer
-ness
fuzzy
fuzziness
40
Derivational Examples
• Nouns and Verbs to Adjectives
-al
computation
computational
-able
embrace
embraceable
-less
clue
clueless
41
Morphology and FSAs
• We’d like to use the machinery provided
by FSAs to capture these facts about
morphology
 Accept strings that are in the language
 Reject strings that are not
 And do so in a way that doesn’t require us to
in effect list all the words in the language
42
Start Simple
• Regular singular nouns are ok
• Regular plural nouns have an -s on the
end
• Irregulars are ok as is
43
Simple Rules
44
Now Plug in the Words
45
Derivational Rules
If everything is an accept state
how do things ever get rejected?
46
Parsing/Generation
vs. Recognition
• We can now run strings through these machines
to recognize strings in the language
• But recognition is usually not quite what we need
 Often if we find some string in the language we might
like to assign a structure to it (parsing)
 Or we might have some structure and we want to
produce a surface form for it (production/generation)
• Example
 From “cats” to “cat +N +PL”
47
Finite State Transducers
• The simple story
 Add another tape
 Add extra symbols to the transitions
 On one tape we read “cats”, on the other we
write “cat +N +PL”
48
FSTs
49
Applications
• The kind of parsing we’re talking about is
normally called morphological analysis
• It can either be
• An important stand-alone component of many
applications (spelling correction, information
retrieval)
• Or simply a link in a chain of further linguistic
analysis
50
Transitions
c:c
a:a
t:t
+N: ε
+PL:s
• c:c means read a c on one tape and write a c on the other
• +N:ε means read a +N symbol on one tape and write nothing on
the other
• +PL:s means read +PL and write an s
51
Ambiguity
• Recall that in non-deterministic recognition
multiple paths through a machine may
lead to an accept state.
• Didn’t matter which path was actually
traversed
• In FSTs the path to an accept state does
matter since different paths represent
different parses and different outputs will
result
52
Ambiguity
• What’s the right parse (segmentation) for
• Unionizable
• Union-ize-able
• Un-ion-ize-able
• Each represents a valid path through the
derivational morphology machine.
53
Ambiguity
• There are a number of ways to deal with
this problem
• Simply take the first output found
• Find all the possible outputs (all paths) and
return them all (without choosing)
• Bias the search so that only one or a few
likely paths are explored
54
The Gory Details
• Of course, its not as easy as
• “cat +N +PL” <-> “cats”
• As we saw earlier there are geese, mice and
oxen
• But there are also a whole host of
spelling/pronunciation changes that go along
with inflectional changes
• Cats vs Dogs
• Fox and Foxes
55
Multi-Tape Machines
• To deal with these complications, we will
add more tapes and use the output of one
tape machine as the input to the next
• So to handle irregular spelling changes
we’ll add intermediate tapes with
intermediate symbols
56
Multi-Level Tape Machines
• We use one machine to transduce between the
lexical and the intermediate level, and another
to handle the spelling changes to the surface
tape
57
Overall Scheme
58
Cascades
• This is an architecture that we’ll see again
and again
• Overall processing is divided up into distinct
rewrite steps
• The output of one layer serves as the input to
the next
• The intermediate tapes may or may not wind
up being useful in their own right
59
Overall Plan
60
Final Scheme
61
Conclusion
• Finite state machines provide flexible and
efficient models of words
• Finite state transducers are the method of
choice for morphological analysis
 If there is a solved problem in NLP, this is it!
• Why not use finite state techniques for all
problems in NLP?
62
Lexical to Intermediate
Level
63
Intermediate to Surface
• The add an “e” rule as in fox^s# <-> foxes#
64
Foxes
65
Composition
1. Create a set of new states that
correspond to each pair of states from
the original machines (New states are
called (x,y), where x is a state from M1,
and y is a state from M2)
2. Create a new FST transition table for the
new machine according to the following
intuition …
66
Composition
• There should be a transition between two
states in the new machine if it’s the case
that the output for a transition from a
state from M1, is the same as the input to
a transition from M2 or …
67
Composition
• δ3((xa,ya), i:o) = (xb,yb) iff
There exists c such that
δ1(xa, i:c) = xb AND
δ2(ya, c:o) = yb
68