Finite-state automata 2 Day 13 LING 681.02 Computational Linguistics Harry Howard Tulane University Course organization http://www.tulane.edu/~ling/NLP/ NLTK is installed on the computers in this room! How.
Download
Report
Transcript Finite-state automata 2 Day 13 LING 681.02 Computational Linguistics Harry Howard Tulane University Course organization http://www.tulane.edu/~ling/NLP/ NLTK is installed on the computers in this room! How.
Finite-state automata 2
Day 13
LING 681.02
Computational Linguistics
Harry Howard
Tulane University
Course organization
http://www.tulane.edu/~ling/NLP/
NLTK is installed on the computers in this
room!
How would you like to use the Provost's
$150?
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
2
SLP §2.2
Finite-state automata
2.2.1 Sheeptalk
Find your files
>>> import sys
>>>sys.path.append("/Users/harr
yhow/Documents/Work/Research/
Sims/NLTK")
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
4
Run program
>>>
>>>
>>>
>>>
23-Sept-2009
import fsaproc
test = 'baaa!'
test = 'baaa!$'
fsaproc.machine(test)
LING 681.02, Prof. Howard, Tulane University
5
Go over print-out
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
6
Key points
D-recognize is a simple table-driven interpreter.
The algorithm is universal for all unambiguous
regular languages.
To change the machine, you simply change the table.
Crudely therefore… matching strings with regular
expressions (ala Perl, grep, etc.) is a matter of:
translating the regular expression into a machine (a
table) and
passing the table and the string to an interpreter.
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
7
Recognition as search
You can view this algorithm as a kind of
state-space search.
States are pairings of tape positions and
state numbers.
The goal state is a pairing with the end of
tape position and a final accept state.
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
8
SLP §2.2
Finite-state automata
2.2.2 Formal languages
Generative Formalisms
Formal Languages are sets of strings composed of
symbols from a finite set of symbols.
Finite-state automata define formal languages
(without having to enumerate all the strings in the
language).
The term Generative is based on the view that you
can run the machine as a generator to get strings
from the language.
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
10
Generative Formalisms
A FSA can be viewed from two
perspectives, as:
an acceptor that can tell you if a string is in the
language.
a generators to produce all and only the strings
in the language.
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
11
SLP §2.2
Finite-state automata
2.2.4 Determinism
Determinism
A deterministic FSA has one unique thing to
do at each point in processing.
i.e. there are no choices
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
13
Non-determinism
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
14
Non-determinism cont.
Epsilon transitions
An arc has no symbol on it, represented as .
Such a transition does not examine or advance
the tape during recognition:
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
15
SLP §2.2
Finite-state automata
2.2.5 Use of a nFSA to accept strings
Read on your own
pp. 33-5
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
17
SLP §2.2
Finite-state automata
2.2.6 Recognition as search
Non-deterministic
recognition: Search
In a ND FSA there is at least one path
through the machine for a string that is in
the language defined by the machine.
But not all paths directed through the
machine for an accept string lead to an
accept state.
No paths through the machine lead to an
accept state for a string not in the language.
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
19
Non-deterministic
recognition
So success in non-deterministic recognition
occurs when a path is found through the
machine that ends in an accept.
Failure occurs when all of the possible paths
for a given string lead to failure.
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
20
Example
b
q0
23-Sept-2009
a
q1
a
a
q2
q2
!
q3
LING 681.02, Prof. Howard, Tulane University
\
q4
21
Example
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
22
Example
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
23
Example
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
24
Example
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
25
Example
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
26
Example
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
27
Example
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
28
Example
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
29
Key points
States in the search space are pairings of
tape positions and states in the machine.
By keeping track of as yet unexplored
states, a recognizer can systematically
explore all the paths through the machine
given an input.
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
30
Ordering of states
But how do you keep track?
Depth-first/last in first out (LIFO)/stack
Unexplored states are added to the front of the
agenda, and they are explored by going to the most
recent.
Breadth-first/first in first out (FIFO)/queue
Unexplored states are added to the back of the
agenda, and they are explored by going to the most
recent.
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
31
SLP §2.2
Finite-state automata
2.2.7 Comparison
Equivalence
Non-deterministic machines can be
converted to deterministic ones with a
fairly simple construction.
That means that they have the same power:
non-deterministic machines are not more
powerful than deterministic ones in terms of
the languages they can accept.
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
33
Why bother?
Non-determinism doesn’t get us more
formal power and it causes headaches, so
why bother?
More natural (understandable) solutions.
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
34
Next time
SLP §2.3 briefly
SLP §3