Finite-state automata 2 Day 13 LING 681.02 Computational Linguistics Harry Howard Tulane University Course organization  http://www.tulane.edu/~ling/NLP/  NLTK is installed on the computers in this room!  How.

Download Report

Transcript Finite-state automata 2 Day 13 LING 681.02 Computational Linguistics Harry Howard Tulane University Course organization  http://www.tulane.edu/~ling/NLP/  NLTK is installed on the computers in this room!  How.

Finite-state automata 2
Day 13
LING 681.02
Computational Linguistics
Harry Howard
Tulane University
Course organization
 http://www.tulane.edu/~ling/NLP/
 NLTK is installed on the computers in this
room!
 How would you like to use the Provost's
$150?
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
2
SLP §2.2
Finite-state automata
2.2.1 Sheeptalk
Find your files
>>> import sys
>>>sys.path.append("/Users/harr
yhow/Documents/Work/Research/
Sims/NLTK")
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
4
Run program
>>>
>>>
>>>
>>>
23-Sept-2009
import fsaproc
test = 'baaa!'
test = 'baaa!$'
fsaproc.machine(test)
LING 681.02, Prof. Howard, Tulane University
5
Go over print-out
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
6
Key points
 D-recognize is a simple table-driven interpreter.
 The algorithm is universal for all unambiguous
regular languages.
 To change the machine, you simply change the table.
 Crudely therefore… matching strings with regular
expressions (ala Perl, grep, etc.) is a matter of:
 translating the regular expression into a machine (a
table) and
 passing the table and the string to an interpreter.
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
7
Recognition as search
 You can view this algorithm as a kind of
state-space search.
 States are pairings of tape positions and
state numbers.
 The goal state is a pairing with the end of
tape position and a final accept state.
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
8
SLP §2.2
Finite-state automata
2.2.2 Formal languages
Generative Formalisms
 Formal Languages are sets of strings composed of
symbols from a finite set of symbols.
 Finite-state automata define formal languages
(without having to enumerate all the strings in the
language).
 The term Generative is based on the view that you
can run the machine as a generator to get strings
from the language.
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
10
Generative Formalisms
 A FSA can be viewed from two
perspectives, as:
an acceptor that can tell you if a string is in the
language.
a generators to produce all and only the strings
in the language.
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
11
SLP §2.2
Finite-state automata
2.2.4 Determinism
Determinism
 A deterministic FSA has one unique thing to
do at each point in processing.
i.e. there are no choices
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
13
Non-determinism
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
14
Non-determinism cont.
 Epsilon transitions
An arc has no symbol on it, represented as .
Such a transition does not examine or advance
the tape during recognition:
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
15
SLP §2.2
Finite-state automata
2.2.5 Use of a nFSA to accept strings
Read on your own
 pp. 33-5
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
17
SLP §2.2
Finite-state automata
2.2.6 Recognition as search
Non-deterministic
recognition: Search
 In a ND FSA there is at least one path
through the machine for a string that is in
the language defined by the machine.
 But not all paths directed through the
machine for an accept string lead to an
accept state.
 No paths through the machine lead to an
accept state for a string not in the language.
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
19
Non-deterministic
recognition
 So success in non-deterministic recognition
occurs when a path is found through the
machine that ends in an accept.
 Failure occurs when all of the possible paths
for a given string lead to failure.
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
20
Example
b
q0
23-Sept-2009
a
q1
a
a
q2
q2
!
q3
LING 681.02, Prof. Howard, Tulane University
\
q4
21
Example
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
22
Example
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
23
Example
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
24
Example
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
25
Example
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
26
Example
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
27
Example
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
28
Example
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
29
Key points
 States in the search space are pairings of
tape positions and states in the machine.
 By keeping track of as yet unexplored
states, a recognizer can systematically
explore all the paths through the machine
given an input.
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
30
Ordering of states
 But how do you keep track?
Depth-first/last in first out (LIFO)/stack
Unexplored states are added to the front of the
agenda, and they are explored by going to the most
recent.
Breadth-first/first in first out (FIFO)/queue
Unexplored states are added to the back of the
agenda, and they are explored by going to the most
recent.
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
31
SLP §2.2
Finite-state automata
2.2.7 Comparison
Equivalence
 Non-deterministic machines can be
converted to deterministic ones with a
fairly simple construction.
 That means that they have the same power:
non-deterministic machines are not more
powerful than deterministic ones in terms of
the languages they can accept.
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
33
Why bother?
 Non-determinism doesn’t get us more
formal power and it causes headaches, so
why bother?
 More natural (understandable) solutions.
23-Sept-2009
LING 681.02, Prof. Howard, Tulane University
34
Next time
SLP §2.3 briefly
SLP §3