Transcript PARSING

PARSING WITH
CONTEXT-FREE GRAMMARS
cc437
PARSING


Parsing is the process of recognizing and
assigning STRUCTURE
Parsing a string with a CFG:
–
–
Finding a derivation of the string consistent with
the grammar
The derivation gives us a PARSE TREE
EXAMPLE (CFR LAST WEEK)
PARSING AS SEARCH


Just as in the case of non-deterministic
regular expressions, the main problem with
parsing is the existence of CHOICE POINTS
There is a need for a SEARCH STRATEGY
determining the order in which alternatives
are considered
TOP-DOWN AND BOTTOM-UP
SEARCH STRATEGIES


The search has to be guided by the INPUT
and the GRAMMAR
TOP-DOWN search: the parse tree has to
be rooted in the start symbol S
–

EXPECTATION-DRIVEN parsing
BOTTOM-UP search: the parse tree must be
an analysis of the input
–
DATA-DRIVEN parsing
AN EXAMPLE OF TOP-DOWN SEARCH
(IN PARALLEL)
AN EXAMPLE OF BOTTOM-UP
SEARCH
NON-PARALLEL SEARCH

If it’s not possible to examine all alternatives
in parallel, it’s necessary to make further
decisions:
–
–
–
Which node in the current search space to
expand first (breadth-first or depth-first)
Which of the applicable grammar rules to expand
first
Which leaf node in a parse tree to expand next
(e.g., leftmost)
TOP-DOWN, DEPTH-FIRST,
LEFT-TO-RIGHT
TOP-DOWN, DEPTH-FIRST,
LEFT-TO-RIGHT (II)
TOP-DOWN, DEPTH-FIRST,
LEFT-TO-RIGHT (III)
TOP-DOWN, DEPTH-FIRST,
LEFT-TO-RIGHT (IV)
A T-D, D-F, L-R PARSER
TOP-DOWN vs

TOP-DOWN:
–
–
–

BOTTOM-UP
Only search among grammatical answers
BUT: suggests hypotheses that may not be
consistent with data
Problem: left-recursion
BOTTOM-UP:
–
–
Only forms hypotheses consistent with data
BUT: may suggest hypotheses that make no
sense globally
LEFT-RECURSION


A LEFT-RECURSIVE grammar may cause a
T-D, D-F, L-R parser to never return
Examples of left-recursive rules:
–
–
–
NP  NP PP
S  S and S
But also:


NP  Det Nom
Det  NP’s
THE PROBLEM WITH
LEFT-RECURSION
LEFT-RECURSION: POOR SOLUTIONS

Rewrite the grammar to a weakly equivalent
one
–

Problem: may not get correct parse tree
Limit the depth during search
–
Problem: limit is arbitrary
LEFT-CORNER PARSING


A hybrid of top-down and bottom-up parsing
Strategy: don’t consider any expansion
unless the current input can serve as the
LEFT-CORNER of that expansion
FURTHER PROBLEMS IN PARSING

Ambiguity
–
Church and Patel (1982): the number of
attachment ambiguities grows like the Catalan
numbers


C(2) = 2, C(3) = 5, C(4) = 14, C(5) = 132, C(6) = 469,
C(7) = 1430, C(8) = 4867
Avoiding reparsing
COMMON STRUCTURAL AMBIGUITIES

COORDINATION ambiguity
–

OLD (MEN AND WOMEN) vs
(OLD MEN) AND WOMEN
ATTACHMENT ambiguity:
–
Gerundive VP attachment ambiguity

–
I saw the Eiffel Tower flying to Paris
PP attachment ambiguity

I shot an elephant in my pajamas
PP ATTACHMENT AMBIGUITY
AMBIGUITY: SOLUTIONS


Use a PROBABILISTIC GRAMMAR (not
covered in this module)
Use semantics
AVOID RECOMPUTING INVARIANTS

Consider parsing with a top-down parser the
NP:
–

A flight from Indianapolis to Houston on TWA
With the grammar rules:
–
–
–
NP  Det Nominal
NP  NP PP
NP  ProperNoun
INVARIANTS AND TOP-DOWN
PARSING
THE EARLEY ALGORITHM
DYNAMIC PROGRAMMING



A standard T-D parser would reanalyze A
FLIGHT 4 times, always in the same way
A DYNAMIC PROGRAMMING algorithm
uses a table (the CHART) to avoid repeating
work
The Earley algorithm also
–
–
Does not suffer from the left-recursion problem
Solves an exponential problem in O(n3)
THE CHART

The Earley algorithm uses a table (the CHART) of
size N+1, where N is the length of the input
–

Each entry in the chart is a list of
–
–
–

Table entries sit in the `gaps’ between words
Completed constituents
In-progress constituents
Predicted constituents
All three types of objects are represented in the
same way as STATES
THE CHART:
GRAPHICAL REPRESENTATION
STATES

A state encodes two types of information:
–
–
–

How much of a certain rule has been encountered
in the input
Which positions are covered
A  , [X,Y]
DOTTED RULES
–
–
–
VP  V NP 
NP  Det  Nominal
S   VP
EXAMPLES
SUCCESS

The parser has succeeded if entry N+1 of the
chart contains the state
–
S   , [0,N]
THE ALGORITHM

The algorithm loops through the input without
backtracking, at each step performing three
operations:
–
–
–
PREDICTOR: add predictions to the chart
COMPLETER: Move the dot to the right when
looked-for constituent is found
SCANNER: read in the next input word
THE ALGORITHM: CENTRAL LOOP
EARLEY ALGORITHM:
THE THREE OPERATORS
EXAMPLE, AGAIN
EXAMPLE:
BOOK THAT FLIGHT
EXAMPLE:
BOOK THAT FLIGHT (II)
EXAMPLE:
BOOK THAT FLIGHT (III)
EXAMPLE:
BOOK THAT FLIGHT (IV)
READINGS

Jurafsky and Martin, chapter 10.1-10.4