CYK Parsing Method

Download Report

Transcript CYK Parsing Method

The CYK Parsing Method
Chiyo Hotani
Tanya Petrova
CL2 Parsing Course
28 November, 2007
Overview
 CYK Recognition with CF grammar
Basic Algorithm
Problems: unit-rules, є-rules
Recognition with a grammar in CNF
 CYK Parsing with CNF
Parsing with CNF
Recognition Table
 Chart Parsing
 Summary
Advantages and Disadvantages
Other remarks
Basic Algorithm of CYK Recognition (1)
Example Grammar:
A grammar describing numbers in scientific notation
Input: 32.5e+1
Basic Algorithm of CYK Recognition (2)
Digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Sign -> + | -
derivations of substrings of length 1
Basic Algorithm of CYK Recognition (3)
NumberS -> Integer | Real
Integer -> Digit | Integer Digit
Digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
derivations of substrings of length 1
 Unit Rule: rules of the form AB, where A and B are
non-terminals. We can have chains of them in a
derivation.
Basic Algorithm of CYK Recognition (4)
NumberS -> Integer | Real
Integer -> Digit | Integer Digit
Fraction -> . Integer
Scale -> e Sign Integer | Empty
Basic Algorithm of CYK Recognition (5)
NumberS -> Integer | Real
Real -> Integer Fraction Scale
Number does indeed derive 32.5e+1.
Basic Algorithm of CYK Recognition (6)
є-rules
Basic Algorithm of CYK Recognition (7)
 Rє = { Empty, Scale }
 sentence: z = z1 z2 . . . zn
substring of z starting at
position i, of length l.
si,l = zizi+1. . . zi+l-1
 Rsi,l: the set of non-terminals
deriving the substring si,l
A graphical presentation of substrings
CYK recognition with a grammar in CNF
Required restrictions:
Eliminate є-rules and unit rules
Limit the maximum length of RHS of the
rule to 2
CNF
No є-rules and unit rules
all rules have one of the following two forms:
Aa
ABC
Our example grammar in CNF
CYK Parsing with CNF
Building the recognition table
Input :
Our example grammar in CNF
input sentence: 32.5 e + 1
CYK Parsing with the CNF
bottom-row : read directly from the
grammar (rules of the form A a )
Two Ways to Copmute a R s i,l:
check each right-hand side
compute possible right-hand sides from
the recognition table
How this is done
Example: 2.5 e ( = s 2, 4)
1) N1 not in R s 2, 1 or R s 2, 2
N1 is a member of R s 2, 3
But Scale´ is not a member of R s 5, 1
2) R s 2, 4 is the set of Non- Terminals that
have a right-hand side AB where either:
A in R s 2, 1 and B in R s 3, 3
A in R s 2, 2 and B in R s 4, 2
A in R s 2, 3 and B in R s 5, 1
Possible combinations: N1 T2 or Number T2
In our grammar we do not have such a righthand side, so nothing is added to R s 2, 4.
Recognition table
l
i
As a result we find out that:
This process is much less complicated
than the one we saw before
Reasons
• We do not have to repeat the process again and
again until no new Non-Terminals are added to
R s i,l
(The substrings we are dealing with
are really substrings and cannot be equal to the
string we start with)
• We only have to find one place where the
substring must be split into two A  B C
Here !
Chart Parsing
A chart is just a recognition table.
A short retrospective of CYK
First: recognition table using the
original grammar.
Then: transforming grammar to CNF.
A short retrospective of CYK cont.
CNF is useful for improving the
efficiency, but it is actually a bit too
restrictive
Disadvantage of CNF:
Resulting recognition table lacks the
information we need to construct a
derivation using the original grammar!
A short retrospective of CYK cont.
In the transformation process, some
non-terminals were thrown away
(non-productive)
Missing information could be added.
A short retrospective of CYK cont.
Result: almost the same recognition
table.
Extra information on non-terminals
Obtained in a simpler and much more
efficient way.
Thank you
for your attention! 