CYK Parsing Method
Download
Report
Transcript CYK Parsing Method
The CYK Parsing Method
Chiyo Hotani
Tanya Petrova
CL2 Parsing Course
28 November, 2007
Overview
CYK Recognition with CF grammar
Basic Algorithm
Problems: unit-rules, є-rules
Recognition with a grammar in CNF
CYK Parsing with CNF
Parsing with CNF
Recognition Table
Chart Parsing
Summary
Advantages and Disadvantages
Other remarks
Basic Algorithm of CYK Recognition (1)
Example Grammar:
A grammar describing numbers in scientific notation
Input: 32.5e+1
Basic Algorithm of CYK Recognition (2)
Digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Sign -> + | -
derivations of substrings of length 1
Basic Algorithm of CYK Recognition (3)
NumberS -> Integer | Real
Integer -> Digit | Integer Digit
Digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
derivations of substrings of length 1
Unit Rule: rules of the form AB, where A and B are
non-terminals. We can have chains of them in a
derivation.
Basic Algorithm of CYK Recognition (4)
NumberS -> Integer | Real
Integer -> Digit | Integer Digit
Fraction -> . Integer
Scale -> e Sign Integer | Empty
Basic Algorithm of CYK Recognition (5)
NumberS -> Integer | Real
Real -> Integer Fraction Scale
Number does indeed derive 32.5e+1.
Basic Algorithm of CYK Recognition (6)
є-rules
Basic Algorithm of CYK Recognition (7)
Rє = { Empty, Scale }
sentence: z = z1 z2 . . . zn
substring of z starting at
position i, of length l.
si,l = zizi+1. . . zi+l-1
Rsi,l: the set of non-terminals
deriving the substring si,l
A graphical presentation of substrings
CYK recognition with a grammar in CNF
Required restrictions:
Eliminate є-rules and unit rules
Limit the maximum length of RHS of the
rule to 2
CNF
No є-rules and unit rules
all rules have one of the following two forms:
Aa
ABC
Our example grammar in CNF
CYK Parsing with CNF
Building the recognition table
Input :
Our example grammar in CNF
input sentence: 32.5 e + 1
CYK Parsing with the CNF
bottom-row : read directly from the
grammar (rules of the form A a )
Two Ways to Copmute a R s i,l:
check each right-hand side
compute possible right-hand sides from
the recognition table
How this is done
Example: 2.5 e ( = s 2, 4)
1) N1 not in R s 2, 1 or R s 2, 2
N1 is a member of R s 2, 3
But Scale´ is not a member of R s 5, 1
2) R s 2, 4 is the set of Non- Terminals that
have a right-hand side AB where either:
A in R s 2, 1 and B in R s 3, 3
A in R s 2, 2 and B in R s 4, 2
A in R s 2, 3 and B in R s 5, 1
Possible combinations: N1 T2 or Number T2
In our grammar we do not have such a righthand side, so nothing is added to R s 2, 4.
Recognition table
l
i
As a result we find out that:
This process is much less complicated
than the one we saw before
Reasons
• We do not have to repeat the process again and
again until no new Non-Terminals are added to
R s i,l
(The substrings we are dealing with
are really substrings and cannot be equal to the
string we start with)
• We only have to find one place where the
substring must be split into two A B C
Here !
Chart Parsing
A chart is just a recognition table.
A short retrospective of CYK
First: recognition table using the
original grammar.
Then: transforming grammar to CNF.
A short retrospective of CYK cont.
CNF is useful for improving the
efficiency, but it is actually a bit too
restrictive
Disadvantage of CNF:
Resulting recognition table lacks the
information we need to construct a
derivation using the original grammar!
A short retrospective of CYK cont.
In the transformation process, some
non-terminals were thrown away
(non-productive)
Missing information could be added.
A short retrospective of CYK cont.
Result: almost the same recognition
table.
Extra information on non-terminals
Obtained in a simpler and much more
efficient way.
Thank you
for your attention!