GRAMMAR & PARSING (Syntactic Analysis)

Download Report

Transcript GRAMMAR & PARSING (Syntactic Analysis)

GRAMMAR & PARSING
(Syntactic Analysis)
NLP- WEEK 4
SYNTACTIC STRUCTURE

To compute the syntactic structure of a
sentence, must consider TWO things:
–
–
GRAMMAR = a formal specification of the
structures allowable in a language
PARSING Technique = the method of analysing a
sentence to determine itsstructure according to
the grammar
TREE Representation


Most common method to re[resent how a sentence
is broken into its major subparts & how these
subparts are broken up in turn is using a TREE.
Eg: Fatin ate the papaya.
(S (NP (NAME Fatin) )
-----> LIST notation
(VP (V ate)
(NP (ART the)
(N papaya) ) ) ).
* Show correspondence Tree structure (Fig 3.1 pg 42, Allen)
Tree Representation : Terminology


Trees = a special form of GRAPH
Structures consisting of:
–
–
–
–
–
–
–
–
–
NODES (eg. Labeled as S, NP)
LINKS (connecting lines/arrows)
ROOT (the node at the top) – (dominates all other nodes)
LEAVES (the nodes at the bottom)
“ a LINK points from a PARENT node to a CHILD node) ‘
Every CHILD node has a UNIQUE PARENT
A PARENT node may point to MANY CHILD codes
An ANCESTOR of a node N is defined as N’s Parent
A node is DOMINATED by its Ancestor node
CONSTRUCT a TREE Structure


To construct a tree structure of a Sentence, one
MUST know what Structures are legal for English.
A set of REWRITE Rules:
–
–
–
–
describes what tree structures are allowable.
Say that certain symbol may be expanded in the tree by a
sequence of other symbols
Example Rule ( Grammar 3.2, Allen pg 42)
Grammars consisting entirely of rules with a single symbol
on the LHS (called the MOTHER) = Context Free
Grammars (CFGs).
CFGs

A very important grammars:
1.
2.
The formalism is powerful enough to describe
most of the structure in Natural languages
Yet, It is restricted enough so that efficient
parsers can be built to analyze sentences.
Terminology cont.




Symbols that cannot be further decomposed in a
grammar = TERMINAL symbols (namely the words)
The other symbols such as S, VP, NP = NONTERMINAL symbols.
The grammatical symbols such as N, V that
describes word categories = LEXICAL symbols
Some words will be listed under multiple categories.
Eg: word can would be listed under V and N.
Grammars and Parsing


Grammars have a special symbol called the
START symbol ( = S)
A grammar is said to DERIVE a sentence if
there is a sequence of rules that allow you to
rewrite the start symbol into the sentence.
DERIVATIONS

Two important processes are based on
derivations:
1.
2.
Sentence Generation – uses derivations to
construct legal sentences
Parsing – identifies the structure of sentences
given a grammar.
SEARCHING TECHNIQUES
–
Two basis methods of searching:
1.
2.
A Top-down Strategy: start with the S symbol and then
searches through different ways to rewrite the symbols until
the input sentence is generated; or until all possibilities have
been explored.
A Bottom-up Strategy : start with the words in the
sentence and use the rewrite rules backward to reduce the
sequence of symbols until it consists solely of S. The LHS
of each rule is used to rewrite the symbol on the RHS