一般性演講

Download Report

Transcript 一般性演講

Context Free Grammar (CFG): Specification for
Structures & Constituency

Parse Tree: graphical representation of structure





root node (S): a sentencial level structure
internal nodes: constituents of the sentence
arcs: relationship between parent nodes and their children (constituents)
terminal nodes: surface forms of the input symbols (e.g., words)
alternative representation: bracketed notation:


e.g., [I saw [the [girl [in [the park]]]]]
For example:
NP
NP
PP
NP
girl
in the
park
Jing-Shin Chang
1
Parse Tree: “I saw the girl in the park”
S
NP
VP
NP
NP
NP
PP
NP
pron
v
det
n
p
det
I
saw
the
girl
in
the park
Jing-Shin Chang
n
2
CFG: Components

CFG: formal specification of parse trees






G = {, N, P, S}
: terminal symbols
N: non-terminal symbols
P: production rules
S: start symbol
: terminal symbols

the input symbols of the language





pre-terminal: parts of speech (when words are regarded as terminals)
N: non-terminal symbols


programming language: tokens (reserved words, variables, operators, …)
natural languages: words or parts of speech
groups of terminals and/or other non-terminals
S: start symbol: the largest constituent of a parse tree
P: production (re-writing) rules



form: α → β (α: non-terminal, β: string of terminals and non-terminals)
meaning: α re-writes to (“consists of”, “derived into”)β, or βreduced to α
start with “S-productions” (S → β)
Jing-Shin Chang
3
CFG: Example Grammar

Grammar Rules





S → NP VP
NP → Pron | Proper-Noun | Det Norm
Norm → Noun Norm | Noun
VP → Verb | Verb NP | Verb NP PP | Verb PP
PP → Prep NP





S: sentence, NP: noun phrase, VP: verb phrase
Pron: pronoun
Det: determiner, Norm: Norminal
PP: prepositional phrase, Prep: preposition
Lexicon (in CFG form)






Noun → girl | park | desk
Verb → like | want | is | saw | walk
Prep → by | in | with | for
Det → the | a | this | these
Pron → I | you | he | she | him
Proper-Noun → IBM | Microsoft | Berkeley
Jing-Shin Chang
4
CFG: Accepted Languages

CFG Operations




derivation: applying a production rule to re-write the LHS non-terminal into
its constituents
rightmost derivation: a sequence of derivations in which the rightmost nonterminal is always re-write first
leftmost derivation: leftmost non-terminal first
Context-Free Language

Language accepted by a CFG

L(G) = {w | S =*=> w (strings of terminals that can be derived from start
symbol)}
Jing-Shin Chang
5
CFG: Expressive Power

CFG vs. Regular Expression (R.E.)




every R.E. can be recognized by a FSA
every FSA can be represented by a CFG with production rules of the
form: A -> a B | ε
therefore, L(RE) < L(CFG)
Writing a CFG for a FSA (RE)




define a non-terminal Ni for a state with state number i
start symbol S = N0 (assuming that state 0 is the initial state)
for each transition δ(i,a)=j (from state i to stet j on input alphabet a),
add a new production Ni -> a Nj to P
for each final state i, add a new production Ni -> εto P
Jing-Shin Chang
6
CFG: Expressive Power (cont.)

Writing a CFG for a FSA (RE)





define a non-terminal Ni for a state with state number i
start symbol S = N0 (assuming that state 0 is the initial state)
for each transition δ(i,a)=j (from state i to stet j on input alphabet a),
add a new production Ni -> a Nj to P
for each final state i, add a new production Ni -> εto P
For example: RE: (a|b)* a b b
a
0
a
b
1
b
2
b
Jing-Shin Chang
3
S -> a S | b S | a N1
N1 -> b N2
N2 -> b N3
N3 -> ε
7
CFG: Expressive Power (cont.)

Chomsky Hierarchy:




R.E.: regular set (FSA)
CFG: context-free (pushdown automata)
CSG: context-sensitive (linear bounded automata)
unrestricted: recursively enumerable (Tuning Machine)
Jing-Shin Chang
8
CFG: Equivalence

Chomsky Normal Form (CNF) (Chmosky, 1963):


ε-free, and
Every production rule is in either of the following form:






A -> A1 A2
A -> a (A1, A2: non-terminal, a: terminal)
two non-terminals or one terminal at the RHS
generate binary tree
good simplification for some algorithms (e.g., grammar training with
the inside-outside algorithm (Baker 1979))
Every CFG can be converted into a weakly equivalent CNF

equivalence: L(G1) = L(G2)



strong equivalent: assign the same phrase structure to each sentence
(except for renaming non-terminals)
weak equivalent: do not assign the same phrase structure to each sentence
e.g., A -> B C D == {A -> B X, X -> CD}
Jing-Shin Chang
9
CFG vs. Finite-State Machine

Inappropriateness of FAS



Constituents
Recursion
RTN (Recursive Transition Network)



FSA with augmentation of recursion
arc: terminal or non-terminal
if arc is non-terminal: call to a sub-transition network & return upon
traversal
Jing-Shin Chang
10
CFG for English

Sentence Level Constructions




Declarative (直述句): NP (Subject) VP
Imperative (命令句): VP
Yes-No Questions: Aux NP VP
WH-Questions: Wh-NP VP
Jing-Shin Chang
11
CFG for English

Noun Phrase


Head Noun
Modifiers:


Pre-nominal Modifiers:



pre-determiner: “all”
determiner: “the”
post-determiner: (ordinal) (cardinal) (quantifier) (ADJP)




pre-nominal (pre-head) and post-nominal (post-head)
ordinal: “first”,”second”,”next”
cardinal: “two”, ”three”
quantifier: “many”, “several”
Post-nominal Modifiers:



PP: prepositional phrase
non-finite clauses: VP(+ing), VP(+ed), VP(to-V) forms
relative clauses: restrictive, non-restrictive


the man whose son lives in NY (restrictive)
the man, whose son lives in NY (non-restrictive)
Jing-Shin Chang
12
CFG for English

Coordination (同位語, 對等連接詞,…)




conjunction (conj): and, or, but
X → X conj X
a big source of ambiguity: X can be almost anything
Comparison with Mathematic Operators


(left/right) association: ((a + b) + c), ( a ** (b ** c) )
(high/low) precedence: a + b x c : (a + b) x c, a + (b * c)
Jing-Shin Chang
13
CFG for English

Agreement

Subject-Verb (or Aux. Verb): person & number



I like her
He likes her
Gender Agreement (German or French): ADJ-Noun, Det-Noun
Jing-Shin Chang
14
CFG for English

Verb Phrases & Subcategorization

not every verb is compatible with every verb phrases




+ NP, +NP-NP, +to-V, +Ving…
e.g., transitive (Vt), intransitive (Vi)
subcat. frame for a verb: possible set of complements
CFG for SUBCAT Problems




Solution
2:
VP → verb
VP → verb NP
VP → verb S
Solution 1:
VP → v1
VP → v2 NP
VP → v3 S



v1 → disappear | …
v2 → find | leave | repeat
v3 → think | believe | say
verb
Jing-Shin Chang
→ disappear | … | find |… | think
15
CFG for English

Auxiliaries





modal: “can”, “may”, “must”, “will” +V(stem)
perfect: “have” +V(pp)
progressive: “be” +V(ing)
passive: “be” +V(past)
Multiple Auxiliaries

modal < perfect < progressive < passive




modal perfect: “could have been …”
modal passive: “will be married …”
perfect progressive: “have been feasting …”
modal perfect passive: “might have been prevented …”
Jing-Shin Chang
16