translationRule.ppt

Download Report

Transcript translationRule.ppt

What’s in a translation rule?
Paper by Galley, Hopkins, Knight & Marcu
Presentation By: Behrang Mohit
Problem
• The problem of syntax in
SMT
• Yamada & Knight (2001)
had transformations like
child-reorderings
– Addressed the SOV vs. VSO
orders
– Does not address all the
syntactic movements
• English Adverbs: The
government simply says …
• ne … pas
Three Alternative
• Abandon Syntax
– Evidence: Kohn et. Al. 2003
• Abandon English Syntax
– Learn grammar from parallel corpus
• Wu (1997): ITG: binary branching rules
• Use English syntax to learn transformation
rules from parallel corpus and larger
fragments of the English tree structure.
A Theory of Word Alignment
• Generative process
– Source string to target tree
(symbol tree)
– Derivation Step: replaces
a substring of the source
string with a subtree of the
target tree.
– Derivation: Sequence DS.
Three Alternative Derivations
Replacing and Creating
• Each source element is
replaced at exactly one
step of the derivation
• Each node target tree is
created at exactly one
step of derivation
• Replaced(s,D)
– Replaced (va, D) = 2
•
Created (t,D)
– Created (AUX, D) = 3
Word Alignment
• Alignment: A relation between leaves of
the target tree (t) and elements of the
source string (s):
– iff Replaced(s,D) = created(t,D)
“Good Derivations”
• Input: source
string, target
tree, word
alignments
• A set that
induces a super
alignment set for
the given word
alignment.
– 1&3
 A (S , T )
Derivations  Rules
• ne VB pas
• NP VP
• Task: given T, S
and A, learn  A (S , T )
in any D   A (S , T )
• What about
inferring complex
rules?
Alignment Graph
• Target Tree,
augmented with the
source strings
• Span of nodes
• Frontier set
• Frontier graph
fragment: root and all
sinks are in the
frontier set
– Spans of the sinks form
a partition of the span
of the root.
Alignment Graph
• Target Tree,
augmented with the
source strings
• Span of nodes
• Frontier set
• Frontier graph
fragment: root and all
sinks are in the
frontier set
– Spans of the sinks form
a partition of the span
of the root.
Alignment Graph
• Target Tree,
augmented with the
source strings
• Span of nodes
• Frontier set
• Frontier graph
fragment: root and all
sinks are in the
frontier set
– Spans of the sinks form
a partition of the span
of the root.
Transformation process
• Input: Place the sinks
in the order defined by
the partition.
• Output: Replace sink
nodes with variable
corresponding to the
position in input, then
take the tree part of
the fragment.
• These rules are in
 A (S , T )
Rule Extraction Algorithm
• Search the space of graph
fragments for frontier
graph fragments (FGF).
– Search of all fragments is
exponential
• The frontier set (FS) can
be found linearly
• For each node (n) in the
FS, there is a unique
minimal FGF, rooted at n.
Rule Extraction Algorithm
• Search the space of graph
fragments for frontier
graph fragments (FGF).
– Search of all fragments is
exponential
• The frontier set (FS) can
be found linearly
• For each node (n) in the
FS, there is a unique
minimal FGF, rooted at n.
Expanding from minimal fragments
• Compose new
frontier graph
fragment by
merging to of the
minimal fragments
Experiments
• French-English (Hansard)
– Human alignments
– GIZA++ alignments
• Chinese-English (FBIS)
– GIZA++ alignments (trained on huge corpus)
• Issue: Coverage of the extracted rules.
– Percentage of the parse trees in the corpus
that can be transformed by the translation
rules.
Coverage of the model
Coverage of the model
• Number of expansions
–
–
–
–
Single: Yamada & Knight 2001
17 to 43 expansions for full coverage
Alignment
Lang Diffs
Another example of multi-level
reordering
Conclusion
• Previous works: child-node reordering
• This model looks at larger tree fragments
• Translation rules are both syntactically and
lexically motivated.
• The rule extraction algorithm can deal with
alignment and systematic parsing errors.
• Next step: defining probability distribution
over the rules  Decoding
Explanatory power of the model