Presentation Slides
Download
Report
Transcript Presentation Slides
Discriminative Modeling extraction
Sets for Machine Translation
Author
John DeNero and Dan Klein
Presenter
Justin Chiu
UC Berkeley
Contribution
Extraction set
◦ Nested collections of all the overlapping phrase
pairs consistent with an underlying wordalignment
Advantages over word-factored alignment
model
◦ Can incorporate features on phrase pairs, more
than word link
◦ Optimize a extraction-based loss function really
direct to generating translation
Perform better than both supervised and
unsupervised baseline
Progress of Statistical MT
Generate translated sentences word by
word
Using while fragments of training example,
building translation rules
◦ Aligned at the word level
◦ Extract fragment-level rules from word aligned
sentence pair
Tree to string translation
Extraction Set Models
◦ Set of all overlapping phrasal translation rule +
alignment
Outline
Extraction Set Models
Model Estimation
Model Inference
Experiments
EXTRACTION SET
MODELS
Extraction Set Models
Input
◦ Unaligned sentence
Output
◦ Extraction set of phrasal translation rules
◦ Word alignment
Extraction Sets from Word
Alignments
Extraction Sets from Word
Alignments
Extraction Sets from Word
Alignments
Possible and Null Alignment Links
Possible links has two types
◦ Function words that is unique in its language
◦ Short phrase that has no lexical equivalent
Null alignment
◦ Express content that is
absent in its translation
Interpreting Possible and Null
Alignment Links
Interpreting Possible and Null
Alignment Links
Linear Model for Extraction Set
Scoring Extraction Sets
MODEL ESTIMATION
MIRA(Margin-infused Relaxed
Algorithm)
Extraction Set Loss Function
MODEL INFERENCE
Possible Decompositions
DP for Extraction Sets
DP for Extraction Sets
Finding Pseudo-Gold ITG Alignment
EXPERIMENTS
Five systems for comparison
Unsupervised baseline
◦ Giza++
◦ Joint HMM
Supervised baseline
◦ Block ITG
Extraction Set Coarse Pass
◦ Does not score bispans that corss bracketing
of ITG derivations
Full Extraction Set Model
Data
Discriminative training and alignment
evaluation
◦ Trained baseline HMM on 11.3 million words of
FBIS newswire data
◦ Hand-aligned portion of the NIST MT02 test set
150 training and 191 test sentences
End-to-end translation experiments
◦ Trained on 22.1 million word prarllel corpus
consisting of sentence up to 40 of newswire data
from GALE program
◦ NIST MT04/MT05 test sets
Results
Discussion
Syntax labels v.s words
Word align to rule Rule to word align
Information from two directions
65% of type 1 error