Presentation Slides

Transcript Presentation Slides

Discriminative Modeling extraction
Sets for Machine Translation
Author
John DeNero and Dan Klein
Presenter
Justin Chiu
UC Berkeley
Contribution

Extraction set
◦ Nested collections of all the overlapping phrase
pairs consistent with an underlying wordalignment

Advantages over word-factored alignment
model
◦ Can incorporate features on phrase pairs, more
than word link
◦ Optimize a extraction-based loss function really
direct to generating translation

Perform better than both supervised and
unsupervised baseline
Progress of Statistical MT


Generate translated sentences word by
word
Using while fragments of training example,
building translation rules
◦ Aligned at the word level
◦ Extract fragment-level rules from word aligned
sentence pair
 Tree to string translation

Extraction Set Models
◦ Set of all overlapping phrasal translation rule +
alignment
Outline
Extraction Set Models
 Model Estimation
 Model Inference
 Experiments

EXTRACTION SET
MODELS
Extraction Set Models

Input
◦ Unaligned sentence

Output
◦ Extraction set of phrasal translation rules
◦ Word alignment
Extraction Sets from Word
Alignments

Extraction Sets from Word
Alignments

Extraction Sets from Word
Alignments
Possible and Null Alignment Links

Possible links has two types
◦ Function words that is unique in its language
◦ Short phrase that has no lexical equivalent

Null alignment
◦ Express content that is
absent in its translation
Interpreting Possible and Null
Alignment Links

Interpreting Possible and Null
Alignment Links
Linear Model for Extraction Set

Scoring Extraction Sets

MODEL ESTIMATION
MIRA(Margin-infused Relaxed
Algorithm)

Extraction Set Loss Function

MODEL INFERENCE
Possible Decompositions
DP for Extraction Sets

DP for Extraction Sets
Finding Pseudo-Gold ITG Alignment

EXPERIMENTS
Five systems for comparison

Unsupervised baseline
◦ Giza++
◦ Joint HMM

Supervised baseline
◦ Block ITG

Extraction Set Coarse Pass
◦ Does not score bispans that corss bracketing
of ITG derivations

Full Extraction Set Model
Data

Discriminative training and alignment
evaluation
◦ Trained baseline HMM on 11.3 million words of
FBIS newswire data
◦ Hand-aligned portion of the NIST MT02 test set
 150 training and 191 test sentences

End-to-end translation experiments
◦ Trained on 22.1 million word prarllel corpus
consisting of sentence up to 40 of newswire data
from GALE program
◦ NIST MT04/MT05 test sets
Results
Discussion
Syntax labels v.s words
 Word align to rule  Rule to word align
 Information from two directions
 65% of type 1 error


Presentation Slides

Transcript Presentation Slides

Directory