MMI HLDA

Transcript MMI HLDA

Integrating Speech Recognition
and Machine Translation
Spyros Matsoukas, Ivan Bulyko,
Bing Xiang, Kham Nguyen,
Richard Schwartz, John Makhoul
1
Integration Issues
 Machine Translation (MT) system is trained on text data,
so it expects
– segments that correspond to foreign sentences
– properly placed punctuation marks
– numbers, dates, monetary amounts, abbreviations, etc., as they
appear in ordinary text
 However, Speech-To-Text (STT) output
– is segmented automatically on long pauses
• resulting segments may be too short, or may cross sentence
boundaries
– has no punctuation
• punctuation needs to be automatically added prior to translation
– has numbers, dates, etc., in spoken form
• output can be parsed to convert numbers to written form
2
STT/MT Pipeline
 Initial set of experiments ran MT on the 1-best
hypothesis from STT
3
STT Components
 STT-A
– EARS RT04 Arabic BN system
– Word pronunciations based on graphemes
– Acoustic models estimated using Maximum Mutual Information
(MMI) and Speaker Adaptive Training (SAT) on 100 hours of BN
audio data
– 3-gram language model trained on 400 million words of news
text
 STT-B
– Uses morphological analyzer and automatic methods to infer
short vowels in word pronunciations
– Trained on an additional 50 hours of acoustic training data
 STT-C
– Makes use of additional language model training data
4
MT Components
 MT-A
– System developed during the period Sep 2004 – Apr 2005
– Phrase-based translation model, trained on 100M words of
Arabic/English UN and news bitext
– 3-gram English LM, trained on 2 billion words of text (mostly
newswire)
– Translation based on posterior probability P(English | Foreign)
 MT-B
– Uses a combination of generative and posterior translation
probabilities
– Includes a phrase segmentation score
– Uses a method to compensate for over-estimated translation
probabilities
– Optimizes decoding weights by minimizing TER on N-best lists
TER results on the 2002 and 2004 MT Eval sets
5
System Id
2002
2004
MT-A
48.29
46.31
MT-B
46.35
45.55
Test Data
 Tested integration on bnat05
– 6-hour set from several sources from Jan 2001 and Nov 2003
– Test set consists of both Modern Standard Arabic (MSA) and
Arabic dialect segments
 All system comparisons based on TER
– MT system output automatically scored against single
reference transcription, with mixed case
6
Integration Results
 Effect of STT accuracy, segmentation and punctuation on MT
accuracy
System
STT/MT
STT
WER
STT
Segmentation
Punctuation
TER
STT-A, MT-A
22.2
auto
period
66.8
STT-B, MT-A
18.3
auto
period
65.9
STT-B, MT-B
18.3
auto
period
64.6
STT-B, MT-B
17.6
reference
period
61.9
REF, MT-B
0.0
reference
period
58.7
REF, MT-B
0.0
reference
reference
58.0
 At current MT performance level:
– large improvements in STT accuracy result in small TER gain
– significant TER reduction (2.7% absolute) can be obtained by
improving sentence boundary detection
– full punctuation helps translation only marginally
7
Optimizing STT segmentation for MT
 Tuned the audio segmentation procedure in order to output
segments that match the reference in terms of average length
System
STT/MT
STT
WER
Avg. Segment
Length (sec)
TER
STT-B, MT-B
18.3
6.17
64.6
STT-C, MT-B
17.8
6.17
64.4
STT-C, MT-B
17.7
9.47
63.1
STT-C, MT-B
17.7
13.60
62.8
 1.6% absolute TER gain for optimizing segmentation
 Additional gains can be obtained by
– Converting spoken numbers to written form prior to translation
(0.4-0.5% TER reduction)
– re-defining STT output segmentation, using linguistic information
8
Sentence Boundary Detection (SBD)
 Used a hidden-event language model (HELM) to detect
sentence boundaries in the 1-best STT output
– 4-gram HELM, trained 850M words of Arabic news with
Kneser-Ney smoothing
– Silence duration can be integrated as observation into HMM
search
 Explored various configurations
– SBD-1: Use only LM to insert periods within speaker turns
– SBD-2: Use LM and silence duration jointly
– SBD-3: Bias the LM to insert boundaries at a higher rate
(by 30-50%), then remove boundaries with lowest model
posteriors while constraining the maximum sentence length
9
SBD Results
 Effect of HELM-based SBD on MT accuracy, starting from one of
two audio segmentations
– audio-seg-1:
– audio-seg-2:
9.47 sec / segment
13.60 sec / segment
SBD Configuration
Baseline audio
segmentation
SBD-1
SBD-2
SBD-3
TER (TER-MSA)
audio-seg-1
audio-seg-2
62.55 (60.32)
62.37 (60.28)
62.66 (60.25)
62.49 (60.20)
62.32 (59.78)
62.81 (60.42)
62.79 (60.28)
62.34 (60.02)
 HELM has larger effect on Modern Standard Arabic (MSA) regions,
where STT accuracy is high
 SBD can be applied safely on top of any audio segmentation
10
Optimizing MT on Speech Data
 MT accuracy can be enhanced by optimizing MT
decoding weights on broadcast speech data
– Optimization can compensate for differences in style between
newswire text and STT transcript (esp. on broadcast
conversations)
 Optimization Issue:
– MT optimization requires one-to-one mapping between
translation hypotheses and references on the tuning set
– Non-trivial to tune on translations of automatically segmented
STT output
 Solutions:
– Re-segment STT output according to reference segmentation
prior to translation, then use translation hypotheses for tuning
– Tune based on translations of the STT reference transcriptions
11
MT Optimization Results

Updated development sets
Purpose
Tuning
Validation


12
Broadcast
Conversations
bcat06
bcad06
bnad06
67.4
66.9
66.8
bcad06
73.3
71.9
71.9
Results
OptSet
MT02
BNC-STT
BNC-REF


Broadcast
News
bnat06
bnad06
MT02: tuning on translations of the 2002 NIST MT evaluation set
BNC-STT: tuning on translations of manually segmented (according to
reference) STT output
BNC-REF: tuning on translations of reference transcripts
Conclusions and Future Research
 Results on 1-best STT/MT integration show that
sentence boundary detection has a large impact on MT
performance
– Segmentation should be based on both audio and STT
transcript
 Better performance is expected by coupling STT and
MT more tightly
– Have begun running MT on consensus networks from STT
output
– Will explore joint optimization of STT and MT system
parameters
 At current operating point, improvements in MT will
have the largest effect
13