Transcript Speech recognition in MUMIS
Speech recognition in MUMIS Mirjam Wester, Judith Kessens & Helmer Strik
Intro • Objective: Automatic speech recognition of football commentaries • SPEX transcribed two matches for two languages (Dutch and English): – England - Germany (Eng-Dld) and – Yugoslavia -The Netherlands (Yug-Ned) • Commentaries and stadium noise are mixed
Data Conversion • SPEX transcription: – text grid: • orthographic transcription • chunk alignment; chunk = a segment of speech of about 2 to 3 seconds – CD with one large wav file • Split according to chunk alignments
Examples of data • Yug-Ned Dutch • Yug-Ned English • Eng-Dld Dutch • Eng-Dld English
Statistics #chunks #speech chunks #empty chunks #words (types) #words (tokens) Dutch 5146 3006 2140 1954 12079 English 5613 3725 1843 2923 24022 English matches have two commentators, Dutch only one.
Overlapping segments have been disregarded.
Training Dutch: • Yug-Ned ¾ of CD (19 min speech) • France Telecom Noise Reduction (FTNR) English: • Yug-Ned ¾ of CD (28 min speech) • FTNR For more information on France Telecom Noise Reduction tool see: B. Noé, J. Sienel, D. Jouvet, L. Mauuary, L. Boves, J. de Veth & F. de Wet “Noise Reduction for Noise Robust Feature Extraction for Distributed Speech Recognition”. In
Proc. of Eurospeech ’01
Test Dutch: • Yug-Ned ¼ of CD – 626 chunks, 1577 words – lexicon and language model based on complete Yug Ned match English: • Yug-Ned ¼ of CD – 636 chunks, 2641 words – lexicon and language model based on complete Yug Ned match
SNR before and after FTNR tool
WER results for Yug-Ned before and after FTNR
60 55 50 45 40 NL-original NL-FTNR Eng-Original Training material acoustic models Eng-FTNR
Dutch – Polyphone • Data is phonetically rich sentences • Phone models were trained on: – Polyphone all speakers – Polyphone male speakers – Polyphone male speakers + MUMIS noise • Polyphone as bootstrap for segmentation of MUMIS material
Polyphone models (Dutch) Yug-Ned test set
95 85 75 65 55 45 Poly-all Poly-male Poly-male+noise Poly-seg.
MUMIS Training material acoustic models
Cross tests (Dutch & English) Cross-tests: • train on ¾ Yug-Ned test on ¼ Eng-Dld • train on ¾ Eng-Dld test on ¼ Yug-Ned
MUMIS models (Dutch) Yug-Ned test Eng-Dld test
70 65 60 55 50 45 Yug-Ned Eng-Dld-cross Eng-Dld Training material acoustic models Yug-Ned-cross
MUMIS models (English) Yug-Ned test Eng-Dld test
70 65 60 55 50 45 Yug-Ned Eng-Dld-cross Eng-Dld Training material acoustic models Yug-Ned-cross
MUMIS models (Dutch+English) Yug-Ned test Eng-Dld test
70 65 60 55 50 45 NL ENG Yug-Ned Eng-Dld-cross Eng-Dld Training material acoustic models Yug-Ned cross
Function words vs content words
80 70 60 50 40 30 20 10 0 Yug-Ned Eng-Dld
English data
Yug-Ned Eng-Dld
Dutch data word type
function content names all
90 80 70 60 50 40 30 20 10 0 0 SNR vs. WER (1) Dutch Data 5 YugNed 10 15 SNR1 (dB) YugNed_ftnr 20 EngDld 25 30
90 80 70 60 50 40 30 20 10 0 0 SNR vs. WER (2) English Data 10 YugNed 20 SNR1 (dB) YugNed_ftnr 30 EngDld 40
Discussion • WERs are high • Noise?
– FTNR leads to lower SNR, but WERs do not improve substantially • Not enough training data?
– Polyphone for training/bootstrapping does not lead to lower WERs than training on MUMIS data – Noisifying Polyphone with MUMIS gives encouraging results
Discussion continued • Function words comprise ± 50% of the data, and cause great deal of the errors • Names are recognized very well • Function words not necessary for information extraction (?)
Future work • Steps to noise robust speech recognition: – model/speaker adaptation – combinations of noisified Polyphone models and FTNR • Other issues: – transcription of more data • English, Dutch and German • preference specific games? radio? TV?
– generic football specific language model – confidence measures?
Future work continued Questions: • What type of output from ASR is needed?
– word-graph – n-best list – top of the list – word spotting? only content words?
• For research purposes: is it possible to obtain data that has not been mixed (noise + commentary)?