Building Aix

Download Report

Transcript Building Aix

Automatic Grapheme-Phoneme Conversion
for Spoken British English Corpora
C. AURAN, C. BOUZON & D.J. HIRST
Laboratoire Parole et Langage
CNRS UMR6057
Université de Provence
Summary
1. The Aix-MARSEC Project
Building Aix-MARSEC
Availability of the database
Methodology
2. Grapheme-Phoneme Conversion and Alignment
The Aix-MARSEC Methodology
Integration into PCE
3. Conclusion and Perspectives
The Aix-MARSEC Project
The Aix-MARSEC Project
Building Aix-MARSEC
An evolution from the SEC and MARSEC corpora
SEC
MARSEC
Spoken English Corpus
Machine Readable SEC
Aix-MARSEC
•and
Automatic
grapheme-to-phoneme
conversion
• Alignment
of words
and
tone
groups with the signal
• 55,000 words,
339 min.
18
sec.
• BBC 1980s recordings
• Automatic phoneme level alignment
• intonation
Conversion
of all the using
TSM to
ASCII
charactersmethodology
• •11
speaking
styles
Automatic
annotation
the
Momel-Intsint
• 53 (17 female and 36 male) speakers
• 8 annotation
levels aligned: phonemes, syllable constituents,
• Orthographic
transcription
syllables,
words,
feetparsing
and rhythmic units, tone groups, Intsint coding
• Syntactic
tagging
and
• Prosodic annotation: 14 tonetic
stressand
marks
• Tagging
parsing alignment under way
The Aix-MARSEC Project
The Aix-MARSEC Project
Availability of the database
• Online version:
• Annotation files (TextGrids)
• Phonemes data tables
• Perl and Praat scripts
www.lpl.univ-aix.fr/~EPGA/
• CD-Rom version:
• Annotation files (TextGrids)
• Phonemes data tables
• Perl and Praat scripts
• Sound files (.wav format)
The Aix-MARSEC Project
Methodology
Orthographic transcription
TSM annotation
G2P conversion
Raw phonemic
transcription
Elision prediction
Automatic alignment
Optimised phonemic
transcription
Rhythmic annotation
Aligned phonemic
transcription
SC annotation
Syllable annotation
Word annotation
Grapheme-Phoneme Conversion
and Alignment
G2P Conversion and Alignment
The Aix-MARSEC Methodology
Orthographic transcription
G2P conversion
Raw phonemic
transcription
Elision prediction
Optimised phonemic
transcription
Automatic alignment
Aligned phonemic
transcription
SC annotation
Syllable annotation
Word annotation
G2P Conversion and Alignment
The Aix-MARSEC Methodology
Orthographic transcription
G2P conversion
Raw phonemic
transcription
G2P Conversion and Alignment
The Aix-MARSEC Methodology
G2P Conversion: General principles
• Dictionary-based method (4 dictionaries used)
• Specific processing for numbers, abbreviations, etc.
• Syntagmatic effects (linking r, definite article)
Raw transcription
G2P Conversion and Alignment
The Aix-MARSEC Methodology
G2P Conversion: The 4 dictionaries
• Primary pronunciation dictionary (‘Advanced Learners’ Dictionary’,
Oxford University Press; 71 000 entries)
• Complementary dictionary (700 entries)
• “Problematic forms” dictionary (for hesitations, partial
words,…; 26 entries)
• “Reduced forms” dictionary (75 entries)
G2P Conversion and Alignment
The Aix-MARSEC Methodology
G2P Conversion: Specific issues
• Abbreviations
• Numbers
• Sequences of numbers and capitals (Post Codes)
• Genitives and Contractions
• 3rd person and plural forms
• Preterite and past participle forms
G2P Conversion and Alignment
The Aix-MARSEC Methodology
Orthographic transcription
G2P conversion
Raw phonemic
transcription
Elision prediction
Optimised phonemic
transcription
G2P Conversion and Alignment
The Aix-MARSEC Methodology
Elision Prediction: General principles
• Raw transcription ↔ citation forms
• Continuous speech ↔ specific phenomena (elisions,
epenthesis, metathesis, etc.)
G2P Conversion and Alignment
The Aix-MARSEC Methodology
Elision prediction: Constraints
- Intonation constraints (TSM)
- Temporal constraints:
Minimal threshold: 5ms
Thresholds for specific phonemes (Klatt, 1979)
/t – d/= 55ms; /@/= 55ms; /T/= 110ms
Lengthening « z » factor: z < 0  elision
z ≥ 0  no elision
- Phonotactic constraints (rules)
G2P Conversion and Alignment
Elision prediction: Rules
Principles Phonemes
0
1
d
2
h
3
td
4
td
Contexts
and
he('s/ll/d) him his her
{[t][d]} # {[t][d]}
C1 + {[t][d]} # C2 – {[h ][j ]}
Constraints
<5ms
TSM
TSM
Th.1 - except '-ed'
Th.
Examples
and then
in her case
I've got to
mustn't lose
5
pk
nasal + {[p][k]} (#) C – {[r][l][j]}
glimpse
6
7
8
l
T
ptk bdg
[O:] + [l] (#) C
C + [T] (#) [s]
[s|z] + {[p|b][t|d][k|g]} (#) [s|z]
Th.
always
twelfths
tourists
9
@
[@] + {[l][r]} (#) + voyelle réduite {[I][@]}
Th. - */rl/
camera
10
11
@
@
# [k@n] ('syll (syll [0…n])) #
{[k][p]} + [@] + [n] #
TSM - Th.
Th.
confront
open
1Th.:
duration threshold
G2P Conversion and Alignment
Elision prediction: Evaluation
MEASURES
RECALL
50,51 %
Half of all elisions are correctly predicted
PRECISION
74,44 %
¾ predicted elisions are correct
SILENCE
49,49 %
NOISE
25,56 %
F-MEASURE
60,18 %
Global quality of the algorithm
4077 elided phonemes out of 199,770 in the corpus (≈ 2 %)
G2P Conversion and Alignment
The Aix-MARSEC Methodology
Orthographic transcription
G2P conversion
Raw phonemic
transcription
Elision prediction
Optimised phonemic
transcription
Automatic alignment
Aligned phonemic
transcription
G2P Conversion and Alignment
Alignment: General principles
HMM and Viterbi based alignment by Christophe Lévy (LIA, France)
- HMM trained on the TIMIT corpus of American English
- Gaussian Mixture Model (8 components & diagonal covariance matrices
estimated through the Expectation-Maximisation algorithm optimising the
Maximum-Likelihood criterion)
- 12 MFCC (filter bank analysis) increased by energy, delta and delta-delta
coefficients
39-coefficient vector per speech frame
G2P Conversion and Alignment
Alignment: Evaluation
30
25
20
15
Series2
Poly. (Series2)
10
5
0
-50 -46 -42 -38 -34 -30 -26 -22 -18 -14 -10
-6
-2
2
6
10
14
18
22
26
30
34
38
42
46
50
-5
Absolute mean error: 22 ms
Mean error: - 6,29 ms
Kurtosis: 8,15 (narrow distribution)
Skewness: -0,94 (left bias)
G2P Conversion and Alignment
Alignment: Evaluation
Acceptance
Threshold
Optimised
transcription
64 ms
93.25 %
32 ms
82.02 %
20 ms
68.37 %
16 ms
59.97 %
15 ms
57.40 %
10 ms
42.43 %
5 ms
23.72 %
Integration into PCE
Integration: Motivations
Double focus:
Segmental phenomena
Formant charts
Prosodic phenomena
Tonal alignment
Phoneme level
alignment
For phoneticians and phonologists
Integration into PCE
Integration: 2 possible policies
• Direct integration: Exact Aix-MARSEC methodology
Requires word level manual alignment
• Alternative integration: Adaptation of the Aix-MARSEC
methodology
Optional elisions predicted on the basis of
phonotactic rules only + decision during the
alignment phase
Conclusions and Perspectives
Conclusions and Perspectives
• An easily evolutive fully automatic methodology
• Diverse types of phonological / phonetic segmental / prosodic
exploitation (formant charts, temporal, intonational and metrical
studies, …)
• Full interactivity with other ProZEd modules (Momel-Intsint, …)
• Realistic integration into PCE (2 options)
Well… This time it’s for good !!
Presentation available from
www.lpl.univ-aix.fr/~EPGA/
14 ASCII prosodic annotation symbols:
(Roach, 1994)
_
~
<
>
/’
‘/
\
/
low level
high level
step-down
step-up
(high) rise-fall
high
high fall fall-rise
high rise
,
‘
,\
\,
*
|
||
low rise
low fall
(low rise-fall – not used)
low fall-rise
stressed but unaccented
minor intonation unit boundary
major intonation unit boundary
Back to the presentation
Reduced forms processing
Aim: improving G2P conversion
Creation of a reduced forms dictionary based on O’Connor (1967) and
Faure (1975)
Reduction constraint: TSM absence
Example:
TSM: ‘/and → converted into /{nd/
No TSM: and → converted into /@nd/
Back to the presentation