Transcript Vocal Tract & Lip Shape Estimation MS Shah & Vikash Sethia By
Vocal Tract & Lip Shape Estimation
By
MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey
EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June, 2003
ABSTRACT:
The display of intensity, pitch, and vocal tract shape is considered to be helpful in speech training of the hearing impaired. A speech analysis package is developed in MATLAB for displaying speech waveforms, pitch and energy contours, spectrogram, and areagram (a two-dimensional plot of cross sectional area of vocal tract as a function of time and position along the tract length). While vocal tract shape estimation works satisfactorily for vowels, during stop closures, the place of closure can not be estimated due to very low signal energy. There is a need to investigate methods for predicting vocal tract shape during stop closure from the shapes estimated on either side of the closure. Work is in progress for lip shape estimation which may find application in video telephony.
2
Introduction
Hearing impairment → Lack of auditory feedback during speech production → Speech impairment
Speech training to hearing impaired children by visual (using a mirror) & tactile feedback : some important features and efforts not distinguishable
Speech training aids: Display of articulatory efforts and acoustic parameters: vocal tract and lip shape, pitch, and energy variations
3
Vocal tract shape estimation
General model for speech production system
Where
s
(
n
)
u
(
n
)
g
(
n
)
v
(
n
)
r
(
n
) s (n) = speech signal, u (n) = glottal excitation, g (n) = glottis impulse response, v (n) = impulse response of the vocal tract, r (n) = impulse response of radiation from lips.
4
Cont..
Acoustic tube model of the vocal tract
At the m th section, volume velocity: pressure: reflection coefficient:
u m p m r m
( (
x
,
x t
, )
t A m
)
u
m
(
A m A m
1
x c A m
A m
1 ,
t
)
u m u
m
( (
x
,
t
)
x
,
t
)
u m
(
x
,
t
) 5
Cont..
Speech analysis model (Wakita-1973) Assumption vocal tract represented as an all-pole filter with
h
(
n
)
g
(
n
)
v
(
n
)
r
(
n
)
Algorithmic steps:
• inverse filtering for error signal with LMS technique • set of simultaneous equations solved with Robinson’s algorithm
for reflection coefficients & relative area values
6
Cont..
Implementation
■
Set-up: PC with sound card for signal acquisition
(sampling rate used: 11.025 k sa/s) ■
“ VTAG-1 ” developed for speech pr. & display
Pre-emphasis for 6 dB/octave equalization, analysis window: 256-sample Hamming with 50% overlap
Robinson’s algorithm for obtaining reflection coefficients & area values
Beizer form algorithm for interpolation of area values
7
VTAG-1 result for all-vowel word /aIje/
8
/a/
Synthesized vowels
/i/ /u/
9
Amplitude/pitch modulated synthesized vowel
/a/
Amplitude modulated Pitch modulated Amp. & pitch modulated
10
/ata/
Spectrograms for V-C-V sequence
/aka/ /ada/ /aga/
11
/ata/
Areagram for V-C-V sequence
/aka/ /ada/ /aga/
12
Lip shape estimation
Mouth parameters:
Parameter estimation :
• Pitch tracking : odd harmonics absent for analysis window
length = 2 * pitch period
• Magnitude spectrum above 4000 Hz clipped to zero • Mean & variance used for generation of predictor surfaces 13
Lip shape estimation results Pitch and mean vs. variance result (1): synthesized amplitude modulated vowel /
u
/
14
Pitch and mean vs. variance result (2): synthesized pitch/amplitude modulated vowel /
a
/
15
Pitch and mean vs. variance result (3): synthesized pitch modulated vowel /
i
/
16
Summary
■ Analysis & display package VTAG-1 developed for pitch/energy variation, spectrogram, & areagram (2-D plot of v.t. area) to investigate the problems in estimation of vocal tract shape, for use in speech training aid of the hearing impaired children.
17
Cont.
■ Area estimation for vowels: not affected by amplitude & pitch variation ■ Area estimation during stop closure: place of closure can not be estimated from analysis result during stop closure ■ Further work:
Investigate methods for predicting vocal tract area during stop closure from the areas estimated on either side of closure Implement algorithm for generation of predictor surfaces for extraction of lip shape estimation parameters
18