Mechanical Model for Speech design

Download Report

Transcript Mechanical Model for Speech design

The Effect of Glottal
Opening on the
Acoustic Response of
the Vocal Tract
Anna Barney,
Antonio De Stefano
ISVR, University of Southampton, UK
&
Nathalie Henrich
LAM, Université Paris VI, France
Introduction
We are interested in the interaction
between the voice source and the
vocal tract.
We hope that an improved
understanding of source-tract
interaction will enhance
naturalness in synthesised speech
Structure of this talk
• Types of source-tract interaction
• Effect of source-tract interaction on
formant frequencies: theory
• Mechanical model
• Measurement of the effect of sourcetract interaction: static
• Measurement of the effect of sourcetract interaction: dynamic
• Conclusions & Future work
Assumptions of SourceFilter Theory
• Source and vocal-tract filter do
not interact
• Non-linear effects are normally
lumped into the source model
• Formants are the resonances of
the vocal-tract, calculated when
the glottal impedance is infinite
Source Tract Interaction
(STI)
Childers & Wong (1994) define 3 principal types of STI:
• Loading of the source by the vocal tract
impedance
• Dissipation of vocal tract energy by glottal
opening (mainly at F1)
• Carry over of energy from one glottal period to
the next (for low glottal damping)
(D.G. Childers and C.-F. Wong, 'Measuring and Modeling Vocal
Source-Tract Interaction', IEEE Transactions on Biomedical
Engineering, Vol. 41. No. 7. pp. 663-671 (1994) )
Source Tract Interaction
(STI)
Flanagan (Speech analysis synthesis and
perception, 1965) considered the effect of
finite glottal impedance on a transmission
line model of the vocal tract
glottis
Zg
Zl
Subglottal
vocal tract
Za
Zb
supraglottal vocal tract
Za
Za
Zb
Za
Zo
Source Tract Interaction
(STI)
Flanagan stated that a finite glottal
impedance would raise F1 and
increase formant damping
He predicted and increase in F1 of
1.4% for a glottal area of 5 mm2
Source Tract Interaction
(STI)
• Ananthapadmanabha, T.V. &
Fant G. (1982)
(Calculation of the true glottal volume velocity and its
components. Speech Commun. 1 (1982) 167-184).
• Found the theoretical effect of
glottal inertance to be small
Source Tract Interaction
(STI)
• P. Badin and G. Fant,
(Notes on Vocal tract computation. STL-QPSR 2-3/1984
(1984) 53-108)
• Modelled the sub-glottal system
as a short circuit
• used a glottal area of 0.027 mm2,
• glottis modelled by inductance
only:
• F1 increased by 0.2%
Measurements on Real
Speech
• It is known that the formant estimates vary
depending on where in a pitch period the
estimation window is placed.
• F1 estimated during open phase using
group delay characteristics and a minimum
phase assumption are generally a little
higher during open phase than during
closed phase.
(B Yeganarayana, R Veldhuis IEEE trans speech & audio processing,
6(4) 1998)
• Closed-phase formant analysis is used to
get estimates of the vocal tract formants
that are reliably decoupled from any subglottal formants.
(L.C.Wood, D.J.P Pearce IEE Proceedings 136 pt 1 no.2 1989)
Source Tract Interaction
(STI)
Shifts in F1 may be small but they may
correlate with:
– changes in glottal OQ and/or
– changes in glottal amplitude
And may be of interest when considering
voice quality & naturalness of synthesis
Also – glottal areas considered in the
literature are always at the small end of
the range found in normal voicing.
Flanagan’s model
We implemented Flanagan’s
transmission line model with a
uniform duct of length
17.5 cm and area 2.89 cm2 to
explore the change as glottal width
increased
Log amplitude
The formant shift –
theory
Frequency (Hz)
Theoretical modelling of
the formant shift – static
glottis
To match our experimental measurements
we elaborated on Flanagan’s model
We used 4 T-sections for the supra-glottal
vocal tract and other parameters to
match those of our mechanical model
We chose the boundary condition at the
lips to match the boundary condition for
our measurements
Theoretical modelling of
the formant shift –glottal
impedance model
Flanagan (1965) & others for
finite glottal impedance:
Zg 
12μl g
wg 3 hg
 jω
ρl g
hg wg
Theoretical modelling of–
glottal impedance model
Laine & Karjalainen (1986):
Z g  R g  jωLg
 0.69  Ar1  A  
ρc 12μt
r 

Rg 


ρ
U
2


At hw3
A
g


 0.4  dd  dm 

L g  ρ


Ag


where
Ar 
Ag
At
2μ
dd  0.48 Ag ;
dm  π
ρω
wh
Theoretical modelling of
the formant shift –glottal
impedance model
Rösler & Strube (1989)
Z gt  Rdk  Z vi  jωLk
Where
R dk 
K ρU
Ag 2
Z vi 
 2ρ   D 
Lk  
 ln ;
 hα   ω 
12μl g
wg3 hg
 jω
6 ρl g
5hg wg
Theoretical modelling of
the formant shift –glottal
impedance model
• How should we model the subglottal impedance?
• Speech models often assume
that the lower end of the trachea
is a fully absorbing boundary
(r=0) so that there are no subglottal resonances.
Theoretical modelling of
the formant shift –glottal
impedance model
• We wanted to compare our
theoretical model with
measurements. We tried all
three glottal impedance models
and a range of sub-glottal
impedance models to find the
best fit to the data.
The Mechanical Model
We made our measurements of
F1 shift using a mechanical
model of the larynx and vocal
tract
The mechanical model
Shutter Driver System
The shutter
region
Schematic Diagram of
the Model
55
50
130
flow
15
17
pt 3
pt1
115
All dimensions in mm, not to scale
pt2
175
Instrumentation
 Rotameter -Inlet volume flow rate
 Manometer -Mean pressure
upstream
 Entran EPE-54 miniature pressure
transducers, diameter of 2.36 mm,
range 0 to 14kPa -Time-varying
pressure at the duct wall for up to 4
locations.
 Shutter driving signal - shutter
position
 All time-histories are captured by a
simultaneous-sampling ADC
connected to a PC with a sampling
frequency of 8928 Hz.
Experimental
measurements – static case
• Glottal widths of 0,1,2,3 mm
• Excitation provided by speaker at duct
outlet – tonal discrete frequencies
between 300 Hz and
2 kHz
• Speaker modified duct boundary
condition at “lips” so it was closer to a
closed end condition. Impedance here
was held constant throughout the
measurements
Experimental
measurements – static case
• 2 pressure transducers between “glottis”
and “lips”
•
Pressure transducer separation 80 mm
• Standing wave component pressure
amplitudes extracted as specified by
K R Holland & POAL Davies
(The measurement of sound power flux in flow ducts. Journal of
Sound and Vibration 230 (2000) 915 - 932 )
• Transfer function from “glottis” to
“lips” obtained.
dB
Transfer function from
glottis to lips – measured &
theoretical - static
dB
Transfer function from
glottis to lips – measured &
theoretical - static
dB
Transfer function from
glottis to lips – measured &
theoretical - static
dB
Transfer function from
glottis to lips – measured &
theoretical - static
Glottal
width
1 mm
Flanagan
model,
Zl 
0.9 ρc
At
Z l  0.2π
MSE
4.00
2 mm Z 
l
0.9 ρc
At
Z l  0.3π
MSE
3 mm
7.95
Zl 
1.2 ρc
At
Z l  0.5π
MSE
12.44
Flanagan
factor of
6/5
Zl 
0.8 ρc
At
Z l  0.1π
3.35
Zl 
0.6 ρc
At
Z l  0.5π
7.26
Zl 
1.2 ρc
At
Z l  0.5π
12.76
L&K
model
Zl 
19.98ρc
At
R&S
model
Zl 
0.7 ρc
At
Z l  0.5π
Z l  0.00π
10.77
2.33
Zl 
19.98ρc
0.4 ρc
Zl 
At
At
Z l  0.5π
16.81
Zl 
19.98ρc
At
Zl  0.1π
5.86
Zl 
0.3 ρc
At
Z l  0.1π
Z l  0.5π
25.50
8.99
0 mm
dB
1 mm
2 mm
3 mm
Static case - Summary
• F1 & F2 increased with increasing glottal width
Predicted values of F1 (799 Hz, 854 Hz, 882
Hz, 896 Hz) match well to measurements
• Increase in F1 between closed glottis and 1 mm
wide glottis is ~6%
• Increase in F1 between closed glottis and 3 mm
wide glottis is ~13%
• Increase in F1 larger than found by previous
researchers, perhaps due to using greater glottal
widths
Dynamic Experimental
measurements
• How do our measurements for the
static case transfer to a model
excited by a vibrating larynx?
• What is the dependence of F1 on
the open quotient?
• What is the dependence of F1 on
the glottal amplitude?
Experimental
measurements – dynamic
• Moving shutters 10 – 40 Hz
square wave excitation
• OQ: 20, 40, 60, 80 %
• Glottal width: 0.25 mm to
4 mm
Glottal amplitude
Peak glottal width versus
OQ for all f0
20
40
60
Open quotient
80
Pressure time history at
p1 in the duct
Pressure (Pa)
closure
opening
Time (s)
Experimental
measurements – dynamic
• F1 frequency found from AR
spectral estimation. AR analysis
uses whole glottal cycle to ensure
STI effects included in analysis
• AR analysis uses the Yule-Walker
algorithm with an order of
ceil((Fs/1000)+2) = 11
Experimental
measurements – dynamic
• F1 peak defined as maximum
value of spectrum between
200 Hz and 1 kHz
• Data set rejected if no peak visible
in this range hence small data set
for OQ = 80%
AR analysis
F1 (Hz)
Frequency of F1 for
changing glottal width and
OQ
Glottal width (mm)
Summary – dynamic
measurements
• F1 increases with increasing
glottal width for fixed OQ
• F1 increases with increasing OQ
for fixed glottal width – at least at
small glottal widths
• Observed values of F1 much
higher than normally predicted for
open-closed tube of the same
length or expected for real speech.
Theoretical model –
dynamic
• Simulink model
• Model adapted from one created
by Nicolas Montgermont and
Benoit Fabre, LAM for
investigating the flute
Glottal excitation
Switchable
Duct model
glottal impedance
Simulink model of dynamic case
Pressure time history at
P1 - simulated
closed
open
F1 values for dynamic
simulation
Simulation - summary
• The simulation does show a
change in the formant frequency
as OQ changes
• The increase in F1 is much
smaller than observed in the
dynamic model experiments
• The dynamic model has much
greater damping, especially
during closure, than the
simulation
Future work
• To make a theoretical model of
the formant shift in the dynamic
case that matches the
measurements more closely
• To make similar measurements
in real speakers