Complete Discrete Time Model

Download Report

Transcript Complete Discrete Time Model

Complete Discrete Time Model
Complete model covers periodic, noise and impulsive inputs.
For periodic input
X z   AvG z H z   AvG z V z Rz 
1) R(z): Radiation impedance.
It has been shown that R(z) can be approximated as R(z) = 1 - z -1
 differentiator.
or R(z) = 1 -  z -1
Complete Discrete Time Model
Therefore in continuous time it can be written that
xt   A


d
d

u g t  * vt   A u g t  * vt 
dt
 dt

 R(z) can be moved to the glottis in the previous figure
2) G(z): z-transform of glottal flow input, g[n] over one cycle.
It can be approximated by



g n    n u n *   n u n
1
G z  
1   z 2
If <1  two identical poles outside the unit circle, two zeroes at infinity (maximum phase)
3) V(z): all pole vocal-tract function.
V z  
1
 1  ck z 1 1  ck* z 1 
Ci
k 1
Complete Discrete Time Model
Therefore
X z   Av
1   z 1
1   z 
2
•
•
V(z) and R(z) are minimum phase.
G(z) is maximum phase.
 1  ck z 1 1  ck* z 1 
Ci
k 1
Some related work
Zeros of Z-Transform (ZZT) Decomposition of Speech For Source-Tract Separation
Baris Bozkurt, Boris Doval, Christophe D’Alessandro, Thierry Dutoit
This study proposes a new spectral decomposition method for source-tract separation. It is
based on a new spectral representation called the Zeros of Z-Transfor m (ZZT), which is
an all-zero representation of the z-transform of the signal. We show that separate
patterns exist in ZZT representations of speech signals for the glottal flow and the vocal
tract contributions. The ZZT-decomposition is simply composed of grouping the zeros
into two sets, according to their location in the z-plane. This type of decomposition leads
to separating glottal flow contribution (without a return phase) from vocal tract
contributions in z domain.
Complete Discrete Time Model
A Method For Glottal Formant Frequency Estimation
Baris Bozkurt, Boris Doval, Christophe D’Alessandro, Thierry Dutoit
This study presents a method for estimation of glottal formant frequency (Fg) from speech signals. Our
method is based on zeros of z-transform decomposition of speech spectra into two spectra : glottal
flow dominated spectrum and vocal tract dominated spectrum. Peak picking is performed on the
amplitude spectrum of the glottal flow dominated part. The algorithm is tested on synthetic speech. It
is shown to be effective especially when glottal formantand first formant of vocal tract are not too
close. In addition, tests on a real speech example are also presented where open quotient estimates
from EGG signals are used as reference and correlated with the glottal formant frequency estimates.
Improved Differential Phase Spectrum Processing For Formant Tracking
Baris Bozkurt, Boris Doval, Christophe D’Alessandro, Thierry Dutoit
This study presents an improved version of our previously introduced formant tracking algorithm. The
algorithm is based on processing the negative derivative of the argument of the chirp-z transform
(termed as the differential phase spectrum) of a given speech signal. No modeling is included in the
procedure but only peak picking on differential phase spectrum. We discuss the effects of roots of ztransform to differential phase spectrum and the need to ensure that all Zeros are at some distance
from the circle where chirp-z transform is computed. For that, we include an additional zerodecomposition step in our previously presented algorithm to improve its robustness. The final version
of the algorithm is tested for analysis of synthetic speech and real speech signals and compared to
two other formant tracking systems.
Complete Discrete Time Model
If the differentiation at the otput (radiation impedance) is applied to the glottal flow.
Glottal flow
Glottal flow derivative
Derivative of glottal flow is more like pulse !
Complete Discrete Time Model
NOISE INPUT
IMPULSE INPUT
X z   An U z  H z   Av U z V z  Rz 
X z   Ai H z   Ai V z  Rz 
The combination of three inputs may be linear or nanlinear !
Complete Discrete Time Model
OTHER ZEROS OF THE VOCAL-TRACT
In the noise and impulse source states oral tract constrictions may give zeros as well as
poles (absorption of energy by cavity anti-resonances)
 V(z) may have zeros
1   z 
1

X z   A
1   z 
2
 1  ak z   1  bk z 
Mi
1
k 1
 1  c
k 1
Ci
k 1
Mo
k

z 1 1  ck* z 1

 Vocal tract function is generally mixed phase.
 Maximum phase elements of the vocal tract can also contribute to a more gradual attack
of the speech waveform.
The modeling described here is called the SOURCE-FILTER MODEL of speech production.
Vocal Fold and Vocal Tract Interaction
In the source filter model, it is assumed that glottal input is infinite and glottal airflow is not
influenced by the vocal tract.
However the pressure in the vocal tract cavity above glottis backs up (resists) against the
glottal flow.
Vocal Fold and Vocal Tract Interaction
Electrical analog is shown below
Psg: subglottal (lung) pressure
p(t): sound pressure corresponding to a single first formant in front of glottis.(because it has
been found that other formants have negligible effect on glottal flow.)
Zg(t): time varying impedance of the glottis.
R,L,C: these parameters model first formant with
1
LC
1
B0 
RC
0 
formant frequency (center frequency)
bandwidth (3dB)
Vocal Fold and Vocal Tract Interaction
Zg(t) accounts for the interaction between the glottal flow and vocal tract.
If Zg(t) is comparable to the impedance of 1st formant then there will be considerable
interaction and Ω0, B0 will be affected.
Also Zg(t) has been found to be nonlinear:
k=1.1
A(t) smallest time-varying area of glottal slit.
 k  2
ptg t    2  u g t 
 2 A t 
2 ptg t 
dpt  pt  1 t
C

  p  d  At 
dt
R
L0
k
where
Equations are nonlinear and time-varying.
pt   ptg t   Psg
Vocal Fold and Vocal Tract Interaction
Numericval solution of the above equations reveals that the skewness of glottal flow is due
to in part A(t) and in part to the loading effect of the first formant.
Numerical solution also yielded a ripple component.
Glottal flow derivative
Vocal Fold and Vocal Tract Interaction
The problem can approximately be analyzed by linearizing the differential equation.
2 ptg t 
dpt  pt  1 t
C

  p  d  At 
dt
R
L0
k
 At 
 At 
Taylor series of
1
1 x  1 x
2


2 Psg  pt 
k
2 Psg
k
1
pt 
Psg
if x<<1.
2 Psg 
dpt  pt  1 t
pt  
C

  p  d  At 
1 

dt
R
L0
k   2 Psg 
Vocal Fold and Vocal Tract Interaction
dpt  pt  1 t
1
 C

  p  d  pt g 0 t   u sc t 
dt
R
L0
2
u sc t   At 
where
g 0 t  
2 Psg
k
u sc t 
2
 At 
Psg
k Psg
By differentiation
d 2 pt   1 1
 dpt   1 1






 C


g
t


g
t
pt   u sc t 
0
0
2




dt
R 2
 dt
L 2

Vocal Fold and Vocal Tract Interaction
Corresponding Norton equivalent circuit is
Where
Rg t  
Usc(t) is now time-varying source.
2
g 0 t 
Lg t  
2
g 0 t 
Vocal Fold and Vocal Tract Interaction
Formant Frequency and Bandwidth Modulation
Because we have a linear but time-varying equation, formant frequency and bandwidth are
time-varying i.e. they are modulated  Laplace transform does not apply.
But the equaton can be solved for each time instantas a constant coefficient equation.
H s, t  
P s, t 
s/c

U sc s  s 2  B1 t s  12 t 
 1

12 t    02 1  L g 0 t 
 2

 1

B1 t   B0 1  R g 0 t 
 2

formant
bandwidth
g0(t) is proportional to A(t) (glottal area)
 bandwidth is proportional to glottal area ( B1(t)  B0 since A(t)  0 )
 formant is proportional to the derivative of glottal area ( Ω1(t) may be aboveor below Ω0 )
Glottal area
Bandwidth
Formant factor
Vocal Fold and Vocal Tract Interaction
•
•
In the minimum bandwidth modulation cases ( /i/, /u/ ) B1(t) increases by a factor of 3 to
4.
Multiplier of Ω 1(t)  0.8 ~ 1.2
Conclusions
•
•
The increase of B1(t) within a glottal cycle yields the truncation of glottal flow (sharp
closing of folds.
It is due to a decrease in the impedance at the glottis as glottis opens. Reduced glottal
impedance Zg(t) yields pressure drop accros glottis.
Truncation Effect (Using Klatt Synthesiser)
Truncation Effect