Transcript Lecture 8

ECE 598: The Speech Chain
Lecture 8: Formant Transitions;
Vocal Tract Transfer Function
Today

Perturbation Theory:



A different way to estimate vocal tract resonant
frequencies, useful for consonant transitions
Syllable-Final Consonants: Formant Transitions
Vocal Tract Transfer Function


Uniform Tube (Quarter-Wave Resonator)
During Vowels: All-Pole Spectrum



Q
Bandwidth
Nasal Vowels: Sum of two transfer functions
gives spectral zeros
Topic #1:
Perturbation Theory
Perturbation Theory
(Chiba and Kajiyama, The Vowel, 1940)
A(x) is constant everywhere, except for one small perturbation.
Method:
1. Compute formants of the “unperturbed” vocal tract.
2. Perturb the formant frequencies to match the area
perturbation.
Conservation of Energy Under
Perturbation
Conservation of Energy Under
Perturbation
“Sensitivity” Functions
Sensitivity Functions for the QuarterWave Resonator (Lips Open)
0
x
L
• Note: low F3 of /er/ is caused
in part by a side branch under
the tongue – perturbation alone
is not enough to explain it.
/AA/
/ER/ /IY/
/W/
Sensitivity Functions for the HalfWave Resonator (Lips Rounded)
0
x
L
• Note: high F3 of /l/ is caused
in part by a side branch above
the tongue – perturbation alone
is not enough to explain it.
/L,OW/
/UW/
Formant Frequencies of Vowels
From Peterson & Barney, 1952
Topic #2:
Formant Transitions,
Syllable-Final Consonant
Events in the Closure of a Nasal
Consonant
Formant Transitions
Vowel Nasalization
Nasal Murmur
Formant Transitions: A Perturbation
Theory Model
“the mom”
Formant
Transitions:
Labial
Consonants
“the bug”
Formant
Transitions:
Alveolar
Consonants
“the supper”
“the tug”
Formant
Transitions:
Post-alveolar
Consonants
“the shoe”
“the zsazsa”
“the gut”
Formant
Transitions:
Velar Consonants
“sing a song”
Topic #3:
Vocal Tract Transfer
Functions
Transfer Function


“Transfer Function” T(w)=Output(w)/Input(w)
In speech, it’s convenient to write
T(w)=UL(w)/UG(w)




UL(w) = volume velocity at the lips
UG(w) = volume velocity at the glottis
T(0) = 1
Speech recorded at a microphone = pressure


PR(w) = R(w)T(w)UG(w)
R(w) = jrf/r = “radiation characteristic”



r = density of air
r = distance to the microphone
f = frequency in Hertz
Transfer Function of an Ideal
Uniform Tube

Ideal Terminations:



Reflection coefficient at glottis: zero velocity, g=1
Reflection coefficient at lips: zero pressure, g=-1
Obviously, this is an approximation, but it gives…
T(w) = 1/cos(wL/c)
w12w22w32…
= …(w+w3)(w+w2)(w+w1)(w-w1)(w-w2)(w-w3)…
wn = npc/L – pc/2L
Fn = nc/2L – c/4L
Transfer Function of an Ideal Uniform Tube
Peaks are
actually
infinite in
height
(figure is
clipped to
fit the
display)
Transfer Function of a Non-Ideal
Uniform Tube

Almost ideal terminations:


At glottis: velocity almost zero, g≈1
At lips: pressure almost zero, g≈-1
T(w) = 1/(j/Q +cos(wL/c))
… at Fn=nc/2L – c/4L,…
T(2pFn) = -jQ
20log10|T(2pFn)| = 20log10Q
Transfer Function of a Non-Ideal Uniform Tube
Transfer Function of a Vowel:
Height of First Peak is Q1=F1/B1
T(w) =
∞
(2pFn)2+(pBn)2
P (jw+j2pFn+pBn)(jw-j2pFn+pBn)
n=1
T(2pF1) ≈ (2pF1)2/(j4pF1pB1)
= -jF1/B1
Call Qn = Fn/Bn
T(2pF1) ≈ -jQ1
20log10|T(2pF1)| ≈ 20log10Q1
Transfer Function of a Vowel:
Bandwidth of a Peak is Bn
∞
T(w) =
P
n=1
(2pFn)2+(pBn)2
(jw+j2pFn+pBn)(jw-j2pFn+pBn)
T(2pF1+pB1) ≈ (2pF1)2/((j4pF1)(pB1+pB1))
= -jQ1/2
At f=F1+0.5Bn,
|T(w)|=0.5Qn
20log10|T(w)| = 20log10Q1 – 3dB
Amplitudes of Higher Formants:
Include the Rolloff
(2pFn)2+(pBn)2
∞
T(w) =
P (jw+j2pF +pB )(jw-j2pF +pB )
n=1
n
n
n
n
At f above F1
T(2pf) ≈ (F1/f)
T(2pF2) ≈ (-jF2/B2)(F1/F2)
20log10|T(2pF2)|
≈ 20log10Q2 – 20log10(F2/F1)
1/f Rolloff: 6 dB per octave (per doubling of frequency)
Vowel Transfer Function: Synthetic Example
L1 = 20log10(500/80)=16dB
L2 = 20log10(1500/240) – 20log10(F2/F1)
= 16dB – 9.5dB
B2 = 240Hz
B1 = 80Hz
L3 = 20log10(2500/600)
– 20log10(F3/F1)
– 20log10(F3/F2)
B3 = 600Hz?
(hard to measure
because rolloff from
F1, F2 turns the F3
peak into a plateau)
F4 peak completely
swamped by rolloff
from lower
formants
Shorthand Notation for the
Spectrum of a Vowel
T(s) =
P
n=1
∞
snsn*
(s-sn)(s-sn*)
s = jw
sn = -pBn+j2pFn
sn* = -pBn-j2pFn
snsn* = |sn|2 = (2pFn)2+(pBn)2
T(0) = 1
20log10|T(0)| = 0dB
Another Shorthand Notation for the
Spectrum of a Vowel
T(s) =
∞
1
P (1-s/sn)(1-s/sn*)
n=1
Topic #4:
Nasalized Vowels
Vowel Nasalization
Nasalized Vowel
Nasal Consonant
Nasalized Vowel
PR(w) = R(w)(UL(w)+UN(w))
UN(w) = Volume Velocity from Nostrils
PR(w) = R(w)(TL(w)+TN(w))UG(w)
= R(w)T(w)UG(w)
T(w) = TL(w) + TN(w)
Nasalized Vowel
T(s) = TL(s)+TN(s)
1
= (1-s/sLn)(1-s/sLn*)
+
1
(1-s/sNn)(1-s/sNn*)
2(1-s/sZn)(1-s/sZn*)
= (1-s/sLn)(1-s/sLn*)(1-s/sNn)(1-s/sNn*)
1/sZn = ½(1/sLn+1/sNn)
sZn = nth spectral zero
T(s) = 0 if s=sZn
The “Pole-Zero Pair”
20log10T(w) =
20log10(1/(1-s/sLn)(1-s/sLn*))
+ 20log10((1-s/sZn)(1-s/sZn*)/(1-s/sNn)(1-s/sNn*))
= original vowel log spectrum
+ log spectrum of a pole-zero pair
Additive Terms in the Log Spectrum
Transfer Function of a Nasalized
Vowel
Pole-Zero Pairs in the Spectrogram
Nasal Pole
Zero
Oral Pole
Summary

Perturbation Theory:



Formant Transitions




Labial closure: loci near 250, 1000, 2000 Hz
Alveolar closure: loci near 250, 1700, 3000 Hz
Velar closure: F2 and F3 come together (“velar pinch”)
Vocal Tract Transfer Function





Squeeze near a velocity peak: formant goes down
Squeeze near a pressure peak: formant goes up
T(s) = P snsn*/(s-sn)(s-sn*)
T(w=2pFn) = Qn = Fn/Bn
3dB bandwidth = Bn Hertz
T(0) = 1
Nasal Vowels:


Sum of two transfer functions gives a spectral zero between the
oral and nasal poles
Pole-zero pair is a local perturbation of the spectrum