Speech Coding
Download
Report
Transcript Speech Coding
Speech Coding
Using LPC
What is Speech Coding
Speech coding is the procedure of transforming
speech signal into more compact form for
Transmission
Available Bandwidth
Encryption
Uncompressed Speech signal
Analog speech is a bandpassed signal between
200 and 3400 Hz.
Uncompressed digital speech is a bit stream at
64kB/s.
Transmission technology must
transmit the signals from point A to point B:
with minimum degradation
using minimum bandwidth
Speech coding
By coding we mean an efficient representation of
the signal
– COMPRESSION
The main approaches:
waveform coding
smart quantizers
transform coding
Parametric / hybrid coding
}
How each of these works:
Waveform coders:
try to find an efficient
representation of the
waveform, directly.
Transform coders:
try to find an efficient
representation in the
frequency domain.
FFT, etc.
Parametric coders:
try to find a small set of
parameters that are an
efficient representation of
the signal.
exc.
H ( )
speech
Comparison of speech coders
LPC (Linear Predictive coding)
LPC is a model for signal production: it is based
on the assumption that the speech signal is
produced by a very specific model.
Speech Production in Humans
The speech signal is
created by:
A pressure source (lungs),
exciting ...
A Filter (Vocal tract:
pharynx - mouth [soft
palate, tongue] - nasal
cavity)
For DSP Engineer
An excitation source
A time varying filter
filter:
Excitation
speech
H(t, )
The model and its representation
The LPC model looks at
speech as:
Excitation:
periodic (voiced) originating in the
larynx
noise (unvoiced) fricative, produced in
the mouth
An all-pole filter
representing the vocal
tract
.. ..
all pole
filter:
H()
Block Diagram
Why the name
“Linear Predictive Coding”
It is assumed that the new sample is the
weighted linear combination of previous
samples
p
s (n) a s (n i ) Ge(n)
i
i 1
Z-Plane Representation
In the z-plane we can write the model as a transfer
function:
H(z)
G
p
1 ai z i
i 1
• Clearly this transfer function has only poles which is why it represents an all pole filter.
Mathematical analysis
Reminder: our problem is to find the LPC
parameters, for a given speech signal. This is
called the Inverse Problem.
How do we find the set of parameters that
gives the best match to the signal?
What are these Parameters
The Coefficients of the All Pole Filter
Pitch of the speech
How do we find the Coefficients:
least squares
Formulation:
Given a signal s(n);
Defining an error as:
Find the set of
square error:
ai
e(n) s(n) ai s(n i)
i 1
that will minize the mean
E e2 ( n)
n
p
Solution:
Simply equate the derivative of E to zero:
E
0, i 1... p
ai
• Which gives us the Normal Equations:
p
a s(n k )s(n i) s(n)s(n i), i 1...p
k 1
k
n
n
• These are no more than p linear equations in
p unknowns...
Or in matricial form:
s(n 1) s(n 1)
n
s(n 2) s(n 1)
n
s(n p) s(n 1)
n
s(n 1)s(n 2)
s(n 2)s(n 2)
n
n
n
s(n p)s(n 2)
n
s(n 1)s(n p) a s(n 1)s(n)
s(n 2)s(n p) a s(n 2)s(n)
n
1
2
a
n s(n p)s(n p) p
n
n
s
(
n
p
)
s
(
n
)
n
What is each element of the form-
s(n k )s(n i)?
n
A correlation; in other words:
take the signal, multiply it by a shifted version, and sum.
Since our signal is long and time varying- we did it on
short windows
Two variants:
autocorrelation method
covariance method
Solving the Matrix
Found the Coefficients a(i) by Using the
Levinson-Durbin recursion method
Second Parameter
Pitch was found by the finding the
correlation of the signal window with
itself
Then these parameters were transmitted
Predictor coefficients
Gain
Pitch period
Voiced/unvoiced
switch
Total
Overall bit rate
18 * 8 = 144
5
6
1
156
50 * 156 = 7800
bits / second
Bit rate for plain LPC vocoder
Predictor
coefficients
Gain
DCT
coefficients
Total
Overall bit
rate
18 * 8 = 144
5
40 * 4 = 160
309
50 * 309 =
15450 bits /
second
Bit rate for voice-excited LPC vocoder with DCT
Conclusion
Sound produced through LPC method is
not exactly the real sound but it sounds
intelligibly understandable
LPC can be used in Speech recognition
systems
LPC was widely used in Military because
of low bit rate in transmission
There are many variants over the basic
scheme: LPC-10, CELP, MELP, RELP,
VSELP, ASELP, LD-CELP...