Scalable Video Coding Update

Download Report

Transcript Scalable Video Coding Update

Linear Predictive Coding
for Speech Compression
Dev Ghosh
ECE 463
9 March 2006
Overview





General Model for Speech Synthesis
Channel Vocoder
Linear Predictive Coder (LPC-10)
Code Excited Linear Prediction (CELP)
Novel Application

Sub-band adaptive filtering based on
cochlear model
Model for Speech Synthesis

Speech produced by forcing air through vocal
cords, larynx, pharynx, mouth and nose
Excitation
Source

Vocal tract
filter
Speech
At transmitter speech is divided into
segments

Each segment analyzed to determine excitation
signal and parameters of vocal tract filter
Channel Vocoder - analysis



Each segment of input speech analyzed by a
bank of (bandpass) analysis filters
Energy at output of each filter is estimated 50
times a second and transmitted to receiver
Decision made whether segment



voiced /a/, /e/, /o/ or
unvoiced /s/, /f/
Estimate of pitch period (period of
fundamental harmonic) is determined
Voice vs. Unvoiced Speech
Channel vocoder - synthesis

Vocal tract filter implemented by bank of
(bandpass) synthesis filters





For voiced segments, periodic pulse generator is
input
For unvoiced segments, pseudonoise source is
input
Period determined by pitch estimate
Scaled by output of energy estimate
First approach to speech compression
Linear Predictive Coder




Models vocal tract as a single linear
filter
yn = ∑aiyn-i+Gn
Output: yn, Input: n, Gain: G
Input is random noise (unvoiced) or
periodic pulse (voiced)
LPC-10 is a standard (2.4 kb, 8000
Samples/sec)
LPC - Voiced/Unvoiced Decision




Voiced speech has more energy and lower
frequency than unvoiced
Speech segment lowpass filtered, energy at
output relative to background noise used to
determine
Zero-crossings counted to determine
frequency
Continuity critereon: voicing decision of
neighboring frames taken into account
LPC - Estimating Pitch Period


Extracting pitch from short noisy
segment is difficult
One approach is to maximize
autocorrelation


Periodicity isn’t strong enough
Threshold can’t be used because maximum
value not known in advance
LPC - Estimating Pitch Period




LPC-10 uses average magnitude difference
function (AMDF)
AMDF(P) =(1/N)∑|yi-yi-P|
If {yn} is periodic with period P0, samples P0
apart will have values close to each other and
AMDF will have a min at P0
AMDF is periodic for voiced and roughly flat
for unvoiced
AMDF is min when P is the pitch period and
spurious min in unvoiced segments are
shallow
LPC - Obtaining Vocal Tract Filter



At transmitter, we want filter coeffs that
best match the segment in a mean
squared error
en2=(yn- ∑aiyn-i+Gn)2
Autocorrelation approach assumes {yn}
is stationary
A = R-1P
Recursive solution uses Levinson-Durbin
LPC - Obtaining the Vocal Tract Filter

Covariance approach discards
stationarity assumption (not valid for
speech signals)
cij =E[yn-iyn-j]
yields
CA = S
LPC - Obtaining the Vocal Tract Filter




cij are estimated as
cij = ∑yn-iyn-j
No longer assume values of yn outside
of segment are zero
Cholesky decomposition required
Reflection coeffs used to update voicing
decision
LPC - Transmitting Parameters


Tenth order filter used for voiced
speech and fourth order for unvoiced
Vocal tract filter is sensitive to errors in
reflection coeffs close to one
gi = (1+ki)/(1-ki)
are quantized and sent instead of ki
Code Excited Linear Prediction



Single pulse per pitch period leads to
buzzy twang
Variety of excitation signals is allowed
For each segment encoder finds
excitation vector that generates
synthesized speech that best matches
speech being coded
Sub-band adaptive filtering


Multi-channel speech enhancement system
Greater number of sub-bands used, the faster
the convergence of the overall system
Cochlear Modelling

Sub-band filters are distributed
logarithmically in frequency to
approximate distribution of filters in
cochlea
Adaptive Noise Cancellation




LMS algorithm is used to model differential
transfer function between noise signals in a
number of sub-bands
Lower power and shorter filters used in each
sub-band
Convergence is equal across all bands if
power is distributed equally and filter lengths
are the same
Convergence dominated by sub-band with
greatest power