Transcript Document

ECE8423
8443––Adaptive
Pattern Recognition
ECE
Signal Processing
LECTURE 04: LINEAR PREDICTION
• Objectives:
The Linear Prediction Model
The Autocorrelation Method
Levinson and Durbin Recursions
Spectral Modeling
Inverse Filtering and Deconvolution
• Resources:
ECE 4773: Into To DSP
ECE 8463: Fund. Of Speech
WIKI: Minimum Phase
Markel and Gray: Linear Prediction
Deller: DT Processing of Speech
AJR: LP Modeling of Speech
MC: MATLAB Demo
• URL: .../publications/courses/ece_8423/lectures/current/lecture_04.ppt
• MP3: .../publications/courses/ece_8423/lectures/current/lecture_04.mp3
The Linear Prediction (LP) Model
• Consider a pth-order linear prediction model:
xˆ  n  
p
 a x n  n
i
0
 i
x(n)
x(n  n0 )
i 1
Without loss of generality, assume n0 = 0.
• The prediction error is defined as:
e  n   x  n   xˆ  n   x  n  
p
 a x n  i 
i
i 1
• We can define an objective function:
J  E {e
2
 n } 
  n 
 E x
2
  n 
 E x
2

E  x  n   xˆ  n 

 
 E   x  n  
 
p

i 1

a i x  n  i  

2



2
p
  p


 
 E  2 x ( n )  a i x  n  i   E    a i x  n  i   
i 1
  i 1


 
2
  p
 p

 
 2 E   a i x ( n ) x  n  i   E    a i x  n  i   
  i 1
 i 1

 
p
  n   2 
 E x
2
2
i 1
2
  p
 
a i  E x ( n ) x  n  i   E    a i x  n  i   
  i 1
 
ECE 8423: Lecture 04, Slide 1
xˆ ( n )
{a i }
– +
e (n )
Minimization of the Objective Function
• Differentiate w.r.t. al:
2
p
  p
 

 
2

 E x  n   2  a i  E x ( n ) x  n  i   E    a i x  n  i    
a l
a l 
i 1
  i 1
  

J



a l

E x n 
2
p

 







2
a
E
x
(
n
)
x
n

i
 i

 a l  i 1
 a l
2
  p
 

 
 E    a i x  n  i    
  
   i 1
 p

  2 E x ( n ) x  n  l   2 E   a i x  n  i  x ( n  l )  0
 i 1

• Rearranging terms:
 p

E   a i x  n  i  x ( n  l )  E x ( n ) x  n  l 
 i 1

• Interchanging the order of summation and expectation on the left (why?):
p
 a E x n  i  x ( n  l )  E x ( n ) x n  l 
i
i 1
• Define a covariance function:
c ( i , j )  E x  n  i  x ( n  j ) 
ECE 8423: Lecture 04, Slide 2
The Yule-Walker Equations (aka Normal Equations)
• We can rewrite our prediction equation as:
p
p
 a E x n  i  x ( n  l )  E x ( n ) x n  l 
i
 a c (i , l )  c ( 0 , l )

i
i 1
i 1
• This is known as the Yule-Walker equation. Its solution produces what we
refer to as the Covariance Method for linear prediction.
a 1 c (1,1)  a 2 c ( 2 ,1)  ...  a p c ( p ,1)  c ( 0 ,1)
a 1 c (1, 2 )  a 2 c ( 2 , 2 )  ...  a p c ( p , 2 )  c ( 0 , 2 )
...
a 1 c (1, p )  a 2 c ( 2 , p )  ...  a p c ( p , p )  c ( 0 , p )
• We can write this set of p equations in matrix form: Ca  c
and can easily solve for the prediction coefficients:
where:
 a1 
 
a2


a 
  
 
 a p 
 c (1,1)

c ( 2 ,1)
C  
 

 c ( p ,1)
c (1, 2 )

c ( 2,2 )



c ( p ,2 )

c (1, p ) 

c (2, p )

 

c( p, p )
-1
a C c
 c ( 0 ,1) 


c (0,2 )

c  
  


c
(
0
,
p
)


• Note that the covariance matrix is symmetric: c( 1,2 )  c( 2 ,1 )
ECE 8423: Lecture 04, Slide 3
Autocorrelation Method
• C is a covariance matrix, which means it has some special properties:
 Symmetric: under what conditions does its inverse exist?
 Fast Inversion: we can factor this matrix into upper and lower triangular
matrices and derive a fast algorithm for inversion known as the Cholesky
decomposition.
• If we assume stationary inputs, we can convert covariances to correlations:
 a1 
 
a2


a 
  
 
 a p 
 r (0)

r (1)
R  
 

 r ( p  1)
r (1)

r (0)



r ( p  2)

r ( p  1) 

r ( p  2)




r (0) 
 r (1) 


r (2)

r  
  


r
(
p
)


• This is known as the Autocorrelation Method. This matrix is symmetric, but is
also Toeplitz, which means the inverse can be performed efficiently using an
iterative algorithm we will introduce shortly.
• Note that the Covariance Method requires p(p-1)/2 unique values for the
matrix, and p values for the associated vector. A fast algorithm, known as the
Factored Covariance Algorithm, exists to compute C.
• The Autocorrelation method requires p+1 values to produce p LP coefficients.
ECE 8423: Lecture 04, Slide 4
Linear Prediction Error
• Recall our expression for J, the prediction error energy:
J  E {e
2
 n } 

E  x  n   xˆ  n 
2



 E   x  n  


p

i 1

a i x  n  i  

2





We can substitute our expression for the predictor coefficients, and show:
p
J  r (0) 
 a r (i )
i
Autocorrel ation Method
i 1
p
J  c ( 0 ,0 ) 
 a c (0, i)
i
Covariance
Method
i 1
These relations are significant because they show the error obeys the same
linear prediction equation that we applied to the signal. This result has two
interesting implications:
 Missing values of the autocorrelation function can be calculated using this
relation under certain assumptions (e.g., maximum entropy).
 The autocorrelation function shares many properties with the linear
prediction model (e.g., minimum phase). In fact, the two representations are
interchangeable.
ECE 8423: Lecture 04, Slide 5
Linear Filter Interpretation of Linear Prediction
• Recall our expression for the error signal:
e  n   x  n   xˆ  n   x  n  
p
 a x n  i 
i
i 1
• We can rewrite this using the z-Transform:

Z e  n   E ( z )  Z  x  n  

p
 X (z) 

ai z
i
i 1
p

i 1

a i x  n  i 


X ( z )  X ( z ) 1 

p

ai z
i 1
i



• This implies we can view the computation of the error as a filtering process:
p
E ( z )  X ( z ) A( z )
where
A(z)  1 

ai z
i
x(n)
H ( z )  A( z )
e (n )
i 1
• This, of course, implies we can invert the
process and generate the original signal
from the error signal:
e (n )
H ( z )  1 / A( z )
• This rather remarkable view of the process exposes some important
questions about the nature of this filter:
 A(z) is an FIR filter. Under what conditions is it minimum phase?
 Under what conditions is the inverse, 1/A(z), stable?
ECE 8423: Lecture 04, Slide 6
x(n)
Residual Error
• To the right are some examples of the
linear prediction error for voiced
speech signals.
• The points where the prediction error
peaks are points in the signal where the
signal is least predictable by a linear
prediction model. In the case of voiced
speech, this relates to the manner in
which the signal is produced.
• Speech compression and synthesis
systems exploit the linear prediction
model as a first-order attempt to
remove redundancy from the signal.
-1
a C c
• The LP model is independent of the
energy of the input signal. It is also
independent of the phase of the input
signal because the LP filter is a minimum
phase filter.
ECE 8423: Lecture 04, Slide 7
x(n)
e (n )
H ( z )  A( z )
H ( z )  1 / A( z )
e (n )
x(n)
Durbin Recursion
• There are several efficient algorithms to compute the LP coefficients without
doing a matrix inverse. One of the most popular and insightful is known as the
Durbin recursion:
E
(i)
 r (0)
i 1


( i 1 )
( i 1 )
k i   r (i )   a j r (i  j )  / E


j 1


(i)
a j  ki
(i)
aj
E
(i)
( i 1 )
 aj
( i 1 )
 k i a i j
 (1  k i ) E
2
1 i  p
1  j  i 1
( i 1 )
• The intermediate coefficients, {ki}, are referred to as reflection coefficients. To
compute a pth order model, all orders from 1 to p are computed.
• This recursion is significant for several reasons:
 The error energy decreases as the LP order increases, indicating the model
continually improves.
 There is a one-to-one mapping between {ri}, {ki}, and {ai}.
 For the LP filter to be stable, k i  1 . Note that the Autocorrelation Method
guarantees the filter to be stable. The Covariance Method does not.
ECE 8423: Lecture 04, Slide 8
The Burg Algorithm
• Digital filters can be implemented
using many different forms. One very
important and popular form is a
lattice filter, shown to the right.
• Itakura showed the {ki}’s can be
computed directly:
N 1
e
ki 
( i 1 )
( m )b
( i 1 )
( m  1)
m o
N 1
 N 1 ( i 1 )
2
( i 1 )
2 
(
e
(
m
))
(
b
(
m

1
))



m

0
m

0


1/ 2
• Burg demonstrated that the LP approach
can be viewed as a maximum entropy
spectral estimate, and derived an
expression for the reflection coefficients
that guarantees:  1  k i  1 .
N 1
2 e
ki 
( i 1 )
( m )b
( i 1 )
( m  1)
m o
N 1
 (e
m 0
( i 1 )
N 1
( m )) 
2
 (b
( i 1 )
( m  1))
m 0
• Makhoul showed that a family of lattice-based formulations exist.
• Most importantly, the filter coefficients can be updated in real-time in O(n).
ECE 8423: Lecture 04, Slide 9
2
The Autoregressive Model
• Suppose we model our signal as the output
of a linear filter with a white noise input:
w (n )
H ( z )  1 / A( z )
x(n)
• The inverse LP filter can be thought of as an
all-pole (IIR) filter:
H (z) 
1

A( z )
1
1  a1 z
1
 a2 z
2
 ...  a p z
p
• This is referred to as an autoregressive (AR) model.
• If the system is actually a mixed model, referred to as an autoregressive
moving average (ARMA) model:
H (z) 
B(z)
A( z )

1  b1 z
1
 b2 z
2
 ...  b q z
q
1  a1 z
1
 a2 z
2
 ...  a p z
p
• The LP model can still approximate such a system because:
1
1  a1 z
1
 1  (  a1 ) z
1
 (  a1 ) z
2
2
 ...
Hence, even if the system has poles and zeroes, the LP model is capable of
approximating the system’s overall impulse or frequency response.
ECE 8423: Lecture 04, Slide 10
Spectral Matching and Blind Deconvolution
• Recall our expression for the
error energy: E ( i )  (1  k i2 ) E ( i 1)
• The LP filter becomes increasingly
more accurate if you increase the
order of the model.
• We can interpret this as a spectral
matching process, as shown to the
right. As the order increases, the LP
model better models the envelope of
the spectrum of the original signal.
• The LP model attempts to minimize
the error equally across the entire
spectrum.
• If the spectrum of the input signal has a systematic variation, such as a
bandpass filter shape, or a spectral tilt, the LP model will attempt to model
this. Therefore, we typically pre-whiten the signal before LP analysis.
• The process by which the LP filter learns the spectrum of the input signal is
often referred to as blind deconvolution.
ECE 8423: Lecture 04, Slide 11
Summary
• There are many interpretations and motivations for linear prediction ranging
from minimum mean-square error estimation to maximum entropy spectral
estimation.
• There are many implementations of the filter, including the direct form and
the lattice representation.
• There are many representations for the coefficients including predictor and
reflection coefficients.
• The LP approach can be extended to estimate the parameters of most digital
filters, and can also be applied to the problem of digital filter design.
• The filter can be estimated in batch mode using a frame-based analysis, or it
can be updated on a sample basis using a sequential or iterative estimator.
Hence, the LP model is our first adaptive filter. Such a filter can be viewed as
a time-varying digital filter that tracks a signal in real-time.
• Under appropriate Gaussian assumptions, LP analysis can be shown to be a
maximum likelihood estimate of the model parameters.
• Further, two models can be compared using a metric called the log likelihood
ratio. Many other metrics exist to compare such models, including cepstral
and principal components approaches.
ECE 8423: Lecture 04, Slide 12