Transcript Document

ECE8423
8443––Adaptive
Pattern Recognition
ECE
Signal Processing
LECTURE 08: LMS VARIANTS
• Objectives:
Algorithm Taxonomy
Normalized LMS
Variable Adaptation
Leaky LMS
Sign Algorithms
Smoothing
Block Algorithms
Volterra Filter
• Resources:
DJ: Family of LMS Algorithms
MATLAB: Leaky LMS
MATLAB: Block LMS
NCTU: Block LMS
• URL: .../publications/courses/ece_8423/lectures/current/lecture_08.ppt
• MP3: .../publications/courses/ece_8423/lectures/current/lecture_08.mp3
Algorithm Taxonomy
• Thus far we have focused on:
Adaptive algorithm: LMS
Criterion: Least-squares
Iterative: Steepest descent
Implementation: Tapped delay line
and this doesn’t even include
statistical methods for feature and
model adaptation.
• There are many alternative optimization
criteria (including ML methods). For example:


J ( N )  E e 2 N (n)
N 1
e(n)  d (n)  f nt x n
e(n)  2Ne 2 N 1 (n)x n
f n 1  f n 

2
(e 2 N (n))
f n1  f n  Ne 2 N 1 (n)x n
ECE 8423: Lecture 08, Slide 1
Least Mean Fourth (LMF) Algorithm: N=2
Normalized LMS
• The stability, convergence and steady-state properties of the LMS algorithm
are directly influenced by the length of the filter and the power of the input
signal. We can normalize the adaptation constant with respect to these:


 

L  Ex 2 (n) TP
• We can estimate the power using a time average:
L 1
TˆP   x 2 (n  j )  x tn x n
j 0
• The update equations become:
e(n)x
f n1  f n   t n
xnx n
• We can add a small positive constant to avoid division by zero:
e(n) x n
f n1  f n  
c  x nt x n
• For the homogeneous problem ( d (n)  f *t x n ), it can be shown that:
lime( n)  0
n
lim v n  f n  f *  
n 
• For zero-mean Gaussian inputs, it can be shown to converge in the mean.
ECE 8423: Lecture 08, Slide 2
Variable Adaptation Rate
• Another variation of the LMS algorithm is to use an adaptation constant that
varies as a function of time:
1
 ( n) 
nc
• This problem has been studied extensively in the context of neural networks.
Adaptation can be implemented using a constant proportional to the
“velocity” or “acceleration” of the error signal.
• One extension of this approach is the Variable Step algorithm:
f n1  f n  e(n)Mn x n
0

0 
 0 (n)
 0


(
n
)
0

1

Mn  
 
 
 


   L 1 (n)
 0
This is a generalization in that each filter coefficient has its own adaptation
constant. Also, the adaptation constant is a function of time.
• The step sizes are determined heuristically, such as examining the sign of the
instantaneous gradient:  2e(n) x(n  i)
ECE 8423: Lecture 08, Slide 3
The Leaky LMS Algorithm
• Consider an LMS update modified to include a constant “leakage” factor:
f n1  f n  e(n)x n 0    1
• Under what conditions would it be advantageous to employ such a factor?
• What are the drawbacks of this approach?
• Substituting our expression for e(n) in terms of d(n) and rearranging terms:


f n1  I  x n x tn f n  d(n)x n
• Assuming independence of the filter and the data vectors:
Ef n1   I  R Ef n   g
(1   ) 

 I -  [R 
I] Ef n   g



• It can be shown:
1
(1   ) 

lim Ef n1    R 
I g
n
 

• There is a bias from the LMS solution,
R-1g, that demonstrates leakage is a
compromise between bias and
coefficient protection.
ECE 8423: Lecture 08, Slide 4
Sign Algorithms
• To decrease the computational complexity, it is also possible to update the
filter coefficients using just the sign of the error:
 Pilot LMS, or Signed Error, or Sign Algorithm:
f n1  f n   sgn[e(n)]x n
 Clipped LMS or Signed Regressor:
f n1  f n  e(n) sgn[x n ]
 Zero-Forcing LMS or Sign-Sign:
f n1  f n   sgn[e(n)]sgn[x n ]
 The algorithms are useful in applications where very high-speed data is being
processed, such as communications equipment.
 We can also derive these algorithms using a modified cost function:
J  Ee(n) 
 The Sign-Sign algorithm is used in the CCITT ADPCM standard.
ECE 8423: Lecture 08, Slide 5
Smoothing of LMS Gradient Estimates
• The gradient descent approach using an instantaneous sampling of the error
signal is a noisy estimate. It is a tradeoff between computational simplicity
and performance.
• We can define an Averaged LMS algorithm:
 n
f n1  f n 
e( j )x j

N j n N 1
• A more general form of this uses a low-pass filter:
f n1  f n   b n
bn (i)  LPF{e(n) x(n  i), e(n) x(n  i  1),...}
The filter can be FIR or IIR.
• A popular variant of this approach is known as the Momentum LMS filter:
f n1  f n  (1   )(f n  f n1 )  e(n)x n
• A nonlinear approach, known as the Median LMS (MLMS) Algorithm:
f n1  f n  Mede(n) x(n  i)N
• There are obviously many variants of these types of algorithms involving
heuristics and higher-order approximations to the derivative.
ECE 8423: Lecture 08, Slide 6
The Block LMS (BLMS) Algorithm
• There is no need to update coefficients every sample. We can update
coefficients every N samples and use a block update for the coefficients:
 B N-1
f ( j 1) N  f jN 
 e( jN  i)x jN i
N i 0
where,
y ( jN  i )  f tjN x jN i
e( jN  i )  d ( jN  i )  y ( jN  i )  d ( jN  i )  f tjN x jN i
x jN i  [ x( jN  i ), x( jN  i  1),...,x( jN  i  L  1)]t
• The BLMS algorithm requires (NL+L) multiplications per block compared to
(NL+N) per block of N points for the LMS.
• It is easy to show that the convergence is slowed by a factor of N for the block
algorithm, but on the other hand it uses excessive smoothing.
• The block algorithm produces a different result than the per-sample updates
(the two algorithms do not produce the same result), but the results are
sufficiently close.
• In many real applications, block algorithms are preferred because they fit
better with hardware constraints and significantly reduce computations.
ECE 8423: Lecture 08, Slide 7
The LMS Volterra Filter
• A second-order digital Volterra filter is defined as:
L 1
y(n)   f
L 1 L 1
(1)
i 0
( j ) x(n  j )    f ( 2)  j1 , j 2 x(n  j1 ) x(n  j 2 )
j1 0 j2  j1
where f(1)(j) are the usual linear filter coefficients, and f(2)(j1,j2) are the L(L+1)/2
quadratic coefficients.
• We can define the update equations in a manner completely analogous to the
standard LMS filter:
J  E{e 2 (n)}  E{[d (n)  y(n)]2 }
J
 0  E{e(n) x(n  j )}
0  j  L 1
(1)
f ( j )
J
 0  E{e(n) x(n  j1 ) x(n  j 2 )}
0  j1  j 2  L  1
( 2)
f ( j1 , j 2 )
• Not surprisingly, one form of the solution is:
Rf  g
where,

f  f ( 1 )t , f ( 2 )t

f ( 1 )t  [ f (1) (0), f (1) (1),..., f (1) ( L  1)]t
f ( 2 )t  [ f ( 2) (0,0),..., f ( 2) (0, L  1), f ( 2) (1,1)..., f ( 2) ( L  1, L  1)]t
ECE 8423: Lecture 08, Slide 8
The LMS Volterra Filter (Cont.)
• The correlation matrix, R, is defined as:
R R 3 
R   t2

R 2 R 4 
R 2  E{x (n1) x (n1)t } R 3  E{x (n1) x (n2)t } R 4  E{x (n2 ) x (n2)t }
x (n1)  x(n), x(n  1),...,x(n  L  1)
t

x (n2 )  x 2 (n),...,x(n) x(n  L  1), x 2 (n  1),...,x 2 (n  L  1)

t
• The extended cross-correlation vector is given by:

g  g t2 g t4

where
g 2  E{d (n) x (n1) } g 4  E{d (n) x (n2 ) }
• An adaptive second-order Volterra filter can be derived:
y (n)  f nt x n
f n( 11)  (I  R 2 )f n( 1 )  g 2
f n( 21)  (I  R 4 )f n( 2 )  g 4
• Once the basic extended equations have been established, the filter functions
much like our standard LMS adaptive filter.
ECE 8423: Lecture 08, Slide 9
Summary
• There are many types of adaptive algorithms. The LMS adaptive filter was
have previously discussed is just one implementation.
• The gradient descent algorithm can be approximated in many ways.
• Coefficients can be updated per sample or on a block-by-block basis.
• Coefficients can be smoothed using a block-based expectation.
• Filters can exploit higher-order statistical information in the signal
(e.g., the Volterra filter).
ECE 8423: Lecture 08, Slide 10