Transcript Document

ECE 8443 – Pattern Recognition

LECTURE 09:

RECURSIVE LEAST SQUARES

Objectives:

Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted RLS Comparison to LMS

Resources:

Wiki: Recursive Least Squares Wiki: Newton’s Method IT: Recursive Least Squares YE: Kernel-Based RLS

• URL:

.../publications/courses/ece_8423/lectures/current/lecture_09.ppt

• MP3:

.../publications/courses/ece_8423/lectures/current/lecture_09.mp3

Newton’s Method

The main challenge with the steepest descent approach of the LMS algorithm is its slow and non-uniform convergence.

Another concern is the use of a single, instantaneous point estimate of the gradient (and we discussed an alternative block estimation approach).

• •

We can derive a more powerful iterative approach that uses all previous data and is based on Newton’s method for finding the zeroes of a function.

Consider a function having a single zero:

f

(

x

)  0 at

x

x

* • Start with an initial guess, x 0

.

The next estimate is obtained by projecting the tangent to the curve to

where it crosses the x-axis:  

f

 (

x

0 ) 

f x

0

x

 0

x

1 

x

1 

x

0 

f f

 (   0

x

0 ) • The next estimate is formed as x 1

, and general iterative formula is:

x n

 1 

x n

f f

 (  

n x n

)

ECE 8423: Lecture 09, Slide 1

Application to Adaptive Filtering

To apply this to the problem of least-squares minimization, we must find the zero of the gradient of the mean-squared error.

• •

Since the mean-squared error is a quadratic function, the gradient is linear, and hence convergence takes place in a single step.

J

E

e

2 (

n

)

e

(

n

)    

d d

(

n

)  f

n t

(

n

)  f

n t

x x

n

 2   

d

2  2 f

n t

g

n

 f

n t

Rf

n

• • •

We find the optimal solution by equating the gradient of the error to zero:

J

 

J

 f

n

 2 Rf

n

 2 g  0

and the optimum solution is:

f

*

 R

-

1 g

We can demonstrate that the Newton algorithm is given by:

f

n

 1  f

n

 1 2 R

-

1 

J

 f

n

by substituting our expression for the gradient:

f

n

 1  f

n

 1 2 R

-

1  2 Rf

n

 2 g   f

n

 1  R

-

1 g  f

*

Note that this still requires an estimate of the autocorrelation and derivative.

ECE 8423: Lecture 09, Slide 2

Estimating the Gradient

In practice, we can use an estimate of the gradient, , as we did for

n

the LMS gradient. The update equation becomes:

• f

n

 1  f

n

 1 R

-

1 (  2

e

(

n

) x

n

)  f

n

 R

-

1

e

(

n

) x

n

2

The noisy estimate of the gradient will produce excess mean-squared error. To combat this, we can introduce an adaptation constant:

f

n

 1  f

n

  R

-

1

e

(

n

) x

n

where 0    1

Of course, convergence no longer occurs in one step, and we are somewhat back to where we started with the iterative LMS algorithm, and we have to worry about estimation the autocorrelation matrix.

To compare this solution to the LMS algorithm, we can rewrite the update equation in terms of the error signal:

f

n

 1  f

n

  R

-

1  (

n

) 

t n

x

n

 x

n

 ( I   R

-

1 x

n

x

t n

) f

n

  R

-

1

d

(

n

) x

n

• •

Taking the expectation of both sides, and invoking independence:

E n

 1  ( 1   )

E

 

n

  R

-

1 g where g 

d(n)

x n

Note that if

 = 1   1

-

1

the previous value and newest estimate.

ECE 8423: Lecture 09, Slide 3

Analysis of Convergence

• •

Once again we can define an error vector:

u

n

 1   1   ) u

n

 where

n

E

 

n

 f

*

The solution to this first-order difference equation is:

u

n

 1   1   

n

u 0 •

We can observe the following:

The algorithm converges in the mean provided:

1    1  or 0    2  

The convergence rate of each coefficient is identical and independent of the eigenvalue spread of the autocorrelation matrix, R.

The last point is a crucial difference between the Newton algorithm and LMS.

We still need to worry about our estimate of the autocorrelation matrix, R:

R n i

, 

l n

  0

x

(

l

i

)

x

(

l

j

) and we assume x (

n

) = 0 function of n: for n < 0

. We can write an update equation as a

f

n

 1  f

n

  R

n -

1

e

(

n

) x

n where

R

n

 

R n

   

l n

  0 x

l

x

l t

ECE 8423: Lecture 09, Slide 4

Estimation of the Autocorrelation Matrix and Its Inverse

The effort to estimate the autocorrelation matrix and its inverse is still considerable. We can easily derive an update equation for the autocorrelation:

R

n

 1  R

n

 x

n

 1 x

t n

 1 •

To reduce the computational complexity of the inverse, we can invoke the matrix inversion lemma:

 A  uu

t

-

1  A

-

1 A

-

1 uu

t

A

-

1 1  u

t

A

-

1 u •

Applying this to the update equation for the autocorrelation function:

R

n

1  1  ( R

n

 x

n

 1 x

t n

 1 )  1  R

n -

1  R 1 

n -

1 x x

t n

 1

n

x  1 R

t n

 1

n -

1 x R

-

1

n n

 1 •

n

 1 R

n

1 x

n

 1 • The computation is proportional to L 2 rather than L 3

for the inverse.

The autocorrelation is never calculated; its estimate is simple updated.

ECE 8423: Lecture 09, Slide 5

Summary of the Overall Algorithm

1) Initialize

f 0 , R  1

-

1 2) Iterate for n = 0, 1, … R

n

1  1  ( R

n

 x

n

 1 x

t n

 1 )  1  R

n -

1  R 1 

n -

1 x x

t n

 1

n

x  1 R

t n

 1

n -

1 x R

-

1

n n

 1

e

(

n

) 

d

(

n

)  f

n t

x

n

f

n

 1  f

n

  R

n -

1

e

(

n

) x

n

There are a few approaches to the initialization in step (1). The most straightforward thing to do is:

R

-

 1 1   2 I

where the

 2

is chosen to be a small positive constant (and can often be estimated based on a priori knowledge of the environment).

This approach has been superseded by recursive-in-time least squares solutions, which we will study next.

ECE 8423: Lecture 09, Slide 6

Recursive Least Squares (RLS)

• •

Consider minimization of a finite duration version of the error:

J

l n

  0

e

2 (

l

) 

l n

  0 

d

(

l

) 

y

(

l

)  2

The objective of the RLS algorithm is to maintain a solution which is optimal at each iteration. Differentiation of the error leads to the normal equation:

R

n

f

n

 g

n R n

(

i

,

j

) 

l n

  0

x

(

l

i

)

x

(

l

j

) •

g

(

i

) 

l n

  0

d

(

l

)

x

(

l

i

)

Note that we can now write recursive-in-time equations for R and g:

R g

n n

  R

n

 1 g

n

 1  

d

x

n

x

t n

(

n

) x

n

We seek solutions of the form:

f

n

f

n

 1  R

n -

1 g

n

 R

n

1  1 g

n

 1 •

We can apply the matrix inversion lemma for computation of the inverse: ECE 8423: Lecture 09, Slide 7

Recursive Least Squares (Cont.)

f

n

 1  [ R

-

1

n

 R 1 

n -

1 x x

n

 1 x

t n

 1 R

t n

 1 R

-

1

n

x

-

1

n n

 1 ][ g

n

d

(

n

 1 ) x

n

 1 ] • •

-

1

Define an intermediate vector variable, :

f

n

 1 f

n

 1  f

n

 f

n

  1 zx    x

t n

x  1 f

t n

 1

n

z

t n

 1 f

n

 1  

d

(

n

 1 ) z 

d

(

n

1   1 ) x zx

t n

 1 z

t n

 1 z

d k

(

n

 1 )   z  f

n

x t n  1 z    x

t n

 1 f

n

 1 

d k

(

n

 1 )   R

n -

1 x

n

 1 •

Define the a priori error as:

e

(

n

 1 /

n

) 

d

(

n

 1 )  f

n t

x

n

 1

reflecting that this is the error obtained using the old filter and the new data.

Using this definition, we can rewrite the RLS algorithm update equation as:

f

n

 1  f

n

 1  1 x

t n

 1 R

n -

1 x

n

 1

e

(

n

 1 /

n

) R

n -

1 x

n

 1

ECE 8423: Lecture 09, Slide 8

Summary of the RLS Algorithm

1) Initialize

f  1 , R  1

-

1 2) Iterate for n = 0, 1, …

e

(

n

 1 /

n

) 

d

(

n

 1 )  f

n t

x

n

 1

α(n)

 1  x 1

t n

 1 R

n -

1 x

n

 1 f

n

 1  f

n

  (

n

)

e

(

n

 1 /

n

) R

n -

1 x

n

 1 R

-

1

n

 1  R

n -

1   (

n

) R

n -

1 x

n

 1 x

t n

 1 R

-

1

n

Compare this to the Newton method:

R

n

1  1  ( R

n

 x

n

 1 x

t n

 1 )  1  R

n -

1  R 1 

n -

1 x x

t n

 1

n

x  1 R

t n

 1

n -

1 x R

-

1

n n

 1

e

(

n

) 

d

(

n

)  f

n t

x

n

f

n

 1  f

n

  R

n -

1

e

(

n

) x

n

The RLS algorithm can be expected to converge more quickly because the use of an aggressive, adaptive step size.

ECE 8423: Lecture 09, Slide 9

Exponentially-Weighted RLS Algorithm

We can define a weighted error function:

n

~

J n

 

n

l e

2 (

l

)

l

  0

This gives more weight to the most recent errors.

The RLS algorithm can be modified in this case: 1) Initialize

f  1 , R  1

-

1 2) Iterate for n = 1, 2, …

e

(

n

/

n

 1 ) 

d

(

n

 1 )  f

n t

x

n

 1

α(n)

 f

n

  x 1

t n

R

-

1

n

 1 x

n

 f

n

 1   (

n

)

e

(

n

/

n

 1 ) R

-

1

n

 1 x

n

R

n -

1  1   R

-

1

n

 1   (

n

) R

-

1

n

 1 x

n

 1 x

t n

 1 R

-

1

n

 1  • RLS is computationally more complex than simple LMS because it is O (

L

2 )

.

In principle, convergence is independent of the eigenvalue structure of the signal due to the premultiplication by the inverse of the autocorrelation matrix.

ECE 8423: Lecture 09, Slide 10

Example: LMS and RLS Comparison

An IID sequence, x (

n

)

, is input to a filter:

H

(

z

)  1  0 .

5

z

 1 •

Measurement noise was assumed to be zero-mean Gaussian noise with unit variance, and a gain such that the SNR was

40 dB

.

The norm of the coefficient error vector is plotted in the top figure for

1000

trials.

The filter length, L, was set to 8

; the LMS adaptation constant,

, was set to

0.05

.

The adaptation step-size was set to the largest value for which the LMS algorithm would give stable results, and yet the RLS algorithm still outperforms LMS.

The lower figure corresponds to the same analysis with an input sequence:

x

(

n

) 

w

(

n

)  0 .

8

w

(

n

 1 )

Why is performance in this case degraded?

ECE 8423: Lecture 09, Slide 11

Summary

Introduced Newton’s method as an alternative to simple LMS.

Derived the update equations for this approach.

Introduced the Recursive Least Squares (RLS) approach and an exponentially-weighted version of RLS.

Briefly discussed convergence and computational complexity.

Next: IIR adaptive filters.

ECE 8423: Lecture 09, Slide 12