A Short Introduction to DSP Microprocessor Architecture

Transcript A Short Introduction to DSP Microprocessor Architecture

M513 – Advanced DSP Techniques

Unit Review & Exam Preparation

M513 – Main Topics Covered in 2011/2012 ay

1. Review of DSP Basics 2. Random Signal Processing 3. Optimal and Adaptive Filters 4. (PSD) Spectrum Estimation Techniques (exam questions will mainly come from parts 2 and 3 but good knowledge of part 1 is needed !!!)

Part 1 – Review of DSP Basics

DSP = Digital Signal Processing = Signal Analysis + Signal Processing

… performed in discrete-time domain • Fourier Transform Family • More general transform (z-transform) • LTI Systems and Convolution • Guide to LTI Systems

Signal Analysis

• To analyse signals in time domain we can use appropriate member of Fourier Transform family

time  Fourier Transforms - Summary continuous discrete aperiodic periodic

x p

F ourier T ransform

 1 2    

  

e dt d

 

F ourier S eries

  

 1

 2

 2

x p jk

 0

t dt jk

t dt X

D iscrete T ime F ourier T ransform

 1 2     

X e

    



D iscrete F ourier T ransform

 1

 1

N k

  0   

2 

N nk



 1 

 0

2 

N nk

continuous discrete aperiodic periodic  frequency

Fourier Transforms

• Following analogies can be seen: Periodic in time Aperiodic in time Continuous in Time Discrete in Time ↔ ↔ ↔ ↔ Discrete in Frequency Continuous in Frequency Aperiodic in Frequency Periodic in Frequency

More general transforms

• Two more transforms are introduced in order to generalise Fourier transforms for both continuous and discrete-time domain signals • To understand their region of operation it is important to recognise that both CTFT and DTFT only operated on one limited part of the whole complex plane (plane of complex values) • CTFT operates on the frequency axis, i.e. line s =0 from the complex plane s= s +j  (i.e. s=  ) .

• DTFT operates on the frequency circle, i.e. curve r=1 from the complex plane z=re j  (i.e. z=e j  ).

From Laplace to Z-Transform

• Evaluate Laplace transform of sampled signal x s (t) substitute:

 e

sT s X

' 

    





From Laplace to Z-Transform

• Consider again substitution we made on the previous slide:

 e

sT s z

 e

sT s s z

   e  s 

 

T s j

 e s

T s

e 

s j

  e s

T s

 cos  

sin    e s

T s

cos  

sin   1 1    

 1 i.e. left half of the s-plane ( s <0) maps into the interior of the unit circle in the z-plane |z|<1

j



Axis in s to z Mapping

S-plane Im (j  ) 



T s

 

2  

Re Z-plane Im

 1 

1 Re

Signal Processing

• Delay … signal • Scale … signal • Add … two or more samples (from the same or different signals)  Signal Filtering  Convolution

Convolution

• Gives the system input – system output relationship for LTI type systems (both DT and CT).

x(t) x(n) System y(t) y(n)

Impulse Response of the System

• Let h(n) be the response of the system to d (n) impulse type input (i.e. Impulse Response of the System ) • we note this as d  d (n) h(n) LTI System

Time-invariance

• For LTI system, if d  than d    ) (this is so called time-invariance property of the system) d (n-k) h(n-k) LTI System

Linearity

• Linearity implies the following system behavior: x(n) y(n) LTI System ax(n) ay(n) LTI System x 1 (n)+ x 2 (n) y 1 (n)+y 2 (n) LTI System

Linearity and Time-Invariance

• We can now combine time-invariance and linearity: Sd (n-k) S h(n-k) LTI System S x(k) d (n-k) LTI System S x(k)h(n-k)

Convolution Sum

• I.e., if • and: d  than d    ) 

       • i.e. system output is the sum of lots of delayed impulse responses (i.e. responses to individual, scaled impulse signals which are making up the whole DT input signal) • This sum is called CONVOLUTION SUM • Sometimes we use  to denote convolution operation, i.e. 

       

Convolution Sum for CT

• Similarly, for continuous time signals and systems (but a little bit more complicated)    

x t

  • The above expression basically describes the analogue (CT) input signal x(t) as an integral (i.e. sum) of an infinite number of time-shifted and scaled impulse functions.

Important fact about convolution

Convolution in t domain ↔ Multiplication in f domain

but we also have

Multiplication in t domain ↔ Convolution in f domain

Discrete LTI Systems in Two Domains

x(n) y(n) h(n) X(z) Y(z) H(z) h(n) – impulse response, H(z) – transfer function of DT LTI system

Summary

• • DT:

H(z)

is z Transform of the System Impulse Response -

System Transfer Function



)

is Discrete Time Fourier Transform of the System Impulse Response –

System Frequency Response

• • CT:

H(s)

is Laplace Transform of the System Impulse Response -

System Transfer Function



)

is Fourier Transform of the System Impulse Response –

System Frequency Response

Guide to Discrete LTI Systems

h(n)

Impulse Response ZT IZT z=e j IDTFT  Transfer Function

H(z)

ZT Including some mathematical manipulations IZT Difference Equation DTFT



)

Frequency Response

Guide to Continuous LTI Systems

h(t)

Impulse Response LT ILT s=j  IFT Transfer Function

H(s)

LT Including some mathematical manipulations ILT Differential Equation FT



)

Frequency Response

Example(s) - Use guide to LTI systems to move between the various descriptions of the system.

E.g. Calculate transfer function and frequency response for the IIR filter given with the following difference equation.

 0.7

 0.3

 6  1     0.7

 1        0.3

 2   0.3

   6

 1

 1 6     6 0.3

 2   6



 



  0.3

 2

  

2 6

 0.7

 0.3



e j

2  6

e j

  0.7

e j

  0.3



Example(s) - Use guide to LTI systems to move between the various descriptions of the system.

Having obtained frequency response, system response to any frequency F 1 can be easily calculated, e.g. for the ¼ of sampling frequency Fs we have:  1  2 

1  2 

F s

 2  1 4

F s F s

  2   1     

e j

 2 2 6

e j

 2  0.7

e j

 2  0.3

    ...

(note, in general this is a complex number so both phase and amplitude/gain can be calculated)

Example(s) - Use guide to LTI systems to move between the various descriptions of the system.

Opposite problem can also be easily solved, e.g. for the IIR filter with transfer function: 

2  0.2

 0.08

2  0.5

find the corresponding difference equation to implement the system.

      

2  0.5

 0.2

 0.08

 2    0.5

 2   0.2

     0.2

 1

 1 0.08

  0.08

 2

 2   0.08

 2   2   0.08

0.5

  2  2 

Part 2 – Random Signals

Random signals – unpredictable type of signals ( … well, more or less).

• Moments of random signals – m x , r xx (m) • Autocorrelation ↔ PSD • Filtering Random Signals –Spectral Factorisation Equations (in three domains) • Deconvolution and Inverse Filtering • Minimum and Non-minimum phase systems/filters

Signal Classification

• Deterministic signals – can be characterised by mathematical equation • Random (Nondeterministic, Stochastic) signals – can not be characterised by mathematical equation – usually characterised by their statistics Random (Nondeterministic, Stochastic) signals can be further classified as: • Stationary signals – if their statistics does not change with time • Nonstationary signals – if their statistics changes with time

Signal Classification

• Wide Sense Stationary (random) signals – random signals with constant signals statistics up to a 2 nd order • Ergodic (random) signals – random signals whose statistics can be measured by Time Averaging rather then Ensemble Averaging (i.e. expectation of the ergodic signal is its average) • For simplicity reasons, we study Wide-Sense Stationary (WSS) and Ergodic Signals

1 Order Signal Statistics

• Mean Value m of the signal x(n) is its 1 st order statistics: m

   

lim  2

1  1

n N

 

     more general eq.

If m x is constant over time, we are talking about stationary signal x(n).

Expected value of x(n) (E – expectation operator) Single waveform averaging, so signal is ergodic (no need for ensemble averages).

2nd Order Statistics

• Autocovariance of the signal is the 2 nd order signal statistics: • It is calculated according to:

c xx

  

    m

    m

       2   where * denotes a complex conjugate in case of complex signals • For equal lags, i.e. k=l, the autocovariance reduces to variance

c xx

   s

2 Order Statistics

• Variance s 2 of the signal is: s

2 

  m

    m

    2  m

2 • Variance can be considered as some kind of measure of signal dispersion around its mean.

Analogies to Electrical signals

• Two zero-mean signals, with different variances 1.5

1 0.5

0 -0.5

-1 -1.5

0 5 10 15 20 25 30 35 40 45 50 m

– mean – DC component m

x 2

of electrical signal – mean squared – DC

E[x 2

power

(n)]

– mean square – s

x 2

total average power – variance – AC power – standard deviation – rms value

Autocorrelation

• This is also a 2 nd order statistics of signal and is very similar (in some cases identical) to autocovariance • Autocorrelation of the signal is basically a product of signal and its shifted version:

r xx

        

lim  2

1  1

k N

 



  where m=l-k.

• Autocorrelation is a measure of signal predictability – correlated signal is one that has redundancy (it is compressible, e.g. speech, audio or video signals)

Autocorrelation

r xx

        

lim  2

1  1

k N

 



  • for m=l-k we sometimes use notation r xx (m) or even r x (m) instead of r xx (k,l), i.e.

  

r xx

        

  

lim  2

1  1

k N

 



 

Autocorrelation and Autocovariance

• For zero mean, stationary signals those two quantities are identical:

c xx

 

c xx

  

          m

     m

    

  

r xx



where m x = 0 • Also notice that the variance of the zero-mean signal then corresponds to zero-lag autocorrelation:

c xx

  

c xx



r xx

 s

Autocorrelation

• Two important autocorrelation properties:

r xx

 

r xx



r xx r xx

• r xx (0) is essentially a signal power so it must be larger than any other autocorrelation of that signal (other way of looking at this property of the autocorrelation is to realise that the sample is best correlated with itself).

Example (Tutorial 1, Problems 2, 3) Random phase family of sinusoids: 

cos  

k n

 q 

and  k are fixed constants and q is a uniformly distributed random variable (i.e. equally likely to have any value in the interval –  to  ). Prove the stationarity of this process, i.e.

a) Find the mean and variance values (should be const.) b) Find the autocorrelation function (ACF) (should only depend on time lag value m, otherwise const.)

Example (Tutorial 1, Problems 2, 3) m

   

    

     cos cos  

2   

k n

 cos  

k n



     cos q cos sin 2   sin   

2 

  

 0

 sin  

2  sin q

      1 2  sin

const

Example (Tutorial 1, Problems 2, 3) s

2    

q      

  

1 2  1 2  1 2   2  m

 2 

  

q  

 0  2        cos  

k n

 q   2         1 2  1 2

q  1 2        

k n

 q 1 2

q   

q  

k n

 q  

q  

 1 2  2 2   1 2   0

const

2  2 

Example (Tutorial 1, Problems 2, 3) Random phase family of sinusoids: 

cos  

k n

 q 

and  k are fixed constants and q is a uniformly distributed random variable (i.e. equally likely to have any value in the interval –  to  ). Discuss two approaches to calculate ACF for this process.

We can use: or we can go via:

  

r xx



q 

r xx

  cos  

k n

 q 

   cos  

k E

q 



  q  



2 

R xx r xx

Power Spectral Density (PSD)

• Power Spectral Density is the Discrete Time Fourier Transform of the autocorrelation function

R xx



R xx



  

r xx

 • PSD contains power info (i.e. distribution of signal power across the frequency spectrum) but has no phase information (PSD is “phase blind”).

• PSD is always real and non-negative.

White Noise

• White noise signal has a perfectly flat power spectrum (equal to the variance of the signal s 2 ).

R ww r ww

DFT IDFT 

m R ww

 s 2 

r ww

 • Autocorrelation of white noise is a unit impulse with amplitude s

2 – white noise is perfectly uncorrelated signal (not realisable in practice, we usually use pseudorandom noise with PSD almost flat over a finite frequency range)

Filtering Random Signals

• Filter scales the mean value of the input signal.

• In time domain the scaling value is the sum of the impulse response.

• In frequency domain the scaling value is the frequency response of the filter at  =0.

m x m y m



   m

Time Domain DIGITAL FILTER m

   0 m

Frequency Domain

Filtering Random Signals

• Cross-correlation between the filter input and output signals:

r yx



    ...



     

xx k



 or

r yx



r xx

Filtering Random Signals

• Autocorrelation of the filter output:

r yy



    ...



  

   

yx n l

 using

m r yy



  

   

yx m



 

r yx



Filtering Random Signals

• The autocorrelation of the filter output therefore depends only on k, difference between the indices n+k and n, i.e.:

r yy



r yx



 • Combining with:

r yx

we have:

r yy



r xx



r xx



Spectral Factorisation Equations

  

r k x



 r x r yx r y h(k) h*(-m) Taking the DTFT of above equation:

 



R e x

 Taking the ZT of above equation:

R y



R x

2 

R e x

Filtering Random Signals

• In terms of z-transform:

R y

 

 1

 

R x

• If H(z) is real, H(z)=H*(z*) so:

R y

 

1   

R x

• This is a special case of spectral factorisation .

Example – Tutorial 2, Problem 2 A zero mean white noise signal x(n) is applied to an FIR filter with impulse response sequence {0.5, 0, 0.75}. Derive an expression for the PSD of the signal at the output of the filter.

r xx



x R xx

 s

r yy

  s 2



r yy R yy

    

R xx

    0.5

0.25

  0.75

0.375

 2  0.5

0.375

 2   0.8125

  0.75

0.375

2 2  0.375

 2  s

2  0.5625

 s

2 s

r yy

 

r yy



r yy

 s

2 0.375

2 ;

r yy

2  0.8125

2 ;

r yy

 0.375

Example - What about IIR type filter and coloured noise signal? (not in Tutorial handouts) An IIR filter described by the following difference equation:

    1   Is used to process WSS signal with PSD:  0.4

 2.5

  1  Find the PSD of the filter output.

Solution – PSD of the output is required, so spectral factorisation equation in  domain can be used.

R y



Example - What about IIR type filter and coloured noise signal? (not in Tutorial handouts) First calculate the transfer function of the filter, then find the frequency response:  0.4

   0.4

     1        



 





 1

 1    2.5

2.5

 1   1    … then apply spectral factorisation:

R y

  



 



  



 



   1  

Inverse Filters

• Consider the deconvolution problem as represented on the figure below: x(n) y(n) Ax(n-d) H(z) H -1 (z) • Our task is to design a filter which will reconstruct or reconstitute the original input x(n) from the observed output y(n).

• The reconstruction may have the arbitrary gain A and a delay of d, hence the definition of the required

inverse filter

is: 

 1 



Inverse Filters

• The inverse system is said to ‘equalise’ the amplitude and phase response of H(z), or to ‘deconvolve’ the output y(n) in order to reconstruct the input x(n).

x(n) y(n) Ax(n-d) H(z) H -1 (z)

Inverse Filters - Problem

• If H(z) is a

non-minimum phase system

, the zeros outside the unit circle become poles outside the unit circle and the inverse filter is

unstable !!!

x(n) y(n) Ax(n-d) H -1 (z) H(z)

Noise Whitening

• With inverse filtering x(n) does not have to be a white random sequence but the inverse filter H 0 -1 (z) has to produce the same sequence x(n).

• For noise whitening, input x(n) has to be a white random sequence as well as output of H 0 -1 (z), u(n) but sequences x(n) and u(n) are not the same.

x(n) y(n) x(n) Inverse filtering H 0 (z) H 0 -1 (z) y(n) u(n) Noise whitening x(n) H 1 (z) H 0 -1 (z)

Deconvolution using autocorrelation

• Consider the filtering process again: x(n) h(k) y(n)

r yx



    • The matrix equation to be solved in order to estimate the impulse response h(k) before attempting the deconvolution is given on the next slide:

    r yx r yx   r yx   ...

    r xx

Deconvolution using autocorrelation

r xx r xx     .

r xx r xx     r xx  .

 r xx r xx     r xx  L .

 2  • using matrix notation:

r yx



 ...

r xx ... r xx   ...

...

r xx               ...

        • The aim is to obtain coefficients b(0), b(1), …, b(L) which can be done by inverting the matrix

R xx

• This matrix is known as autocorrelation matrix be used as important 2 nd and can order characterisation of random signal.

Deconvolution using autocorrelation

• Solution of the equation from the previous slide: is obviously given with:

  1

R r x yx r yx



R b x

• Important to note is the structure of autocorrelation matrix R x .

Toeplitz matrices

• If A i,j is an element of the matrix in i-th row and j-th column then A i,j =a i-j .

A        a a 0 a 1 a 2 ...

a a  1 a 0 a 1 ...

a  2 a  1 a 0 ...

a ... a ... a ... a ...

...

a 0       • Another important DSP operation – convolution also has strong relation to Toeplitz type matrix, called convolution matrix .

Convolution matrix

• To form the convolution matrix we need to represent the convolution operation in vector form.

• For example, the output of the FIR filter of length N can be written as: 

k N

 1        T where x is a vector of input samples to a filter and h is a vector of filter coefficients (impulse response in case of FIR filter) • The above equation represents the case where x(n)=0 for n<0.

Decomposition of Autocorrelation Matrix

• The Toeplitz structure of autocorrelation matrix is used in diagonalisation of the autocorrelation matrix – this is the important process of decomposition of the autocorrelation matrix in the form: here: R Q -1 Q T L – diagonal matrix containing the eigenvalues of R Q – modal matrix containing the eigenvectors of associated with eigenvalues in L

White noise

• What would be the form of the autocorrelation matrix for the case of white noise signal?

• Assuming ideal white noise sequence, i.e. perfectly uncorrelated signal, its autocorrelation is a unit impulse with amplitude s w 2 .

r ww m

• The autocorrelation matrix is in this case diagonal (all non-diagonal elements are zero).

More on minimum and non minimum phase systems

• Non-minimum phase systems (sometimes also called mixed-phase systems) are the systems with some of its zeros inside the unit circle and the remaining zeros outside the unit circle.

• If all of its zeros are outside the unit circle, non-minimum phase system is called a maximum phase system.

• The minimum, non-minimum and maximum phase systems can also be recognised by their phase characteristics.

• The phase characteristic of the minimum phase system has a zero net phase change between  =0 and  =  frequencies, while the non-minimum phase system has non-zero phase change between those frequencies.

More on minimum and non minimum phase systems

• Maximum phase system has the maximum phase change between  =0 and  =  frequencies amongst all possible systems with the same amplitude response.

Example (Tutorial 2, Problem 4) A zero-mean stationary white noise x(n) is applied to a filter with a transfer function:

 0.5



 3 

2 Find all filters that can produce the same PSD as the above filter. Are those filters minimum or maximum phase filters?

Using spectral factorisation equation

S yy



S xx z

 0.5



 3 

2 

 1  0.5



 2

 1  3  s

Example (Tutorial 2, Problem 4)

S yy



S xx z

 0.5



 3 

2 

 1  0.5



 2

 1  3  s

2 Same PSD would be obtained using filter H 0 (z):

S yy

 

 0.5

 

 1  3  

 1

2  0.5

 

z z

 2  3  s

2 

0 0   s

2 H 0 (z) has two zeros 0.5 and 1/3 = 0.333333, both inside the unit circle i.e. this is a minimum phase filter

2  

 1  0.5

 

 3 

2 has poles at z=2 and z=3, both outside the unit circle i.e. this is a maximum phase filter

Part 3 Optimal and Adaptive Digital Filters

Part 3 – Optimal and Adaptive Digital Filters

Best filters for the task in hand

• Wiener Filter and Equation • Finding the minimum of the cost function (MSE) = MMSE • Steepest Descent algorithm • LMS and RLS algorithms • Optimal and Adaptive Filter Configurations (i.e. applications)

Optimal and Adaptive Filters

•

Optimal filters

are the “best” filters for the particular task. We use knowledge about the signals to design those filters and apply them to the task. •

Adaptive Filters

change their coefficients to improve their performance for the given task. They are not fixed and can therefore change their (statistical) properties over time. • Adaptive Filters may not be optimal but are constantly striving to become optimal

Optimal (Wiener) Filter Design

• System Identification problem: x(n) h(n) = ?

y(n) • We want to estimate the impulse response h(n) of the “unknown” discrete-time system.

• We can use the equation for the cross-correlation between the filter input and output to obtain the estimate for h(n).

Optimal (Wiener) Filter Design

1. convolution form: 2. matrix form:        r yx r yx r yx    ...

       r xx r xx xx  .

    

r yx

r xx r xx     xx  .

 

    r xx r xx     xx  .

 ...

r xx ... r xx ...

...

  r xx .

                ˆ      ...

        3. matrix form in short-notation:





Optimal (Wiener) Filter Design

3. matrix form in short-notation:





– cross-correlation vector (between input and output signals) – autocorrelation matrix (of input signal) – estimated impulse response vector From the above equation we can easily obtain vector 

 1



Optimal (Wiener) Filter Design

• Equation 

 1



is also known as Wiener-Hopf equation • • Using this equation we have actually estimated (or designed) a filter with the impulse response close (or equal) to the impulse response of the unknown system.

This type of optimal filter is also known as

Wiener filter

Optimal (Wiener) Filter Design

• We can approach the problem of designing the Wiener filter estimate of the unknown system in a slightly different way. Consider a block diagram given below:

x(n)

 

y(n) d(n)

+ e(n)

• A good estimate of the unknown filter impulse response h(n) can be obtained, if the difference/error signal between two outputs (real and estimated system) is minimal (ideally zero).

Optimal (Wiener) Filter Design

x(n)

 

y(n) d(n)

+ e(n)

• We use the following notation: d(n) – output of the unknown system (desired signal) y(n) – output of the system estimate x(n) – input signal (same for both systems) e(n) – error signal, e(n)= d(n) – y(n) • For e(n) →0, we expect to achieve a good estimate of the unknown system, i.e.:  

Optimal (Wiener) Filter Design

• • Wiener filter design is actually a much more general problem desired signal d(n) does not have to be the output of the unknown system

x(n)

 

y(n) d(n)

+ e(n)

Optimal (Wiener) Filter Design

• Another Wiener filter estimation example:

d(n)

Signal Distorting System

+ w(n) x(n)

w(n) – noise signal Optimal Filter

h(n)

Task:

y(n) -

d(n) + e(n)

Design (determine) h(n) in order to minimise error e(n) !

Optimal (Wiener) Filter Design

x(n)

 

y(n) d(n)

+ e(n)

• Rather than minimising current value of the error signal - e(n), we can choose more effective approach – minimise the expected value of the square error – mean square error (MSE) function.

• Function to be minimised (cost function) is therefore MSE function defined as: 2

Mathematical Analysis

x(n)

 

y(n) d(n)

+ e(n)

  

N k

  0  1   2 

 

) 

     

N k

  0  1    2 

)   2   Filter output Error signal MSE (cost) function

We can try to minimise this expression or switch to matrix/vector notation

Mathematical Analysis

 2 

   2 

N k

  0  1  

) Using vector notation:

        1) 1)     

    

h h

(0) (1)  1)      

Mathematical Analysis

[ 2 ( )]  

y n

2    2 2  T

h x

 

2  T

h x

 P -2 d T

h R dx

+ T

h R h xx



 T

h x



T  

]



T 

] P = [ d

R dx

 ( )] 

x R xx





T Scalar Cross-correlation vector Autocorrelation matrix

Mathematical Analysis

• To find the minimum error take the derivative with respect to the coefficients,

(

), and set equal to zero.

 2

E e n

   2

R dx

 2

R h xx

opt

 0 • Solving for

h: h

opt

  1

R R xx dx Wiener-Hopf Equation … again

• Wiener-Hopf equation therefore determines the set of optimal filter coefficients in the mean-square sense.

Example (Tutorial 3, Problems 1 and 2) Derive the Wiener-Hopf equation for the Wiener FIR filter working as a noise canceller.

Detailed derivation of Wiener-Hopf equation is shown in Tutorial (3); for ANC application, we can start from      1    

N k

  1  0    2     1   

N k

  1  0    2  

Example – Tutorial 3, Problems 1 and 2 Derivation of the Wiener-Hopf equation for the FIR noise canceller.



  

 0    2 

   0     1   

N k

 1   0        2   

   2

   0          2

   0    2    0

   0    2    

   0  

   0  

r dv

2 

  1   

l N

   1 0 1    2      2   

k N

  0  0

    2      2  

N k

  1  0 



  0   2   

Example – Tutorial 3, Problems 1 and 2 Derivation of the Wiener-Hopf equation for the FIR noise canceller.



N k

  0  1 



 or in matrix/vector form:



R w

opt

  1

R r

MSE Surface

• E[e 2 (n)] represents the expected value of squared filter error e(n), i.e. mean square error (MSE).

• For the N coefficients filter this is an N dimensional surface with Wiener-Hopf solution positioned at the bottom of this surface (i.e. this is the minimum error point) • We can plot it for the case of 2-coefficient filter (more than that - impossible to draw in 2D). 6 5 4 3 2 1 0 20

Mean Square Error Surface Example for 2 weights Wiener filter

10 2 4 6 8 10 12 14 MMSE – Wiener optimum

MMSE

• Once the coefficients of the Wiener filter (i.e. coordinates of the MMSE point) are known, the actual MMSE value is easy to calculate – we need to evaluate J(n) for h=h opt .

min      



h=h opt

P -2 d 

-1 R R xx dx



T h T opt R dx T

h opt R dx

+ 

-1 R R xx dx



T opt R xx



-1 R R xx dx

 P -2 d 

-1 R R xx dx

P -2 d 

-1 R R xx dx





-1 R R dx



P d 

-1 R R xx dx



T R d x R dx R dx

+ 

-1 R R xx dx

+ 

-1 R R xx dx



T -1 R R R xx xx dx IR dx R dx

+ 

-1 R R dx



T R dx

 P d

h T opt R dx

Example (Tutorial 3, Problems 3 and 4) • Alternative derivation of the MMSE equation is shown in Tutorial 3, Problem 3 • Use of both Wiener-Hopf and MMSE equations is demonstrated in Tutorial 3, Problem 4 Two-coefficient Wiener filter is used to filter zero-mean, unit variance noisy signal,

v(n)

uncorrelated with the desired signal

d(n)

. Find: r dx , optimal solution (Wiener Hopf) w opt and MMSE J min assuming:

   0.6

m v

   d

Example (Tutorial 3, Problems 3 and 4)

  

         

r d r d

       1 0.6

     

Example (Tutorial 3, Problems 3 and 4)

opt



 1

   

          

   

   1

 

     

)  

    1

    

d +

  1

     

r d r d

          1 0.6

r d r d

       0.6

1     1 0    s 0

2 0 s

2      1   

r d r d

       0 1       1   1 0.6

2 0.6

0.6

2     1 0.6



 0.549

 0.165

    

   1

 0.165

0.549

    1 0.6

0.451

  0.165



Example (Tutorial 3, Problems 3 and 4)

MMSE

min  

r d

2 

dx T

opt



dx T

opt

 1 0.6

   0.451

0.165

  0.451

MMSE

• Another very important observation can be made after rearranging the basic equation for the error signal:

    

  

   



T h x T h x T h

 

  

R dx - R h xx



   

T h

x x T h

The Steepest Descent Algorithm

• The Steepest Descent method iteratively estimates the solution to the Weiner-Hopf equation using a method called gradient descent . • This minimization method finds a minima by estimating the gradient of the MSE surface and forcing the step in the opposite direction of the gradient. • The basic equation in gradient descent is:

h n

 1 

h n

 2



 Step size parameter Gradient vector that makes h n+1 (k) approach h opt

The Steepest Descent Algorithm

h n

 1 

h n







 • Notice that the expression for the gradient has already been obtained in the process of calculating Wiener filter coefficients, i.e.:  

 2

E e n

   2

R dx

 2

R h xx

• This is a significant improvement in our search for more efficient solutions needed.

– coefficients are now determined iteratively and no inverse of the autocorrelation matrix is

The Steepest Descent Algorithm

 

 2

E e n

   2

R dx

 2

R h xx h n+1



h n

 2 m



R h xx n



T R dx



• We still need to estimate autocorrelation matrix

R xx

crosscorrelation vector R dx and (for every iteration step !!!) • Further simplification of the algorithm can be achieved by using the instantaneous estimates of

R xx

and

R dx

R xx



   



x x T R dx x x

The LMS (Least Mean Squares) Algorithm for Adaptive Filtering

h n+1



h n



h n



h n

   2 m 2 m 2 m  



R h xx x



n x



R dx

   

T T h

 

n h



 



h n

 2 m

  

Example – Tutorial 4, Problem 3

4 coefficients LMS based FIR adaptive filter working in the system identification application trying to identify system with transfer function: 

 1

 1 Write the equations for the signals

d(n)

and

e(n)

and the update equation for each adaptive filter coefficient, i.e.

w 1 (n)… w 4 (n)

Example – Tutorial 4, Problem 3

       

 1 

 1 

 1 0.25

 2  0.25

 2   

 1   

 1   0.5

 1  0.25

 2    

w x n



w x n

 

w x n

 2)

w x n

 3) weights update equations:   2 m ) i=0, 1, 2, 3

Applications

• Before looking into details of Matlab implementation of LMS update algorithm, some practical applications for adaptive filters are considered first.

• Those are: – System Identification – Inverse System Estimation – Adaptive Noise Cancellation – Linear Prediction

Applications: System Identification

Unknown System

x(n)

Digital Filter

y(n) d(n)

+ e(n) h(n)

Adaptive Algorithm Definitions of signals:

(

) – input applied to unknown system and adaptive filter

(

) – filter output

(

) – system (desired) output

(

) – estimation error Identifying the response of the unknown system.

Applications: Inverse Estimation

Delay

x(n) x(n)

System Digital Filter

y(n) d(n)

+ e(n) h(n)

Adaptive Algorithm Definitions of signals:

(

) – input applied to system

(

) – filter output

(

) – desired output

(

) – estimation error Estimating the inverse of the system.

Delay block ensures the causality of the estimated inverse

Applications: Noise Cancellation

d(n)=s(n)+n(n)

Signal source Noise source

x(n)

Digital Filter

h(n)

Adaptive Algorithm

y(n) -

+ e(n)

Definitions of signals:

(

) – noise (so called reference signal)

(

) – noise estimate

(

) – signal + noise

(

) – signal estimate Removing background noise from the useful signals

Applications: Linear Predictor

AR Process

x(n) x(n)

Delay Digital Filter

y(n) d(n)

+ e(n) h(n)

Adaptive Algorithm Definitions of signals:

(

) – signal to be predicted

(

) – filter output (signal prediction)

(

) – desired output

(

) – estimation error Estimating the future samples of the signal.

Applications: Linear Predictor

• • Assuming that the signal x(n): • is periodic • Is steady or varies slowly over time the adaptive filter will can be used to predict the future values of the desired signal based on past values.

• • When x(n) is periodic and the filter is long enough to remember previous values, this structure with the delay in the input signal, can perform the prediction. This configuration can also be used to remove a periodic signal from stochastic noise signals.

Example

• Have a look into Tutorials 3 and 4 for examples of each discussed configuration.

•

Adaptive LMS Algorithm Implementation

LMS algorithm can easily be implemented in software. Main steps of this algorithm are: 1. Read in the next sample, x(n), and perform the filtering operation with the current version of the coefficients.

2. Take the computed output and compare it with the expected output, i.e. calculate the error.

3. Update the coefficients (obtain the next set of coefficients) using the following computation.

h n

 1 

N k

 1   0 



 m  

) 

) • • This algorithm is performed in a loop so that with each new sample, a new coefficient vector, h n+1 (k) is created. In this way, the filter coefficients change and adapt.

Adaptive LMS Algorithm Implementation

• Before the LMS algorithm “kicks in” we also need to initialise filter coefficients; the safest option is to initialise them all to zero.

h k

(0)  0 for k=0...N-1 

N k

 1   0

  

)

h n

 1 

 m 

)

Other applications of adaptive filters

• PSD estimation of observed signal • Foetal ECG monitoring – cancelling of maternal ECG • Removal of mains interference in medical signals • Radar signal processing – Background noise removal – RX-TX crosstalk reduction – Adaptive jammer suppression • Separation of the speech from the background noise • Echo cancellation for speaker phones • Beamforming

Part 4 PSD Estimation and Signal Modelling Techniques

Part 4 – PSD Estimation

There are more ways to find PSD of the signal

• Nonparametric techniques (periodogram and correlogram) • Parametric Techniques (AR, MA and ARMA models) • Yule-Walker Equations and Signal Predictors

Approaches to PSD estimation

• Classical,

Non-parametric Techniques

Fourier Transform – based on – robust, require no previous knowledge about the data – assume zero data values outside the data window -results are distorted, resolution can be low – not suitable for short data records • Modern,

Parametric Techniques

– include a priori model information concerning the spectrum to be estimated.

• Modern,

Non-parametric methods

– use singular value decomposition (SVD) of the signal to separate correlated and uncorrelated signal components for an easier analysis

Non-parametric PSD estimation techniques (DFT/FFT based)

• PSD is estimated directly from the signal itself, with no previous knowledge about signal • Periodogram: – Based on the following formula • Correlogram:

R xx

 1

 1



 0 2  – Based on the following formula

xx R xx



 1

  

 1 

- estimate of the autocorrelation of signal x(n) 1

Periodogram and Correlogram

• note that since 2  





 



 

 (note - not a strict mathematical derivation) results obtained with those two estimators should coincide • variations on the basic periodogram approach are usually used in practice

Blackman-Tukey method

• Since correlation function at its extreme lag values is not reliable (less data points enter the computation) it is recommended to use lag values of about 30%-40% of the total length of the data • Blackman-Tukey is windowed correlogram given by: w(n) is the window with zero values for |m|>L-1 • also, L<

R BT



 1

 

1     

Bartlett Method

• This is an improved periodogram method (note that previously discussed, Blackman-Tukey is a correlogram method) • Bartlett’s method reduces the fluctuation of the periodogram by splitting up the available data of N observations into K=N/L subsections of L observations each.

• Spectral densities of produced K periodograms are then averaged.

segment 1 L samples periodogram 1 + periodogram 2 … periodogram K = Total/K

Bartlett Method

segment 2 L samples … … segment K L samples PSD Estimate

Welch Method

• Welch proposed further modification to Bartlett method and introduced overlapped and windowed data segments defined as:

      0

 1, 0 K-1 where: w(n) - window of length M D - offset distance K - number of sections that the sequence x(n) is divided into

Welch Method

• i-th periodogram is

R e i

 1

 1

L n

  0

  • averaged periodogram is 2  1

K L n

 1   0

 

Welch Method

data segment 1

D samples

segment 2 periodogram 1 + periodogram 2 … periodogram K = Total/K PSD Estimate segment K

Modified Welch Method

• Data segments taken from the data record are progressively getting longer thus introducing a better frequency resolution.

• Due to an averaging procedure periodogram variance decreases and smoother periodograms are obtained

Modified (Symmetric) Welch Method

data segment 1 segment 2 periodogram 1 + periodogram 2 … periodogram K = Total/K PSD Estimate segment K

Modified (Assymetric) Welch Method

data segment 1 segment 2 periodogram 1 + periodogram 2 … periodogram K = Total/K segment K PSD Estimate

Comparison of nonparametric PSD estimators

• We use quality factor Q to evaluate different nonparametric methods • This is a ratio if the square if the mean of the power spectral density to its variance

  var 

xx R xx

   



Comparison of nonparametric PSD estimators

Method Periodogram Bartlett Welch Blackman-Tukey Conditions N→∞ N,L→∞ N,L→∞, 50% overlapp N,L→∞, triangular window Q 1 1.11Nf

1.39Nf

Comments Inconsisten, independent of N Quality improves with data length Quality improves with data length 2.34Nf

Quality improves with data length • f is a 3 dB main lobe of the associated windows

Parametric PSD estimation techniques (DFT based)

• use a priori model information about the spectrum to be estimated

Steps for parametric spectrum estimation

1. Select a suitable model for the procedure. This step may be based on: - a priori knowledge of the physical mechanism that generates the random process. - trial and error, by testing various parametric models. (if wrong model is selected, results can be worse than when using non-parametric methods for PSD estimation) 2. Estimate the (p,q) order of the model (from the collected data and/or from a priori information). 3. Use collected data to estimate model parameters, coefficients.

Stochastic signal modelling

Deterministic signal modelling

Possible Models

Most General Model:

Autoregressive Moving Average (ARMA) - a and b coefficients (i.e. IIR)

also known as

Pole-Zero Model

x(n) x(n) b(n) y(n) x(n) b(n) a(n)

Moving Average (MA) b coefficients only (i.e. FIR)

All Zero Model

a(n) + + + S + S y(n) y(n)

Autoregressive (AR) - a coefficients only

All Pole Model

ARMA MA AR

Model Equations

k p

  0

   

k q

  0

   

k q

  0

  

k p

  0

 

Model Equations in z-domain (i.e. Model Transfer Functions)

ARMA apply z-transform

k p

  0   

k q

  0 

k p

  0

a z k

 

k q

  0

b z k

       

k q

  0

b z k



k p

  0

a z k



Model Equations in z-domain

for a 0 =1:

ARMA MA AR



k q

  0

b z k



1 

k p

  1

a z k





k q

  0

b z



k k

 1 1 

k p

  1

a z



k k

Model Equations in



-domain

for a 0 =1:

ARMA MA AR

 1 

k q

  0

b e k k p

  1

a e k



k q

  0

b e k

 1 1 

k p

  1

a e k

So how do we get the signal PSD from the estimated model?

• If the white noise signal w(n) is the input to our model (i.e. x(n)=w(n)) the output signal y(n) is a WSS (wide sense stationary) signal with PSD given as: or

R yy R yy

   

R ww



R ww

PSD for ARMA modelled signal

R yy

  1 

k q

  0

b e k k p

  1

a e k

• using vector notation: 

R ww k q

  0

b e k jk

  1 

k p

  1

a e k jk

  s

R yy



q p H H p q

 s

PSD for ARMA modelled signal

R yy



H q H e bb e q H p H e aa e p

 s

2 • where H denotes Hermitian (transpose + complex conjugate) and:

e q

      

e e e j

 2  ...

      

e p

      

e e e j

 2 ...



      

  

b b

...

1      



a a

1 1 2   ...

PSD for AR & MA modelled signals

• Similarly, for AR modelled signals we have:

R yy

 1 1 

k p

  1

a e k

 1 1 

k p

  1

a e k jk

  s

2  1

H p H e aa e p

 s

2 • and for MA models:

R yy



k q

  0

b e k



k q

  0

b e k jk

  s

2 

R yy



q H q

 s

Statistical Signal Modelling Yule-Walker Equations

• In statistical signal modelling problem of determining the model coefficients boils down to solving a set of nonlinear equations called Yule Walker equations.

• Next couple of slides show how those equations are obtained starting from the general expression for ARMA model • Assuming a 0 =1 we have:

k p

  0

    

k p

  1

   

k q

  0

  

Yule-Walker Equations

• Multiplying both sides of the ARMA equation with

y(n-i)

and taking the expectation we have:     

   

 

k p

  1

 

k p

  1

k y

        

 

k q

  0

 

k q

  0

            • Since both x(n) and y(n) are jointly wide sense stationary processes, we can rewritte the last part of this equation using the following reasoning:

• i.e.  

Yule-Walker Equations

       

       s

r E x

 

   

2 

 



          



  

  

k p

  1

k y

  s

k q

  0

 



Yule-Walker Equations

• Further, for causal h(n) we can obtain the standard form of Yule-Walker equations

  

k p

  1

k y

  s



  s

  0 • Introducing we have:

c k







  

k p

  1

k y

  

  0   s

c k

0  for 0 for



q q

Yule-Walker Equations

• in matrix form:          

r y r y

    ...



   1 

 ...







r y r y

   

 

  1  1  ...

...

r y r y



 ...

 ....

 

1   1              

a a

1     

0            

Yule Walker Equations for AR model

  

k p

  1

k y



k p

  0

   s



 • in matrix form:      

r r r y y y

...

   

r y r y r y

    

 1  ...



r y

...

r y

        

a a

1      s

2 s