Transcript Document

Variable Step-Size Adaptive Filters
for Acoustic Echo Cancellation
Constantin Paleologu
Department of Telecommunications
[email protected]
Introduction
• acoustic echo cancellation (AEC)
 required in hands-free communication devices, (e.g., for mobile
telephony or teleconferencing systems)
 ! acoustic coupling between the loudspeaker and microphone
 an adaptive filter identifies the acoustic echo path between the
terminal’s loudspeaker and microphone
• specific problems in AEC
 the echo path can be extremely long
 it may rapidly change at any time during the connection
 the background noise can be strong and non-stationary
• important issue in echo cancellation
 the behaviour during double-talk
 the presence of Double-Talk Detector (DTD)
Introduction
(cont.)
d(n)
x(n)
Digital
filter
y(n)
+

e(n)
Adaptive
algorithm
Adaptive filter
e(n)  d (n)  y(n)
 Cost function J[e(n)] ↓ minimized
Introduction
(cont.)
• AEC configuration
x(n)
Far-end
ĥ(n)
Adaptive filter
to the Far-end
≈ v(n)
DTD
h
Acoustic echo path
“synthetic”
echo
acoustic
echo
e(n)
d(n)
microphone signal
Background
noise
Near-end
v(n)
Performance criteria: - convergence rate vs. misadjustment
- tracking vs. robustness
(a)
0.01
0.005
0
-0.005
-0.01
0
100
200
300
400
500
600
700
800
900
1000
esantioane
Samples
(b)
-20
dB
-30
-40
-50
-60
0
0.5
1
1.5
2
2.5
3
3.5
4
frecventa [kHz]
Frequency
[kHz]
Fig. 1. Acoustic echo path: (a) impulse response; (b) frequency response.
(a)
0.1
0.05
0
-0.05
0
100
200
300
400
500
600
700
800
900
1000
Samples
esantioane
(b)
dB
0
-20
-40
0
0.5
1
1.5
2
2.5
3
3.5
4
frecventa [kHz]
Frequency
[kHz]
Fig. 2. Acoustic echo path: (a) impulse response; (b) frequency response.
(a)
0.1
0.05
0
-0.05
-0.1
0
500
1000
1500
2000
2500
3000
esantioane
Samples
(b)
10
dB
0
-10
-20
-30
0
0.5
1
1.5
2
2.5
3
3.5
4
frecventa [kHz]
Frequency
[kHz]
Fig. 3. Acoustic echo path: (a) impulse response; (b) frequency response.
Adaptive algorithms for AEC
• requirements
 fast convergence rate and tracking
 low misadjustment
 double-talk robustness
• most common choices
 normalized least-mean-square (NLMS) algorithm
 affine projection algorithm (APA)
• step-size parameter (controls the performance of these algorithms)
 large values  fast convergence rate and tracking
 small values  low misadjustment and double-talk robustness
conflicting requirements  variable step-size (VSS) algorithms
• classical NLMS algorithm
e(n) = d(n) – xT(n)ĥ(n – 1)
ĥ(n) = ĥ(n – 1) + μ(n)x(n)e(n)
where μ(n) = μ /
step-size parameter
adaptive filter length
[xT(n)x(n)]
x(n) = [x(n), x(n – 1), …, x(n – L + 1)]T
ε(n) = d(n) – xT(n)ĥ(n)  a posteriori error vector
 ε(n) = e(n)[1 – μ(n)xT(n)x(n)]
ε(n) = 0 [assuming that e(n) ≠ 0]  μ(n) = 1 / [xT(n)x(n)]
 μ=1
Optimal step-size ?
x(n)
(far-end)
ĥ(n)
h
(to the far-end)
v(n)
(near-end)
ε(n) = 0
ε(n) = v(n)
E{ε2(n)} = E{v2(n)}
E{e2(n)}[1 – μ(n)LE{x2(n)}]2 = E{v2(n)}


1
1 
 ( n)  T
x (n)x(n) 
E{v 2 ( n)} 

2
E{e (n)} 
 ˆ v  n  
1
  n  T
1 

x (n)x(n)  ˆ e  n  
ˆe2  n   ˆe2  n 1  1    e2  n 
?
λ = 1 – 1/(KL), with K > 1
1) near-end signal = background noise (single-talk scenario)
v(n) = w(n)

ˆ w 
1
  n  T
1 

x (n)x(n)  ˆ e  n  
background noise power estimate
[J. Benesty et al, “A nonparametric VSS NLMS algorithm”, IEEE Signal Process. Lett., 2006]
 NPVSS-NLMS algorithm
Problem: background noise can be time-variant
2) near-end signal = background noise + near-end speech
v(n) = w(n) + u(n)
(double-talk scenario)
ˆv2
 n
 ˆ w2
 n
 ˆu2
 n
near-end speech power estimate
???
Problem: non-stationary character of the speech signal
• Solutions for evaluating the near-end signal power estimate
ˆ v2  n   ?
1. using the error signal e(n), with a larger value of the weighting factor:
ˆe2  n   ˆe2  n 1  1    e2  n 
λ = 1 – 1/(KL), with K > 1
ˆv2  n  ˆv2  n  1  1    e2  n
γ = 1 – 1/(QL), with Q > K
 simple VSS-NLMS (SVSS-NLMS) algorithm
ˆ v  n 
SVSS  n  
1
T
  ˆ e  n 
  x  n x  n
1
regularization parameter
small positive constant
! The value of γ influences the overall behaviour of the algorithm.
2. using a normalized cross-correlation based echo path change detector:
ˆ v2
 n
 ˆ e2
n 
1
ˆ x2
 n
rˆeTx
 n  rˆex  n 
ˆ x2  n  ˆ x2  n 1  1    x2  n
rˆex  n   rˆex  n  1  1    x  n  e  n 
 NEW-NPVSS-NLMS algorithm


ˆ v  n  
1

1 
 if   n   
T
   x  n  x  n     ˆ e  n  
 NEW  NPVSS  n   
1

otherwise
T

  x n x n

[M. A. Iqbal et al, “Novel Variable Step Size NLMS Algorithms for Echo Cancellation ”,
Proc. IEEE ICASSP, 2008]
 n 
rˆed  n   ˆ e2  n 
ˆ d2  n   rˆed  n 
ˆd2  n  ˆd2  n 1  1    d 2  n
rˆed  n  rˆed  n  1  1    e  n  d  n 
convergence statistic
! The value of ς influences the overall behaviour of the algorithm.
x(n)
(far-end)
ĥ(n)
h
- yˆ  n 
y(n)
(to the far-end)
d(n) = y(n) + v(n)
E{d2(n)} = E{y2(n)} + E{v2(n)}

 

E y 2  n   E yˆ 2  n 

 
 
v(n) = w(n) + u(n)
(near-end)
available !
 assuming that the adaptive filter has
converged to a certain degree

E v2  n   E d 2  n   E yˆ 2  n 
2
2
2
 ˆv  n  ˆ d  n  ˆ yˆ  n
1
 n 
1
T
  x ( n) x( n)
ˆ d2  n   ˆ y2ˆ  n 
  ˆ e  n 
 Practical VSS-NLMS (PVSS-NLMS) algorithm
[C. Paleologu, S. Ciochina, and J. Benesty,“Variable Step-Size NLMS Algorithm for
Under-Modeling Acoustic Echo Cancellation ”, IEEE Signal Process. Lett., 2008]
!
This algorithm assumes that the adaptive filter has converged to
a certain degree.
Table I. Computational complexities of the different variable-step sizes.
Algorithms
Additions
Multiplications
Divisions
Square-roots
NPVSS-NLMS
3
3
1
1
SVSS-NLMS
4
5
1
1
NEW-NPVSSNLMS
2L + 8
3L + 12
3
1
PVSS-NLMS
6
9
1
1
Simulation results (I)
• conditions
 Acoustic echo cancellation (AEC) context, L = 1000.
 input signal x(n) – AR(1) signal or speech sequence.
 background noise w(n) – independent white Gaussian noise
signal (variable SNR)
 measure of performance – normalized misalignment (dB)
20log (|| h  hˆ (n) || / || h ||)
10
• algorithms for comparisons
 NLMS
 NPVSS-NLMS
 SVSS-NLMS
 NEW-NPVSS-NLMS
 PVSS-NLMS
0
NLMS w ith step-size (a)
NLMS w ith step-size (b)
-5
NPVSS-NLMS
Misalignment (dB)
-10
-15
-20
-25
-30
-35
-40
0
5
10
15
20
25
30
Time (seconds)
Fig. 4. Misalignment of the NLMS algorithm with two different step sizes
(a) μ = 1 and (b) μ = 0.05 , and misalignment of the NPVSS-NLMS algorithm.
The input signal is an AR(1) process, L = 1000, λ = 1 − 1/(6L), and SNR = 20 dB.
5
NLMS w ith step-size (a)
NLMS w ith step-size (b)
0
NPVSS-NLMS
-5
Misalignment (dB)
-10
-15
-20
-25
-30
-35
-40
0
5
10
15
20
25
30
Time (seconds)
Fig. 5. Misalignment of the NLMS algorithm with two different step sizes
(a) μ = 1 and (b) μ = 0.05 , and misalignment of the NPVSS-NLMS algorithm.
The input signal is an AR(1) process, L = 1000, λ = 1 − 1/(6L), and SNR = 20 dB.
Echo path changes at time 10.
5
SVSS-NLMS w ith  = 1 - 1/(12L)
SVSS-NLMS w ith  = 1 - 1/(18L)
SVSS-NLMS w ith  = 1 - 1/(30L)
0
Misalignment (dB)
-5
-10
-15
-20
-25
-30
-35
0
5
10
15
20
25
30
Time (seconds)
Fig. 6. Misalignment of the SVSS-NLMS algorithm for different values of γ.
Impulse response changes at time 10, and SNR decreases from 20 dB to 10 dB
at time 20. The input signal is an AR(1) process, L = 1000, and λ = 1 – 1/(6L).
5
NEW-NPVSS-NLMS w ith  = 1
NEW-NPVSS-NLMS w ith  = 0.1
0
NEW-NPVSS-NLMS w ith  = 0.001
Misalignment (dB)
-5
-10
-15
-20
-25
-30
-35
0
5
10
15
20
25
30
Time (seconds)
Fig. 7. Misalignment of the NEW-NPVSS-NLMS algorithm for different values
of ς. Impulse response changes at time 10, and SNR decreases from 20 dB to 10
dB at time 20. The input signal is an AR(1) process, L = 1000, and λ = 1 – 1/(6L).
0
NPVSS-NLMS
SVSS-NLMS
NEW-NPVSS-NLMS
-2
PVSS-NLMS
Misalignment (dB)
-4
-6
-8
-10
-12
-14
-16
-18
0
10
20
30
40
50
60
Time (seconds)
Fig. 8. Misalignment of the NPVSS-NLMS, SVSS-NLMS [with γ = 1 – 1/(18L)],
NEW-NPVSS-NLMS (with ς = 0.1), and PVSS-NLMS algorithms. The input signal
is speech, L = 1000, λ = 1 – 1/(6L), and SNR = 20 dB.
5
NPVSS-NLMS
SVSS-NLMS
NEW-NPVSS-NLMS
PVSS-NLMS
Misalignment (dB)
0
-5
-10
-15
0
10
20
30
40
50
60
Time (seconds)
Fig. 9. Misalignment during impulse response change. The impulse response changes
at time 20. Algorithms: NPVSS-NLMS, SVSS-NLMS [with γ = 1 – 1/(18L)], NEWNPVSS-NLMS (with ς = 0.1), and PVSS-NLMS. The input signal is speech, L = 1000,
λ = 1 – 1/(6L), and SNR = 20 dB.
20
NPVSS-NLMS
SVSS-NLMS
NEW-NPVSS-NLMS
15
Misalignment (dB)
PVSS-NLMS
10
5
0
-5
-10
-15
0
10
20
30
40
50
60
Time (seconds)
Fig. 10. Misalignment during background noise variations. The SNR decreases from 20 dB
to 10 dB between time 20 and 30, and to 0 dB between time 40 and 50. Algorithms:
NPVSS-NLMS, SVSS-NLMS [with γ = 1 – 1/(18L)], NEW-NPVSS-NLMS (with ς = 0.1),
and PVSS-NLMS. The input signal is speech, L = 1000, λ = 1 – 1/(6L), and SNR = 20 dB.
20
SVSS-NLMS
NEW-NPVSS-NLMS
PVSS-NLMS
Misalignment (dB)
15
10
5
0
-5
-10
-15
0
10
20
30
40
50
60
Time (seconds)
Fig. 11. Misalignment during double-talk, without DTD. Near-end speech appears between
time 15 and 25 (with FNR = 5 dB), and between time 35 and 45 (with FNR = 3 dB).
Algorithms: SVSS-NLMS [with γ = 1 – 1/(18L)], NEW-NPVSS-NLMS (with ς = 0.1), and
PVSS-NLMS. The input signal is speech, L = 1000, λ = 1 – 1/(6L), and SNR = 20 dB.
Adaptive algorithms for AEC
(cont.)
• Affine Projection Algorithm (APA)
 superior convergence rate as compared to the NLMS algorithm
 lower complexity as compared to the RLS algorithm
e  n   d  n   XT  n  hˆ  n  1
ˆh  n   hˆ  n  1   X  n   XT  n  X  n  1 e  n 


where d(n) = [d(n), d(n–1), …, d(n – p +1)]T
projection order
X(n) = [x(n), x(n – 1), …, x(n – p + 1)]
x(n – l) = [x(n – l), x(n – l – 1), …, x(n – l – L + 1)]T
l  0,1,
, p 1
adaptive filter length
• step-size parameter
• Variable Step-Size APA (VSS-APA)
ˆh  n   hˆ  n  1  X  n   XT  n  X  n   1 μ  n  e  n 



μ  n   diag 0  n  , 1  n  ,
0  n  1  n  
  p1  n     Classical APA
ε  n   d  n   XT  n  hˆ  n 
 a posteriori error vector
e  n   d  n   XT  n  hˆ  n  1  a priori error vector
ε  n   I p  μ  n  e  n 
ε  n   0 p1

μ  n  I p

,  p1  n 
 Classical APA with μ = 1
x(n)
(far-end)
ĥ(n)
h
(to the far-end)
v(n)
(near-end)
ε  n   0 p1
ε(n) = v(n)
where v(n) = [v(n), v(n – 1), …, v(n – p +1)]T
l 1  n  1  l  n el 1  n   v  n  l 

 
l  0,1,
, p 1

E l21  n   E v 2  n  l 


 

1  l  n   E el21  n   E v 2  n  l 
2

ˆ v  n  l 
l  n   1 
ˆ el 1  n 
l  n   1 


E el21  n 
E v2  n  l 
n   ˆ e2  n  1  1    el21  n 

l 1
l 1
ˆ e2
λ = 1 – 1/(KL), with K > 1
ˆ v  n  l 
???
l  n   1 
ˆ d2  n  l   ˆ y2ˆ  n  l 
ˆ el 1  n 
Proposed
VSS-APA
, l  0,1,..., p  1
ˆh  n   hˆ  n  1  X  n   I  XT  n  X  n  1 μ  n  e  n 
 p

regularization factor
[C. Paleologu, J. Benesty, and S. Ciochina,“Variable Step-Size Affine Projection Algorithm
Designed for Acoustic Echo Cancellation ”, IEEE Trans. Audio, Speech, Language Process.,
Nov. 2008]
• main advantages
 non-parametric algorithm
 robustness to background noise variations and double-talk
Simulation results (II)
0
APA w ith  = 1
APA w ith  = 0.25
-2
APA w ith  = 0.025
-4
VSS-APA
Misalignment (dB)
-6
-8
-10
-12
-14
-16
-18
-20
0
10
20
30
40
50
60
Time (seconds)
Fig. 12. Misalignment of the APA with three different step-sizes (μ = 1, μ = 0.25, and
μ = 0.025), and VSS-APA. The input signal is speech, p = 2, L = 1000, λ = 1−1/(6L),
and SNR = 20 dB.
0
PVSS-NLMS (VSS-APA w ith p = 1)
-2
VSS-APA w ith p = 2
VSS-APA w ith p = 4
-4
Misalignment (dB)
-6
-8
-10
-12
-14
-16
-18
-20
0
10
20
30
40
50
60
Time (seconds)
Fig. 13. Misalignment of the VSS-APA with different projection orders, i.e., p = 1
(PVSS-NLMS algorithm), p = 2, and p = 4. Other conditions are the same as in Fig. 12.
5
PVSS-NLMS (VSS-APA w ith p = 1)
APA w ith p = 2
VSS-APA w ith p = 2
Misalignment (dB)
0
-5
-10
-15
-20
0
10
20
30
40
50
60
Time (seconds)
Fig. 14. Misalignment during impulse response change. The impulse response changes
at time 20. Algorithms: PVSS-NLMS algorithm, APA (with μ = 0.25), and VSS-APA.
Other conditions are the same as in Fig. 12.
2
PVSS-NLMS (VSS-APA w ith p = 1)
0
APA w ith p = 2
VSS-APA w ith p = 2
-2
Misalignment (dB)
-4
-6
-8
-10
-12
-14
-16
-18
0
10
20
30
40
50
60
Time (seconds)
Fig. 15. Misalignment during background noise variations. The SNR decreases from
20 dB to 10 dB between time 20 and 40. Other conditions are the same as in Fig. 12.
10
PVSS-NLMS (VSS-APA w ith p = 1)
APA w ith p = 2
VSS-APA w ith p = 2
5
Misalignment (dB)
0
-5
-10
-15
0
10
20
30
40
50
60
Time (seconds)
Fig. 16. Misalignment during double-talk, without a DTD. Near-end speech appears
between time 20 and 30 (with FNR = 4 dB). Other conditions are the same as in Fig. 12.
10
PVSS-NLMS (VSS-APA w ith p = 1)
APA w ith p = 2
VSS-APA w ith p = 2
5
Misalignment (dB)
0
-5
-10
-15
0
10
20
30
40
50
60
Time (seconds)
Fig. 17. Misalignment during double-talk with the Geigel DTD. Other conditions are
the same as in Fig. 16.
Simulation results (III)
Comparisons with other VSS-type APAs
• conditions
 AEC context, L = 512
 x(n), u(n) = speech sequences
 w(n) = independent white Gaussian noise signal (SNR = 20dB)
• algorithms for comparisons
 classical APA, μ = 0.2
 variable regularized APA (VR-APA)
[H. Rey, L. Rey Vega, S. Tressens, and J. Benesty, IEEE Trans. Signal Process.,
May 2007]
 robust proportionate APA (R-PAPA)
[T. Gänsler, S. L. Gay, M. M. Sondhi, and J. Benesty, IEEE Trans. Speech Audio
Process., Nov. 2000]
 “ideal” VSS-APA (VSS-APA-id) - assuming that v(n) is available
0
APA
VR-APA
VSS-APA
-5
VSS-APA-id
Misalignment (dB)
-10
-15
-20
-25
-30
0
5
10
15
20
25
30
35
40
Time (seconds)
Fig. 18. Misalignments of APA with μ = 0.2, VR-APA, VSS-APA, and VSS-APA-id.
Single-talk case, L = 512, p = 2 for all the algorithms, SNR = 20dB.
p=2
p=8
5
5
APA
APA
VR-APA
0
VR-APA
0
VSS-APA
VSS-APA
VSS-APA-id
VSS-APA-id
-5
Misalignment (dB)
Misalignment (dB)
-5
-10
-15
-10
-15
-20
-20
-25
-25
-30
0
5
10
15
20
25
Time (seconds)
30
35
40
-30
0
5
10
15
20
25
30
35
Time (seconds)
Fig. 19. Echo path changes at time 21. Other conditions are the same as in Fig. 18.
40
0
APA
VR-APA
R-PAPA
-5
VSS-APA
VSS-APA-id
Misalignment (dB)
-10
-15
-20
-25
-30
0
5
10
15
20
25
30
35
40
Time (seconds)
Fig. 20. Misalignments of APA, VR-APA, R-PAPA, VSS-APA, and VSS-APA-id.
Background noise variation at time 14, for a period of 14 seconds (SNR decreases
from 20dB to 10 dB). Other conditions are the same as in Fig. 18.
without DTD
with Geigel DTD
APA
30
APA
30
VR-APA
R-PAPA
VSS-APA
VSS-APA
VSS-APA-id
20
10
10
Misalignment (dB)
Misalignment (dB)
VSS-APA-id
20
0
0
-10
-10
-20
-20
-30
-30
0
5
10
15
20
25
Time (seconds)
30
35
40
0
5
10
15
20
25
30
Time (seconds)
Fig. 21. Misalignment of the algorithm during double-talk. Other conditions
are the same as in Fig. 18.
35
40
Conclusions
• a family of VSS-type algorithms was developed in the context
of AEC.
• they takes into account the existence of the near-end signal,
aiming to recover it from the error of the adaptive filter.
• the VSS formulas do not require any additional parameters
from the acoustic environment (i.e., non-parametric).
• they are robust to near-end signal variations like the increase of
the background noise or double-talk.
• the experimental results indicate that these algorithms are
reliable candidates for real-world applications.