Multi-channel speech enhancement

Transcript Multi-channel speech enhancement

Multi-channel speech
enhancement
Chunjian Li
DICOM, Aalborg University
3/24/2006
Lecture notes for Speech Communications
Methods & applied fields

Dual-channel spectral subtraction
- noise reduction in speech

Adaptive Noise Canceling (ANC)
- noise reduction and interference elimination
- echo canceling
- adaptive beamforming


Blind Source Separation (BSS)
Blind Source Extraction (BSE)
3/24/2006
Lecture notes for Speech Communications
Dual-channel spectral
subtraction
- Hanson and Wong, ICASSP84.
3/24/2006
Lecture notes for Speech Communications
The method



The exponent is chosen to be a=1 based on
listening test and spectral distortion
measure.
The noisy phase is used in the
reconstruction of signal.
The estimate of noise spectrum is either
obtained from a reference channel or
estimated from the noisy signal assuming
the SNR is very low (about -12 dB).
3/24/2006
Lecture notes for Speech Communications
Revisiting the phase issue
To see the dependency of magnitude on phase:
1
a
Sˆ ( f )   S ( f )  N ( f )  Nˆ ( f )   e j ( f )


a
a

  S( f )

a
ˆ
S ( f )   N ( f ) 1  
  N( f )



2

 S( f )
  2

 N( f )




a

 cos()  Nˆ ( f ) 






a
2
1
a
where  is the phase difference between the two signals.
It is clear that the estimate of signal magnitude spectrum depends
on both the SNR and the phase difference. But phase is not estimated
in this method because the enhanced quality is acceptable.
3/24/2006
Lecture notes for Speech Communications
Comments




The simplest (and a bit unrealistic)
form of exploiting multi-channel.
Aims at improving intelligibility.
Significant intel. gains only at very low
SNR (-12dB).
Unvoiced speech is not processed.
3/24/2006
Lecture notes for Speech Communications
Adaptive Noise Canceling



First proposed by Widrow et al. [1] in 1975.
It is adaptive because of the use of adaptive
filter such as the LMS algorithm.
The objective: estimate the noise in the
primary channel using the noise recorded in
the secondary channel, and subtract the
estimate from the primary channel
recordings.
[1] B. Widrow, J. R. Grover, J. M. McCool et al. ”Adaptive noise canceling:
Principles and applications,” Proceedings of the IEEE, vol.63, pp. 1692-1716,
Dec. 1975.
3/24/2006
Lecture notes for Speech Communications
Signal model
3/24/2006
Lecture notes for Speech Communications
Signal estimation
The estimated signal:
sˆ(n)  y(n)  dˆ1 (n)
M 1
dˆ1 (n)   hˆ(i )d 2 (n  i )
i 0
The optimization criterion:


hˆ  arg min   y(n)   hˆ(i)d 2 (n  i)
h
i 0


M 1
3/24/2006
Lecture notes for Speech Communications
2
Signal estimation
The minimization can be solved by applying the orthogonality principle:
M
ryd2 ( )   hˆ(i)rd 2 (  i)  0
i 0
This can be solved in the same way as solving the normal equations.
But it is usually solved by sequential algorithms such as the LMS
algorithm. The advantages of the LMS are:
-No matrix inversion, low complexity
-Fully adaptive, suitable to non-stationary signal and noise
-Low delay
3/24/2006
Lecture notes for Speech Communications
LMS
-It is a sequential, gradient descent minimization method,
- The estimate of the weights is updated each time a new sample
is available:
hˆ k  hˆ k 1  k g
Where the element of the gradient vector:
g
3/24/2006
( )
M 1




 2ryd2 ( )   hˆ(i)rd2 (  i)
hˆ( )
i 0


Lecture notes for Speech Communications
LMS
Or, in matrix form:
g  2(ryd2  R d 2 hˆ )
The most important trick is, in this sequential implementation, to
approximate the correlation matrix and cross-correlation vector by
The instantaneous estimates.
ˆ  d dH 2
R
d2
2
ryd2  d2 y(n)
3/24/2006
Lecture notes for Speech Communications
LMS
The step size is often chosen empirically, as long as the following
condition is satisfied for stability reason:
0
1
max
where max is the largest eigenvalue of the matrix R d
2
The larger the step-size, the faster the convergence, but also the
larger estimation variance.
3/24/2006
Lecture notes for Speech Communications
Comments




The LMS belongs to the stochastic gradient
algorithm.
The algorithm is based on the instantaneous
estimates of correlation function, which are of high
variance. But the algorithm works well because of
its iterative nature, which averages the estimate
over time.
Low complexity: O(M), where M is the filter order.
Although the derivation is based on WSS
assumption, the algorithm is applicable to stationary
signals, due to the sequential implementation.
3/24/2006
Lecture notes for Speech Communications
Implementation issues of ANC





Microphones must be sufficiently separated in
space or contain acoustic barriers.
Typically 1500 taps are needed => large
misadjustment => pronounced echo => must use
small step-size => long convergence time.
Different delays from the sources to the two
microphones must be taken care of.
Frequency domain LMS can reduces the number
of taps needed.
ANC can be generalizes to a multi-channel system,
which can be seen as a generalized beamforming
system.
3/24/2006
Lecture notes for Speech Communications
Eliminating cross-talk
Cross-talk: If the signal is also captured in the reference channel, the ANC
will suppress part of the signal. Cross-talk can be reduced by employing
two adaptive filter within a feedback loop.
3/24/2006
Lecture notes for Speech Communications
Beamforming



Compared to ANC, beamforming is
truly a spatial filtering technique.
First, locate the source direction; then
form a beam directing to the source.
The source location problem is a
analogy of the spectral analysis
problem, with the frequency domain
replaced by the spatial domain.
3/24/2006
Lecture notes for Speech Communications
A simple array model





Planar wave
Uniform linear array
Sensors responses are identical and
LTI
Sensors are omni directional
One parameter to estimate: DOA
3/24/2006
Lecture notes for Speech Communications
ULA
3/24/2006
Lecture notes for Speech Communications
ULA
The signal model:
y(t )  a( )s(t )  e(t )
where the array transfer vector :

a( )  1 e jc 2
... e jc m

T
Where  m is the delay with reference to the first sensor, and c is the
center frequency of the signal. By defining the spatial frequency as:
 s  c
d sin 
c
we can write the array transfer vector as:

a( )  1 e
3/24/2006
 js
... e

 j ( m 1)s T
Lecture notes for Speech Communications
ULA



A direct analogy between frequency
analysis and spatial analysis using the
spatial frequency.
To avoid spatial aliasing:
d /2
All frequency analysis techniques can be
applied to the DOA estimation problem.
3/24/2006
Lecture notes for Speech Communications
Spatial filtering

Analogy between spatial filter and temporal filter
3/24/2006
Lecture notes for Speech Communications
Spatial filtering


The spatially filtered signal: x(t )  h*a( )s(t )
Objective: find the filter that passes
undistorted the signals with a given DOA;
and attenuates all the other DOAs as
much as possible.
min h*h subject to h*a( )  1
h
3/24/2006
Lecture notes for Speech Communications
The beam pattern
3/24/2006
Lecture notes for Speech Communications
Restrictions to beamforming






Very sensitive to array geometry, need good
calibration
Has only directivity, no selectivity in range or
other location parameters
Frequency response is not flat
Ambient noises are assumed to be spatially
white
Beam width (or selectivity) depends on the
size of the array
Spatial aliasing problem
3/24/2006
Lecture notes for Speech Communications
Blind Source Separation (BSS)







MIMO systems
Spatial processing techniques with no
knowledge of array geometry
Invisible beam
Arbitrarily high spatial resolution
Do not depend on signal frequency
Spatial noise is not assumed to be white
Not a spatial sampling system
3/24/2006
Lecture notes for Speech Communications
Solutions to BSS


Independent Component Analysis
(ICA) [2]
Independent Factor Analysis (IFA) [3]
[2] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis, John Wiley & Sons, Inc. 2001
[3] H. Attias, “Independent factor analysis”, Neural Computation, 1999.
3/24/2006
Lecture notes for Speech Communications
Independent component
analysis (ICA)





Instantaneous mixing
The number of sensors is greater than
or equal to the number of sources
No system noise
The sources (components) are
independent of each other
The sources are non-Gaussian
processes
3/24/2006
Lecture notes for Speech Communications
ICA model
Cocktail party problem. Three sources, three sensors:
 x1 (t )   a11s1 (t ) a12 s2 (t ) a13 s3 (t ) 
 x (t )  a s (t ) a s (t ) a s (t )
22 2
23 3
 2   12 1

 x3 (t )  a31s1 (t ) a32 s2 (t ) a33 s3 (t ) 
Or, in matrix form
x  As
Neither s nor A are known. Can not be solved by linear algebra.
If the sources are independent non-Gaussian, the A matrix can
be found by maximizing the non-Gaussianity of the sources.
3/24/2006
Lecture notes for Speech Communications
Contrast function
An iterative gradient method. First initialize the A matrix.
If the mixing matrix A is square and non-singular, move it to the left:
A 1x  s
Calculate the non-Gaussianity of s, and find the next estimate of A that
gives a higher non-Gaussianity. Iterate until convergence.
The contrast function is the objective function to maximize or minimize.
3/24/2006
Lecture notes for Speech Communications
Maximizing non-Gaussianity


Non-Gaussian is independent
Measuring non-Gaussianity
- by kurtosis
- by negentropy
3/24/2006
Lecture notes for Speech Communications
ICA methods




ICA by maximizing non-Gaussianity
ICA by Maximum Likelihood
ICA by minimizing mutual information
ICA by nonlinear decorrelation
3/24/2006
Lecture notes for Speech Communications
Extensions to ICA





Noisy ICA
ICA with non-square mixing matrix
Independent Factor Analysis
Convolutive mixture
Methods using time structure
3/24/2006
Lecture notes for Speech Communications
Blind Source Extraction



Only interested in one or a few
sources out of many (feature
extraction)
Save computation
Don’t know the exact number of
sources
3/24/2006
Lecture notes for Speech Communications
BSE
D. Mandic and A. Cichocki, An Online Algorithm For Blind Extraction Of Sources With Different
Dynamical Structures.
3/24/2006
Lecture notes for Speech Communications

Multi-channel speech enhancement

Transcript Multi-channel speech enhancement

Directory