Noise Supression Techniques for Speech Enhancement Using

Download Report

Transcript Noise Supression Techniques for Speech Enhancement Using

Noise Supression Techniques for Speech
Enhancement Using Adaptive Filtering
Derek Shiell
03/09/2006
ECE 463: Project Presentation
Professor Michael Honig
Overview





Objective/Problem Description
Applications
Overview of Noise Reduction Methods
System Description
Filter analysis

Linear methods









Wiener approximation
KLT preprocessing
Signal subspace embedding
Kalman filter based methods
Non-linear methods
Current results
Future work
Implementation/ practical considerations
Conclusions
Objective/Problem Description
The goal of my project was to
research noise reduction techniques
specifically for automatic speech
recognition system front-end processing
on a single microphone without an
independent noise recording or clean
reference signal.
Applications

Cell phone speech enhancement

Automatic speech recognition

Speaker identification

(1)
(2)
(3)
(1)
(2)
Biomedical signal processing
http://images.businessweek.com/mz/04/45/techbuy/images/razr_phone.jpg
http://www.nanopac.com/images/smnsbox.jpg
http://ldt.stanford.edu/~sgilutz/Shulis_Portfolio/fall/hci/images/sensory.jpg
(3)
Overview of Speech Enhancement

Microphone Array Processing


Active echo/noise cancellation (ANC)


Utilizing multiple microphones, blind source separation (BSS)
techniques such as independent component analysis (ICA)
may be used to distinguish one speaker from other
directional or diffuse noises.
In this case, the echo or noise is estimated and re-generated
with opposite phase to destructively interfere with the
original echo or noise.
Blind noise suppression

In this case, there is a single speech signal corrupted by
noise, no separate noise recording with which to make noise
estimates, and no source signal to reference.
System Descriptions
BSS/ICA
ANC
Active Noise Cancellation with single microphone/speaker [4]
BSS based on frequence domain ICA [6]
Blind Noise Reduction
Blind noise reduction schematic [1]
Filter Analysis
(1)
Linear MMSE (Wiener approximation)
yk  sk  nk
error  sk  sˆk
sˆk  g ( yk )
MMSE cost function

Min
E  sk  g ( yk ) 
g

Min
E  yk  g ( yk )    2E  nk  g ( yk )    2E  yk  nk    E nk 2 
g

2

 

2
 








Reduces to (frame length N):






2
1 N 




  yk  g ( y )    2 E   nk  g ( y )   
Min

k  
k   
g
 
N k 1 

Filter Analysis
(2)
Linear Estimation (continued)
Signal is estimated from a linear filtering of the corrupted signal
sˆk  wT yk
Minimizing the MMSE cost function with respect to w the result is as follows:
w  Rˆ -1y (rˆy  rn )
This is an approximation to the Wiener solution where we are estimating the
crosscorrelation vector p with (ry – rn) (similar to spectral subtraction)
Filter Analysis
(3)
Linear estimation with Karhunen Lòeve Transform (KLT)
Preprocessing the signal using KLT (or PCA) separates the signal into its directions of
greatest variance. Using the transform the signal can be mapped into a lower
dimensional space which helps decorrelate the signal from noise. For a changing signal
this requires that U be adaptively updated.
Define U the KLT transform as the eigenvectors of Ry the autocorrelation matrix of the
noisy signal.
U  eig ( Rˆ y )
Using this transformation we can define the transformed yk as:
~
yk  Uyk
The resulting closed form solution for the weight vector is:


1 T
ˆ
w  U R y U U rˆy  rn 
T
Filter Analysis
(4)
Signal subspace embedding
This method allows for a matrix of gain factors, W, rather than simply a weight
vector, w (MIMO) so that a simultaneous block estimate of sˆk can be made. In
addition the matrix Q can be chosen as either I or to taper the tap weights by some
factor(s) such that sˆk is emphasized more in the minimization phase.
MMSE cost function:

 
~ T 
~   
 E  sk  W( y )  Q sk  W( y )   
Min
k
k 
g
 




 
Update Equations for the filter matrix and transform basis can be found iteratively:
1

Wi 1  Wi    Qe yk y T UTi    Q Rn UTi 
N

1

U i 1  U i    WiT Qe yk yiT    WiT Q Rn 
N

Filter Analysis
(5)
Kalman Filtering Approaches
Kalman filters are widely used in speech enhancement and much theoretical work has
been done analyzing Kalman filters. The Kalman filter is the minimum mean-square
estimator of the state of a linear dynamical system and can be used to derive many types
of RLS filters. Extended Kalman filters can be expanded to handle nonlinear models
through a linearization process.
Kalman filters have the advantages that they are:



more robust (stationarity not assumed)
require only the previous estimate for the next estimation
computationally efficient
(versus all passed values for instance)
Standard linear state-space model for Kalman filter
x(n  1)  F (n  1, n) x(n)  v1 (n)
y(n)  C (n) x(n)  v2 (n)
Filter Analysis
(6)
Nonlinear filtering
Many nonlinear filtering methods exist to suppress noise in noisy speech.
Examples include filters based on neural networks or phase space
reconstruction. In general, they are very complex to analyze, but do not
require estimation of noise or speech spectra and are not characterized by
“musical tone” artifacts.
Feed forward neural network (1)
Phase space reconstruction for different speech phonemes [9]
(1) http://research.yale.edu/ysm/images/78.2/articles-neural-network.jpg
Typical Results
Segmental SNR results
(left) and SNR results
(below) for various linear
and nonlinear noise
reduction methods [8]
Noisy Speech Signal
(white noise)
Wiener Filtered
Ephraim Filtered
Comparison of segmental SNR performance for different noise sources:
1)
2)
3)
4)
White noise (SNR 6.08 dB)
Pink noise (SNR 4.34 dB)
Factory noise (SNR 5.16 dB)
F16 noise (SNR 4.61 dB)
a) Linear estimation b) linear estimation with KLT preprocessing c) signal subspace
embedding d) weighted signal subspace embedding e) NN with KLT f) linear with
clean target g) nonlinear with clean target h) standard spectral subtraction method
(3dB segmental SNR ~ 5dB SNR) [1]
Future Work


Perform ASR after noise reduction filtering
AVICAR database




Data collected in a car environment
Time varying SNR
No independent noise recording (detecting speech is difficult)
Experiments



KLT preprocessing + linear estimation (Wiener)
Ephraim filter (ML short time spectral amplitude estimator)
Nonlinear methods
Implementation/
Practical Considerations

Real-time processing


Applications require computationally
efficient algorithms to be feasible.
Determining noise sample


Single microphone, speech detection to
estimate noise statistics is difficult.
Use visual information to detect speech or
nonlinear noise reduction methods
Conclusions


Noise suppression methods have become increasingly
important due to the proliferation of mobile devices,
ASR systems, and biometrics/ bioinformatics
Speech enhancement is a very broad field


Interested in blind noise reduction




Array processing for source separation, noise cancellation
Linear, Linear + KLT preprocessing, Signal subspace embedding
Kalman filter based methods, Non-linear methods
Using state-of-the-art noise reduction methods,
typical SNR improvements are ~5 dB
Proposed experiments to test ASR improvement
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
Eric A. Wan and Rudolph van der Merwe, “Noise-Regularized Adaptive Filtering for Speech
Enhancement,” Proc. Eurospeech, pp. 2643-2646, 1999.
Ki Yong Lee., Byung-Gook Lee, Iickho Song, and Souguil Ann, “Robust Estimation of AR Parameters
and its Application for Speech Enhancement,” Proc. IEEE ICASSP, pp. 309 - 312, 1992.
Phil S. Whitehead, David V. Anderson, and Mark A. Clements, “Adaptive, Acoustic Noise Suppression
for Speech Enhancement.” Proc. IEEE ICME, pp. 565 – 568, 2003.
A. V. Oppenheim, E. Weinstein, K. C. Zangi, M. Feder, and D. Gauger, “Single Sensor Active Noise
Cancellation Based on the EM Algorithm,” Proc. IEEE ICASSP, pp. 277 – 280, 1992.
T. Rutkowski, A. Cichocki, and A. K. Barros, “Speech Enhancement Using Adaptive Filters and
Independent Component Analysis Approach,” Proc. AISAT, 2000.
H. Saruwatari, K. Sawai, A. Lee, K. Shikano, A. Kaminuma, and M. Sakata, “Speech Enhancement and
Recognition in Car Environment Using Blind Source Separation and Subband Elimination Processing,”
Proc. ICA, pp. 367 – 372, 2003.
Simon Haykin, Adaptive Filter Theory, Prentice-Hall Inc., Upper Saddle River, NJ, pp 466 – 501, 2002.
M. T. Johnson, A. C. Lindgren, R. J. Povinelli, and X. Yuan, “Performance of Nonlinear Speech
Enhancement using Phase Space Reconstruction,” Proc IEEE ICASSP, pp. 872 – 875, 2003.
Andrew C. Lindgren, “Speech Recognition Using Features Extracted from Phase Space
Reconstructions,” Thesis, Marquette University, Milwaukee WI, May 2003.
END