No Slide Title

Download Report

Transcript No Slide Title

Motorola presents in collaboration with CNEL
Project Golden Voice
®
Meena Ramani, Lingyun Gu, Kausthub Kale
Golden voice
Introduction
Speech enhancement for cell phones
Frequency Independent Beamformer
Beamforming is the signal processing technique which operate on multiple sensor arrays
Use psychoacoustic and auditory system knowledge
to improve speech loudness and intelligibility
Types of Beamforming
Bandwidth Expansion
Frequency Dependent
Direction of arrival estimation & Beamforming
Frequency Independent
Motivation
Need for enhanced voice quality
Ability to identify different speakers in a
conference call
Constraints
Novel approach
Physical constraints
Advantage: Phone line frequency range: 300Hz3400Hz; Recovered frequency range: 20Hz-8000Hz
Basic Assumption: The high correlation between the
low-frequency and high-frequency components of the
same phonemes
The few Frequency independent
beamformers available work with large(512
microphone) array systems
Increase the intelligibility of speech
Motivation: The limitation of traditional narrowband
transmission channel
Goal: Increase the speech intelligibility and quality
by adding artificial high frequency components
Conventional Beamformers are all
frequency dependent.
Complete mobility under noisy conditions
Bandwidth Extension of Telephone Speech
Features
extraction
The algorithm developed at CNEL works on a
narrow baseline (4cm) 2 microphone system
Spectral envelope
estimation
The results are superior to conventional
techniques
Low software and hardware complexity
Good performance at all frequencies
Improvements in SNR
Real time operation
Improvement in Recognition
sNB (n)
LPC
analysis
Excitation
regeneration
1:2
Direction
Aim Of Arrival (DOA) estimation
WB LPC
synthesis
HPF
+
sˆWB (n)
LPF
Excitation Regeneration
DOA Requirements
• Differentiate speech source from
•
•
Frequency fold
noise source
Overcome problems of signal
distortion due to noise
Prevent loss of accuracy due to
room reverberations
Spectral Envelope Regeneration
GMM algorithm
DOA Algorithms
Spatial Correlation
methods
f XY ( x, y)  
Subspace decomposition
methods
i 1
Delay and Sum
Minimum Variance
ESPRIT
Estimation of Signal parameters
using rotational invariance
MUSIC
Multiple Signal Estimation
i
Q
Results
Q
Coherent MUSIC
with
Speech with babble noise
in the background
Root MUSIC
(2 ) n Ci
Signal processed
by the algorithm
 i  1,
i 1
1/ 2
T
 1  x 


1   x 





exp 
 i 
Ci 
 i 








2  y 


 y 



Cixx
Ci   yx
Ci
Cixy 

Ciyy 
and
  y  yˆ 
2
ˆ MMSE  arg min
y
  ix 
i   x 
i 
f y| x  y | x dy
i
ˆ MMSE 
y
DOA Algorithm requirements
Method
Low computational intensity
(FLOPS)
High accuracy (Confidence
Interval)
High speed (Time taken)
Easy to implement
Work well at low SNRs
Work well in a 2 microphone
narrow baseline (4cm) system.
ESPRIT
Low FLOPS count
DOA
Method
Equation for
Implementation
Delay and Sum
P( )  a ( ) Sa( )
Minimum
Variance
1
P( )  *
a ( )inv( s)a( )
MUSIC
Coherent
MUSIC
*
a ( )a( )
*
P( ) 
Speech with pink noise in
the background
Signal processed
by the algorithm
a ( ) EN EN a( )
P( ) 
*
Q

y
yx
xx1
x
w
(
x
)



C
C
(
x


 i
i
i
i
i )
i 1
Hamming window length
20 ms
LPC order(wideband)
18
LPC order(narrowband)
14
Spectral representation
LPC cepstrum
Mixture number (Q)
128
VQ codebook size
128
*
a 'a
a ' EN EN ' a
Root MUSIC
K  sin1 c.angle( z) /(0 d )
ESPRIT
 K  sin
1
c arg(
K
) /(0 d )
High Speed and
good low SNR
performance
Good Accuracy
Tradeoff between Accuracy and Computational intensity
Improvements in SNR for varying
Noise DOA
Plot comparing the MSE for the six
different methods at different SNRs
Comparison of FLOPS for the six
different methods for 10dB SNR
Performance comparison between
Motorola's Noise suppressor and
our algorithm
ESPRIT 34501
Outperforms!
Captions to be set in Times or Times New Roman or equivalent, italic, 18 to 24 points, to
the length of the column in case a figure takes more than 2/3 of column width.
