Gammachirp Auditory Filter SPOKEN LANGUAGE SYSTEMS Alex Park May 7

Download Report

Transcript Gammachirp Auditory Filter SPOKEN LANGUAGE SYSTEMS Alex Park May 7

SPOKEN LANGUAGE SYSTEMS
Gammachirp Auditory Filter
Alex Park
May 7th, 2003
MIT Laboratory for Computer Science
Project Overview
SLS
• Goal:
– Investigate use of (non-linear) auditory filters for speech analysis
• Background:
– Sound analysis in auditory periphery similar to wavelet transform
• Comparison:
– Traditional Short-Time Fourier analysis
– Gammatone wavelet based analysis (auditory filter)
• Extension:
– Gammachirp filter has level-dependent parameters which can
model non-linear characteristics of auditory periphery
• Implementation:
– Specifics of Gammachirp implementation
– How to incorporate level dependency
MIT Laboratory for Computer Science
SLS
Auditory Physiology
• Sound pressure variation in the air is transduced
through the outer and middle ears onto end of cochlea
• Basilar membrane which runs throughout the cochlea
maps place of maximal displacement to frequency
Outer ear
Middle ear
Auditory Nerve
Cochlea
Low freq (200 Hz)
Cortex
High freq (20 kHz)
Basilar Membrane
MIT Laboratory for Computer Science
Motivation – Why better auditory models?
• Automatic Speech Recognition (ASR)
– ASR systems perform adequately in ‘clean’ conditions
– Robustness is a major problem; degradation in low SNR
conditions is much worse than humans
• Hearing research
– Build better hearing aids and cochlear implants
– Hearing impaired subjects with damaged cochlea have trouble
understanding speech in noisy environments
– Current hearing aids perform linear amplification, amplify noise
as well as the signal
• Is the lack of compressive non-linearity in the front-end a
common link?
MIT Laboratory for Computer Science
SLS
SLS
Non-stationary Nature of Speech
• Why is speech a good candidate for local frequency
analysis?
Waveform of the word “tapestry”
0
0
0.1
0.2
0.3
/t/
transient
MIT Laboratory for Computer Science
0.4
/ae/
tone
0.5
0.6
0.7
0.8
/s/
noise
SLS
Time-Frequency Representation
• The most common way of representing changing spectral
content is the Short Time Fourier Transform (STFT)
0.2
0.1
8000
0
7000
-0.1
6000
0
0.01
0.02
0.03
0.04
0.05
0.06
5000
4000
FFT
3000
2000
1000
0
0
2000
4000
0
6000
8000
Frequency (Hz)
MIT Laboratory for Computer Science
Power
0.01
0.02
Time
0.03
0.04
Spectrogram from STFT
“tapestry”
MIT Laboratory for Computer Science
SLS
SLS
STFT Characteristics
• We can think of the STFT as filtering using the following
basis
• In the frequency
domain, we are using a filterbank
0
consisting of linearly spaced, constant bandwidth filters
0
dB
-10
-20
-30
-40
0
200
400
600
800
1000
Freq
(Hz)
Freq (Hz)
MIT Laboratory for Computer Science
1200
1400
1600
1800
2000
SLS
Auditory Filterbanks
• Unlike the STFT, physiological data indicates that
auditory filters:
– are spaced more closely at lower freq than at high freq
– have narrower bandwidths at lower frequencies (constant-Q)
• The Gammatone filter bank proposed by Patterson,
models these characteristics using a wavelet transform.
• The mother wavelet, or kernel function, is
at n 1 exp( 2b1 ERB ( f c )t )  exp( j 2f c t )
Tone carrier
Gamma Envelope
0
0.1
0.2
MIT Laboratory for Computer Science
0.3
0.4
0.5
SLS
Gammatone Characteristics
• Unlike the STFT, the Gammatone filterbank uses the
following basis
• The corresponding frequency responses are
0
dB
-20
-40
-60
-80
0
500
MIT Laboratory for Computer Science
Freq
Freq (Hz)
(Hz)
1000
1500
What are we missing?
• The Gammatone filterbank has constant-Q bandwidths
and logarithmic spacing of center frequencies
• Also, Gamma envelope guarantees compact support
• But, the filters are 1) symmetric and 2) linear
• Psychophysical experiments indicate that auditory filter
shapes are:
1) Asymmetric
* Sharper drop-off on high frequency side
2) Non-linear
* Filter shape and gain change depending on input level
* Compressive non-linearity of the cochlea
* Important for hearing in noise and for dynamic range
MIT Laboratory for Computer Science
SLS
SLS
Gammachirp Characteristics
• The Gammachirp filter developed by Irino & Patterson
uses a modified version of the Gammatone kernel
at n 1 exp( 2b1 ERB ( f c )t )  exp( j 2f c t  jc ln t )
Gamma Envelope
Chirp term
Tone carrier
0
dB
-20
-40
-60
-80
0
0.1
0.2
0.3
0.4
Impulse response
0.5
0
500
1000
Freq (Hz)
• Frequency response is asymmetric, can fit passive filter
• Level-dependent parameters can fit changes due to
stimulus
MIT Laboratory for Computer Science
1500
SLS
Implementation
• Looking in the frequency domain, the Gammachirp can
be obtained by cascading a fixed Gammatone filter with
an asymmetric filter
• To fit psychophysical data, a fixed Gammachirp is
cascaded with level-dependent asymmetric IIR filters
Level dependent
chirps
Gammachirp
Filter Gain(dB)
Filter Gain (dB)
Level dependent
asymmetries
Asymmetric
Compensation Filter
0
500
1000
1500
Frequency (Hz)
Gammatone
2000
2500
MIT Laboratory for Computer Science
Passive
Gammachirp
200
400
600
800
1000 1200 1400 1600 1800 2000
Frequency (Hz)
Comparison: Tone vs. Passive Chirp outputs
Gammatone Spectrogram
0.1
0.2
0.3
0.4
0.5
Time (s)
0.6
SLS
Passive Gammachirp Spectrogram
0.7
0.8
0.1
0.2
0.3
0.4
0.5
Time (s)
0.6
0.7
0.8
• Gammatone output seems to have better frequency res.
• Passive Gammachirp output seems to have better time res.
MIT Laboratory for Computer Science
Comparison: Tone vs. Active Chirp Outputs
Gammatone
Active Gammachirp
MIT Laboratory for Computer Science
SLS
SLS
Incorporating level dependency
• As illustrated in previous slide, passive Gammachirp output offers
little advantage on clean speech using fixed stimulus levels
• We can incorporate parameter control via feedback
Compute
Passive GC
Spectrogram
Segment into
frames
For each time frame
Reconstruct
Frames
MIT Laboratory for Computer Science
Filter w/ level
specific filter
S1
S2
:
SN-1
SN
Get stimulus
level/channel
SLS
Sample outputs
Clean
30dB SNR
40dB SNR
20dB SNR
MIT Laboratory for Computer Science
References
• Bleeck, S., Patterson, R.D., and Ives, T. (2003) Auditory Image Model for
Matlab. Centre for the Neural Basis of Hearing.
http://www.mrc-cbu.cam.ac.uk/cnbh/aimmanual/Introduction/
• Irino, T. and Patterson, R.D. (2001). “A compressive gammachirp auditory
filter for both physiological and psychophysical data,” J. Acoust. Soc. Am.
109, 2008-2022.
• Pickles, J.O. (1988). An Introduction to the Physiology of Hearing
(Academic, London).
• Slaney, M. (1993). “An efficient implementation of the PattersonHoldsworth auditory filterbank,” Apple Computer Technical Report #35.
• Slaney, M. (1998). “Auditory Toolbox for Matlab,” Interval Research
Technical Report #1998-010.
http://rvl4.ecn.purdue.edu/~malcolm/interval/1998-010/
MIT Laboratory for Computer Science
SLS
Sidenote
Clean
40 dB
SNR
30 dB
SNR
MIT Laboratory for Computer Science
SLS