Adaptive Time-Frequency Resolution for Analysis and

Download Report

Transcript Adaptive Time-Frequency Resolution for Analysis and

AES 120th Convention
Paris, France, 2006
Adaptive Time-Frequency Resolution
for Analysis and Processing of Audio
Alexey Lukin
AES Student Member
Moscow State University, Moscow, Russia
Jeremy Todd
AES Member
iZotope, Inc., Cambridge, MA
Short-Time Fourier Transform
STFT[n,  ] 

 w[m]  x[m  n]  e
 jm
m  

Most commonly used transform for audio:
► Spectral analysis
► Noise reduction (spectral subtraction algorithms)
► Time-variable filters and other effects
2/15
+

–



Very fast implementation for large number of bands via FFT
Good energy compaction for many musical signals
Many oscillations in basis functions → ringing (Gibbs phenomenon)
Uniform frequency resolution → inadequate resolution at lows
A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Filter banks

Idea:
x[n]

Decomposition
Processing
of subband
signals
…
…
Synthesis
y[n]
Decompositions of time-frequency plane
f
f
Uncertainty
principle
STFT
3/15
t
DWT
t
A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Suggested approach
Transforms must vary
their time-frequency resolution
in a perceptually motivated way
► Imitation of time-frequency resolution of human
hearing
► Adaptation of resolution to local signal features
4/15
A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Spectrograms
Conventional
STFT spectrogram
(linear frequency scale)

Problems:
► Most perceptually meaningful energy is concentrated in
the narrow band below 4 kHz → can’t see useful details
► Time/frequency resolution trade-off
5/15
A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Spectrograms
Mel-scale
STFT spectrogram
(window size = 12 ms)

Problems:
► Poor frequency resolution at low frequencies → can’t
separate bass harmonics from bass drum
► Time/frequency resolution trade-off
6/15
A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Spectrograms
Mel-scale
STFT spectrogram
(window size = 93 ms)

Problems:
► Poor time resolution at transients → time-smearing of
drums
7/15
A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Spectrograms

Simple solution: combine spectrograms with
different resolutions
► Take bass from spectrogram with good freq. resolution
► Take treble from spectrogram with good time resolution
x[t]
Analysis
Filter bank 2
af,t,1
af,t,2
control
Filter bank 1
Mixer of coefficients
af,t
8/15
A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Spectrograms
Combined resolution
spectrogram
(window sizes
from 12 to 93 ms)

Simple solution:
► Combine spectrograms with different resolutions: take
bass from spectrogram with good frequency resolution,
take treble from spectrogram with good time resolution
9/15
A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Spectrograms


Better approach: select best resolution for each
time-frequency neighborhood
Criteria?
► Better frequency resolution at bass
(reflects a-priori
psychoacoustical knowledge)
► Maximal energy compaction
(to minimize spectral smearing in
both time and frequency)
best
6 ms
12 ms
24 ms
48 ms
96 ms
STFT window size
10/15
A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Spectrograms

Calculation of energy compaction
(energy smearing in the given block
for all given resolutions)
Sr 
i a
i ,r
i
a
i ,r

i
r0  arg min Sr
r
Here ai,r are descendingly sorted STFT magnitudes in the block,
Sr is the energy smearing for the given resolution r,
r0 is the resolution with best energy compaction.
best
6 ms
12 ms
24 ms
48 ms
96 ms
STFT window size
11/15
A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Spectrograms
Adaptive resolution
spectrogram
(window sizes
from 12 to 93 ms)

Benefits:
► Sharper bass drum hits and other transients, even in midfrequency range
► Sharper guitar harmonics in high frequencies
12/15
A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Spectrograms
More examples
Conventional
STFT spectrogram
Tone onset waveform
13/15
A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Spectrograms
More examples
Adaptive resolution
spectrogram
Combined resolution
spectrogram
14/15
A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Processing framework
General framework for
multi-resolution processing
► Perform processing with
several different resolutions
► Adaptively combine (mix)
results in time-frequency space
► Mixing is controlled by a-priori
knowledge of psychoacoustics
and analysis of local signal features
(e.g. transience)
Processing 1
Processing 2
x1[t]
x2[t]
Filter bank
Filter bank
Analysis
control

x[t]
Mixer of coefficients
Inverse
filter bank
y[t]
15/15
A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Noise reduction

Spectral subtraction algorithm modifications
► Better frequency resolution at low frequencies (according to
the human hearing resolution)
► Better temporal resolution near signal transients (for
reduction of Gibbs phenomenon)
Transience
analysis
x1[t]
Spectral subtraction x2[t]
(long windows)
16/15
STFT
STFT
Mixer
of coefficients
y[t]
Spectral subtraction
(short windows)
control
Synthesis
x3[t]
A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Noise reduction

Results of single-resolution and multi-resolution
algorithms
Noisy recording
(guitar + castanets)
17/15
A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Noise reduction

Results of single-resolution and multi-resolution
algorithms
Single resolution
Multi-resolution
18/15
A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Your questions
?
Demo web page: http://www.izotope.com/tech/aes_adapt/
Poster session P17: Monday, 9:00 – 10:30 a.m.
19/15
A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”