RTpitchtrack - McGill University
Download
Report
Transcript RTpitchtrack - McGill University
Overview of Real-Time
Pitch Tracking
Approaches
Music information retrieval seminar
McGill University
Francois Thibault
Presentation Goals
Describe the requirements of RT pitch
tracking algorithm for musical applications
Briefly introduce key developments in RT
pitch tracking algorithms
Provide insight on what techniques might
be more suitable for a given application
Pitch tracking requirements
in musical context
Must often function in real-time
Minimal output latency
Accuracy in the presence of noise
Frequency resolution
Flexibility and adaptability to various musical
requirements:
Pitch range
Dynamic range
…
Overview of techniques
Time-domain methods
Frequency-domain methods
Autocorrelation Function (Rabiner 77)
Average Magnitude Difference Function (AMDF)
Fundamental Period Measurement (Kuhn 90)
Cepstrum (Noll 66)
Harmonic Product Spectrum (Schroeder 68)
Constant-Q transform (Brown 92)
Least-Squares fitting (Choi 97)
Maximum Likelihood (McAulay 86, Puckette 98)
Other approaches…
Autocorrelation method
Based on the fact that periodic signal will
correlate strongly with itself offset by the
fundamental period
Measures to which extent a signal correlates
with a time-shifted version of itself
The time shifts which display peaks in the ACF
corresponds to likely period estimate
1
ø(t) =
N
N- 1
• x(n) x(n + t)
n=0
Autocorrelation Pros/Cons
Simple implementation (good for hardware)
Can handle poor quality signals (phase
insensitive)
Often requires preprocessing (spectral
flattening)
Poor resolution for high frequencies
Analysis parameters hard to tune
Uncertainty between peaks generated by
formants and periodicity of sound can lead to
wrong estimation
AMDF
Again based on the idea that a periodic
signal will be similar to itself when shifted
by fundamental period
Similar in concept to ACF, but looks at
difference with time shifted version of itself
The time shifts which display valleys
correspond to likely period estimates
1
psi(t) =
N
N-1
•
n=0
x(n) - x(n + t)
AMDF Pros/Cons
Poor frequency resolution
Even simpler implementation then ACF
(good for hardware)
Less computationally expensive then ACF
Combination of AMDF and ACF yields
result more robust to noise (Kobayashi 95)
f(t) =
ø(t)
psi(t) + k
Fundamental Period
Measurement approach
Signal is first ran through bank of half-octave
bandpass filters
If filters are sharp enough, the output of one filter
should display the input waveform freed of its
upper partials (nearly sinusoidal)
It is up to a decision algorithm to decide which
filter output corresponds to fundamental
frequency
Time between zero crossings of that filter output
determines period
FPM Pros/Cons
Easy implementation (hardware and
software)
Efficiency of computation
Decision algorithm highly dependent on
thresholds
But, automatic threshold setting provided
for most situations
Cepstrum approach
Tool often used in speech processing
Cepstrum is defined as power spectrum of
logarithm of the power spectrum
Clearly separate contribution of vocal tract and
excitation
A strong peak is displayed in the excitation part
(high cepstral region) at the fundamental
frequency
Use a peak picker on cepstrum and translate
quefrency into fundamental frequency
Cepstrum Pros/Cons
Less confusion between candidates than
in ACF
Proven method, especially suitable for
signal easily characterized by source-filter
models (e.g. voice)
Relatively computationally intensive (2
FFTs)
Harmonic Product Spectrum
approach
Measures the maximum coincidence of
harmonics for each spectral frame
Resulting periodic correlation array is
searched for maximum which should
correspond to fundamental frequency
Algorithm ran for octave correction
Y(w) = Prod X(wr)
Ÿ = max (Y(wi))
HPS Pros/Cons
Simple to implement
Does well under wide variety of conditions
Poor low frequency resolution
Computing complexity augmented by zero
padding required for interpolation of low
frequencies
Requires post-processing for error
correction
Constant-Q transform
approach
First computes the Constant-Q transform
to obtain constant pattern in log frequency
domain (Q = fc/bw)
Compute the cross-correlation with a fixed
comb pattern (ideal partial positions for
given fundamental frequency)
Peak-pick the result to obtain fundamental
frequency
Constant-Q Pros/Cons
Complexity of constant-Q reduced but
still… (Brown and Puckette 91)
Sensitive to octave errors
Other peaks could be candidates
Least-Squares fitting
approach
Perform least-squares spectral analysis -->
minimize error by fitting sinusoids to the signal
segment
Strong sinusoidal components are identified as
sharp valleys in least-square error signal
Relatively few evaluation of the error signal are
required to identify a valley
Fundamental frequency is obtained as average
of partial frequencies over their partial number
Uses rectangular windowing to provide faster
response
LS fitting Pros/Cons
Operates on shorter frame segments
Best option for real-time applications with
minimum latency requirements
Efficient evaluation scheme allows
reasonable computation complexity
Maximum Likelihood
Maximum likelihood algorithm searches
trough a set of possible ideal spectra and
chooses closest match (Noll 69)
Was adapted to sinusoidal modeling
theory, by finding best fit for harmonic
partials sets to the measured model
(McAulay 86)
Enhance discrimination by suppressing
partials of small amplitude values
ML Pros/Cons
Inherits high computational requirement
from sinusoidal modeling
Very robust estimation
Allows guess of fundamental frequency
even with several partials missing.
Other approaches
Neural Nets (Barnar 91)
Hidden Markov Models (Doval 91)
Parrallel processing approaches (Rabiner
69)
Fourier of Fourier transforms (Marchand
2001)
Two-way mismatch model (Cano 98)
Subharmonic to harmonic ratio (Sun 2000)
Conclusions
Lot of research still… Motivated by speech
telecommunication
Abundant literature since 1950
Complete and objective performance
overviews seems missing
Combination of techniques in parallel
processing seems foreseeable with
today’s fast computers