Transcript poster

AUDIO TONALITY MODE CLASSIFICATION
WITHOUT TONIC ANNOTATIONS
Zhiyao Duan1,2, Lie Lu1, and Changshui Zhang2
1. Microsoft Research Asia (MSRA), China.
2. Department of Automation, Tsinghua University, China.
Summary
Tonality mode classification for popular songs,
only mode is labeled in training data.
 Traditional key finding algorithms often rely
on tonic annotations of the training songs.
 Keys of popular songs are hard to obtain
 Easier to label mode than key for a song.
 Mode is more important than tonic.
An alignment approach to transpose chroma
features to a reference (but unknown) tonic.
 Three methods for mode learning:
 Single Profile Correlation (SPC)
 Multiple Profile Correlation (MPC)
 Support Vector Machine (SVM)
1. After N times updates,
is used to initialize
again, and Step 2 is performed once more.
2. The calculated average vector is stable when
the sequence of the training chroma vectors
being randomly changed.
Learning and Classification
• Key: C-major, a-minor, Eb-major, etc.
• Mode: major/minor
• Tonic: C, C#, D, etc.
Single Profile Correlation (SPC):
1. In training, Each mode is represented by one
chroma profile, using a 12-d or 7-d feature.
2. Each element of the 7-d profile corresponds
to the diatonic note of the 12-d profile.
3. In testing, circularly shift the chroma vector
of a excerpt 12 times
.
4. Correlate against the major/ minor profiles
The highest correlated one indicates the
mode.
Algorithm Flow
5. Majority voting of excerpts for song mode.
Feature Extraction and Alignment
Chroma feature extraction:
1.Divide a song into excerpts (15s, 30s, whole).
2.In each frame (130ms with 10ms shift) of an
excerpt, a 48-bins CQT in the frequency range
from 130Hz (C3) to 1975Hz (B6) is calculated.
3.For each excerpt, a 12-d Chroma vector is
calculated from the average CQT vector.
4.Each Chroma vector is normalized.
Alignment:
To transpose chroma vectors within each mode
to a reference (but unknown) tonic.
Criteria: Maximize the overall correlation.
: inner product;
: norm;
: the transposition of , by circularly shifting
the items j positions to the left;
: i-th aligned vector; q: the average vector.
Multiple Profile Correlation (MPC):
1. In training, K profiles (12-d or 7-d) to
represent a mode, using a K-kernel
Gaussian Mixture Model.
2. In testing, circularly shift the chroma vector
of a excerpt 12 times to generate 12 vectors.
3. Correlate the shifted vectors with the major/
minor profiles (Eq. (6)). The maximum or the
weighted summation of the correlations
defines the confidence score. The highest
confidence score indicates the mode.
4. Majority voting of excerpts for song mode.
Support Vector Machine (SVM):
1. In training, train a SVM using training
chroma vectors.
2. In testing, circularly shift the chroma vector
of a excerpt 12 times to generate 12 vectors.
3. Classify each shifted vector, and the label of
the one with the highest classification
confidence is assigned to the excerpt.
4. Majority voting of excerpts for song mode.
The issue of inter-class alignment:
Need to consider the alignment between the
vectors of the two modes in the training phase.
Three alignment methods:
a)Make the major profile and the minor profile have
the same tonic (see Fig. 2(a)). (bad)
b)Make major and minor “relative” (see Fig. 2(b)),
such as for C-major and a-minor. (bad)
c)Make the profiles of major and minor correlate
least or apart furthest, as in Fig. 2(c). (good)
A greedy method for alignment:
1.Initialization:
2.Align and update one by one.
training set is small, i.e., the training samples of
major and minor mode are close to each other
in the feature space. Therefore, it is hard for
SVM to find a good classification surface
between two modes.
1) The decisive (shifted) chroma vector among is
the furthest one from the classification surface.
This makes the distribution of the decisive test
vectors different from that of the training vectors
in Method (a) and (b).
2) For Method (c), this alignment together with the
inner-class alignment, can be seen analog to
minimize the intra-class distance while to
maximize the inter-class distance.
Experiments
Materials:
 4,528 (2,786 major and 1,742 minor) songs.
 Various genres including rock, electronica,
folk, country, jazz, etc.;
Songs having ambiguous modes or majorminor modulations were discarded.
 Training set: 25%, test set: 75%.
Results:
1. In SPC, profiles by aligned features > those
without alignment, or Krumhansl’s profiles.
2. MPC > SPC
3. 15s- and 30s- excerpts > the whole song
4. 7 diatonic elements > 12 elements
1. Least-correlation (or maximum apart) criteria
works best.
2. 15s- and 30s- excerpts > the song-level.
3. SVM > Profile correlation methods.
4. The best result using SVM is up to 78.2%.
Future Work
Explanation:
1) For Method (a) and (b), the distance between
the major profile and the minor profile in the
1. How to propose a kind of key-independent
feature for mode classification?
2. How to exploit temporal information to
improve the mode model building?