Transcript PPT
A Probabilistic Model for
Melody Segmentation
By Miguel Ferrand, Peter Nelson,
and Geraint Wiggins
Outlines
Overview of this model
N-gram models and Entropy
A case study
Compare with the experiment from real
listeners
Discussion
Overview
A probabilistic approach to predict
segmentation boundaries in melodies
No knowledge of music theories is used in
this model, pure mathematic method
Use entropy as a measure of
unpredictability of music features
Guess that segmentation boundaries will
appear at the changes of entropy
N-gram Models (1)
N-gram grammar (Nth order Markov
model): P of occurrence of a symbol
depends on the prior occurrence of n -1
other symbols.
The probability of sequence s = w1…wl of
length l (wji: wi…wj, n: the order)
N-gram Model (2)
Problems:
Data sparseness: some P(wi | …) = 0
Longer sequences will have lower counts if training corpus is
small
Use linear interpolation smoothing method,
Take tri-gram for example,
P(wk | wk-3, wk-2, wk-1) = λ1P(wk) + λ2P(wk | wk-1) + λ3P(wk | wk-2,
wk-1),
where λ1 + λ2 + λ3 = 1 and λ1 < λ2 < λ3
Entropy
For an N-gram model M, entropy Hc(M)
associated with context c, (e is all possible
successor symbol of c)
P(e | c) is calculated from linear interpolation smoothing
method.
Low entropy usually means high predictability.
A case study (1)
Deliège’s experiment
Subjects listened to a melody and had to
identify segmentation points in real-time. (Use
the solo for English Horn, from Tristan and
Isolde by Wagner)
Subjects are both musically trained and
untrained.
Found 8 main segment boudaries
A case study (2)
Translate melody information to eventbased representation
Pitch Step (PS): interval distance to following
event in semitones
Pitch Contour (PC): the sign of PS, {-1, +1, 0}
Duration Ratio (DR): DR of the present and
following event
Duration Contour (DC): the change of DR; -1 if
DR >1; 1 if DR < 1; 0 if DR = 1
A case study (3)
A case study (4)
Tri-gram, bi-gram and uni-gram model was
generated for PS, PC, DR and DC.
Standard deviation of entropy is calculated
with sliding window (size = 10)
Results
A case study (5)
A case study (6)
Result
Duration based features have a much higher
entropy variance than pitch based features.
Therefore time based features are more likely to
convey more information for segmentation.
Distinct changes in entropy happened to be
melody segment boundaries indicated by
listeners.
Discussion
N-gram model might be over-simplified for
music sequences.
A state depends only on the previous states.
However, human’s memory is not infinite,
either.
The ability to establish large-span temporal
relations is limited.