Music Retrieval and Analysis Part I: Music Retrieval Arbee L.P. Chen

Download Report

Transcript Music Retrieval and Analysis Part I: Music Retrieval Arbee L.P. Chen

Music Retrieval and Analysis

Part I: Music Retrieval

Arbee L.P. Chen ISMIR’03 Tutorial III

Outline

   Technologies   Architecture for Music Retrieval Music Representations    Music query processing Music indexing Similarity measures Systems and evaluation  Existing systems      Meldex Themefinder SEMEX PROMS OMRAS  System evaluation Future research directions

Architecture for Music Retrieval

Users Music Player Music Query Interface Result music objects Query result Music query Result music objects Music Storage manager Music Feature Extractor Music objects Music Query Processor Music features Music Database Music Index

Music Representations

Media Music info acoustical thematic Music_Info key beat tempo Acoustical loudness pitch duration brightness bandwidth Thematic theme* rhythm melody chord Music _Wave Music _MIDI Music_AU IS-A relationship composition relationship * multi-valued attribute

Styles of Music Composition

   Monophony  Monophonic music has at most one note playing at any given time; before a new note starts the previous note must have ended Homophony  Homophonic music has at most one set of notes playing at the same time. For any set of notes that start at the same time, no new note or notes may begin until every note in that set has ended Polyphony  Polyphonic music has no such restrictions. Any note or set of notes may begin before any previous note or set of notes has ended

Monophony Representations

  Absolute measure   Absolute pitch  C5 C5 D5 A5 G5 G5 G5 F5 G5 Absolute duration  1 1 1 1 1 0.5 0.5 1 1  Absolute pitch and duration  (C5,1)(C5,1)(D5,1)(A5,1)(G5,1)(G5,0.5)(G5,0.5)(F5,1)(G5,1) Relative measure    Contour (in semitones)  0 +2 +7 -2 0 0 -2 +2 IOI (Inter onset interval) ratio  1 1 1 1 0.5 1 2 1 Contour and IOI ratio  (0,1)(+2,1)(+7,1)(-2,1)(0,0.5)(0,1)(-2,2)(+2,1)

Polyphony Representations

    All information preservation  Keep all information of absolute pitch and duration (start_time, pitch, duration)  (1,C5,1)(2,C5,1)(3,D5,1)(3,A5,1)(4,F5,4)(5,C6,1)(6,G5,0.5)(6.5,G5,0.5)… Relative note representation  Record difference of start times and contour (ignore duration)  (1,0)(1,+2)(0,+7)(1, 4)… Monophonic reduction  Select one note at every time step (main melody selection)  (C5,1)(C5,1)(A5,1)(F5,1)(C6,1)...

Homophonic reduction (chord reduction)  Select every note at every time step  (C5)(C5)(D5,A5)(F5)(C6)(G5)(G5)…

Music Representation - Theme

    Theme  A short tune that is repeated or developed in a piece of music A small part of a musical work  Efficient retrieval A highly semantic representation  Effective retrieval Automatic theme extraction  Exact repeating patterns  Approximate repeating patterns

Music Representation – Markov Models

 Capture global information for a music piece  Repeating patterns  Sequential patterns  A lossy representation  Good for music classification

Markov Model Representation

 [Pickens and Crawford, CIKM‘02]  Homophonic reduction  For each chord, compute its distance with the 24 lexical chords  Capture statistical properties by Markov models  The representation of each song is reduced into a matrix

Markov Model Representation (Cont.)

Chord Markov model representation Lexical chords

Music Query Processing

 On-line methods (string matching algorithms)  Exact string matching   Brute-force method KMP algorithm   Boyer-Moore algorithm Shift-Or algorithm  Partial string matching  Shift-Or algorithm  Approximate string matching  Dynamic programming

Brute-Force Method

   T: A5 B5 A5 C5 A5 B5 A5 B5 P: A5 B5 A5 B5 Time complexity O(mn) A5 B5 A5 C5 A5 B5 A5 B5 A5 B5 A5 B5 A5 B5 A5 B5 A5 B5 A5 B5 A5 B5 A5 B5 A5 B5 A5 B5

KMP Algorithm

   Left-to-right scan Failure function shift rule O(m+n) A5 B5 A5 C5 A5 B5 A5 B5 A5 B5 A5 B5 Failure function f(i) 0 0 1 2 A5 B5 A5 B5 Skip this step A5 B5 A5 B5 A5 B5 A5 B5 A5 B5 A5 B5

Boyer-Moore Algorithm

     If the pattern P is relatively long and the alphabet is reasonably large, this algorithm is likely to be the most efficient string matching algorithm Right-to-left scan Bad character shift rule Good suffix shift rule O(m+n)

Bad Character Shift Rule

bad character A5 B5 A5 C5 A5 B5 A5 B5 Right to left scan A5 B5 A5 B5 A5 B5 A5 B5 skip these steps A5 B5 A5 B5 A5 B5 A5 B5 A5 B5 A5 B5

Good Suffix Shift Rule

Good suffix A5 C5 A5 B5 A5 B5 C5 B5 A5 B5 A5 B5 A5 B5 A5 B5 A5 B5 A5 B5 Skip this step

Shift-Or Algorithm

An example of the shift-or algorithm for p=aab and s=abcaaab T a a b a b c 0 1 1 0 1 1 1 0 1 a a b E S(E) T[a] E S(E) T[b] E S(E) T[c] E S(E) T[a] E S(E) T[a] E S(E) T[a] E S(E) T[b] E 1 1 1 0 1 1 0 0 1 0 1 1 0 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 0 0 1 0 1 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 1 0

Shift-Or Algorithm for Partial Matching [Lemstrom and Perttu, ISMIR’00]

An example of the shift-or algorithm for p=aab and s=(ab)(ca)(aab) T a a b a b c 0 1 1 0 1 1 1 0 1 E S(E) T[a]^T[b] a a b 1 1 1 0 1 1 0 0 0 E S(E) T[c]^T[a] 0 1 1 0 0 1 0 0 1 E S(E) T[a]^T[a]^T[b] 0 0 1 0 0 0 0 0 0 E 0 0 0

Approximate Matching

 In practical pattern matching applications, exact matching is not always suitable  In the field of MIR, approximation is measured mainly by the edit distance: the minimal number of local edit operations needed to transform one music object into another  Dynamic programming method serves this purpose

Edit Distance

 Unit cost edit distance 

W

(a  b)=1, a  b (Replacement) 

W

(a   )=W(   b)=1 (Deletion and Insertion)  Non-unit cost edit distance (Content-sensitive)  The costs of replacement, deletion and insertion can be any values which depend on the cost function 

E.g., W

(m  n)=0.2 and W(a   )=0.8

Dynamic Programming Method

   Given any two strings S 1 =abac, S 2 =aaccb The edit distance evaluated by DP The edit distance is 3 a b a c c b (1 deletion, 2 insertions) a a c c b a a c c b 0 1 2 3 4 5 1 2 3 4 a 1 0 1 2 3 3 b 2 1 1 2 3 4 a 3 2 2 1 2 3 c 4 3

Music Indexing

 Tree-based index (Suffix tree)  List-based index (1D-list)  N-gram index  Indexing Markov models?

Tree-Based Index

 [Chen, et al., ICME‘00]  Music objects are coded as strings of music segments  Four segment types to model music contour  Pitch and duration are considered  Index structures  Augmented suffix tree  Both incipit/partial and exact/approximate matching can be handled

Tree-Based Index (Cont.)

Four segment types type A type B type C type D note number 67 65 64 (A, 1, +1) (B, 3, -3) 62 60 (D, 3, -3) (B, 1, -2) beat (C, 1, +1) (C, 1, +2) (C, 1, +2)

Tree-Based Index (Cont.)

root A B C B 1 A 4 $ C 2 B A 5 $ C A 3 B B $ 3 $ 2 $ 1 The suffix tree of the string S=“ABCAB” (a) root A C A C<1,3> N 2 N 1 A<1,1> C<7,8> A<3,4> A<7,8> (b) (a) An example suffix tree (b) A 1-D augmented suffix tree

List-Based Index

 [Liu, Hsu and Chen, ICMCS‘99]  Music objects are coded as melody strings  “so-mi-mi-fa-re-re-do-re-mi-fa-so-so-so”  Melody strings are organized as linked lists  Both incipit/partial and exact/approximate matching can be handled  Exact link, insertion link, dropout link, transposition link

List-Based Index (Cont.)

do 1:7 re 1:5 mi 1:2 fa 1:4 so la 1:1 si start do 1:7 re 1:5 mi 1:2 end 2:1 1:6 1:3 2:1 1:6 1:3 1:10 1:11 2:9 1:8 1:9 2:7 1:12 2:9 1:8 1:9 2:5 2:2 2:8 1:13 2:5 2:2 2:10 2:6 2:3 2:10 2:6 2:11 2:12 (a) 2:4 2:11 2:12 (b) start do 1:7 re 1:5 mi 1:2 2:1 1:6 1:3 2:9 1:8 1:9 end 2:5 2:2 2:10 2:6 2:11 2:12 (c)

N-Gram Index

 A widely used technique in music databases  Target strings are cut into index terms by a sliding window with length N  Index can be implemented by various methods, e.g., inverted file  Queries are also cut into index terms with length N  Searching and joining are then performed

N-Gram Index [Doraisamy and Ruger, ISMIR’02]

S=aabbcaab 2-Gram aa ab bb bc ca Inverted file Position 1,6 2,7 3 4 5 Query=bbca bb, ca Cut into 2-grams Position: 3 Position: 5 Join The substring is found from position 3 to position 6

Similarity Measures

 The effectiveness of MIR depends on the similarity measure  Edit distance (Suitable for short queries)  Difference between two probability matrices  Note shift distance [Typke, et al., ISMIR‘03]

Probability Matrix Distance

S 1 : CCCAABCB S 2 : CCAAABCB

D

(

q

||

d

) 

qi

q

  

d

,

di

(

x

X qi

(

x

) log

qi

(

x

)

di

(

x

) ) Kullback-Liebler ( KL) divergence: The value is 0 when two matrices are the same q: Query probability matrix d: Data probability matrix i: row x: column A B C A B C A 0.5

0 0.25

A 0.7

0 0.3

B 0.5

0 0.25

B 0.3

0 0.3

D

(S 1 ||S 2 )=0.1092

C 0 1 0.5

C 0 1 0.3

Probability Matrix Distance (Cont.)

 Ineffective for MIR with short queries  0-entries in the query model mean unknown values?

 0-entries in the corpus model means facts?

 Performance comparison with string matching needed

Note Shift Distance

 Sum of the two dimensional distance between the notes of the query and the notes of the answer 0 0 0 0 0

Music Retrieval Systems

 Music Representations  Music Query processing  Special features

Meldex

 [McNab, et al., D Lib Magazine‘97]  "melodic contour" or "pitch profile“  113531  RUUDD (R:Repeat, U:Up, D:Down)  Approximate string matching  Dynamic programming  Query by humming

Themefinder

 [Kornstadt, Computing in Musicology‘98]  Select themes manually  Allow different query types  Pitch  Interval  Contour  Provide exact matching only

SEMEX

     [Lemstrom and Perttu, ISMIR’00] The pattern is monophonic; the musical source is polyphonic Finding all positions of S (source) that have an occurrence of p  p=bca  S=< a , b> Shift-or algorithm No similarity function

PROMS

 [Clausen, et al., ISMIR‘00]  Representation by pitch and onset time (ignore duration)  Index by inverted file  Fault-tolerant music search  Allow missing notes  Allow fuzzy notes  Query=(b, (d or c), a, b)

OMRAS

    [Dovey, ISMIR‘01] Searching in a “piano roll” model Gaps based dynamic programming Example (gap = 2):   Data  T 0 =<64,72,76>, T 1 =<60>, T 2 =< 59,67, 79>, T 3 =<55,63>, T 4 =< 55,67 ,79> Query  S 0 =< 59,67 >, S 1 =< 55,67 > S0 S1 T0 0 0 T1 0 0 T2 2 0 T3 1 0 T4 0 2 Piano roll

System Evaluation

 Traditional measures of effectiveness are precision and recall

precision

number of retrieved references that are relevant number of references that are retrieved recall

number of retrieved references that are relevant number of relevant references

The Recall-Precision Curve

A Platform for Evaluating MIR Systems

 Evaluation of various music retrieval approaches  Efficiency  response time  Effectiveness  recall-precision curve  The Ultima project builds such a platform [Hsu, Chen and Chen, CIKM’01]  Same data set and query set for various approaches  Compare recall-precision curves

The Ultima Project

      Data store Query generation module Query processing module Result summarization module Report module Mediator 1D-List APS APM Query Processing Module Report Module Summarization Module Query Generation Module SMF Table Data Store (MS Access) to the Internet

Future Research Directions

  Music Retrieval based on music structure Music retrieval based on user’s perceptiveness  Similarity measure for polyphonic music  A novel index structure for polyphony  Fair evaluation method

Music Retrieval and Analysis

Part II: Music Analysis

Outline

    Music Segmentation and Structure Analysis  Local Boundary Detection  Repeating Pattern Discovery  Phrase Extraction Music Classification Music Recommendation Systems Future Research Directions

Local Boundary Detection

   [Cambouropoulos, ICMC’01] Segment music by local discontinuities between notes Calculate

boundary strength values

for each interval of a melodic surface,

i.e.

, pitch, IOI, and rest, according to the strength of local discontinuities

Local Boundary Detection (Cont.)

 A music object

m

has a parametric profile

P k

, which is represented as a sequence of n intervals  P k = [x 1 , x 2 , …, x n ] where k  {pitch, IOI , rest }  Pitch interval measured in semitones  IOI and rest intervals measured in milliseconds or numerical duration values  IOI  (Inter onset interval) The amount of time between the onset of one note and the onset of the next note  Rest  The amount of time between the offset of one note and the onset of the next note IOI pitch rest

Local Boundary Detection (Cont.)

 The degree of change

r

between two successive interval values x i 

r i

,

i

 1  and x i+1 |

x i x i

 

x i x i

is:  1  1 | iff

x i

x i

 1  0 and

x i

,

x i

 1  0 

r i

,

i

 1  0 iff

x i

x i

 1  0   The strength of the boundary

s i

s i

x i

 (

r i

 1 ,

i

r i

,

i

 1 ) for interval x i is: Overall local boundary strength based on the three intervals  w p *s i(p) +w d *s i(d) +w r *s i(r)

Local Boundary Detection (Cont.)

Repeating Pattern Discovery

    A repeating pattern in music data is defined as a sequence of notes which appears more than once in a music object The themes or motives are typical kinds of repeating patterns Exact repeating patterns [Hsu, Liu and Chen, TMM’01]  By the string-join operator Approximate repeating patterns [Lartillot, ISMIR’03]  By detecting when their successive notes are sufficiently close and their borders contrast sufficiently with the outer environment

Exact Repeating Pattern Discovery

{ 12(2), 23(3), 34(4), 45(3), 56(2) } 123(2) 234(3) 345(3) 456(2) { 1234(2), 2345(2), 3456(2) } S = 1234 A 2345 B 3456 C 123456 Nontrivial repeating patterns { }

Approximate Repeating Pattern Discovery

 Stpe1  Find all note pairs (n1, n2) which satisfy the pitch distance constraint   Step 2  Group the similar note pairs  D((n1, n2), (n1’, n2’))   Step 3  Merge the adjacent note pairs for approximate repeating patterns   (n 1 , n 2 )  (n 2 ,n 3 ) :

(n 1 ,n 2 ,n 3 )

(n 1 ’,n 2 ’)  (n 2 ’,n 3 ’) :

(n 1 ’,n 2 ’,n 3 ’)

Approximate Repeating Pattern Discovery (Cont.)

n 1 , n 2 p 1 , p 2 o 1 , o 2   p = p 2 o = o 2 – p 1 – o 1 n 1 ’ , n 2 ’ p 1 ’ , p 2 ’ o 1 ’ , o 2 ’   p’ = p’ 2 o’ = o’ 2 – p’ 1 – o’ 1 p i is the pitch value of n i o i is the onset time of n i D((n 1 , n 2 ), (n 1 ’, n 2 ’)) = ( |  p  p’| + 1 ) * (max {  o/  o’,  o’/  o }) 0.7

Phrase Extraction

 Two features used for phrase extraction  Duration and rest  Melodic Shapes [Huron, Computing in Musicology’95]  Statistics Information in Western Folksongs  The most common length of a phrase is 8 notes  Half of all phrases are between 7 and 9 notes in length  Three-quarters of all phrases are between 6 and 10 notes in length

Phrase Extraction (Cont.)

A: the pitch value of the first B: the pitch value of the last note in the target phrase note in the target phrase C: the average pitch value of the remaining notes in the target phrase Contour Type Convex Descending Ascending Concave Others Number of Phrases 13926 10376 6983 3496 1294 Percentage 38.6% 28.8% 19.4% 9.7% 3.5% Arch Shape Definition AC>B A  C  B A>C  B>C

Phrase Extraction (Cont.)

   Identify the positions of all the terminative notes Extract the music pieces notes according to the terminative Select the candidate music pieces for decomposition based on the length information  If the length  12, the music piece is marked as a phrase  If the length > 12, decompose the music piece into phrases  convex > descending > ascending > concave

Phrase Extraction (Cont.)

x y z 64 62 60 57 55 67 67 69 67 69 64 62 64 62 60 57 55 67 67 69 67 69 64 62 64 62 60 57 55 67 64 64 62 60 62 64 62 62 67 67 Convex?

No Order 1 2 3 The Length of the Prefix Fragment 6 7 8 The Pitch of the First Note 64 64 64 Descending?

4 5 Length = 12, A = 64, B = 62, C = 63.7

6 9 10 11 64 64 64 7 12 64 The Pitches of the Remaining Notes 62, 60, 57, 55 62, 60, 57, 55, 67 62, 60, 57, 55, 67, 67 62, 60, 57, 55, 67, 67, 69 62, 60, 57, 55, 67, 67, 69, 67 62, 60, 57, 55, 67, 67, 69, 67, 69 62, 60, 57, 55, 67, 67, 69, 67, 69, 64 The Pitch of the Last Note 67 67 69 67 69 64 62

Music Classification

 After music segmentation, different kinds of music units can be extracted from music objects, such as repeating patterns and phrases  Different kinds of music units may have different semantics in musicology  These extracted music units can be used in music classification, retrieval, and analysis

Music Recommendation Systems

 [Chen and Chen, CIKM’01]  The results of music classification can be used for music-related services  By analyzing the user access histories, we can discover which music classes the users may be interested in and which users belonging to the same group  By using different kinds of recommendation mechanisms, we can recommend the users the music objects

Architecture

Users Recommendation Module CB Method COL Method STA Method Interface Profile Manager Track Selector Music Object representative track Feature Extractor feature point Classifier Database Music Objects Feature Points Access Histories Music Groups User Groups A polyphonic music object • one melody track • other accompaniment tracks

Recommendation Mechanisms

   Content-based filtering approach  Similarity between music objects and user profiles  Recommend the music objects that belong to the music groups the user is recently interested in Collaborative filtering approach  Similarity between user profiles  Provide unexpected recommendations to the users in the same user group Statistical approach  Recommend “hot” music objects

Future Research Directions

    Polyphonic Music Segmentation Efficient Approximate Repeating Pattern Discovery Representation and Similarity Measure for Musical Structures Musical Style/Form Detection

References

     [Camb01] Cambouropoulos, E., “The Local Boundary Detection Model (LBDM) and its Application in the Study of Expressive Timing,”

Proc. International Computer Music Conference

, 2001.

[Chen00] Chen, A.L.P., M. Chang, J. Chen, J.L. Hsu, C.H. Hsu, and S.Y.S. Hua, "Query by Music Segments: An Efficient Approach for Song Retrieval,"

Proc. IEEE International Conference on Multimedia and Expo

, 2000.

[Chen01] Chen, H.C. and A.L.P. Chen, "A Music Recommendation System Based on Music Data Grouping and User Interests grouping and user interests,"

Proc. ACM International Conference on Information and Knowledge Management

, 2001 [Clau00] Clausen, M., R. Engelbrecht, D. Meyer, and J. Schmitz, “PROMS: A Web based Tool for Searching in Polyphonic Music,”

Proc. International Symposium on Music Information Retrieval

, 2000.

[Dove01] Dovey, M.J., “A Technique for Regular Expression Style Searching in Polyphonic Music,”

Proc. International Symposium on Music Information Retrieval

, 2001.

References (Cont.)

     [Korn98] Kornstadt, A., “Themefinder: A Web-based Melodic Search Tool,”

Computing in Musicology

, 11:231-236, 1998.

[Dora02] Doraisamy, S. and S.M. Ruger,

A comparative and fault-tolerance study of the use of n grams with polyphonic music,”

Proc. International Symposium on Music Information Retrieval

, 2002.

[Hsu01] Hsu, J.L., C.C. Liu and A.L.P. Chen, "Discovering Nontrivial Repeating Patterns in Music Data,"

IEEE Transactions on Multimedia

, Vol. 3, No. 3, 2001. [Hsu02] Hsu, J.L., A.L.P. Chen and H.C. Chen, "The Effectiveness Study of Various Music Information Retrieval Approaches,"

Proc. ACM International Conference on Information and Knowledge Management

, 2002.

[Huro95] Huron, D., “The Melodic Arch in Western Folksongs,”

Computing in Musicology,

Volume 10, 1995.

References (Cont.)

     [Lart03] Lartillot, O., “Discovering musical patterns through perceptive heuristics,”

Proc. International Symposium on Music Information Retrieval,

2003.

[Lems00]Lemstrom, K. and Sami Perttu, “SEMEX-An Efficient Music Retrieval Prototype,”

Proc. International Symposium on Music Information Retrieval,

2000.

[Liu99] Liu, C.C., J.L. Hsu, and A.L.P. Chen, "An Approximate String Matching Algorithm for Content-Based Music Data Retrieval,"

Proc. IEEE International Conference on Multimedia Computing and Systems

, 1999.

[Mcna97] McNab, R.J., L.A. Smith, D. Bainbridge, and I.H. Witten, “The New Zealand Digital Library: MELody inDEX,”

D-Lib Magazine

, 1997.

[Pick02] Pickens, J., and T. Crawford, “Harmonic Models for Polyphonic Music Retrieval,”

Proc. ACM Conference on Information and Knowledge Management

, 2002.