Transcript Meeting

AFP System Flowchart
Offline : Database Building
資料庫歌曲
抽取
Landmark
Query
Query
查詢片段
Snippet
Snippet
轉換雜湊鍵
及雜湊值
雜湊表
抽取
Landmark
轉換雜湊鍵
統計分析
回傳結果
Online : Music Retrieval
1/32
Landmark Identification
查詢片段頻譜圖
尋找Peak點
組成Landmark
Frequency
Frequency
Target zone
219
203
Anchor
point
134 152
Time
訊號處理
組成Landmark
尋找能量較高的Peak點
Landmark
Time
t1
134
t2
152
f1
203
f2
219
...
...
...
...
2/32
Hash Table Construction

Computation of hash key and hash value
24 bits
f1
Δf
Δt
9 bits
8 bits
7 bits
Hash Key
32 bits
Hash Value
SongID
t1
18 bits
14 bits
Hash Table
Hash Key
Hash Value
0
1
2
...
101055
249028
1067857
249514
542650
1199420
...
...
...
14250
...
3/32
Retrieval Process
Convert query landmarks to hash key
 Retrieve the hash values
 Derive song ID and landmark start time
 Find no. of time-consistent landmarks  match landmark count
 Use MLC for final ranking

辨識片段
Hash Key
2
50
14250
Hash Table
Hash Key
Hash Value
0
1
2
...
101055
249028
1067857
249514
542650
1199420
...
...
...
14250
...
18279
...
4/32
Offset Time

Offset time = 𝐷𝑏𝑆𝑡𝑎𝑟𝑡𝑇𝑖𝑚𝑒 − 𝑄𝑆𝑡𝑎𝑟𝑡𝑇𝑖𝑚𝑒
QStartTime1
Query
QStartTime2
Offset Time
a
b
DbStartTime1
Song
DbStartTime2
歌曲編號
偏移時間
雜湊鍵
444
17
54372
…
…
…
496
158
17936
…
…
…
5/32
Match Frequency Count 1 (MFC1)
MFC1: No. of landmarks in DB that have similar t1 & f1 to
the query landmarks (with tolerance ±2 frame)
 Restriction of the identified landmarks

Within the right interval
 With the same hash key

Query
MFCI
8
SongA
4
SongB
3
SongC
6/32
Match Frequency Count 2 ( MFC2 )
MFC2: Almost the same as MFC1, with less restriction
 Restriction: Within the right interval only

Query
MFCII
13
SongA
4
SongB
3
SongC
7/32
Comparison between MFC1 & MFC2
MFC1
MFC2
• Within the right interval
• Same hash key as the query
landmarks
• No need to store extra info
• Within the right interval
• May have different hash key as
the query landmarks
• Need to store extra info
• More discriminant
MFCII method
MFCI method
Query
Query
Song A
Song A
C
MFC I : 5
MFC II : 11
8/32
Learning to Rank
Use machine for ranking, with three paradigms
 Pointwise approach

A is right and B is wrong
 Ex. PRank


Pairwise approach



A>B, C>D, A>D, etc
Ex. RankingSVM
Listwise approach
A>B>C>D…
 Ex. ListNet

9/32
Experimental Settings
OS
Windows 7 Enterprise, 64-bit
RAM
8GB Main Memory
CPU
Intel® Core™ i7-4770 ( 3.40 GHz )
Programming language MATLAB
10/32
Corpora for the Experiments
Datasets
Query sets
Baina
George
Size
500首
10000
File duration
3-10 minutes
30 sec to 10 minutes
Total duration
38 hours 22 minutes
636 hours 41 minutes
Languages
Mandarin and English
952 from GTZAN dataset plus
other 9048 noisy mp3, in
English and Mandarin
Audio format
Mono/stereo, mp3/wav, 44.1KHz, 16 bits
Size
1412
1062
Query duration
About 10 sec
About 10 sec
Total duration
3 hours 55 minutes
2 hours 57 minutes
Source
Recordings of 5
clips at very noisy
environment, and
chop them into 1042
10-sec segments
(with 9-sec overlap)
Recordings of 345 clips
at noisy environment, and
chop them into 1062 10sec segments (without
overlap)
11/32
Experimental Results Using Baina Dataset
Re-ranking is invoked when the diff of MLCs of top-2
candidates is larger than 15
 Only re-rank the top-10 candidates

Ranking SVM
ListNet
Baina dataset
92
Methods
Accuracy
(%)
Original
86.83
MFC1
89.02
16.63
MFC2
91.78
37.59
Ranking SVM
91.997
39.23
ListNet
91.997
39.23
MFCII
91
Accuracy ( % )
90
89
MFCI
88
87
original
86
0.33
0.34
0.35
0.36
0.37
Time ( second )
0.38
0.39
0.4
Example recording
Error
reduction
rate (%)
12/32
Experimental Results Using George Dataset

Use the same condition for re-ranking
George dataset
方法
86
Ranking SVM
MFCII ListNet
辨識率 (%) 錯誤降低率
(%)
Accuracy ( % )
85.5
Original
83.52
MFCI
84.46
5.71
MFCII
85.78
13.71
Ranking SVM
85.88
14.29
ListNet
85.78
13.71
85
84.5
MFCI
84
original
83.5
11.2
11.25
11.3
11.35
11.4
Time ( second )
11.45
11.5
11.55
13/32