Transcript Meeting
AFP System Flowchart Offline : Database Building 資料庫歌曲 抽取 Landmark Query Query 查詢片段 Snippet Snippet 轉換雜湊鍵 及雜湊值 雜湊表 抽取 Landmark 轉換雜湊鍵 統計分析 回傳結果 Online : Music Retrieval 1/32 Landmark Identification 查詢片段頻譜圖 尋找Peak點 組成Landmark Frequency Frequency Target zone 219 203 Anchor point 134 152 Time 訊號處理 組成Landmark 尋找能量較高的Peak點 Landmark Time t1 134 t2 152 f1 203 f2 219 ... ... ... ... 2/32 Hash Table Construction Computation of hash key and hash value 24 bits f1 Δf Δt 9 bits 8 bits 7 bits Hash Key 32 bits Hash Value SongID t1 18 bits 14 bits Hash Table Hash Key Hash Value 0 1 2 ... 101055 249028 1067857 249514 542650 1199420 ... ... ... 14250 ... 3/32 Retrieval Process Convert query landmarks to hash key Retrieve the hash values Derive song ID and landmark start time Find no. of time-consistent landmarks match landmark count Use MLC for final ranking 辨識片段 Hash Key 2 50 14250 Hash Table Hash Key Hash Value 0 1 2 ... 101055 249028 1067857 249514 542650 1199420 ... ... ... 14250 ... 18279 ... 4/32 Offset Time Offset time = 𝐷𝑏𝑆𝑡𝑎𝑟𝑡𝑇𝑖𝑚𝑒 − 𝑄𝑆𝑡𝑎𝑟𝑡𝑇𝑖𝑚𝑒 QStartTime1 Query QStartTime2 Offset Time a b DbStartTime1 Song DbStartTime2 歌曲編號 偏移時間 雜湊鍵 444 17 54372 … … … 496 158 17936 … … … 5/32 Match Frequency Count 1 (MFC1) MFC1: No. of landmarks in DB that have similar t1 & f1 to the query landmarks (with tolerance ±2 frame) Restriction of the identified landmarks Within the right interval With the same hash key Query MFCI 8 SongA 4 SongB 3 SongC 6/32 Match Frequency Count 2 ( MFC2 ) MFC2: Almost the same as MFC1, with less restriction Restriction: Within the right interval only Query MFCII 13 SongA 4 SongB 3 SongC 7/32 Comparison between MFC1 & MFC2 MFC1 MFC2 • Within the right interval • Same hash key as the query landmarks • No need to store extra info • Within the right interval • May have different hash key as the query landmarks • Need to store extra info • More discriminant MFCII method MFCI method Query Query Song A Song A C MFC I : 5 MFC II : 11 8/32 Learning to Rank Use machine for ranking, with three paradigms Pointwise approach A is right and B is wrong Ex. PRank Pairwise approach A>B, C>D, A>D, etc Ex. RankingSVM Listwise approach A>B>C>D… Ex. ListNet 9/32 Experimental Settings OS Windows 7 Enterprise, 64-bit RAM 8GB Main Memory CPU Intel® Core™ i7-4770 ( 3.40 GHz ) Programming language MATLAB 10/32 Corpora for the Experiments Datasets Query sets Baina George Size 500首 10000 File duration 3-10 minutes 30 sec to 10 minutes Total duration 38 hours 22 minutes 636 hours 41 minutes Languages Mandarin and English 952 from GTZAN dataset plus other 9048 noisy mp3, in English and Mandarin Audio format Mono/stereo, mp3/wav, 44.1KHz, 16 bits Size 1412 1062 Query duration About 10 sec About 10 sec Total duration 3 hours 55 minutes 2 hours 57 minutes Source Recordings of 5 clips at very noisy environment, and chop them into 1042 10-sec segments (with 9-sec overlap) Recordings of 345 clips at noisy environment, and chop them into 1062 10sec segments (without overlap) 11/32 Experimental Results Using Baina Dataset Re-ranking is invoked when the diff of MLCs of top-2 candidates is larger than 15 Only re-rank the top-10 candidates Ranking SVM ListNet Baina dataset 92 Methods Accuracy (%) Original 86.83 MFC1 89.02 16.63 MFC2 91.78 37.59 Ranking SVM 91.997 39.23 ListNet 91.997 39.23 MFCII 91 Accuracy ( % ) 90 89 MFCI 88 87 original 86 0.33 0.34 0.35 0.36 0.37 Time ( second ) 0.38 0.39 0.4 Example recording Error reduction rate (%) 12/32 Experimental Results Using George Dataset Use the same condition for re-ranking George dataset 方法 86 Ranking SVM MFCII ListNet 辨識率 (%) 錯誤降低率 (%) Accuracy ( % ) 85.5 Original 83.52 MFCI 84.46 5.71 MFCII 85.78 13.71 Ranking SVM 85.88 14.29 ListNet 85.78 13.71 85 84.5 MFCI 84 original 83.5 11.2 11.25 11.3 11.35 11.4 Time ( second ) 11.45 11.5 11.55 13/32