DTW for QBSH
Download
Report
Transcript DTW for QBSH
DTW for QBSH
J.-S Roger Jang (張智星)
http://mirlab.org/jang
MIR Lab, CSIE Dept.
National Taiwan University
Dynamic Time Warping (DTW)
Goal:
Allows comparison of high tolerance to tempo
variation
Characteristics:
Robust for irregular tempo variations
Trial-and-error for dealing with key
transposition
Expensive in computation
Does not conform to triangle inequality
Some indexing algorithms do exist
-2-
Dynamic Time Warping: Type 1
t: input pitch vector (8 sec)
r: reference pitch vector
Local paths: 27-45-63 degrees
j
r(j)
D(i, j )
3-step formula for DTW:
r(j-1)
1. D(i, j ) : DT W dist ancebet ween t (1 : i ) and r (1 : j )
2. Recurrentformulafor D (i, j )
D(i 1, j 2)
D (i, j ) | t (i ) r ( j ) | min D (i 1, j 1)
D(i 2, j 1)
D (1,1) | t (1) r (1) |
min D (m, j ) for anchoredbeginning
j
3. Answer
D
(
m
,
n
)
for
anchoredbeginningand anchoredend
t(i-1) t(i)
i
-3-
Dynamic Time Warping: Type 2
j
r(j)
t: input pitch vector (8 sec)
r: reference pitch vector
Local paths: 0-45-90 degrees
D(i, j )
DTW recurrence:
r(j-1)
D(i, j ) | t (i ) r ( j ) |
D(i, j 1)
min D(i 1, j 1)
D(i 1, j )
Min distance min D(m, j )
j
t(i-1) t(i)
i
-4-
Local Path Constraints
Type 1:
Type 2:
27-45-63 local paths
0-45-90 local paths
Di, j
Di 1, j
Di 2, j 1
Di 1, j 1
Di 1, j 1
Di 1, j 2
Di, j
Di, j 1
D(i, j ) t (i ) r ( j )
D(i, j ) t (i ) r ( j )
D(i 1, j 2)
min D(i 1, j 1)
D(i 2, j 1)
D(i, j 1)
min D(i 1, j 1)
D(i 1, j )
-5-
Path Penalty
Path penalty
Small/no penalty for
45-degree path
Large penalty for
paths deviated from
45-degree
D(i, j )
D (i 1, j 2) D(i 2, j 1) 0
D (i, j ) t (i ) r ( j ) min D(i 1, j 1)
D(i 1, j 1)
D (i 2, j 1)
D(i 1, j 2)
-6-
Weighted DTW Distance
觀察:
在音符開始時,使用者的音高不穩定
在音符後半部,使用者的音高較穩定且逼近音符音高
Weighted DTW Distance
在音符開始時,權重函數 w(j) 較小
在音符後半部,權重函數 w(j) 較大
D(i, j )
D(i 2, j 1)
D(i 1, j 2)
D(i, j ) w( j ) t (i ) r ( j ) min D(i 1, j 1)
D(i 1, j 1)
D(i 2, j 1)
D(i 1, j 2)
-7-
DTW Paths of “Anchored Beginning”
Anchored beginning
end position is free
to move
Assumption: The speed
of a user’s acoustic
input falls within 1/2
and 2 times of that of
the intended song.
DTW table size for 8sec query = 250x180
250 = 31.25*8
375 = 250*1.5
j
i
-8-
DTW Paths of “Anchored Anywhere”
Anchored anywhere
Both ends are
free to move.
DTW table size for
8-sec query against
3-min song = 250 x
5620
250 = 31.25*8
5620 = 31.25*180
j
i
-9-
2
1
3
4
5
4
0
1
5
0
1
5
6
0
6
5
1
0
6
6
5
1
0
6
0
1
5
6
0
1
0
4
5
1
2
1
3
4
2
1 1
2
6
7
1
4
7
2
8
8
2
3
4
2
3
7
8
2
-102
2
4
1
5
4
7
2
8
8
2
3
4
0
1
0
6
7
5
6
6
5
1
0
1
2
1
0
1
2
3
6
1
7
2
6
12
0
5
4
7
6
1
7
4
6
1
7
6
3
6
0
6
6
2
5
0
1
4
2
7
3
1
2
1 11
2
6
5
2
4
5
10
2
1
1
1
4
0
5
0
0
5
6
0
3
7
8
5
2
1
2
-11-
Implementation Issues
To save memory
Use 2-column table for type-1 DTW
Use 1-column table for type-2 DTW
To avoid too many if-then statements
Pad type-1 DTW with two-layer padding
Pad type-2 DTW with one-layer padding
To find a suitable path
Minimizing total distance
Minimizing average distance
-13-
Other Variants
Local constraints
Flexible
start/ending pos.
-14-
DTW Path of “Match Beginning”
-15-
DTW Path of “Match Anywhere”
-16-
DTW Path of “Match Anywhere”
-17-
Key Transposition (1/2)
Goal:
Allow users’ input of different keys
Method 1:
Mean shift and heuristic modification
t+2
t t’-1 (t’) t’+1
t-2
Mean
-4
-2
0
1
2
3
4
5 DTW computation when compared to each song
-19-
Key Transposition (2/2)
Method 2: Fixed point iteration
Step 1: DTW alignment
Step 2: Stop if mapping path fixed
Step 3: Shift to the same mean based on
the alignment
Step 4: Go back to step 2.
Characteristics
DTW distance monotonically nonincreasing to guarantee convergence
-20-
Type-3 DTW:
Frame to Note Alignment
DP-based method for filling the table:
Notes
65
62
65
64
67
Recurrent formula:
D(i 1, j )
D(i, j ) | t (i) r ( j ) | min
D(i 1, j 1)
Frame-level
Pitch vector
Local constraint:
Di, j
Di 1, j
Di 1, j 1
-24-
Type-3 DTW
Characteristics
Mapping path
Frame-based query
input vs. note-based
music database
Note duration unused
More efficient, less
effective
Heuristics for keytransposition
-25-
Type-3 DTW:
Effects of Key Transposition
Rough key transpos.
Fine key transpos.
Please refer to the online tutorial page for playback.
-26-