DTW for Speech Recognition
Download
Report
Transcript DTW for Speech Recognition
DTW for Speech Recognition
J.-S. Roger Jang (張智星)
[email protected]
http://www.cs.nthu.edu.tw/~jang
MIR Lab (多媒體資訊檢索實驗室)
CS, Tsing Hua Univ. (清華大學 資工系)
Dynamic Time Warping (DTW)
Characteristics:
Pattern-matching-based approach
Require less memory/computation
Suitable for speaker-dependent recognition
Suitable for small to medium vocabulary
Suitable for microprocessor/chip implementation
Applications
Speaker identification & verification for
surveillance
Voice commands for mobile phones, toys
-2-
Dynamic Time Warping: Type 1
j
r(j)
t: input MFCC matrix
(Each column is a frame’s feature.)
r: reference MFCC matrix
Local paths: 27-45-63 degrees
D(i, j )
r(j-1)
DTW recurrence:
D(i, j ) t (i ) r ( j )
D(i 1, j 2)
min D(i 1, j 1)
D(i 2, j 1)
t(i-1) t(i)
i
-3-
Dynamic Time Warping: Type 2
j
r(j)
t: input MFCC matrix
(Each row is a frame’s feature.)
r: reference MFCC matrix
Local paths: 0-45-90 degrees
D(i, j )
r(j-1)
DTW recurrence:
D(i, j ) t (i ), r ( j )
D(i, j 1)
min D(i 1, j 1)
D(i 1, j )
t(i-1) t(i)
i
-4-
Local Path Constraints
Type 1
Type 2
27-45-63 local paths
0-45-90 local paths
Di, j
Di 1, j
Di 2, j 1
Di 1, j 1
Di 1, j 1
Di 1, j 2
Di, j
Di, j 1
D(i, j ) t (i ) r ( j )
D(i, j ) t (i ) r ( j )
D(i 1, j 2)
min D(i 1, j 1)
D(i 2, j 1)
D(i, j 1)
min D(i 1, j 1)
D(i 1, j )
-5-
Path Penalty for Type-1 DTW
Path penalty
No penalty for 45-degree path
Some penalty for paths deviated from 45degree
D(i, j )
D (i 1, j 2)
D (i, j ) t (i ) r ( j ) min D(i 1, j 1)
D (i 2, j 1)
D(i 2, j 1) 0
D(i 1, j 1)
D(i 1, j 2)
-6-
DTW Paths of “Match Corners”
We assume the speed of
a user’s acoustic input
falls within 1/2 and 2
times of that of the
intended sentence.
Both corners are fixed.
(End point detection
is critical.)
Suitable for voice
command applications
j
i
-7-
DTW Paths of “Match Anywhere”
No fixed anchored
positions
Suitable for
retrieval of
personal spoken
documents
j
i
-8-
Other Variants
Local constraints
Start/ending area
-9-
Implementation Issues
To save memory
Use 2-column table for type-1 DTW
Use 1-column table for type-2 DTW
To avoid too many if-then statements
Pad type-1 DTW with two-layer padding
Pad type-2 DTW with one-layer padding
To find a suitable path
Minimizing total distance
Minimizing average distance
-10-
DTW Path of “Match Corners”
-11-
DTW Path of “Match Anywhere”
-12-
DTW Path of “Match Anywhere”
DTW total distance = 304.957
160
我今天很高興來到清華大學進行演講
我今天很高興來到清華大學進行演講
160
140
120
100
80
60
40
20
140
120
800
600
400
200
100
150
100
80
50
60
20 40
40
20
20
40
清華大學
20
40
清華大學
-13-
DTW for Spoken Document Retrieval
Applications
Voice-based audio/video retrieval
Issues in SDR using DTW
Speaker normalization
Vocal track length normalization (VTLN)
Frequency warping
Efficiency
-14-
DTW for Speaker-independent
Voice Command Recognition
Applications
Digit recognition
Technical highlights
Extensive recordings
Clustering within each command
Some indexing methods for DTW
Suitable for small-vocabulary
applications
-15-