No Slide Title
Download
Report
Transcript No Slide Title
Construction of Discriminative Kernels from Known and Unknown Non-targets for PLDA-SVM Scoring
Wei RAO and Man-Wai MAK
Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University
UP-AVR
Introduction
Methods
Motivation
Unknown Non-targets for PLDA-SVM Training/Scoring
Methods
We exploited this new protocol to enhance the performance of PLDA-SVM
scoring [3], which is an effective way to utilize the multiple enrollment
utterances of target speakers. We used the score vectors of both known and
unknown non-targets as the impostor class data to train speaker-dependent
SVMs. We also applied utterance partitioning to alleviate the imbalance
between the speaker- and imposter-class data during SVM training.
PLDA Scoring
+
Empirical Kernel
Map
Target
speaker’s
i-vectors
Xs
Key Findings
Results show that incorporating known non-targets into the training of
speaker-dependent PLDA-SVMs together with utterance partitioning can boost
the performance of i-vector based PLDA systems significantly.
PLDA Scoring
+
Empirical Kernel
Map
Unknown
non-targets’
i-vectors
Xb
Methods
Xs
Empirical LR Kernel Maps
Test utt.
Background
speaker utts.
K xt , xs , j
Us
X X s , X b
jSVs
s, j
U1
S LR xb ,1 , X
S x , X
LR b , B
Training
U0 ,
Ub,1, ,Ub,B
,U s , H s
K xt , xs , j
jSVb
s, j
K xt , xb , j d s
3.3
PLDA
2.8
PLDA+UP-AVR
2.3
PLDA+SVM(Unknown)
1.8
PLDA+UP-AVR+SVM(Unknown)
CC2
2
1
=exp 2 SLR xt , X SLR xs , j , X
2
X s xs ,1 ,
xt
PLDA Scoring
+
Empirical Kernel Map
SLR xt , X
X b xb,1 ,
X b X b
SLR xs ,H s , X
(Test vector)
X X s , X b X b xb,1 ,
SLR xs ,1 , X
, xs , H s
, xb,B ;
SLR xb,1 , X
SLR xb,B , X
(Speaker-class vectors) (Imposter-class vectors)
SLR xs , j , xs ,1
SLR xb, j , xs ,1
S LR xt , xs ,1
SLR xs , j , xs ,H s
SLR xb, j , xs ,H s
S
x
,
x
LR
t
s
,
H
s
; SLR xb, j , X
S LR xt , X
; SLR xs , j , X
S x ,x
SLR xs , j , xb,1
SLR xb, j , xb,1
LR
t b ,1
SLR xs , j , xb,B
SLR xb, j , xb,B
S LR xt , xb,B
Background
X b speaker’s
i-vectors
, xb,B
B B;
PLDA Scoring
+
Empirical Kernel
Map
Target
speaker’s
i-vectors
Xs
PLDA Scoring
+
Empirical Kernel
Map
Known
non-targets’
i-vectors
Xa
S LR xi , x j PLDA score of i-vectors xi and x j
CC4
Xs
jSVs
s, j
CC4
0.36
S LR xs ,1 , X
S LR xs ,H s , X
Xa
0.35
s s
X s
0.34
Results
demonstrate the
advantages of
including known
non-targets for
training the SVMs
0.32
0.31
0.3
S LR xa ,1 , X
S x , X
LR a , A
SVM
Training
K xt , xs , j
Target
speaker
SVM
0.29
0.28
0.27
2.6
2.7
2.8
2.9
3
3.1
3.2
EER(%)
3.3
3.4
3.5
3.6
References
jSVa
s, j
K xt , xa , j d s
X a contains the i-vectors of the competing known non-targets with respect to s.
K xt , xa , j
PLDA
PLDA+UP-AVR
PLDA+SVM(Unknown)
PLDA+UP-AVR+SVM(Unknown)
PLDA+UP-AVR+SVM(Known)
PLDA+UP-AVR+SVM(Pooling)
0.33
Target Background
speaker’s speaker’s X b
i-vectors i-vectors
xt , X s , X a
SSVM
CC5
UP-AVR is very important for SVM scoring. The performance of PLDASVM scoring after UP-AVR is much better than PLDA scoring.
Known Non-targets for PLDA-SVM Training/Scoring
I-vector Extractor
,U N
3.8
MinNDCF
U s,1,
UN
Results
UP-AVR
Ut
U3
U2
Target
speaker
SVM
SVM
U0
Utterance Partitioning
Target Background
speaker’s speaker’s X b
i-vectors i-vectors
SSVM xt , X s , X b
Target speaker
enrollment utts.
S LR xs ,1 , X
S LR xs ,H s , X
Feature Extraction and Index Randomization
EER
Background
X b speaker’s
i-vectors
NIST 2012 SRE permits systems to use the information of other targetspeakers (called known non-targets) in each verification trial.
utt
2
1
=exp 2 SLR xt , X SLR xa , j , X ; xa , j X a
2
[1] P. Kenny, “Bayesian speaker verification with heavy-tailed priors”, in Proc. of Odyssey:
Speaker and Language Recognition Workshop, Brno, Czech Republic, June 2010.
[2] D. Garcia-Romero and C.Y. Espy-Wilson, “Analysis of i-vector length normalization in speaker
recognition systems”, in Proc. Interspeech 2011, Florence, Italy, Aug. 2011, pp. 249–252.
[3] M. W. Mak and W. Rao, “Likelihood-Ratio Empirical Kernels for I-Vector Based PLDA-SVM
Scoring”, in Proc. ICASSP 2013, Vancouver, Canada, May 2013, pp. 7702-7706.
[4] W. Rao and M.W. Mak, “Boosting the Performance of I-Vector Based Speaker Verification via
Utterance Partitioning”, IEEE Trans. on Audio, Speech and Language Processing, May 2013, vol. 21,
no. 5, pp. 1012-1022.