No Slide Title

Download Report

Transcript No Slide Title

Construction of Discriminative Kernels from Known and Unknown Non-targets for PLDA-SVM Scoring
Wei RAO and Man-Wai MAK
Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University
 UP-AVR
Introduction
Methods
 Motivation
 Unknown Non-targets for PLDA-SVM Training/Scoring
 Methods
We exploited this new protocol to enhance the performance of PLDA-SVM
scoring [3], which is an effective way to utilize the multiple enrollment
utterances of target speakers. We used the score vectors of both known and
unknown non-targets as the impostor class data to train speaker-dependent
SVMs. We also applied utterance partitioning to alleviate the imbalance
between the speaker- and imposter-class data during SVM training.
PLDA Scoring
+
Empirical Kernel
Map
Target
speaker’s
i-vectors
 Xs 
 Key Findings
Results show that incorporating known non-targets into the training of
speaker-dependent PLDA-SVMs together with utterance partitioning can boost
the performance of i-vector based PLDA systems significantly.
PLDA Scoring
+
Empirical Kernel
Map
Unknown
non-targets’
i-vectors
 Xb 
Methods
 Xs 
 Empirical LR Kernel Maps
Test utt.
Background
speaker utts.
K  xt , xs , j 
Us


X   X s , X b






jSVs
s, j
U1
 S LR  xb ,1 , X   




 S  x , X  
 LR b , B

Training
U0 ,

Ub,1, ,Ub,B
,U s , H s
K  xt , xs , j  

jSVb
s, j
K  xt , xb , j   d s
3.3
PLDA
2.8
PLDA+UP-AVR
2.3
PLDA+SVM(Unknown)
1.8
PLDA+UP-AVR+SVM(Unknown)
CC2
2
 1
=exp  2 SLR  xt , X   SLR  xs , j , X  
 2


X s  xs ,1 ,
xt
PLDA Scoring
+
Empirical Kernel Map
SLR  xt , X 
X b  xb,1 ,

X b  X b
SLR xs ,H s , X 
(Test vector)

X    X s , X b  X b  xb,1 ,
SLR  xs ,1 , X 

, xs , H s
, xb,B ;
SLR  xb,1 , X 

SLR  xb,B , X 
(Speaker-class vectors) (Imposter-class vectors)
 SLR  xs , j , xs ,1  
 SLR  xb, j , xs ,1  
 S LR  xt , xs ,1  


















SLR  xs , j , xs ,H s 
SLR  xb, j , xs ,H s 
S
x
,
x


LR
t
s
,
H
s 
; SLR  xb, j , X   
S LR  xt , X   
; SLR  xs , j , X   


 S x ,x  
SLR  xs , j , xb,1  
SLR  xb, j , xb,1  
LR
t b ,1
















 SLR  xs , j , xb,B  
 SLR  xb, j , xb,B  
 S LR  xt , xb,B  





Background
 X b  speaker’s
i-vectors
, xb,B 
B   B;
PLDA Scoring
+
Empirical Kernel
Map
Target
speaker’s
i-vectors
 Xs 
PLDA Scoring
+
Empirical Kernel
Map
Known
non-targets’
i-vectors
 Xa 

S LR  xi , x j   PLDA score of i-vectors xi and x j
CC4
 Xs 

jSVs
s, j
CC4
0.36
 S LR  xs ,1 , X 



 S LR xs ,H s , X 


Xa 





0.35
s  s
X s

0.34
Results
demonstrate the
advantages of
including known
non-targets for
training the SVMs
0.32
0.31
0.3
 S LR  xa ,1 , X   




 S  x , X  
 LR a , A

SVM
Training
K  xt , xs , j  
Target
speaker
SVM

0.29
0.28
0.27
2.6
2.7
2.8
2.9
3
3.1
3.2
EER(%)
3.3
3.4
3.5
3.6
References

jSVa
s, j
K  xt , xa , j   d s
X a contains the i-vectors of the competing known non-targets with respect to s.
K  xt , xa , j 
PLDA
PLDA+UP-AVR
PLDA+SVM(Unknown)
PLDA+UP-AVR+SVM(Unknown)
PLDA+UP-AVR+SVM(Known)
PLDA+UP-AVR+SVM(Pooling)
0.33
Target Background
speaker’s speaker’s  X b 
i-vectors i-vectors
  xt , X s , X a  
SSVM
CC5
UP-AVR is very important for SVM scoring. The performance of PLDASVM scoring after UP-AVR is much better than PLDA scoring.
 Known Non-targets for PLDA-SVM Training/Scoring
I-vector Extractor
,U N
3.8
MinNDCF
U s,1,
UN
Results
UP-AVR
Ut
 
 
U3
U2
Target
speaker
SVM
SVM
U0
Utterance Partitioning

Target Background
speaker’s speaker’s  X b 
i-vectors i-vectors
SSVM  xt , X s , X b  
Target speaker
enrollment utts.
 S LR  xs ,1 , X 



 S LR xs ,H s , X 
Feature Extraction and Index Randomization
EER
Background
 X b  speaker’s
i-vectors
NIST 2012 SRE permits systems to use the information of other targetspeakers (called known non-targets) in each verification trial.
utt
2
 1
=exp  2 SLR  xt , X   SLR  xa , j , X   ; xa , j  X a
 2

[1] P. Kenny, “Bayesian speaker verification with heavy-tailed priors”, in Proc. of Odyssey:
Speaker and Language Recognition Workshop, Brno, Czech Republic, June 2010.
[2] D. Garcia-Romero and C.Y. Espy-Wilson, “Analysis of i-vector length normalization in speaker
recognition systems”, in Proc. Interspeech 2011, Florence, Italy, Aug. 2011, pp. 249–252.
[3] M. W. Mak and W. Rao, “Likelihood-Ratio Empirical Kernels for I-Vector Based PLDA-SVM
Scoring”, in Proc. ICASSP 2013, Vancouver, Canada, May 2013, pp. 7702-7706.
[4] W. Rao and M.W. Mak, “Boosting the Performance of I-Vector Based Speaker Verification via
Utterance Partitioning”, IEEE Trans. on Audio, Speech and Language Processing, May 2013, vol. 21,
no. 5, pp. 1012-1022.