Transcript Slide 1

SNR-Dependent Mixture of PLDA for
Noise Robust Speaker Verification
Man-Wai Mak
Interspeech 2014
Department of Electronic and Information Engineering
The Hong Kong Polytechnic University, Hong Kong SAR, China
Contents
1. Motivation of Work
2. Conventional PLDA
3. Mixture of PLDA for Noise Robust Speaker Verification
4. Experiments on SRE12
5. Conclusions
2
Motivation
•
Conventional i-vector/PLDA systems use a single PLDA
model to handle all SNR conditions.
I-Vector/PLDA PLDA Score
Scoring
Enrollment
Utterances
3
Motivation
•
We argue that a PLDA model should focus on a small
range of SNR.
PLDA
Model 1
PLDA Score
PLDA
Model 2
PLDA Score
PLDA
Model 3
PLDA Score
4
Distribution of SNR in SRE12
Each SNR region is handled
by a PLDA Model
5
Proposed Solution
The full spectrum of SNRs is handled by a mixture of
PLDA in which the posteriors of the indicator variables
depend on the utterance’s SNR.
SNR
Estimator
SNR Posterior Estimator
•
PLDA
Model 1
PLDA
Model 2
PLDA
Score
PLDA
Model 3
6
Key Features of Proposed Solution
•
Verification scores depend not only on the samespeaker and different-speaker likelihoods but also on
the posterior probabilities of SNR.
7
Contents
1. Motivation of Work
2. Conventional PLDA
3. Mixture of PLDA for Noise Robust Speaker Verification
4. Experiments on SRE12
5. Conclusions
8
Probabilistic LDA (PLDA)
• In PLDA, the i-vectors x are modeled by a factor analyzer of
the form:
i-vector extracted
from the j-th session
of the i-th speaker
Global mean
of all i-vectors
Speaker
factor
Residual noise
with covariance Σ
Speaker factor
loading matrix
• Density of x is
9
Probabilistic LDA (PLDA)
• The PLDA parameters ω={m, V, Σ} are estimated by
maximizing
10
Contents
1. Motivation of Work
2. Conventional PLDA
3. Mixture of PLDA for Noise Robust Speaker Verification
4. Experiments on SRE12
5. Conclusions
11
Mixture of PLDA
• Model Parameters of mPLDA:
For modeling
SNR of utts.
For modeling
SNR-dependent
i-vectors
12
2
Generative Model for mPLDA
: SNR in dB
Posterior of SNR
where the posterior prob of SNR is
13
PLDA vs mPLDA
Generative Model
PLDA
Mixture of PLDA
14
Likelihood-Ratio Scores of mPLDA
• Same-speaker likelihood:
i-vectors of target and
test speakers
SNR of target and test
utterances
15
Likelihood-Ratio Scores of mPLDA
• Different-speaker likelihood:
• Verification Score =
Same-speaker likelihood
Different-speaker likelihood
16
16
PLDA vs mPLDA
Auxiliary Function
PLDA:
Mixture of PLDA:
No. of
mixtures
Latent indicator variables:
Latent speaker factors:
SNR of training utterances:
Speaker
indexes
Session
indexes
17
PLDA vs mPLDA
E-Step
PLDA
Mixture of PLDA
18
PLDA versus mPLDA
M-Step
PLDA
Mixture of PLDA
19
Contents
1. Motivation of Work
2. Conventional PLDA
3. Mixture of PLDA for Noise Robust Speaker Verification
4. Experiments on SRE12
5. Conclusions
20
Experiments
• Evaluation dataset: Common evaluation condition 2 of NIST SRE
2012 core set.
• Parameterization: 19 MFCCs together with energy plus their
1st and 2nd derivatives  60-Dim
• UBM: gender-dependent, 1024 mixtures
• Total Variability Matrix: gender-dependent, 500 total factors
• I-Vector Preprocessing:
 Whitening by WCCN then length normalization
 Followed by LDA (500-dim  200-dim) and WCCN
21
Experiments
• In NIST 2012 SRE, training utterances from telephone channels are clean,
but some of the test utterances are noisy.
• We used the FaNT tool to add babble noise to the clean training
utterances
Utterances from
microphone
channels
Babble
noise
FaNT
From telephone
channels
22
Performance on SRE12
• Train on tel+mic speech and test on noisy tel speech (CC4)
• Train on tel+mic speech and test on tel speech recorded in
noisy environments (CC5)
• Use FaNT and a VAD to determine the SNR of test utts.
See our
ISCSLP14
paper
Performance on SRE12
• Train on tel+mic speech and test on noisy tel speech (CC4)
• Use FaNT and a VAD to determine the SNR of test utts.
Female
Male
PLDA
PLDA
mPLDA
mPLDA
Conclusions
• Mixture of SNR-dependent PLDA is a flexible model
that can handle noisy speech with a wide range of
SNR
• The contribution of the mixtures are probabilistically
combined based on the SNR of the test utterances
and the target-speaker’s utterances
• Results show that the mixture PLDA performs better
than conventional PLDA whenever the SNR of test
utterances varies widely.
Hard-Decision Mixture of PLDA
Training of mPLDA
• Auxiliary function:
where
No. of
mixtures
Latent indicator variables:
Latent speaker factors:
SNR of training utterances:
Speaker
indexes
Session
indexes
27
PLDA Scoring
xs and xt share the
same z
28
Probabilistic LDA (PLDA)
• PLDA example: 2-D data in 1-D subspace
z
Take a sample
according to p(z)
Source: S. Prince, “Computer vision: models, learning and inference”, 2012
29