PowerPoint slide

Download Report

Transcript PowerPoint slide

USING CLASS WEIGHTING
IN INTER-CLASS MLLR
Sam-Joo Doh and Richard M. Stern
Department of Electrical and Computer Engineering
and School of Computer Science
Carnegie Mellon University
October 20, 2000
1
Outline
Introduction
Review

Transformation-based adaptation

Inter-class MLLR
Application of weights
 For different neighboring classes
Summary
Carnegie Mellon
2
Robust Speech Group
Introduction
We would like to achieve
Better adaptation using small amount of adaptation data
 Enhance recognition accuracy

Current method



Reduce the number of parameters
Assume transformation function
 Transformation-based adaptation
Example: Maximum likelihood linear regression
Carnegie Mellon
3
Robust Speech Group
Introduction (cont’d)
Transformation-based adaptation
 Transformation classes are assumed to be independent
 It does not achieve reliable estimates for multiple
classes using a small amount of adaptation data
Better idea ?
 Utilize inter-class relationship to achieve more reliable
estimates for multiple classes
Carnegie Mellon
4
Robust Speech Group
Transformation-Based Adaptation
Estimate each target parameter (mean vector)
Carnegie Mellon
5
Robust Speech Group
Transformation-Based Adaptation (cont’d)
Estimate each transformation function
Carnegie Mellon
6
Robust Speech Group
Transformation-Based Adaptation (cont’d)
Trade-off
Quality
Number of transformation classes
Carnegie Mellon
7
Robust Speech Group
Previous Works
Consider Correlations among model parameters
Mostly in Bayesian framework
Considering a few neighboring models:
 Not effective
Considering all neighboring models:
 Too much computation
It is difficult to apply correlation on multi-Gaussian
mixtures: No explicit correspondence
Carnegie Mellon
8
Robust Speech Group
Previous Works (cont’d)
Using correlations among model parameters
Carnegie Mellon
9
Robust Speech Group
Inter-Class Relationship
Inter-class relationship
among transformation functions ?
Carnegie Mellon
10
Robust Speech Group
Inter-Class Relationship (cont’d)
Two classes are independent
Class 1
Class 2
μ̂1 j  f1 (μ1 j ), μ̂ 2 k  f 2 (μ 2 k ),
j  class 1
Carnegie Mellon
k  class 2
11
Robust Speech Group
Inter-Class Relationship (cont’d)
If we know an inter-class transformation g12(.)
Class 1
Class 2
 Transform class 2 parameters
f2(.)
μ 2k
g12(.)
(μ(2212kk )),),
μ̂1 j  f1 (μ1 j ), μ̂ 2 k  f12(μ
class 22
kkclass
j  class 1
m2k(12)
μ̂ 2 k
f1(.)
 Now class 2 data contribute to
the estimation of f1(.)
 More reliable estimation of f1(.)
while it keeps the characteristics
of Class 1
f2(.) can be estimated by transforming class 1 parameters
Carnegie Mellon
12
Robust Speech Group
Inter-class MLLR
Use Linear Regression for inter-class transformation
μ̂ k  A1 μ k  b1, k  Class 1
: Target class
μ̂ k  A1(T12 μ k  d12 )  b1
μ̂ k  A 2 μ k  b2 , k  Class 2
 A1(μ (k12) )  b1
Estimate (A1, b1) to minimize Q
Q  
t

t
Where
k  Class1

 t (k )(ot  A1 μ k  b1)T Ck1 (ot  A1 μ k  b1)
k  Class 2
Ck :
 t (k )(ot  A1 μ (k12)  b1)T Ck1 (ot  A1 μ (k12)  b1)
covariance matrix of Gaussian k
 t (k ) : a posteriori probability of being with Gaussian k at time t
ot :
Carnegie Mellon
input feature vector at time t
13
Robust Speech Group
Application of Weights
Neighboring classes have different contributions to the
target class
Carnegie Mellon
14
Robust Speech Group
Application of Weights (cont’d)
Application of weights to the neighboring classes
 We assume μ̂ k  A1 μ (k1n )  b1 in neighboring class n
 The error using (A1, b1) in neighboring class n
ot  A1 μ (k1n )  b1  et
 Weighted least squares estimation:
 Use the variance of the error for weight
 Large error  Small weight
 Small error  Large weight
 
t
 
t
Q
Carnegie Mellon

n  Neighbor k  Class n
 t (k )(ot  A1 μ (k1n )  b1) T Ck1 (ot  A1 μ (k1n )  b1)
k  Class 1
 t (k )(ot  A1 μ k  b1) T Ck1 (ot  A1 μ k  b1)
15
Robust Speech Group
Number of Neighboring Classes
Limit the number of neighboring classes
 Sort neighboring classes
 Set threshold for the number of samples
 Use “closer” neighboring class first
 Count the number of samples used
 Use next neighboring classes until the number of samples
exceed the threshold
Carnegie Mellon
16
Robust Speech Group
Experiments
Test data


1994 DARPA, Wall Street Journal (WSJ) task
10 Non-native speaker x 20 test sentences (Spoke 3: s3-94)
Baseline System: CMU SPHINX-3


Continuous HMM, 6000 senones
39 dimensional features
• MFCC cepstra + delta + delta-delta + power
Supervised/Unsupervised adaptation

Focus on small amounts of adaptation data
13 phonetic-based classes for inter-class MLLR
Carnegie Mellon
17
Robust Speech Group
Experiments (cont’d)
Supervised adaptation
Word Error Rates
Adaptation Method
1 Adapt. Sent.
3 Adapt. Sent.
Baseline (No adapt)
27.3%
27.3%
Conventional MLLR (one class)
24.1%
23.1%
Inter-class MLLR without weights
(full + shift)
20.4% (15.4%)
19.6% (15.2%)
Inter-class MLLR with weights
(full + shift)
20.2% (16.2%)
19.3% (16.5%)
Carnegie Mellon
18
Robust Speech Group
Experiments (cont’d)
Unsupervised adaptation
Word Error Rates
Adaptation Method
1 Test Sent.
10 Test Sent.
Baseline (No adapt)
27.3%
27.3%
Conventional MLLR (one class)
26.7%
23.9%
Inter-class MLLR without weights
(full + shift)
24.0 % (10.1%)
20.1% (15.9%)
Inter-class MLLR with weights
(full + shift)
24.3 % (9.0%)
19.9% (16.7%)
Carnegie Mellon
19
Robust Speech Group
Experiments (cont’d)
Limit the number of neighboring classes
Supervised adaptation: 10 adaptation sentences
21
WER (%)
20.5
20
19.5
19
0
Carnegie Mellon
2
4
6
8
10
Average number of classes
20
12
Robust Speech Group
Summary
Application of weights
 Use weighted least square estimation
 Was helpful for supervised case
 Was not helpful for unsupervised case
(with small amount of adaptation data)
Number of neighboring classes
 Use smaller number of neighboring classes
as more adaptation data are available
Carnegie Mellon
21
Robust Speech Group
Summary
Inter-class transformation
 It can have speaker-dependent information
 We may prepare several sets of inter-class
transformations
 Select appropriate set for a new speaker
Combination with Principal Component MLLR
 Did not provide additional improvement
Carnegie Mellon
22
Robust Speech Group
Thank you !
Carnegie Mellon
23
Robust Speech Group