PowerPoint 演示文稿
Download
Report
Transcript PowerPoint 演示文稿
Bagging-based System
Combination for Domain
Adaptation
Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu
INSTITUTE OF COMPUTING TECHNOLOGY
Institute of Computing Technology
Chinese Academy of Sciences
INSTITUTE OF COMPUTING
TECHNOLOGY
An Example
2
An Example
INSTITUTE OF COMPUTING
TECHNOLOGY
Initial MT system
3
An Example
Initial MT system
INSTITUTE OF COMPUTING
TECHNOLOGY
Tuned MT system
that fits domain A
Development set
A:90% B:10%
The translation
styles of A and B are
quite different
4
An Example
Initial MT system
Development set
A:90% B:10%
INSTITUTE OF COMPUTING
TECHNOLOGY
Tuned MT system
that fits domain A
Test set
A:10% B:90%
5
An Example
Initial MT system
Development set
A:90% B:10%
The translation style
fits A, but we mainly
want to translate B
INSTITUTE OF COMPUTING
TECHNOLOGY
Tuned MT system
that fits domain A
Test set
A:10% B:90%
6
Traditional Methods
INSTITUTE OF COMPUTING
TECHNOLOGY
Monolingual data with
domain annotation
7
Traditional Methods
Monolingual data with
domain annotation
INSTITUTE OF COMPUTING
TECHNOLOGY
Domain
recognizer
8
Traditional Methods
INSTITUTE OF COMPUTING
TECHNOLOGY
Bilingual
training data
9
Traditional Methods
INSTITUTE OF COMPUTING
TECHNOLOGY
training data :
domain A
Bilingual
training data
training data :
domain B
Domain
recognizer
10
Traditional Methods
training data :
domain A
INSTITUTE OF COMPUTING
TECHNOLOGY
MT system
domain A
Bilingual
training data
training data :
domain B
MT system
domain B
Domain
recognizer
11
Traditional Methods
INSTITUTE OF COMPUTING
TECHNOLOGY
Test set
12
Traditional Methods
INSTITUTE OF COMPUTING
TECHNOLOGY
Test set
domain A
Test set
Test set
domain B
Domain
recognizer
13
Traditional Methods
INSTITUTE OF COMPUTING
TECHNOLOGY
The translation
result domain A
Test set
domain A
MT system
domain A
The translation
result
The translation
result domain B
Test set
domain B
MT system
domain B
14
The merits
Simple and effective
Fits Human’s intuition
INSTITUTE OF COMPUTING
TECHNOLOGY
15
The drawbacks
INSTITUTE OF COMPUTING
TECHNOLOGY
Classification Error (CE)
Especially for unsupervised methods
Supervised methods can make CE low, yet
requiring annotation data limits its usage
16
Our motivation
INSTITUTE OF COMPUTING
TECHNOLOGY
Jump out of the alley of doing adaptation directly
Statistics methods (such as Bagging) can help.
17
INSTITUTE OF COMPUTING
TECHNOLOGY
Preliminary
The general framework of Bagging
18
General framework of Bagging
INSTITUTE OF COMPUTING
TECHNOLOGY
Training set D
19
General framework of Bagging
INSTITUTE OF COMPUTING
TECHNOLOGY
Training set D
Training set D1
Training set D2
Training set D3
……
C1
C2
C3
……
20
General framework of Bagging
C1
C2
C3
INSTITUTE OF COMPUTING
TECHNOLOGY
……
Test sample
21
General framework of Bagging
INSTITUTE OF COMPUTING
TECHNOLOGY
Voting result
Result of C1
Result of C2
Result of C3
……
C1
C2
C3
……
Test sample
22
INSTITUTE OF COMPUTING
TECHNOLOGY
Our method
23
Training
INSTITUTE OF COMPUTING
TECHNOLOGY
Suppose there is a
development set
A,A,A,B,B
For simplicity, there
are only 5 sentences, 3
belong A, 2 belong B
24
We bootstrap N new
development sets
Training
INSTITUTE OF COMPUTING
TECHNOLOGY
A,B,B,B,B
A,A,B,B,B
A,A,A,B,B
A,A,B,B,B
A,A,A,B,B
A,A,A,A,B
……
25
For each set, a
subsystem is tuned
Training
A,A,A,B,B
A,B,B,B,B
MT system-1
A,A,B,B,B
MT system-2
A,A,B,B,B
MT system-3
A,A,A,B,B
MT system-4
A,A,A,A,B
MT system-5
……
INSTITUTE OF COMPUTING
TECHNOLOGY
……
26
Decoding
For simplicity, Suppose
only 2 subsystem has
been tuned
INSTITUTE OF COMPUTING
TECHNOLOGY
Subsystem-1
W:<-0.8,0.2>
Subsystem-1
W:<-0.6,0.4>
27
Decoding
INSTITUTE OF COMPUTING
TECHNOLOGY
Subsystem-1
W:<-0.8,0.2>
AB
Subsystem-1
W:<-0.6,0.4>
Now a sentence “A B”
needs a translation
28
Decoding
After translation, each
system generate its Nbest candidate
Subsystem-1
W:<-0.8,0.2>
a b; <0.2, 0.2>
a c; <0.2, 0.3>
Subsystem-1
W:<-0.6,0.4>
a b; <0.2, 0.2>
a b; <0.1, 0.3>
a d; <0.3, 0.4>
INSTITUTE OF COMPUTING
TECHNOLOGY
AB
29
Decoding
Fuse these N-best lists
and eliminate deductions
Subsystem-1
W:<-0.8,0.2>
a b; <0.2, 0.2>
a c; <0.2, 0.3>
Subsystem-1
W:<-0.6,0.4>
a b; <0.2, 0.2>
a b; <0.1, 0.3>
a d; <0.3, 0.4>
AB
INSTITUTE OF COMPUTING
TECHNOLOGY
a b; <0.1, 0.2>
a b; <0.1, 0.3>
a c; <0.2, 0.3>
a d; <0.3, 0.4>
30
Decoding
INSTITUTE OF COMPUTING
TECHNOLOGY
Subsystem-1
W:<-0.8,0.2>
a b; <0.2, 0.2>
a c; <0.2, 0.3>
Subsystem-1
W:<-0.6,0.4>
a b; <0.2, 0.2>
a b; <0.1, 0.3>
a d; <0.3, 0.4>
AB
a b; <0.1, 0.2>
a b; <0.1, 0.3>
a c; <0.2, 0.3>
a d; <0.3, 0.4>
Candidates are identical
only if their target strings
and feature values are
entirely equal
31
S represent the number
of subsystems
Decoding
S
final _ score (c) featc t
INSTITUTE OF COMPUTING
TECHNOLOGY
t 1
a b; <0.2, 0.2>
a b; <0.1, 0.3>
a c; <0.2, 0.3>
a d; <0.3, 0.4>
Subsystem-1
W:<-0.8,0.2>
Subsystem-1
W:<-0.6,0.4>
a b; <0.2, 0.2>; -0.16
a b; <0.1, 0.3>; +0.04
a c; <0.2, 0.3>; -0.1
a d; <0.3, 0.4>; -0.18
Calculate the
voting score
32
Decoding
S
final _ score (c) featc t
INSTITUTE OF COMPUTING
TECHNOLOGY
t 1
a b; <0.2, 0.2>
a b; <0.1, 0.3>
a c; <0.2, 0.3>
a d; <0.3, 0.4>
Subsystem-1
W:<-0.8,0.2>
Subsystem-1
W:<-0.6,0.4>
a b; <0.2, 0.2>; -0.16
a b; <0.1, 0.3>; +0.04
a c; <0.2, 0.3>; -0.1
a d; <0.3, 0.4>; -0.18
The one with the
highest score
wins
33
Decoding
S
final _ score (c) featc t
INSTITUTE OF COMPUTING
TECHNOLOGY
t 1
a b; <0.2, 0.2>
a b; <0.1, 0.3>
a c; <0.2, 0.3>
a d; <0.3, 0.4>
Subsystem-1
W:<-0.8,0.2>
Subsystem-1
W:<-0.6,0.4>
Since subsystems are different copies of
the same model and share unique training
data, calibration is unnecessary
a b; <0.2, 0.2>; -0.16
a b; <0.1, 0.3>; +0.04
a c; <0.2, 0.3>; -0.1
a d; <0.3, 0.4>; -0.18
The one with the
highest score
wins
34
INSTITUTE OF COMPUTING
TECHNOLOGY
Experiments
35
Basic Setups
INSTITUTE OF COMPUTING
TECHNOLOGY
Data: NTCIR9 Chinese-English patent corpus
1k sentence pairs as development set
Another 1k pairs as test set
The remains are used for training
System: hierarchical phrase based model
Alignment: GIZA++ grow-diag-final
36
Effectiveness : Show and Prove
INSTITUTE OF COMPUTING
TECHNOLOGY
Tune 30 subsystems using Bagging
Tune 30 subsystems with random initial weight
Evaluate the fusion results of the first N (N=5,10,
15, 20, 30) subsystems of both and compare
37
Results: 1-best
INSTITUTE OF COMPUTING
TECHNOLOGY
32,00
31,90
31,9
31,80
31,8
+0.82
31,70
31,73
31,64
31,60
bagging
random
31,51
31,50
31,40
31,30
31,23
31,20
31,10
31,13
31,11
31,08
31,2
31,17
31,00
1
5
10
15
20
30
Number of
subsystem
38
Results: 1-best
INSTITUTE OF COMPUTING
TECHNOLOGY
32,00
31,90
31,9
31,80
31,8
+0.70
31,73
31,70
31,64
31,60
bagging
random
31,51
31,50
31,40
31,30
31,23
31,20
31,10
31,13
31,11
31,08
31,2
31,17
31,00
1
5
10
15
20
30
Number of
subsystem
39
Results: Oracle
INSTITUTE OF COMPUTING
TECHNOLOGY
43,00
42,00
42,52
42,27
+6.22
42,96
42,74
41,00
40,35
40,00
39,00
38,00
37,00
39,04
38,82
38,67
38,35
39,25
bagging
random
36,74
36,00
1
5
10
15
20
30
Number of
subsystem
40
Results: Oracle
INSTITUTE OF COMPUTING
TECHNOLOGY
43,00
42,52
42,27
42,00
42,96
42,74
+3.71
41,00
40,35
40,00
39,00
38,00
37,00
39,04
38,82
38,67
38,35
39,25
bagging
random
36,74
36,00
1
5
10
15
20
30
Number of
subsystem
41
Compare with traditional methods
Evaluate a supervised method
INSTITUTE OF COMPUTING
TECHNOLOGY
For tackling data sparsity only operate on
development set and test set
Evaluate a unsupervised method
Similar to Yamada (2007)
To avoid data sparsity, only LM specific
42
Results
32,00
INSTITUTE OF COMPUTING
TECHNOLOGY
31,9
31,90
31,80
31,70
31,63
31,60
31,50
1-best
31,40
31,30
31,24
31,20
31,10
31,08
31,00
baseline
bagging
supervise
unsupervise
43
Conclusions
INSTITUTE OF COMPUTING
TECHNOLOGY
Propose a bagging-based method to address
multi-domain translation problem.
Experiments shows that:
Bagging is effective for domain adaptation
problem
Our method surpass baseline explicitly, and is
even better than some traditional methods.
44
INSTITUTE OF COMPUTING
TECHNOLOGY
Thank you for listening
And any questions?
45