Weakly-Supervised Learning with Cost

Download Report

Transcript Weakly-Supervised Learning with Cost

Weakly-Supervised Learning with
Cost-Augmented Contrastive Estimation:
Supplementary Material
Kevin Gimpel
Mohit Bansal
Toyota Technological Institute at Chicago, IL 60637, USA
{kgimpel,mbansal}@ttic.edu
1
Tag Bigram Costs
We used treebanks for 11 languages from the
CoNLL 2006/2007 shared tasks other than those
used in our POS tagging experiments. In particular, we used Arabic, Bulgarian, Catalan, Czech,
English, Spanish, German, Hungarian, Italian,
Japanese, and Turkish. We replicated shorter treebanks a sufficient number of times until they were
a similar size as the largest treebank. Then we
counted gold POS tag unigrams and bigrams from
the concatenation. In Table 1 we show counts and
costs for tag bigrams.
tag bigram
CONJ hEOSi
DET hEOSi
ADP hEOSi
DET PRT
ADV hEOSi
X DET
PRT hEOSi
PRON hEOSi
ADV X
X PRT
DET CONJ
X ADV
PRT X
X hEOSi
X PRON
NUM X
VERB hEOSi
CONJ X
PRON X
DET X
NUM hEOSi
X VERB
NUM ADV
X NUM
ADJ hEOSi
PRT CONJ
NUM PRON
PRT ADP
ADJ X
NUM DET
ADP X
X ADJ
hBOSi PRT
VERB X
ADP PRT
PRT DET
X CONJ
DET ADP
X ADP
NUM CONJ
. PRT
XX
NUM PRT
PRON NUM
ADJ NUM
DET ADV
NUM VERB
PRT ADV
.X
ADP ADP
hBOSi NUM
DET VERB
DET .
PRT NUM
X NOUN
ADJ ADV
count
5
20
30
109
201
259
281
406
486
506
518
739
745
747
805
1013
1023
1037
1141
1282
1475
1490
1587
1630
1845
1936
1968
2039
2477
2564
2595
2667
2787
2809
2885
3301
4259
4276
4957
5078
5782
5881
5896
5944
6243
6501
6834
6841
6844
6954
7079
7390
7413
7526
7870
7932
cost
115.23
101.36
97.31
84.41
78.29
75.75
74.94
71.26
69.46
69.06
68.82
65.27
65.19
65.16
64.41
62.11
62.02
61.88
60.92
59.76
58.36
58.26
57.63
57.36
56.12
55.64
55.47
55.12
53.17
52.83
52.71
52.43
51.99
51.92
51.65
50.30
47.75
47.71
46.24
45.99
44.70
44.53
44.50
44.42
43.93
43.52
43.02
43.01
43.01
42.85
42.67
42.24
42.21
42.06
41.61
41.53
tag bigram
CONJ PRT
CONJ NUM
NUM ADJ
PRT PRON
ADP CONJ
ADJ DET
PRT ADJ
CONJ CONJ
ADJ PRT
ADV NUM
ADV PRT
DET DET
ADV CONJ
PRON PRT
hBOSi X
VERB NUM
PRON CONJ
NOUN hEOSi
ADP ADV
NOUN X
DET NUM
DET PRON
CONJ .
NUM ADP
ADJ PRON
ADP .
NUM NUM
PRON DET
hBOSi ADJ
hBOSi.
X.
PRT PRT
PRON ADV
ADV PRON
. NUM
PRT .
ADV NOUN
ADV DET
ADV ADV
NOUN NUM
NUM .
CONJ ADP
VERB PRT
hBOSi VERB
CONJ ADV
ADV ADP
PRON ADJ
CONJ ADJ
PRT NOUN
ADP NUM
PRON PRON
hBOSi ADP
hBOSi PRON
PRON .
. ADJ
hBOSi ADV
count
8494
8708
9204
9349
9362
9385
9646
9939
10069
10207
10230
10469
10739
10873
11226
11281
11922
12334
12637
13247
14495
14720
14921
15024
15396
15595
15807
16134
18858
18939
18973
21393
21569
23035
23833
24106
25692
27526
28654
28957
29359
29523
29762
30087
30668
31008
31259
32453
32560
32903
33339
33470
33486
33567
35337
36636
cost
40.85
40.60
40.05
39.89
39.88
39.85
39.58
39.28
39.15
39.01
38.99
38.76
38.50
38.38
38.06
38.01
37.46
37.12
36.88
36.41
35.51
35.35
35.22
35.15
34.90
34.77
34.64
34.43
32.87
32.83
32.81
31.61
31.53
30.87
30.53
30.42
29.78
29.09
28.69
28.59
28.45
28.39
28.31
28.20
28.01
27.90
27.82
27.45
27.41
27.31
27.18
27.14
27.13
27.11
26.59
26.23
tag bigram
ADJ VERB
PRON ADP
VERB CONJ
. ADV
ADV .
ADV ADJ
ADJ CONJ
CONJ PRON
NOUN DET
hBOSi CONJ
ADP VERB
ADJ ADJ
ADP ADJ
ADP PRON
CONJ DET
. ADP
VERB ADJ
NOUN ADV
PRT VERB
. PRON
. DET
VERB PRON
hBOSi DET
NOUN PRT
..
VERB ADV
NUM NOUN
ADV VERB
hBOSi NOUN
. VERB
CONJ VERB
PRON NOUN
DET ADJ
ADJ ADP
NOUN PRON
. CONJ
ADJ .
CONJ NOUN
. NOUN
VERB DET
NOUN CONJ
VERB ADP
VERB VERB
VERB .
VERB NOUN
PRON VERB
ADP DET
NOUN ADJ
NOUN VERB
ADJ NOUN
. hEOSi
NOUN NOUN
NOUN ADP
DET NOUN
ADP NOUN
NOUN .
count
38685
38982
39641
40531
41062
41244
44536
45286
46804
48265
49139
51444
54707
56097
57176
58998
59035
59291
59408
59523
60663
62171
62920
70345
71475
76624
78804
80126
82206
84367
85420
91872
93111
104280
106705
108929
109098
113019
118788
124640
141525
150002
150999
153257
157745
164872
193443
255720
260172
293295
367592
409828
427409
454980
470575
504897
cost
25.69
25.61
25.44
25.22
25.09
25.05
24.28
24.11
23.78
23.48
23.30
22.84
22.22
21.97
21.78
21.47
21.46
21.42
21.40
21.38
21.19
20.94
20.82
19.71
19.55
18.85
18.57
18.41
18.15
17.89
17.77
17.04
16.91
15.77
15.54
15.34
15.32
14.97
14.47
13.99
12.72
12.14
12.07
11.92
11.63
11.19
9.59
6.80
6.63
5.43
3.17
2.09
1.67
1.04
0.70
0.00
Table 1: Counts and costs for universal tag bigrams based on treebanks for 11 languages not used in
experiments. The cost used for unseen bigrams is the maximum of all costs in the table.