Methods for Learning Classifier Combinations: No Clear Winner

Transcript Methods for Learning Classifier Combinations: No Clear Winner

Methods for Learning Classifier
Combinations: No Clear Winner
Dmitriy Fradkin, Paul Kantor
DIMACS, Rutgers University
7/17/2015
Dmitriy Fradkin, ACM SAC'2005
1
Topic 1
System 2
System 1
Topic 2 …..
System 2
System 1
New Topics
System 2
System 1
?
Local Fusion
7/17/2015
Federated or
Global Fusion
Dmitriy Fradkin, ACM SAC'2005
2
Overview
• Discuss local fusion methods
• Describe a new fusion approach for multitopic problems that we call “federated”
• Compare it empirically to the global
approach, previously described in [Bartell
et. al. 1994]
• Interpret the results
7/17/2015
Dmitriy Fradkin, ACM SAC'2005
3
Related Work in IR
• [Bartell et. al, 1994] - global fusion of systems
• [Hull et. al, 1996] - local fusion methods for
document filtering (averaging, linear and logistic
regression, grid search)
• [Lam and Lai 2001] used category-specific
features to model error-rate, and then picked the
single best system for a category
• [Bennet et.al, 2002] uses “reliability indicators”
together with scores as input to a metaclassifier
7/17/2015
Dmitriy Fradkin, ACM SAC'2005
4
Combination of Classifiers
Relevance Judgment:
Decision Rule:
y(d , q)  {0,1}
C(d , q)  {0,1}  sign(r (d , q)   q )
The problem of fusion can be formulated as the problem of
finding a way to combine several decision rules
7/17/2015
Dmitriy Fradkin, ACM SAC'2005
5
Linear Combinations
l
C F (d , q )  sign(  j x j (d , q )   q )
j 1
where  is an l - dimensional vectorof weight s,
 is a thresholdand x(d , q) is an l - dimensional
vectorof normalizedscores, given by thesystems
to documentd on t opicq
xs 
rs (d , q)  min rs (d ' , q)
d'
max rs (d ' , q)  min rs (d ' , q)
d'
7/17/2015
d'
Dmitriy Fradkin, ACM SAC'2005
7
Input to Local Fusion
doc 1
doc 2
…
doc n
System 1 System 2 …
x_11
x_12
…
x_21
x_22
…
…
…
…
x_n1
x_n1
…
System L Relevance
x_1l
y_1
x_2l
y_2
…
…
x_nl
y_n
x j - vectorof scoresfor a given document,
j  1,...,n (documents)
y j - relevancejudgement for jth document
7/17/2015
Dmitriy Fradkin, ACM SAC'2005
8
Local Fusion Methods
A new fusion method:
Other methods:
Centroid :  
Linear : min

7/17/2015
x   x
( y
j 1,...,n
Linear 2 : min
 ,
x   x
j
( y
j 1,...,n
   xj )
j
Dmitriy Fradkin, ACM SAC'2005
2
   xj  )
2
9
Local Fusion Methods (cont.)
Logistic : min
 ,
(y
j 1,...,n
j
log p j  (1  y j ) log(1  p j ))
p j  p( y  1 | x j ,  ),
log
pj
1 p j
   xj 
Since log is a monotone function, the underlying decision
rule is linear
7/17/2015
Dmitriy Fradkin, ACM SAC'2005
10
Threshold Tuning
• Once a vector of parameters is found for a local rule, we
compute fusion score on the training set and find a
threshold maximizing a particular utility measure:
 i   (qi ,  i )
Different combinations lead to different scores and decisions.
7/17/2015
Dmitriy Fradkin, ACM SAC'2005
11
Global Fusion
When there are many topics:
•
Combine all document-query relevance judgments and
corresponding score together (as if for a single query)
•
Compute a local fusion rule
When data for a new training topic becomes
available we can either:
•
solve the problem from the scratch, or
•
continue using the same rule.
7/17/2015
Dmitriy Fradkin, ACM SAC'2005
13
Input to Global Fusion
doc
doc
…
doc
doc
…
doc
System 1 System 2 …
1/query 1 x_111
x_112
…
1/query 2 x_121
x_122
…
…
…
…
1/query m x_111
x_112
…
2/ query 1 x_21
x_212
…
…
…
…
n/ query m x_n1
x_nm1
…
7/17/2015
Dmitriy Fradkin, ACM SAC'2005
System L Relevance
x_11l
y_11
x_12l
y_12
…
…
x_11l
y_1m
x_21l
y_21
…
…
x_nml
y_nm
14
Question:
Suppose we know local fusion rules on a set of queries.
• Can we exploit this knowledge on other queries?
• Can we come up with a scheme that can easily incorporate new
training queries?
7/17/2015
Dmitriy Fradkin, ACM SAC'2005
15
Federated Fusion
Given trainingqueries q1,...,qm with their localfusion rules ( j , j )
1 m
*    j
m j 1
1 m
1 m
 *   j   (q j ,  )
m j 1
m j 1
New training topics are easy to incorporate!
7/17/2015
Dmitriy Fradkin, ACM SAC'2005
16
Experimental Evaluation
•
•
•
•
Reuters Corpus v1, version 2 (RCV1-v2)
99 topics
Completely judged
~23K documents (as in Lewis et. al. 2004) to train
individual systems
• Selected 4060 (from ~ 800K) to construct fusion
rules
• 9-fold cross-validation over topics
7/17/2015
Dmitriy Fradkin, ACM SAC'2005
17
Utility Measures
2 | D |  | D |
T 11NU
2|T |
T11SU 
max( T11NU, 0.5)  0.5
1.5
T+ - all positive documents; D+ - submitted positive;
D- - submitted negative
7/17/2015
Dmitriy Fradkin, ACM SAC'2005
18
Term Representation
1  log(f'(t,d)), if f'(t,d)  0,
f(t,d)  

0
otherwise


where f’(t,d) is number of times a term occurs in a document.
IDF weighting: let i’(t) is the number of documents, in the
training set T, containing term t. Then:
iD(t)  log(
7/17/2015
1 T
1  i'(t)
Dmitriy Fradkin, ACM SAC'2005
)
19
Individual Classifiers
• Bayesian Binary Regression (BBR) [Genkin
et. al. 2004]
• kNN, k=384 (k was chosen on the basis of
prior experiments)
• Rocchio Classifier
7/17/2015
Dmitriy Fradkin, ACM SAC'2005
20
Single Classifiers and BBR-kNN fusion
1.2
kNN
0.8
Rocchio
0.6
BBR
0.4
BBR-kNN global
0.2
BBR-kNN federated
4
9
14
20
31
52
82
133
178
296
0
1911
T11SU
1
Topics (# of Relevant Documents)
7/17/2015
Dmitriy Fradkin, ACM SAC'2005
22
Global vs. Federated
1
0.6
BBR-kNN global
0.4
BBR-kNN federated
0.2
E141
C331
G152
E311
E31
C173
GENT
C182
E13
E11
E512
C172
G15
C21
E12
GCRIM
C181
C18
M14
0
CCAT
T11SU
0.8
Topics (in decreasing number of relevant documents)
7/17/2015
Dmitriy Fradkin, ACM SAC'2005
23
Global vs. Federated
Federated (T11SU)
1
0.8
0.6
BBR-kNN global
0.4
Series1
0.2
0
0
0.2
0.4
0.6
0.8
1
Global (T11SU)
7/17/2015
Dmitriy Fradkin, ACM SAC'2005
24
Results
Local Fusion
none
Centroid
Linear
Linear 2
Logistic
kNN
Rocchio
0.583
0.54
…
…
…
…
…
…
…
…
BBR
BBR-kNN global BBR-kNN federated
0.578 …
…
…
0.569
0.587
…
0.569
0.574
…
0.569
0.575
…
0.556
0.549
Average T11SU measure across 99 topics of RCV1
7/17/2015
Dmitriy Fradkin, ACM SAC'2005
25
Conclusions
• Centroid method performs best with federated fusion
• Federated fusion gives higher average utility,
• But global fusion performs better on greater number of topics.
• This seems to be related to the number of relevant documents
for individual topics (federated is better for topics with few
relevant documents).
• No Clear Winner: the choice of methods depends on user’s
objectives
• However, computationally Federated fusion is more efficient
• Have to consider topic properties when choosing a
combination method
7/17/2015
Dmitriy Fradkin, ACM SAC'2005
26
Acknowledgments
• KD-D group via NSF grant EIA-0087022
• Members of DIMACS MMS project:
Fred Roberts (PI), Andrei Anghelescu, Alex
Genkin, Dave Lewis, David Madigan,
Vladimir Menkov
• Kwong Bor Ng
• Anonymous reviewers
7/17/2015
Dmitriy Fradkin, ACM SAC'2005
29

Methods for Learning Classifier Combinations: No Clear Winner

Transcript Methods for Learning Classifier Combinations: No Clear Winner

Directory