Ranking with High-Order and Missing Information

Download Report

Transcript Ranking with High-Order and Missing Information

Ranking with High-Order
and Missing Information
M. Pawan Kumar
Ecole Centrale Paris
Aseem Behl
Puneet Dokania
Pritish Mohapatra
C. V. Jawahar
PASCAL VOC
“Jumping” Classification
Processing
Features
Training
Classifier
PASCAL VOC
“Jumping” Classification
Processing
Features
✗
Training
Classifier
Think of a classifier !!!
PASCAL VOC
“Jumping” Ranking
Processing
Features
✗
Training
Classifier
Think of a classifier !!!
Ranking vs. Classification
Rank 1
Rank 2
Rank 3
Rank 4
Rank 5
Rank 6
Average Precision = 1
Ranking vs. Classification
Rank 1
Rank 2
Rank 3
Rank 4
Rank 5
Rank 6
0.92
Average Precision = 0.81
1
Accuracy = 0.67
1
Ranking vs. Classification
Ranking is not the same as classification
Average precision is not the same as accuracy
Should we use 0-1 loss based classifiers?
Or should we use AP loss based rankers?
Outline
• Optimizing Average Precision (AP-SVM)
• High-Order Information
• Missing Information
Yue, Finley, Radlinski and Joachims, SIGIR 2007
Problem Formulation
Single Input X
Φ(xi)
for all i  P
Φ(xk)
for all k  N
Problem Formulation
Single Output R
+1 if i is better ranked than k
Rik =
-1 if k is better ranked than i
Problem Formulation
Scoring Function
si(w) = wTΦ(xi) for all i  P
sk(w) = wTΦ(xk) for all k  N
S(X,R;w) = Σi  P Σk  N Rik(si(w) - sk(w))
Ranking at Test-Time
R(w) = maxR S(X,R;w)
x
x
x
x
x
x
x
x
1
2
3
4
5
6
7
8
Sort samples according to individual scores si(w)
Learning Formulation
Loss Function
Δ(R*,R(w))
= 1 – AP of rank R(w)
Non-convex
Parameter cannot be regularized
Learning Formulation
Upper Bound of Loss Function
S(X,R(w);w) + Δ(R*,R(w)) - S(X,R(w);w)
Learning Formulation
Upper Bound of Loss Function
S(X,R(w);w) + Δ(R*,R(w)) - S(X,R*;w)
Learning Formulation
Upper Bound of Loss Function
maxR S(X,R;w) + Δ(R*,R)
Convex
- S(X,R*;w)
Parameter can be regularized
minw ||w||2 + C ξ
S(X,R;w) + Δ(R*,R) - S(X,R*;w) ≤ ξ, for all R
Optimization for Learning
Cutting Plane Computation
maxR S(X,R;w) + Δ(R*,R)
x
x
x
x
x
x
x
x
1
2
3
4
5
6
7
8
Sort positive samples according to scores si(w)
Sort negative samples according to scores sk(w)
Find best rank of each negative sample independently
Optimization for Learning
Training Time
Cutting Plane Computation
AP
5x slower
0-1
Slightly faster
AP
Mohapatra, Jawahar and Kumar, NIPS 2014
Experiments
Images
Classes
PASCAL VOC 2011
Jumping
Phoning
10 ranking tasks
Playing Instrument
Reading
Poselets Features
Riding Bike
Riding Horse
Running
Taking Photo
Using Computer
Walking
Cross-validation
AP-SVM vs. SVM
PASCAL VOC ‘test’ Dataset
Difference
in AP
Better in 8 classes, tied in 2 classes
AP-SVM vs. SVM
Folds of PASCAL VOC ‘trainval’ Dataset
Difference
in AP
AP-SVM is statistically better in 3 classes
SVM is statistically better in 0 classes
Outline
• Optimizing Average Precision
• High-Order Information (HOAP-SVM)
• Missing Information
Dokania, Behl, Jawahar and Kumar, ECCV 2014
High-Order Information
• People perform similar actions
• People strike similar poses
• Objects are of same/similar sizes
• “Friends” have similar habits
• How can we use them for ranking? classification
Problem Formulation
x
Input x = {x1,x2,x3}
Output y = {-1,+1}3
Ψ(x,y) =
Ψ1(x,y)
Unary Features
Ψ2(x,y)
Pairwise Features
Learning Formulation
x
Input x = {x1,x2,x3}
Output y = {-1,+1}3
Δ(y*,y) = Fraction of incorrectly classified persons
Optimization for Learning
x
Input x = {x1,x2,x3}
Output y = {-1,+1}3
maxy wTΨ(x,y) + Δ(y*,y)
Graph Cuts (if supermodular)
LP Relaxation, or exhaustive search
Classification
x
Input x = {x1,x2,x3}
Output y = {-1,+1}3
maxy wTΨ(x,y)
Graph Cuts (if supermodular)
LP Relaxation, or exhaustive search
Ranking?
x
Input x = {x1,x2,x3}
Output y = {-1,+1}3
Use difference of max-marginals
Max-Marginal for Positive Class
x
Input x = {x1,x2,x3}
Output y = {-1,+1}3
Best possible score when person i is positive
mm+(i;w) = maxy,yi=+1 wTΨ(x,y)
Convex in w
Max-Marginal for Negative Class
x
Input x = {x1,x2,x3}
Output y = {-1,+1}3
Best possible score when person i is negative
mm-(i;w) = maxy,yi=-1 wTΨ(x,y)
Convex in w
Ranking
x
Input x = {x1,x2,x3}
Output y = {-1,+1}3
Use difference of max-marginals
HOB-SVM
si(w) = mm+(i;w) – mm-(i;w)
Difference-of-Convex in w
Ranking
Why not optimize AP directly?
High Order AP-SVM
HOAP-SVM
si(w) = mm+(i;w) – mm-(i;w)
Problem Formulation
Single Input X
Φ(xi)
for all i  P
Φ(xk)
for all k  N
Problem Formulation
Single Input R
+1 if i is better ranked than k
Rik =
-1 if k is better ranked than i
Problem Formulation
Scoring Function
si(w) = mm+(i;w) – mm-(i;w) for all i  P
sk(w) = mm+(k;w) – mm-(k;w) for all k  N
S(X,R;w) = Σi  P Σk  N Rik(si(w) - sk(w))
Ranking at Test-Time
R(w) = maxR S(X,R;w)
x
x
x
x
x
x
x
x
1
2
3
4
5
6
7
8
Sort samples according to individual scores si(w)
Learning Formulation
Loss Function
Δ(R*,R(w)) = 1 – AP of rank R(w)
Learning Formulation
Upper Bound of Loss Function
minw ||w||2 + C ξ
S(X,R;w) + Δ(R*,R) - S(X,R*;w) ≤ ξ, for all R
Optimization for Learning
Difference-of-convex program
Very efficient CCCP
Linearization step by Dynamic Graph Cuts
Kohli and Torr, ECCV 2006
Update step equivalent to AP-SVM
Experiments
Images
Classes
PASCAL VOC 2011
Jumping
Phoning
10 ranking tasks
Playing Instrument
Reading
Poselets Features
Riding Bike
Riding Horse
Running
Taking Photo
Using Computer
Walking
Cross-validation
HOB-SVM vs. AP-SVM
PASCAL VOC ‘test’ Dataset
Difference
in AP
Better in 4, worse in 3 and tied in 3 classes
HOB-SVM vs. AP-SVM
Folds of PASCAL VOC ‘trainval’ Dataset
Difference
in AP
HOB-SVM is statistically better in 0 classes
AP-SVM is statistically better in 0 classes
HOAP-SVM vs. AP-SVM
PASCAL VOC ‘test’ Dataset
Difference
in AP
Better in 7, worse in 2 and tied in 1 class
HOAP-SVM vs. AP-SVM
Folds of PASCAL VOC ‘trainval’ Dataset
Difference
in AP
HOAP-SVM is statistically better in 4 classes
AP-SVM is statistically better in 0 classes
Outline
• Optimizing Average Precision
• High-Order Information
• Missing Information (Latent-AP-SVM)
Behl, Jawahar and Kumar, CVPR 2014
Fully Supervised Learning
Weakly Supervised Learning
Rank images by relevance to ‘jumping’
Two Approaches
• Use Latent Structured SVM with AP loss
– Unintuitive Prediction
– Loose Upper Bound on Loss
– NP-hard Optimization for Cutting Planes
• Carefully design a Latent-AP-SVM
– Intuitive Prediction
– Tight Upper Bound on Loss
– Optimal Efficient Cutting Plane Computation
Results
Questions?
Code + Data Available