Transcript Large-Scale Object Recognition with Weak Supervision
Large-Scale Object Recognition with Weak Supervision Weiqiang Ren, Chong Wang, Yanhua Cheng, Kaiqi Huang, Tieniu Tan {wqren,cwang,yhcheng,kqhuang,tnt}@nlpr.ia.ac.cn
Task2 : Classification + Localization Task 2b: Classification + localization with additional training data — Ordered by
classification error
1. Only classification labels are used 2. Full image as object location
• Motivation • Method • Results
Outline
Motivation
Why Weakly Supervised Localization (WSL)?
Knowing where to look, recognizing objects will be easier !
However, in the classification-only task, no annotations of object location are available.
Weakly Supervised Localization
Current WSL Results on VOC07
10 5 0 40 35 30 25 20 15
13,9 15.0
22,4 22,7 26,2 26,4 31,6 33,7
13.9
: Weakly supervised object detector learning with model drift detection, ICCV 2011
15.0
: Object-centric spatial pooling for image classification, ECCV 2012
22.4
: Multi-fold mil training for weakly supervised object localization, CVPR 2014
22.7
: On learning to localize objects with minimal supervision, ICML 2014
26.2: Discovering Visual Objects in Large-scale Image Datasets with Weak Supervision, submitted to TPAMI 26.4
: Weakly supervised object detection with posterior regularization, BMVC 2014
31.6: Weakly supervised object localization with latent category learning, ECCV 2014 Sep 11, Poster Session 4A, #34
Our Work
VOC 2007 Ours Results 31.6
DPM 5.0
33.7
Weakly Supervised Object Localization with Latent Category Learning
ECCV 2014 VOC 2007 Ours Results 26.2
DPM 5.0
33.7
Discovering Visual Objects in Large-scale Image Datasets with Weak Supervision
Submitted to TPAMI
For the consideration of high efficiency in large-scale tasks, we use the second one.
Method
Input Images
2
Framework Conv Layers FC Layers
Det Prediction 3 Rescoring 4 Cls Prediction 1
1 st
: CNN Architecture
Chatfield et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets
2 nd
: MILinear SVM
MILinear : Region Proposal
Good region proposal algorithms High recall High overlap Small number Low computation cost MCG pretrained on VOC 2012 Additional Data Training: 128 windows/ image Testing: 256 windows/image Compared to Selective Search (~2000)
MILinear: Feature Representations
• • • Low Level Features – – SIFT, LBP, HOG Shape context, Gabor, … Mid-Level Features – Bag of Visual Words (BoVW) Deep Hierarchical Features – Convolutional Networks – Deep Auto-Encoders – Deep Belief Nets
MILinear: Positive Window Mining
• • • • Clustering – KMeans Topic Model – pLSA, LDA, gLDA CRF Multiple Instance Learning – DD, EMDD, APR – MI-NN, – –
MI-SVM
, mi-SVM MILBoost
MILinear: Objective Function and Optimization • Multiple instance Linear SVM • Optimization: trust region Newton – – A kind of Quasi Newton method Working in the primal – Faster convergence
MILinear: Optimization Efficiency
3 rd
: Detection Rescoring
• Rescoring with softmax
train softmax max
… 1000 dim … 1000 dim 1000 classes
Softmax
: consider all the categories simultaneously at each minibatch of the optimization –
other Suppress the response of appearance similar object categories
4 th
: Classification Rescoring
• Linear Combination
S cls
S cls
… 1000 dim … 1000 dim )S
WSL
… 1000 dim
One funny thing
: We have tried some other strategies of score combination, but it seems not working !
Results
1
st
: Classification without WSL
Method
Baseline with
one
CNN :
Average
with
four
CNNs:
Top 5 Error
13.7
12.5
2
nd
: MILinear on ImageNet 2014
Methods Baseline (Full Image) MILinear Winner Detection Error 61.96
40.96
25.3
2
nd
: MILinear on VOC 2007
2 nd : MILinear on ILSVRC 2013 detection
mAP: 9.63%!
vs 8.99% (DPM5.0)
2
nd
: MILinear for Classification
Methods Top 5 Error
Milinear 17.1
3
rd
: WSL Rescoring (Softmax)
Method
Baseline with
one
CNN :
Average
with
four
CNN : MILinear MILinear + Rescore
Top 5 Error
13.7
12.5
17.1
13.5
The Softmax based rescoring successfully suppresses the predictions of other appearance similar object categories !
4
th
: Cls and WSL Combinataion
S cls
S cls
(1 )S
WSL
Method
Baseline with
one
CNN model:
Average
with
four
CNN models: MILinear MILinear + Rescore Cls (12.5) + MILinear (13.5)
Top 5 Error
13.7
12.5
17.1
13.5
11.5
WSL and Cls can be complementary to each other!
Russakovsky et al.
ImageNet Large Scale Visual Object Challenge
.
Conclusion
• WSL always helps classification • WSL has large potential: WSL data is cheap
Thank You!