Folie 1 - ImageNet

Download Report

Transcript Folie 1 - ImageNet

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search
1
Deep Epitomic Nets and Scale/Position Search
for Image Classification
TTIC_ECP team
George Papandreou
Iasonas Kokkinos
Toyota Technological Institute
Ecole Centrale Paris/INRIA
at Chicago
2
TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search
TTIC_ECP entry in a nutshell
Goal: Invariance in Deep CNNs
Part 1: Deep epitomic nets: local translation (deformation)
Part 2: Global scaling and translation
(0) Baseline:
max-pooled net
(1) epitomic DCNN
13.0%
11.9%
~1% gain
(2) epitomic
DCNN+ search
10.56%
Fusion
(1)+(2)
10.22%
~1.5% gain
Top-5 error. All DCNNs have 6 convolutional and 2 fully-connected layers.
3
TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search
Deep Convolutional Neural Networks (DCNNs)
convolutional
fully connected
Cascade of convolution + max-pooling blocks
(deformation-invariant template matching)
Our work: different blocks (P1) & different architecture (P2)
LeCun et al.: Gradient-Based Learning Applied to Document Recognition, Proc. IEEE 1998
Krizhevsky et al.: ImageNet Classification with Deep CNNs, NIPS 2012
TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search
Part 1: Deep epitomic nets
4
TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search
Epitomes: translation-invariant patch models
Patch Templates
Separate modeling: more data & less power per parameter
Epitomes: a lot more for just a bit more
EM-based training
Jojic, Frey, Kannan: Epitomic analysis of appearance and shape, ICCV 2003
Benoit, Mairal, Bach, Ponce: Sparse image representation with epitomes, CVPR 2011
Grosse, Raina, Kwong, Ng: Shift-invariant sparse coding, UAI 2007
5
TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search
6
Mini-epitomes for image classification
Dictionary of mini-epitomes
Dictionary of patches (K-means)
Gains in (flat) BoW classification
Papandreou, Chen, Yuille: Modeling Image Patches with a Dictionary of Mini-Epitomes, CVPR14
7
TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search
From flat to deep: Epitomic convolution
Max-Pooling
Max over image positions
Epitomic Convolution
Max over epitome positions
G. Papandreou: Deep Epitomic Convolutional Neural Networks, arXiv, June 2014.
TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search
Deep Epitomic Convolutional Nets
Epitomic convolution
Convolution + max-pooling
Supervised dictionary learning by back-propagation
G. Papandreou: Deep Epitomic Convolutional Neural Networks, arXiv, June 2014.
8
TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search
Deep Epitomic Convolutional Nets
Parameter sharing: faster and more reliable model learning
Consistent improvements
(0) Baseline:
max-pooled net
(1) epitomic DCNN
13.0%
11.9%
~1% gain
9
TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search
Part 2: Global scaling and translation
10
11
TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search
Category-dependent (ear detector)
Scale Invariance challenge
Dogs
Scale-dependent (area)
12
TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search
Category-dependent (ear detector)
Scale Invariance challenge
Dogs
Skyscrapers
Scale-dependent
13
TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search
Category-dependent (ear detector)
Scale Invariance challenge
Training set
Dogs
Skyscrapers
Scale-dependent
14
TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search
Category-dependent (ear detector)
Scale Invariance challenge
Rule: Large skyscrapers have ears, large dogs don’t
Dogs
Skyscrapers
Scale-dependent
15
TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search
Category-dependent
Scale Invariant classification
MIL: End-to-end training!
Scale-dependent
‘bag’ of
features
feature
This work:
A. Howard. Some improvements on deep convolutional neural network based image classification, 2013.
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition, 2014.
T. Dietterich et al. Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence, 1997.
16
TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search
Step 1: Efficient multi-scale convolutional features
220x220x3
pyramid
stitch
5x5x512
GPU
I(x,y)
Patchwork(x,y)
I(x,y,s)
C(x,y,s)
C(x,y)
unstitch
multi-scale
convolutional
features
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat : ICLR 2014
Dubout, C., Fleuret, F.: Exact acceleration of linear object detectors. ECCV 2012
Iandola, F., Moskewicz, M., Karayev, S., Girshick, R., Darrell, T., Keutzer, K.: Densenet. arXiv 2014
17
TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search
Step 2: From fully connected to fully convolutional
220x220x3
pyramid
stich
GPU
I(x,y)
I(x,y,s)
convolutional
1x1x4096
Patchwork(x,y)
convolutional
F(x,y)
fully connected
18
TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search
Step 3: Global max-pooling
pyramid
stich
GPU
I(x,y)
Patchwork(x,y)
I(x,y,s)
learned class-specific bias
Consistent, explicit position and scale search during training and testing
For free: argmax yields 48% localization error
(0) Baseline:
max-pooled net
(1) epitomic DCNN
13.0%
11.9%
~1% gain
(2) epitomic
DCNN+ search
10.56%
~1.5% gain
Fusion
(1)+(2)
10.22%
19
TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search
Deep Epitomic Nets and Scale/Position
Search for Image Classification
Goal: Invariance in Deep CNNs
(0) Baseline:
max-pooled net (1) Epitomic DCNN
13.0%
11.9%
~1% gain
(2) search
Fusion
(1)+(2)
10.56%
10.22%
~1.5% gain
DCNN: 6 Convolutional + 2 Fully Connected layers
The Deeper the Better: stay tuned!
?
TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search
Epitomic implementation details
 Architecture of our deep epitomic net (11.94%)
 Training took 3 weeks on a singe Titan (60 epochs)
 Standard choices for learning rate, momentum, etc.
20
TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search
Pyramidal search implementation details
 Image warp to square image. Position in mosaic is fixed
 Scales: 400, 300, 220, 160, 120, 90 pixels  Mosaic: 720 pixels
21