DenseNet: Replacing HOG with Deep Convnet Pyramids for Object Detection Forrest Iandola, Sergey Karayev, Ross Girshick, Matt Moskewicz, Yangqing Jia, Kurt Keutzer, and Trevor.
Download ReportTranscript DenseNet: Replacing HOG with Deep Convnet Pyramids for Object Detection Forrest Iandola, Sergey Karayev, Ross Girshick, Matt Moskewicz, Yangqing Jia, Kurt Keutzer, and Trevor.
DenseNet:
Replacing HOG with Deep Convnet Pyramids for Object Detection
Forrest Iandola
, Sergey Karayev, Ross Girshick, Matt Moskewicz, Yangqing Jia, Kurt Keutzer, and Trevor Darrell
University of California, Berkeley
1
Overview Object Detection • • • Selective Search + ConvNets Multiscale Pyramid Descriptors DenseNet: ConvNet Pyramids for improved efficiency • DenseNet code is available – give it a try in
your
pipeline Forrest Iandola [email protected]
2
Deep Convolutional Neural Networks 1989: high-quality digit recognition (Bell Labs – LeCun) 2012: best ImageNet Classification (Toronto) 2013: best PASCAL Detection (Berkeley)
2014: efficient detection + replace HOG with ConvNets
Forrest Iandola [email protected]
3
Regions with CNN Features Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik.
Rich feature hierarchies for accurate object detection and semantic segmentation
. ArXiv 2013.
Forrest Iandola [email protected]
4
Regions with CNN Features "Selective Search" region proposals (Uijlings et al, IJCV 2013) Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik.
Rich feature hierarchies for accurate object detection and semantic segmentation
. ArXiv 2013.
Forrest Iandola [email protected]
5
Regions with CNN Features "Selective Search" region proposals (Uijlings et al, IJCV 2013) Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik.
Rich feature hierarchies for accurate object detection and semantic segmentation
. ArXiv 2013.
Forrest Iandola [email protected]
6
Regions with CNN Features "Selective Search" region proposals (Uijlings et al, IJCV 2013) Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik.
Rich feature hierarchies for accurate object detection and semantic segmentation
. ArXiv 2013.
Forrest Iandola [email protected]
7
Regions with CNN Features "Selective Search" region proposals (Uijlings et al, IJCV 2013) Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik.
Rich feature hierarchies for accurate object detection and semantic segmentation
. ArXiv 2013.
Forrest Iandola [email protected]
8
Regions with CNN Features Forrest Iandola [email protected]
Caffe
– efficient ConvNet GPU implementation from Berkeley http://caffe.berkeleyvision.org
9
Regions with CNN Features
Linear Classifier > 50% mAP on PASCAL 07 detection
Forrest Iandola [email protected]
10
Efficiency Issues with R-CNN Forrest Iandola [email protected]
11
Efficiency Issues with R-CNN
2000 windows = 100x the input image size
Forrest Iandola [email protected]
12
Sliding-Window Detection on HOG Pyramids Forrest Iandola [email protected]
13
Sliding-Window Detection on HOG Pyramids
pyra = featpyramid(image)
Forrest Iandola [email protected]
14
Sliding-Window Detection on HOG Pyramids
pyra = featpyramid(image)
Forrest Iandola [email protected]
15
Sliding-Window Detection on HOG Pyramids
pyra = featpyramid(image)
Forrest Iandola [email protected]
16
Sliding-Window Detection on HOG Pyramids
pyra = featpyramid(image)
Forrest Iandola [email protected]
17
Sliding-Window Detection on HOG Pyramids
pyra = featpyramid(image)
Forrest Iandola [email protected]
18
Sliding-Window Detection on HOG Pyramids
pyra = featpyramid(image)
Forrest Iandola [email protected]
19
Sliding-Window Detection on HOG Pyramids Can add parts, if desired Forrest Iandola [email protected]
33% mAP on PASCAL 07 detection
20
Efficiency of HOG Pyramids
Pyramid = 8x the input image size
Typical settings: 5 octaves 10 scales per octave Forrest Iandola [email protected]
21
Sliding-Window Detection on
ConvNet
Pyramids
Pyramid = 8x the input image size
Forrest Iandola [email protected]
22
Sliding-Window Detection on
ConvNet
Pyramids
Pyramid = 8x the input image size
Forrest Iandola [email protected]
23
Sliding-Window Detection on
ConvNet
Pyramids
Pyramid = 8x the input image size
Forrest Iandola [email protected]
24
Sliding-Window Detection on
ConvNet
Pyramids
Pyramid = 8x the input image size
Efficiency of HOG + Accuracy of Deep Learning Easy to use:
pyra = convnet _featpyramid(image)
Forrest Iandola [email protected]
25
Implementing ConvNet Pyramids Forrest Iandola [email protected]
26
Implementing ConvNet Pyramids State-of-the-art ConvNet implementations (e.g. Caffe): • Can handle any input image size • BUT, need batches of same-sized images to saturate GPU 27 Forrest Iandola [email protected]
Implementing ConvNet Pyramids State-of-the-art ConvNet implementations (e.g. Caffe): • Can handle any input image size • BUT, need batches of same-sized images to saturate GPU 28 Forrest Iandola [email protected]
Implementing ConvNet Pyramids State-of-the-art ConvNet implementations (e.g. Caffe): • Can handle any input image size • BUT, need batches of same-sized images to saturate GPU 29 Forrest Iandola [email protected]
Implementing ConvNet Pyramids State-of-the-art ConvNet implementations (e.g. Caffe): • Can handle any input image size • BUT, need batches of same-sized images to saturate GPU 30 Forrest Iandola [email protected]
Implementing ConvNet Pyramids Easy to use:
pyra = convnet _featpyramid(image)
Forrest Iandola [email protected]
31
Computational Performance
Selective Search Pyramids
Forrest Iandola [email protected]
32
Computational Performance
Selective Search Pyramids
Forrest Iandola [email protected]
33
Computational Performance
Selective Search
2000 windows = 100x the input image size
1/10 fps Pyramids
Pyramid = 8x the input image size
1fps
34 Forrest Iandola [email protected]
Future Applications
for each of the 6000 papers citing HOG: pyra = featpyramid(image) #HOG Pyramid
Forrest Iandola [email protected]
35
Future Applications
for each of the 6000 papers citing HOG: pyra = featpyramid(image) #HOG Pyramid pyra = convnet _featpyramid(image)
Exemplar-SVM (Alyosha Efros) RGB-D Recognition (Saurabh Gupta) Tracking Algorithms (TTI-Japan) 36 Forrest Iandola [email protected]
Future Applications
for each of the 6000 papers citing HOG: pyra = featpyramid(image) #HOG Pyramid pyra = convnet _featpyramid(image)
Exemplar-SVM (Alyosha Efros) RGB-D Recognition (Saurabh Gupta) Tracking Algorithms (TTI-Japan) 37 Forrest Iandola [email protected]