DenseNet: Replacing HOG with Deep Convnet Pyramids for Object Detection Forrest Iandola, Sergey Karayev, Ross Girshick, Matt Moskewicz, Yangqing Jia, Kurt Keutzer, and Trevor.

Download Report

Transcript DenseNet: Replacing HOG with Deep Convnet Pyramids for Object Detection Forrest Iandola, Sergey Karayev, Ross Girshick, Matt Moskewicz, Yangqing Jia, Kurt Keutzer, and Trevor.

DenseNet:

Replacing HOG with Deep Convnet Pyramids for Object Detection

Forrest Iandola

, Sergey Karayev, Ross Girshick, Matt Moskewicz, Yangqing Jia, Kurt Keutzer, and Trevor Darrell

[email protected]

University of California, Berkeley

1

Overview Object Detection • • • Selective Search + ConvNets Multiscale Pyramid Descriptors DenseNet: ConvNet Pyramids for improved efficiency • DenseNet code is available – give it a try in

your

pipeline Forrest Iandola [email protected]

2

Deep Convolutional Neural Networks 1989: high-quality digit recognition (Bell Labs – LeCun) 2012: best ImageNet Classification (Toronto) 2013: best PASCAL Detection (Berkeley)

2014: efficient detection + replace HOG with ConvNets

Forrest Iandola [email protected]

3

Regions with CNN Features Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik.

Rich feature hierarchies for accurate object detection and semantic segmentation

. ArXiv 2013.

Forrest Iandola [email protected]

4

Regions with CNN Features "Selective Search" region proposals (Uijlings et al, IJCV 2013) Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik.

Rich feature hierarchies for accurate object detection and semantic segmentation

. ArXiv 2013.

Forrest Iandola [email protected]

5

Regions with CNN Features "Selective Search" region proposals (Uijlings et al, IJCV 2013) Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik.

Rich feature hierarchies for accurate object detection and semantic segmentation

. ArXiv 2013.

Forrest Iandola [email protected]

6

Regions with CNN Features "Selective Search" region proposals (Uijlings et al, IJCV 2013) Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik.

Rich feature hierarchies for accurate object detection and semantic segmentation

. ArXiv 2013.

Forrest Iandola [email protected]

7

Regions with CNN Features "Selective Search" region proposals (Uijlings et al, IJCV 2013) Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik.

Rich feature hierarchies for accurate object detection and semantic segmentation

. ArXiv 2013.

Forrest Iandola [email protected]

8

Regions with CNN Features Forrest Iandola [email protected]

Caffe

– efficient ConvNet GPU implementation from Berkeley http://caffe.berkeleyvision.org

9

Regions with CNN Features

Linear Classifier > 50% mAP on PASCAL 07 detection

Forrest Iandola [email protected]

10

Efficiency Issues with R-CNN Forrest Iandola [email protected]

11

Efficiency Issues with R-CNN

2000 windows = 100x the input image size

Forrest Iandola [email protected]

12

Sliding-Window Detection on HOG Pyramids Forrest Iandola [email protected]

13

Sliding-Window Detection on HOG Pyramids

pyra = featpyramid(image)

Forrest Iandola [email protected]

14

Sliding-Window Detection on HOG Pyramids

pyra = featpyramid(image)

Forrest Iandola [email protected]

15

Sliding-Window Detection on HOG Pyramids

pyra = featpyramid(image)

Forrest Iandola [email protected]

16

Sliding-Window Detection on HOG Pyramids

pyra = featpyramid(image)

Forrest Iandola [email protected]

17

Sliding-Window Detection on HOG Pyramids

pyra = featpyramid(image)

Forrest Iandola [email protected]

18

Sliding-Window Detection on HOG Pyramids

pyra = featpyramid(image)

Forrest Iandola [email protected]

19

Sliding-Window Detection on HOG Pyramids Can add parts, if desired Forrest Iandola [email protected]

33% mAP on PASCAL 07 detection

20

Efficiency of HOG Pyramids

Pyramid = 8x the input image size

Typical settings: 5 octaves 10 scales per octave Forrest Iandola [email protected]

21

Sliding-Window Detection on

ConvNet

Pyramids

Pyramid = 8x the input image size

Forrest Iandola [email protected]

22

Sliding-Window Detection on

ConvNet

Pyramids

Pyramid = 8x the input image size

Forrest Iandola [email protected]

23

Sliding-Window Detection on

ConvNet

Pyramids

Pyramid = 8x the input image size

Forrest Iandola [email protected]

24

Sliding-Window Detection on

ConvNet

Pyramids

Pyramid = 8x the input image size

Efficiency of HOG + Accuracy of Deep Learning Easy to use:

pyra = convnet _featpyramid(image)

Forrest Iandola [email protected]

25

Implementing ConvNet Pyramids Forrest Iandola [email protected]

26

Implementing ConvNet Pyramids State-of-the-art ConvNet implementations (e.g. Caffe): • Can handle any input image size • BUT, need batches of same-sized images to saturate GPU 27 Forrest Iandola [email protected]

Implementing ConvNet Pyramids State-of-the-art ConvNet implementations (e.g. Caffe): • Can handle any input image size • BUT, need batches of same-sized images to saturate GPU 28 Forrest Iandola [email protected]

Implementing ConvNet Pyramids State-of-the-art ConvNet implementations (e.g. Caffe): • Can handle any input image size • BUT, need batches of same-sized images to saturate GPU 29 Forrest Iandola [email protected]

Implementing ConvNet Pyramids State-of-the-art ConvNet implementations (e.g. Caffe): • Can handle any input image size • BUT, need batches of same-sized images to saturate GPU 30 Forrest Iandola [email protected]

Implementing ConvNet Pyramids Easy to use:

pyra = convnet _featpyramid(image)

Forrest Iandola [email protected]

31

Computational Performance

Selective Search Pyramids

Forrest Iandola [email protected]

32

Computational Performance

Selective Search Pyramids

Forrest Iandola [email protected]

33

Computational Performance

Selective Search

2000 windows = 100x the input image size

1/10 fps Pyramids

Pyramid = 8x the input image size

1fps

34 Forrest Iandola [email protected]

Future Applications

for each of the 6000 papers citing HOG: pyra = featpyramid(image) #HOG Pyramid

Forrest Iandola [email protected]

35

Future Applications

for each of the 6000 papers citing HOG: pyra = featpyramid(image) #HOG Pyramid pyra = convnet _featpyramid(image)

Exemplar-SVM (Alyosha Efros) RGB-D Recognition (Saurabh Gupta) Tracking Algorithms (TTI-Japan) 36 Forrest Iandola [email protected]

Future Applications

for each of the 6000 papers citing HOG: pyra = featpyramid(image) #HOG Pyramid pyra = convnet _featpyramid(image)

Exemplar-SVM (Alyosha Efros) RGB-D Recognition (Saurabh Gupta) Tracking Algorithms (TTI-Japan) 37 Forrest Iandola [email protected]