**** 1 - SGLab

Download Report

Transcript **** 1 - SGLab

ImageNet Classification with
Deep Convolutional Neural Networks
Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, NIPS 2012
Eunsoo Oh
(오은수)
ILSVRC
● ImageNet Large Scale Visual Recognition Challenge
● An image classification challenge with 1,000
categories (1.2 million images)
Processing…
Deep
Convolutional
Neural Network
(ILSVRC-2012 Winner)
reference : http://www.image-net.org/challenges/LSVRC/2013/slides/ILSVRC2013_12_7_13_clsloc.pdf
2
Why Deep Learning?
● “Shallow” vs. “deep” architectures
Learn a feature hierarchy all the way from pixels to classifier
reference : http://web.engr.illinois.edu/~slazebni/spring14/lec24_cnn.pdf
3
Background
● A neuron
Input
(raw pixel)
x1
Weights
x3
w1
w2
w3
…
wd
x2
Output: f(w·x+b)
f
xd
reference : http://en.wikipedia.org/wiki/Sigmoid_function#mediaviewer/File:Gjl-t(x).svg
4
Background
● Multi-Layer Neural Networks
● Nonlinear classifier
● Learning can be done
by gradient descent
 Back-Propagation
algorithm
5
Input
Layer
Hidden
Layer
Output
Layer
Background
● Convolutional Neural Networks
● Variation of multi-layer neural networks
● Kernel (Convolution Matrix)
reference : http://en.wikipedia.org/wiki/Kernel_(image_processing)
6
Background
● Convolutional Filter
.
.
.
Input
Feature Map
reference : http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/fergus_dl_tutorial_final.pptx
7
Proposed Method
● Deep Convolutional Neural Network
● 5 convolutional and 3 fully connected layers
● 650,000 neurons, 60 million parameters
● Some techniques for boosting up performance
● ReLU nonlinearity
● Training on Multiple GPUs
● Overlapping max pooling
● Data Augmentation
● Dropout
8
Rectified Linear Units (ReLU)
reference : http://www.image-net.org/challenges/LSVRC/2012/supervision.pdf
9
Training on Multiple GPUs
● Spread across two GPUs
● GTX 580 GPU with 3GB memory
● Particularly well-suited to cross-GPU parallelization
● Very efficient implementation of CNN on GPUs
10
Pooling
● Spatial Pooling
● Non-overlapping / overlapping regions
● Sum or max
Max
Sum
reference : http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/fergus_dl_tutorial_final.pptx
11
Data Augmentation
256x256
Training Image
12
Horizontal Flip
224x224
224x224
224x224
224x224
224x224
224x224
Dropout
● Independently set each hidden unit activity to
zero with 0.5 probability
● Used in the two globally-connected hidden layers
at the net's output
reference : http://www.image-net.org/challenges/LSVRC/2012/supervision.pdf
13
Overall Architecture
● Trained with stochastic gradient descent on two NVIDIA GPUs
for about a week (5~6 days)
● 650,000 neurons, 60 million parameters, 630 million connections
● The last layer contains 1,000 neurons which produces a
distribution over the 1,000 class labels.
14
Results
● ILSVRC-2010 test set
ILSVRC-2010 winner
Previous best
published result
Proposed Method
15
reference : http://image-net.org/challenges/LSVRC/2012/ilsvrc2012.pdf
Results
● ILSVRC-2012 results
Runner-up
Top-5 error rate : 26.172%
16
Proposed method
Top-5 error rate : 16.422%
Qualitative Evaluations
17
Qualitative Evaluations
18
ILSVRC-2013 Classification
reference : http://www.image-net.org/challenges/LSVRC/2013/slides/ILSVRC2013_12_7_13_clsloc.pdf
19
ILSVRC-2014 Classification
22 Layers
20
19 Layers
Conclusion
● Large, deep convolutional neural networks for
large scale image classification was proposed
● 5 convolutional layers, 3 fully-connected layers
● 650,000 neurons, 60 million parameters
● Several techniques for boosting up performance
● Several techniques for reducing overfitting
● The proposed method won the ILSVRC-2012
● Achieved a winning top-5 error rate of 15.3%,
compared to 26.2% achieved by the second-best entry
21
Q&A
22
Quiz
● 1. The proposed method used hand-designed
features, thus there is no need to learn features
and feature hierarchies. (True / False)
● 2. Which technique was not used in this paper?
① Dropout
② Rectified Linear Units nonlinearity
③ Training on multiple GPUs
④ Local contrast normalization
23
Appendix
Feature Visualization
● 96 learned low-level(1st layer) filters
24
Appendix
Visualizing CNN
25
reference : M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional neural networks.
arXiv preprint arXiv:1311.2901, 2013.
Appendix
Local Response Normalization
●
: the activity of a neuron computed by applyuing kernel
i at position (x, y)
● The response-normalized activity
is given by
● N : the total # of kernels in the layer
● n : hyper-parameter, n=5
● k : hyper-parameter, k=2
● α : hyper-parameter, α=10^(-4)
● This aids generalization even though ReLU don’t require it.
● This reduces top-5 error rate by 1.2%
26
Appendix
Another Data Augmentation
● Alter the intensities of the RGB channels in training images
● Perform PCA on the set of RGB pixel values
● To each training image, add multiples of the found principal
components
● To each RGB image pixel
add the following quantity
●
●
,
: i-th eigenvector and eigenvalue
: random variable drawn from a Gaussian with mean 0
and standard deviation 0.1
● This reduces top-1 error rate by over 1%
27
Appendix
Details of Learning
● Use stochastic gradient descent with a batch size of 128 examples,
momentum of 0.9, and weigh decay of 0.0005
● The update rule for weight w was
● i : the iteration index
●
: the learning rate, initialized at 0.01 and reduced three times
prior to termination
●
: the average over the i-th batch Di of the derivative of the
objective with respect to w
● Train for 90 cycles through the training set of 1.2 million images
28
29