Fergus - Frontiers in Computer Vision

Download Report

Transcript Fergus - Frontiers in Computer Vision

The Role of Learning in Vision
3.30pm:
3.40pm:
3.50pm:
4.00pm:
4.10pm:
4.20pm:
4.30pm:
4.40pm:
4.50pm:
Rob Fergus
Andrew Ng
Kai Yu
Yann LeCun
Alan Yuille
Deva Ramanan
Erik Learned-Miller
Erik Sudderth
Spotlights
Overview
Feature / Deep Learning
Compositional Models
Learning Representations
Low-level Representation
Learning on the fly
- Qiang Ji, M-H Yang
4.55pm:
5.30pm:
Discussion
End
An Overview of
Hierarchical Feature Learning
and Relations to Other Models
Rob Fergus
Dept. of Computer Science,
Courant Institute,
New York University
Motivation
• Multitude of hand-designed features currently in use
– SIFT, HOG, LBP, MSER, Color-SIFT………….
• Maybe some way of learning the features?
• Also, just capture low-level edge gradients
Felzenszwalb, Girshick,
McAllester and Ramanan, PAMI
Yan & Huang
(Winner of PASCAL 2010 classification competition)
Beyond Edges?
• Mid-level cues
Continuation
Parallelism
Junctions
Corners
“Tokens” from Vision by D.Marr:
• High-level object parts:
• Difficult to hand-engineer  What about learning them
Deep/Feature Learning Goal
•
–
–
–
Build hierarchy of feature extractors (≥ 1 layers)
All the way from pixels  classifier
Homogenous structure per layer
Unsupervised training
Image/Video
Pixels
Layer 1
Layer 2
Layer 3
Simple
Classifier
• Numerous approaches:
– Restricted Boltzmann Machines
– Sparse coding
LeCun)
– Auto-encoders
– ICA variants
(Hinton, Ng, Bengio,…)
(Yu, Fergus,
(LeCun, Bengio)
(Ng, Cottrell)
Single Layer Architecture
Input:
Image Pixels / Features
Filter
Links to
neuroscience
Normalize
Details in the
boxes matter
(especially in a
hierarchy)
Pool
Output:
Features / Classifier
Example Feature Learning Architectures
Pixels /
Feature
s
Filter with
Dictionary
+ Non-linearity
(patch/tiled/convoluti
onal)
Normalization
between
feature
responses
Spatial/Feature
(Sum or Max)
(Group)
Sparsit
y
Max
/
Softmax
Local Contrast
Normalization
(Subtractive /
Divisive)
Features
SIFT Descriptor
Image
Pixels
Apply
Gabor filters
Spatial pool
(Sum)
Normalize to
unit length
Feature
Vector
Spatial Pyramid Matching
SIFT
Feature
s
Filter with
Visual Words
Lazebnik,
Schmid,
Ponce
[CVPR 2006]
Max
Multi-scale
spatial pool
(Sum)
Classifier
Role of Normalization
• Lots of different mechanisms (max, sparsity, LCN
etc.)
• All induce local competition between features to
explain input
– “Explaining away”
–
–
Just like top-down models
But more local mechanism
Example:
Convolutional Sparse Coding
Zeiler et al. [CVPR’10/ICCV’11],
Kavakouglou et al. [NIPS’10],
Yang et al. [CVPR’10]
|.|1
|.|1
|.|1
Convolution
Filters
|.|1
Role of Pooling
• Spatial pooling
– Invariance to small transformations
– Larger receptive fields
• Pooling across feature groups
– Gives AND/OR type behavior
– Compositional models of Zhu,
Yuille
• Pooling with latent variables (& springs)
– Pictorial structures models
Zeiler, Taylor, Fergus [ICCV 2011]
Felzenszwalb,
Girshick,
McAllester,
Ramanan
[PAMI 2009]
Chen, Zhu, Lin, Yuille, Zhang [NIPS
Object Detection with Discriminatively
Trained Part-Based Models
HOG
Pyramid
Felzenszwalb,
Girshick,
McAllester,
Ramanan
[PAMI 2009]
Apply object
part filters
Pool part
responses
(latent variables
& springs)
Non-max
Suppression
(Spatial)
+
+
Score