Transcript C1 layer

Computer Vision, Part 2
Object recognition and scene
“understanding”
• What makes object recognition a hard task for
computers?
HMAX
Riesenhuber, M. & Poggio, T. (1999),
“Hierarchical Models of Object Recognition in Cortex”
Serre, T., Wolf, L., Bileschi, S., Risenhuber, M., and Poggio, T. (2006),
“Robust Object Recognition with Cortex-Like Mechanisms”
• HMAX: A hierarchical neural-network model of object
recognition.
• Meant to model human vision at level of “immediate
recognition” capabilities of ventral visual pathway,
independent of attention or other top-down processes.
• Also called “Standard Model” (because it incorporates the
“standard model” of visual cortex)
• Inspired by earlier “Neocognitron” model of Fukushima (1980)
General ideas behind model
• “Immediate” visual processing is feedforward and hierachical: low levels
detect simple features, which are combined hierarchically into increasingly
complex features to be detected
• Layers of hierarchy alternate between “sensitivity” (to detecting features)
and “invariance” (to position, scale, orientation)
• Size of receptive fields increases along the hierarchy
• Degree of invariance increases along the hierarchy
The HMAX model for object recognition
(Riesenhuber, Poggio, Serre, et al.)
The HMAX model for object recognition
(Riesenhuber, Poggio, Serre, et al.)
Image (gray-scale)
The HMAX model for object recognition
(Riesenhuber, Poggio, Serre, et al.)
S1 layer
Edge detectors
Image (gray-scale)
The HMAX model for object recognition
(Riesenhuber, Poggio, Serre, et al.)
C1 layer
Max over local S1 units
S1 layer
Edge detectors
Image (gray-scale)
Layers alternate
between
“specificity”
and
“invariance”
over position,
scale, orientation
The HMAX model for object recognition
(Riesenhuber, Poggio, Serre, et al.)
S2 layer
Prototypes
(small image patches)
C1 layer
Max over local S1 units
S1 layer
Edge detectors
Image (gray-scale)
Layers alternate
between
“specificity”
and
“invariance”
over position,
scale, orientation
The HMAX model for object recognition
(Riesenhuber, Poggio, Serre, et al.)
C2 layer
Max activation over each
prototype
S2 layer
Prototypes
(small image patches)
C1 layer
Max over local S1 units
S1 layer
Edge detectors
Image (gray-scale)
Layers alternate
between
“specificity”
and
“invariance”
over position,
scale, orientation
The HMAX model for object recognition
(Riesenhuber, Poggio, Serre, et al.)
Classification layer
Object or image
classification
C2 layer
Max activation over each
prototype
S2 layer
Prototypes
(small image patches)
C1 layer
Max over local S1 units
S1 layer
Edge detectors
Image (gray-scale)
Layers alternate
between
“specificity”
and
“invariance”
over position,
scale, orientation
The HMAX model for object recognition
(Riesenhuber, Poggio, Serre, et al.)
Classification layer
Object or image
classification
C2 layer
Max activation over each
prototype
S2 layer
Prototypes
(small image patches)
C1 layer
Max over local S1 units
S1 layer
Edge detectors
Image (gray-scale)
Job of HMAX is to
produce a higher-level
representation of an image that
will be useful for classification.
Layers alternate
between
“specificity”
and
“invariance”
over position,
scale, orientation
S1 layer
Edge detectors
4 orientations, 16 scales
Image (gray-scale)
One S1 receptive field:
Etc.: 16 scales
C1 layer
Max activation over local S1
units (local position, scale)
4 orientations, 8 scales
S1 layer
Edge detectors
4 orientations, 16 scales
Image (gray-scale)
MAX
MAX
S2 layer
Calculate similarity to
prototype
(radial basis function)
4 orientations, 8 scales
C1 layer
Max activation over local S1
units (local position, scale)
4 orientations, 8 scales
S2 unit: Calculate similarity to prototype for each “pooled” position
in C1 layer.
…
Prototypes
(~1000, chosen from image collection,
translated to C1 features)
S2 layer
Calculate similarity to
prototype
(radial basis function)
4 orientations, 8 scales
C1 layer
Max activation over local S1
units (local position, scale)
4 orientations, 8 scales
S2 unit: Calculate similarity to prototype for each “pooled” position
in C1 layer.
…
Prototypes
(~1000, chosen from image collection,
translated to C1 features)
S2 layer
Calculate similarity to
prototype
(radial basis function)
4 orientations, 8 scales
…
Similarity: Radial basis function:

S2 i  exp  X  Pi
C1 layer
Max activation over local S1
units (local position, scale)
4 orientations, 8 scales

S2 unit: Calculate similarity to prototype for each “pooled” position
in C1 layer.
2

C2 layer
Max activation over
position, orientation,
scale
MAX
(1 value)
S21
S2 layer
Calculate similarity to
prototype
(radial basis function)
4 orientations, 8 scales
…
MAX
(1 value)
S22
…
…
classification
(e.g., dog / not dog)
C2 layer
Max over position,
orientation, scale
Support Vector Machine
.11
.78
…
.32
Streetscenes “scene understanding” system
(Bileschi, 2006)
Use HMAX + SVM to identify object classes:
Car, Pedestrian, Bicycle, Building, Tree
How Streetscenes Works
(Bileschi, 2006)
1. Densely tile the image with
windows of different sizes.
…
2. C1 and C2 features are
computed in each window.
3. The features in each
window are given as input
to each of five trained
support vector machines
4. If any return a
classification with score above
a learned threshold, that object is
said to be “detected” .
Object detection (here, “car”) with HMAX model
(Bileschi, 2006)
Sample of results from HMAX model
(Serre et al., 2006)