Standard Brain Model for Vision The talk is given by Tomer Livne and Maria Zeldin Overview Introduction to biological basis of vision  Computer.

Download Report

Transcript Standard Brain Model for Vision The talk is given by Tomer Livne and Maria Zeldin Overview Introduction to biological basis of vision  Computer.

Standard Brain Model for Vision
The talk is given by Tomer Livne and Maria Zeldin
Overview
Introduction to biological basis of vision
 Computer analogy to biology
 Implementation
 Discussion

Overview of biological vision
Hierarchical structure
 From simple features to complex ones
(Hubel & Weisel)
 Increased invariance

The basic idea
Hubel and Weisel (1962, 1965) following
experimental results proposed a model in which
neighbouring simple cells are combined into complex
cell.
The result is complex cells with phase independence.
Max vs. sum pooling
Electrophysiological results indicate that
pooling may not be linear, the response of a
complex cell can be best described by the
activity of its maximal afferent.
From simple to complex cells:
A straightforward extension of this is to start
with simple cells and end up with “higher-orderhyper-complex cells”.
This is the basis for all the hierarchy idea!
The hierarchy based on the brain model:
Hierarchical models of object recognition in cortex.
Reisenhuber and Poggio. Nature, november 1999.
Clearer explanation of the hierarchy
orientations
-
1
0.7
|
0
0.7
\
0
0
/
0
0
|
\
/
Max
pooling
Simple
cells
1
0.7
0
0
Complex
cells
Computer vision
Usual approach – image patching
 Biological motivated approach hierarchy

Representing objects by invariant complex features
The IT area in the brain is dealing with object recognition. In this area
there are cells that respond best to a specific object
Hierarchical models of object recognition in cortex.
Reisenhuber and Poggio. Nature, november 1999.
Recognize the same faces
In the previous task our brains did a very good job
in recognizing same face even thou the scale,
impression, illumination were different.
And did not classified different faces as same
even thou they have similar physical conditions
Motivation
The presented approach is trying to
implement into a computer system the
hierarchical idea that was presented. In
order to achieve similar robustness.
The models that we present deal with more
general problem which is object classification.
We can say that the problem of recognition of
different transformations of an object is
similar to the problem of classification.
Can computers reach similar
properties to biology?
Reisenhuber & Poggio (1999) demonstrate that it
can.
Comparing electrophysiological results from cells in
the monkey brain with implemented hierarchical
model.
Training stage:
The monkey was trained to recognize restricted
set of views of unfamiliar target stimuli resembling
paperclips. They check which IT cell responds best
to all views. After finding the cell that responded
the most was picked for the study.
Test stage:
The best reaction of the cell was to the trained
data.
The second best was to new transformations of the
trained object.
And very little response to new objects
(distractors)
Learning the results:
Hierarchical models of object recognition in cortex. Reisenhuber and Poggio. Nature America Inc, november 1999.
The hierarchy based on the brain model:
We saw this part
Now lets compare it to
the model
Hierarchical models of object recognition in cortex.
Reisenhuber and Poggio. Nature, november 1999.
Hierarchical models of object recognition in cortex. Reisenhuber and Poggio. Nature America Inc, november 1999.
Results of scrambling
Hierarchical models of object recognition in cortex. Reisenhuber and Poggio. Nature America Inc, november 1999.
Summary
 Goal- brain based object classification
 Biology view of the problem
 implementation of hierarchical structure
 comparing true results to model results
What’s next?
 Models based on the hierarchical idea we already
discussed
 Riesenhuber & Poggio (1999)
 Serre & Riesenhuber (2004)
 Serre, Wolf, Bileschi, Riesenhuber, & Poggio (2007)
 Mutch & Lowe (2006)
 Modifications of the basic ideas
 limitations and shortcomings
Method #1
Riesenhuber & Poggio , ”Hierarchical models of
objects recognition in cortex”, Nature 1999
Later it was modified by Serre, Wolf, Bileschi,
Riesenhuber, & Poggio, “Robust object recognition
with cortex-like mechanisms”, 2007.

S1 – Gabor filters
 16 different sizes (7X7, 9X9,…,37X37)
 4 orientations
 A total of 64 S1 type detectors
Robust object recognition with cortex-like mechanisms. Serre, Wolf, Bileschi, Reisenhuber and Poggio. IEEE, march 2007.
A serial implementation of filtering

C1 – MAX pooling
 8 different sizes (8X8, 10X10,…,22X22)
 4 orientations
 A total of 32 C1 type detectors
 Used to define features during the learning stage
Robust object recognition with cortex-like mechanisms. Serre, Wolf, Bileschi, Reisenhuber and Poggio. IEEE, march 2007.

S2 – learned features
 Holds N learned features
 4 patch sizes (4X4, 8X8, 12X12, 16X16) indicating
how many C1 neighboring cells are considered (this is
done separately for each C1 scale)
 For each image patch X, a Gaussian radial basis
function that depends on an Euclidean distance, is
calculated from each of the stored features Pi (i=1:N)
r=exp(-β ||X – Pi||²)
Robust object recognition with cortex-like mechanisms. Serre, Wolf, Bileschi, Reisenhuber and Poggio. IEEE, march 2007.


C2 – max pooling
 For each stored feature the best match (closest)
Classifier
 Classification is based on both C1 and C2
Robust object recognition with cortex-like mechanisms. Serre, Wolf, Bileschi, Reisenhuber and Poggio. IEEE, march 2007.
Summery
4 Layers of processing
 2 types of operations (Max, Sum)
 Output – N dimensional vector

Model’s performance
Testing the model
 Defining features
 Flexibility of the design

Robustness to background






Ignoring presented unrelated data
Training and test images contains both targets and distractors
Performed best with C2 type detectors
Simple detection – present/absent (no location information)
Approaches maximal performance with 1000-5000 features
Performance improve with increased training (more examples)
Robust object recognition with cortex-like mechanisms. Serre, Wolf, Bileschi, Reisenhuber and Poggio. IEEE, march 2007.
Object specific features or a
universal dictionary


A Universal dictionary based system is good for small
training sets (10,000 features)
An object specific based system is better when using
large training sets (improves with practice –
increased number of features [200 an image])
Robust object recognition with cortex-like mechanisms. Serre, Wolf, Bileschi, Reisenhuber and Poggio. IEEE, march 2007.
Object recognition without a
clutter

Scene understanding using a windowing strategy

Large inter-category variability

Training sets of only either positive (target) or negative (no
target)

2 classification systems: C1 and C2 based

C1 based system performs better (able to efficiently
represent objects’ boundaries)
Robust object recognition with cortex-like mechanisms. Serre, Wolf, Bileschi, Reisenhuber and Poggio. IEEE, march 2007.
Texture based objects




Again C1 and C2 based classifiers
C2 features are now evaluated only locally, not
over all image locations
C2 based classification is better (the features
are more invariant and complex)
Evaluated by correct labeling of pixels in the
image
Robust object recognition with cortex-like mechanisms. Serre, Wolf, Bileschi, Reisenhuber and Poggio. IEEE, march 2007.
A unified system – looking at
multiple processing levels



The hierarchical nature of the described
system enables the use of multiple levels of
feature
Recognizing both shape and texture based
objects in the same image
Two processing pathways
Robust object recognition with cortex-like mechanisms. Serre, Wolf, Bileschi, Reisenhuber and Poggio. IEEE, march 2007.
Scene understanding task


Complex scene understanding requires more than just
detection of objects, location information of the
detected objects is also required
Shape-based objects



C1 based classification, using a windowing approach, for both
identification and localization
Local neighborhood suppression by the maximal detected
result
Texture-based objects


C2 based classification
texture boundaries posses a problem (solved by additionally
segmenting the image and averaging the responses within
each segment)
Model summery
Hierarchical design
 Efficiency
 Multiple processing pathways
 Universality Vs. specificity
 Limitations

Method #2
Mutch & Lowe Multiclass Object Recognition with
Sparse, Localized Features. 2006.


Image scaling – 10
scales
S1 – Gabor filters
Single scale (11X11)
 4 orientations
 applied to every
location
 Evaluated at all
possible locations

Multiclass Object Recognition with Sparse, Localized Features. By Mutch & Lowe. IEEE 2006

C1 – local invariance
 Max
pooling using a
10X10(size)X2(scale)
filter
 Each orientation is
tested separately
 used to define
features during the
learning stage
 Larger skips
Multiclass Object Recognition with Sparse, Localized Features. By Mutch & Lowe. IEEE 2006

S2 – intermediate
features
4
filter sizes (4X4,
8X8, 12X12, 16X16)
defined by the stored
features
 A Universal feature set
 Response to each filter
(feature) is calculated
as
R(X,P) = exp[-(||X – P||²)/2σ²α]
Multiclass Object Recognition with Sparse, Localized Features. By Mutch & Lowe. IEEE 2006

C2 – Global
invariance
A
vector of size d of
the maximal response
(anywhere in the
image) to each
feature.

SVM classifier
 Majority-voting
based decision
Multiclass Object Recognition with Sparse, Localized Features. By Mutch & Lowe. IEEE 2006
The overall look on all the stages:
Multiclass Object Recognition with Sparse, Localized Features. By Mutch & Lowe. IEEE 2006
Summary
Similar assumptions
 Differences in construction

Model performance and improvements
Testing classification
 More biologically motivated
improvements

Tests classification






101 categories (from Caltech101)
Trained sets of 15 (or 30) images of each
category
Learn random features (in both size and
location), an equal number for each category
Construct C2 vectors
Train the SVM (on the improved model also
perform feature selection)
Test stage
Results of the test:
Multiclass Object Recognition with Sparse, Localized Features. By Mutch & Lowe. IEEE 2006
To get better results, some improvements were
added to the model:





S2 – encodes only the dominant orientation at each
location.
Increased number of tested orientations (from 4 to
12)
Lateral inhibition – suppressing below threshold filter
outputs in S1 & C1 layers
Limited S2 invariance – in order to allow for
preserving a certain amount of geometrical relations,
S2 feature are limited to certain places in the image
(relative to the center of the object)
Select only good features for classification
Running the previous test on the improved model
lead to the following results:
Multiclass Object Recognition with Sparse, Localized Features. By Mutch & Lowe. IEEE 2006
Refining the model
Multiclass Object Recognition with Sparse, Localized Features. By Mutch & Lowe. IEEE 2006
Tests
detection/localization



Sliding window
Merging overlapping
detections
Single/multiple scale
test images
Multiclass Object Recognition with Sparse, Localized Features. By Mutch & Lowe. IEEE 2006
Summery
Efficiency
 Improvements
 Limitations

THE END
Thank you for listening!

Simple cell is an early visual neuron
meaning it responds best to a line of a
specific size, orientation, and phase.
This cell responds best
to 180 deg. phase.
This cell responds best
to 90 deg. phase.
back
Image
Simple cell
(phase sensitive)
Complex cell
(phase insensitive)
back