Transcript Slides

Object Recognizing
Recognition -- topics
• Features
• Classifiers
• Example ‘winning’ system
Object Classes
Individual Recognition
Object parts
Automatic, or query-driven
Window
Mirror
Window
Door knob
Headlight
Back wheel
Bumper
Headlight
Front wheel
Class
Non-class
Variability of Airplanes Detected
Class
Non-class
Features and Classifiers
Same features with different classifiers
Same classifier with different features
Generic Features:
The same for all classes
Simple (wavelets)
Complex (Geons)
Class-specific Features: Common Building
Blocks
Optimal Class Components?
• Large features are too rare
• Small features are found
everywhere
Find features that carry the highest amount of
information
Entropy
Entropy:
H  - p(xi ) log2 p(xi )
x =
0
1
H
p =
0.5
0.1
0.01
0.5
0.9
0.99
?
0.47
0.08
Mutual information
H (c)   P(c)Log(P(c))
H(C)
F=1
H(C) when F=1
F=0
H(C) when F=0
I(C;F) = H(C) – H(C/F)
Mutual Information I(C,F)
0
0
1
0
1
0
1
1
Class:
0
0
1
1
1
0
0
1
Feature:
I(F,C) = H(C) – H(C|F)
Optimal classification features
• Theoretically: maximizing delivered
information minimizes classification error
• In practice: informative object components
can be identified in training images
Selecting Fragments
forehead
Mutual Info vs.
Threshold
hairline
Mutual Info
mouth
eye
nose
nosebridge
0.00
20.00
Detection threshold
40.00
long_hairline
chin
twoeyes
Horse-class
features
Car-class
features
Pictorial features
Learned from examples
Star model
Detected fragments ‘vote’ for the center location
Find location with maximal vote
In variations, a popular state-of-the art scheme
Bag of words
1.Feature detection and representation
Bag of visual words
Regular
•
A large collection
ofgrid
image
patches
Vogel & Schiele, 2003 –
Fei-Fei & Perona, 2005 –
–
Generate a dictionary using K-means clustering
Recognition by Bag of Words (BoD):
Each class has its words historgram
–
–
–
Limited or no Geometry
Simple and popular, no longer state-of-the art.
HoG Descriptor
Dallal, N & Triggs, B. Histograms of Oriented Gradients for Human
Detection
Shape context
Recognition Class II:
SVM
Example Classifiers
SVM – linear separation in feature space
-1
0
Separating line:
Far line:
Their distance:
Separation:
Margin:
+1
The Margin
w∙x+b=0
w ∙ x + b = +1
w ∙ ∆x = +1
|∆x| = 1/|w|
2/|w|
Max Margin Classification
The examples are vectors xi
The labels yi are +1 for class, -1 for non-class
(Equivalently, usually used
How to solve such constraint optimization?
Solving the SVM problem
•
•
•
•
Duality
Final form
Efficient solution
Extensions
Using Lagrange multipliers:
Using Lagrange multipliers:
Minimize LP =
With αi > 0 the Lagrange multipliers
Minimizing the Lagrangian
Minimize Lp :
Set all derivatives to 0:
Also for the derivative w.r.t. αi
Dual formulation: Maximize the Lagrangian w.r.t. the αi and the above two
conditions.
Solved in ‘dual’ formulation
Maximize w.r.t αi :
With the conditions:
Dual formulation
Mathematically equivalent formulation:
Can maximize the Lagrangian with respect to the αi
After manipulations – concise matrix form:
Summary points
•
•
•
•
Linear separation with the largest margin, f(x) = w∙x + b
Dual formulation
Natural extension to non-separable classes
Extension through kernels, f(x) = ∑αi yi K(xi x) + b
Felzenszwalb
• Felzenszwalb, McAllester, Ramanan CVPR 2008. A
Discriminatively Trained, Multiscale, Deformable Part Model
• Many implementation details, will describe the main points.
Using patches with HoG descriptors
and classification by SVM
Person model
HoG orientations with w > 0
Object model using HoG
A bicycle and its ‘root filter’
The root filter is a patch of HoG descriptor
Image is partitioned into 8x8 pixel cells
In each block we compute a histogram of gradient orientations
Dealing with scale: multi-scale analysis
The filter is searched on a pyramid of HoG descriptors, to deal with
unknown scale
Adding Parts
A part Pi = (Fi, vi, si, ai, bi).
Fi is filter for the i-th part, vi is the center for a box of possible positions for
part i relative to the root position, si the size of this box
ai and bi are two-dimensional vectors specifying coefficients of a quadratic
function measuring a score for each possible placement of the i-th part. That
is, ai and bi are two numbers each, and the penalty for deviation ∆x, ∆y
from the expected location is a1 ∆ x + a2 ∆y + b1 ∆x2 + b2 ∆y2
Bicycle model: root, parts, spatial map
Person model
Match Score
The full score of a potential match is:
∑ Fi ∙ Hi + ∑ ai1 xi + ai2 yi + bi1xi2 + bi2yi2
Fi ∙ Hi is the appearance part
xi, yi, is the deviation of part pi from its expected location in the
model. This is the spatial part.
Using SVM:
The score of a match can be expressed as the dot-product of a
vector β of coefficients, with the image:
Score = β∙ψ
Using the vectors ψ to train an SVM classifier:
β∙ψ > 1 for class examples
β∙ψ < 1 for class examples
β∙ψ > 1 for class examples
β∙ψ < 1 for class examples
However, ψ depends on the placement z, that is, the values of ∆xi, ∆yi
We need to take the best ψ over all placements. In their notation:
Classification then uses β∙f > 1
We need to take the best ψ over all placements. In their
notation:
Classification then uses β∙f > 1
Recognition
search with gradient descent over the placement. This includes also the
levels in the hierarchy. Start with the root filter, find places of high score
for it. For these high-scoring locations, each for the optimal placement of
the parts at a level with twice the resolution as the root-filter, using GD.
Essentially maximize
∑ Fi Hi + ∑ ai1 xi + ai2 y + bi1x2 + bi2y2
Over placements (xi yi)
Final decision
β∙ψ > θ
implies class
• Training -- positive examples with bounding
boxes around the objects, and negative
examples.
• Learn root filter using SVM
• Define fixed number of parts, at locations of
high energy in the root filter HoG
• Use these to start the iterative learning
Hard Negatives
The set M of hard-negatives for a known β and data set D
These are support vector (y ∙ f =1) or misses (y ∙ f < 1)
Optimal SVM training does not need all the examples, hard
examples are sufficient.
For a given β, use the positive examples + C hard examples
Use this data to compute β by standard SVM
Iterate (with a new set of C hard examples)
All images contain at least 1 bike
Future challenges:
• Dealing with very large number of classes
– Imagenet, 15,000 categories, 12 million images
• To consider: human-level performance for at
least one class