Transcript Document

Computer Vision

Template matching and object recognition

Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

Computer Vision

Tentative class schedule

Jan 16/18 Jan 23/25 Jan 30/Feb1 Feb 6/8 Feb 13/15 Feb 20/22 Feb27/Mar1 Mar 6/8 Mar 13/15 Mar 20/22 Mar 27/29 Apr 3/5 Apr 10/12 Apr 17/19

Apr 24/26

Cameras Sources & Shadows Linear filters & edges Multi-View Geometry Optical flow Affine SfM Camera Calibration

Springbreak

Fitting Silhouettes and Photoconsistency

Project Update

Object Recognition Range data

Final project

Introduction Radiometry Color Texture Stereo

Project proposals

Projective SfM Segmentation

Springbreak

Prob. Segmentation Linear tracking Non-linear Tracking Object Recognition Range data

Final project

2

Computer Vision

Recognition by finding patterns

• We have seen very simple template matching (under filters) • Some objects behave like quite simple templates – Frontal faces • Strategy: – Find image windows – Correct lighting – Pass them to a statistical test (a classifier) that accepts faces and rejects non-faces 3

Computer Vision

Basic ideas in classifiers

• Loss – some errors may be more expensive than others • e.g. a fatal disease that is easily cured by a cheap medicine with no side-effects -> false positives in diagnosis are better than false negatives – We discuss two class classification: L(1->2) is the loss caused by calling 1 a 2 • Total risk of using classifier s 4

Computer Vision

Basic ideas in classifiers

• Generally, we should classify as 1 if the expected loss of classifying as 1 is better than for 2 • gives 1 if 2 if • Crucial notion: Decision boundary – points where the loss is the same for either case 5

Computer Vision Some loss may be inevitable: the minimum risk (shaded area) is called the Bayes risk 6

Computer Vision Finding a decision boundary is not the same as modelling a conditional density.

7

 Computer Vision

Example: known distributions

     2 1    

p

2   1 2 exp    1 2 

x

 

k T

   1 

x

 

k

   • Assume normal class densities, p-dimensional measurements with common (known) covariance and different (known) means • Class priors are 

k

• Can ignore a common factor in posteriors - important; posteriors are then: |

x

      2 1    

p

2   1 2 exp    1 2 

x

 

k

T

  1 

x

 

k

   8

Computer Vision • Classifier boils down to: choose class that minimizes: Mahalanobis distance    2

k

 2 log 

k

where    

k

  

x

 

k T

   1 

x

 

k

   1 2 because covariance is common, this simplifies to sign of a linear expression (i.e. Voronoi diagram in 2D for  =I and equal priors) 9

Computer Vision

Plug-in classifiers

• Assume that distributions have some parametric form - now estimate the parameters from the data.

• Common: – assume a normal distribution with shared covariance, different means; use usual estimates – ditto, but different covariances; ditto • Issue: parameter estimates that are “good” may not give optimal classifiers. 10

Computer Vision

Histogram based classifiers

• Use a histogram to represent the class conditional densities – (i.e. p(x|1), p(x|2), etc) • Advantage: estimates become quite good with enough data!

• Disadvantage: Histogram becomes big with high dimension – but maybe we can assume feature independence?

11

Computer Vision

Finding skin

• Skin has a very small range of (intensity independent) colours, and little texture – Compute an intensity-independent colour measure, check if colour is in this range, check if there is little texture (median filter) – See this as a classifier - we can set up the tests by hand, or learn them.

– get class conditional densities (histograms), priors from data (counting) • Classifier is 12

Computer Vision 13 Figure from “Statistical color models with application to skin detection,” M.J. Jones and J. Rehg, Proc. Computer Vision and Pattern Recognition, 1999 copyright 1999, IEEE

Computer Vision Receiver Operating Curve 14 Figure from “Statistical color models with application to skin detection,” M.J. Jones and J. Rehg, Proc. Computer Vision and Pattern Recognition, 1999 copyright 1999, IEEE

Computer Vision

Finding faces

• Faces “look like” templates (at least when they’re frontal).

• General strategy: – search image windows at a range of scales – Correct for illumination – Present corrected window to classifier • Issues – How corrected?

– What features?

– What classifier?

– what about lateral views?

15

Computer Vision

Naive Bayes

• (Important: naive not necessarily pejorative) • Find faces by vector quantizing image patches, then computing a histogram of patch types within a face • Histogram doesn’t work when there are too many features – features are the patch types – assume they’re independent and cross fingers – reduction in degrees of freedom – very effective for face finders • why? probably because the examples that would present real problems aren’t frequent.

Many face finders on the face detection home page http://home.t-online.de/home/Robert.Frischholz/face.htm

16

Computer Vision 17 Figure from A Statistical Method for 3D Object Detection Applied to Faces and Cars, H. Schneiderman and T. Kanade, Proc. Computer Vision and Pattern Recognition, 2000, copyright 2000, IEEE

Computer Vision

Face Recognition

• Whose face is this? (perhaps in a mugshot) • Issue: – What differences are important and what not?

– Reduce the dimension of the images, while maintaining the “important” differences.

• One strategy: – Principal components analysis 18

Computer Vision

Template matching

• Simple cross-correlation between images • Best match wins arg max

S i i

 I

i

T I • Computationally expensive, i.e. requires presented image to be correlated with every image in the database !

19

Computer Vision

Eigenspace matching

• Consider PCA I

i

E

p

i

  • Then, arg max

i S i

 I I I

i i i

T I I I   p

i i i

T p p T I Much cheaper to compute!

20

Computer Vision 21

22 Computer Vision

Eigenfaces

plus a linear combination of eigenfaces 

Computer Vision 23

Computer Vision

Appearance manifold approach

for every object - use these images as feature vectors - apply a PCA over all the images - keep the dominant PCs manifold in space of projections (Nayar et al. ‘96) sample the set of viewing conditions - sequence of views for 1 object represent a - what is the nearest manifold for a given view? 24

Computer Vision

Object-pose manifold

• Appearance changes projected on PCs (1D pose changes) • Sufficient characterization for recognition and pose estimation 25

Computer Vision

Real-time system

(Nayar et al. ‘96) 26

Computer Vision

Difficulties with PCA

• Projection may suppress important detail – smallest variance directions may not be unimportant • Method does not take discriminative task into account – typically, we wish to compute features that allow good discrimination – not the same as largest variance 27

Computer Vision 28

Computer Vision

Linear Discriminant Analysis

• We wish to choose linear functions of the features that allow good discrimination.

– Assume class-conditional covariances are the same – Want linear feature that maximises the spread of class means for a fixed within class variance 29

Computer Vision 30

Computer Vision 31

Computer Vision 32

Computer Vision

Neural networks

• Linear decision boundaries are useful – but often not very powerful – we seek an easy way to get more complex boundaries • Compose linear decision boundaries – i.e. have several linear classifiers, and apply a classifier to their output – a nuisance, because sign(ax+by+cz) etc. isn’t differentiable.

– use a smooth “squashing function” in place of sign.

33

Computer Vision 34

Computer Vision 35

 Computer Vision

Training

• Choose parameters to minimize error on training set      1 2   

e

   

o e

 • Stochastic gradient descent, computing gradient using trick (backpropagation, aka the chain rule) • Stop when error is low, and hasn’t changed much 36

Computer Vision 37 The vertical face-finding part of Rowley, Baluja and Kanade’s system Figure from “Rotation invariant neural-network based face detection,” H.A. Rowley, S. Baluja and T. Kanade, Proc. Computer Vision and Pattern Recognition, 1998, copyright 1998, IEEE

Computer Vision Histogram equalisation gives an approximate fix for illumination induced variability 38

Computer Vision 39 Architecture of the complete system: they use another neural net to estimate orientation of the face, then rectify it. They search over scales to find bigger/smaller faces.

Figure from “Rotation invariant neural-network based face detection,” H.A. Rowley, S. Baluja and T. Kanade, Proc. Computer Vision and Pattern Recognition, 1998, copyright 1998, IEEE

Computer Vision 40 Figure from “Rotation invariant neural-network based face detection,” H.A. Rowley, S. Baluja and T. Kanade, Proc. Computer Vision and Pattern Recognition, 1998, copyright 1998, IEEE

Computer Vision

Convolutional neural networks

• Template matching using NN classifiers seems to work • Natural features are filter outputs – probably, spots and bars, as in texture – but why not learn the filter kernels, too?

41

Computer Vision 42 A convolutional neural network, LeNet; the layers filter, subsample, filter, subsample, and finally classify based on outputs of this process.

Figure from “Gradient-Based Learning Applied to Document Recognition”, Y. Lecun et al Proc. IEEE, 1998 copyright 1998, IEEE

Computer Vision 43 LeNet is used to classify handwritten digits. Notice that the test error rate is not the same as the training error rate, because the test set consists of items not in the training set. Not all classification schemes necessarily have small test error when they have small training error.

Figure from “Gradient-Based Learning Applied to Document Recognition”, Y. Lecun et al Proc. IEEE, 1998 copyright 1998, IEEE

Computer Vision

Support Vector Machines

• Neural nets try to build a model of the posterior, p(k|x) • Instead, try to obtain the decision boundary directly – potentially easier, because we need to encode only the geometry of the boundary, not any irrelevant wiggles in the posterior.

– Not all points affect the decision boundary 44

Computer Vision

Support Vector Machines

• Set

S

of points two classes

y

i  x i  R n , each x i {-1,1} belongs to one of • The goals is to find a hyperplane that divides S in these two classes

S

is separable if  w  R n ,

b

 R

y i

 w .

x

i

b

  1

d i

Separating hyperplanes w .

x 

b

 0

d i

 w .

x

i

b w w

 w Closest point

y i i i i

45

Computer Vision

Support Vector Machines

1 • Optimal separating hyperplane maximizes Problem 1: Minimize Subject to 1 2

y i

w

w .

.

w x

i

b

 1 ,

i

 1 , 2 ,...,

N

support vectors 2

w

Optimal separating hyperplane (OSH) 46

Computer Vision

Solve using Lagrange multipliers

• Lagrangian

L

w,

b

, α

 1 2 w .

w 

i N

  1 

i

y i

w .

x

i

b

 1

– at solution 

L

b

L

 w 

i N

  1

y i

i

  0 w 

i N

  1 

i y i

x

i

 0 – therefore

L

 

i N

   1

i

 1 2

i

,

N

  1

j

i

j y i y j

x

i

.

x

j

  0 47

Computer Vision

Dual problem

Problem 2: Minimize Subject to where  1 2  τ

D

 

i N

  1 

i N

 

i

 1 

y i

 0

i D ij

 

y i y j

0 x

i

.

x

j

w 

i N

  1 

i y i

x

i

( 

i

>0 only for support vectors) K ühn-Tucker condition: 

i

y i

 w .

x

i

b

  1   0

b

y j

 w .

x

j

(for x j a support vector) 48

Computer Vision

Linearly non-separable cases

• Find trade-off between maximum separation and misclassifications Problem 3: Minimize Subject to 1 2

y i

w .

w  w 0 .

x

i

 

C

 

i b

 1  

i

,

i

 1 , 2 ,...,

N y i

 w .

x

i

b

  1  

i

1  

i w

49

50 Computer Vision

Dual problem for non separable cases

Problem 4: Minimize Subject to where  1 2  τ

D

 

i N

  1 

i

0

i N

  1 

D ij y i

i

 0 

i

y i

y C j

x

i

.

x

j

K ühn-Tucker condition: Support vectors: 

i

i

i

y i

 w  

C C

      

i

 0

i i

    0  1 0

i

.

x

i

b

C

   1  

i

  

i i

    0 0 1 margin vectors misclassified too close OSH  errors

Computer Vision

Decision function

• Once w and

b

have been computed the classification decision for input x is given by

f

 sign  w .

x 

b

 • Note that the globally optimal solution can always be obtained (convex problem) 51

Computer Vision

Non-linear SVMs

• Non-linear separation surfaces can be obtained by non-linearly mapping the data to a high dimensional space and then applying the linear SVM technique • Note that data only appears through vector product • Need for vector product in high-dimension can be avoided by using Mercer kernels:

K

x

i

, x

i

  x

i

 x

i

e.g.

K

   

p K K K

     

 y  x 2 

 

 tanh   (Polynomial kernel) x  x   

x

1 2 y

y

1 2 2

x

2 1 2 ,

x

1  

x

2

x

1 , .

y   

x

2

x

2 

y

1

y

2 

x

2 2

y

2 2 (Sigmo ïdal function) 52

  Computer Vision    

x

2 ,

xy

,

y

2 ,

x

,

y

  

u

0 ,

u

1 ,

u

2 ,

u

3 ,

u

4  Space in which decision boundary is linear - a conic in the original space has the form

au

0 

bu

1 

cu

2 

du

3 

eu

4 

f

 0 53

Computer Vision -

SVMs for 3D object recognition

(Pontil & Verri PAMI’98) Consider images as vectors Compute pairwise OSH using linear SVM Support vectors are representative views of the considered object (relative to other) Tournament like classification Competing classes are grouped in pairs Not selected classes are discarded Until only one class is left Complexity linear in number of classes No pose estimation 54

Computer Vision

Vision applications

• Reliable, simple classifier, – use it wherever you need a classifier • Commonly used for face finding • Pedestrian finding – many pedestrians look like lollipops (hands at sides, torso wider than legs) most of the time – classify image regions, searching over scales – But what are the features?

– Compute wavelet coefficients for pedestrian windows, average over pedestrians. If the average is different from zero, probably strongly associated with pedestrian 55

Computer Vision 56 Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. Int. Conf. Computer Vision, 1998, copyright 1998, IEEE

Computer Vision 57 Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. Int. Conf. Computer Vision, 1998, copyright 1998, IEEE

Computer Vision 58 Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. Int. Conf. Computer Vision, 1998, copyright 1998, IEEE

Computer Vision Latest results on Pedestrian Detection: Viola, Jones and Snow’s paper (ICCV’03: Marr prize) • Combine static and dynamic features 59 cascade for efficiency (4 frames/s) 5 best out of 55k (AdaBoost) some positive examples used for training 5 best static out of 28k (AdaBoost)

Computer Vision

Dynamic detection

60 false detection: typically 1/400,000 (=1 every 2 frames for 360x240)

Computer Vision

Static detection

61

Computer Vision

Next class: Object recognition

62 Reading: Chapter 23