Transcript General

1

Object Recognition

 Instance Recognition  Known, rigid object  Only variation is from relative position & orientation (and camera parameters)  “Cluttered image” = possibility of occlusion, irrelevant features  Generic (category-level) Recognition  Any object in a class (e.g. chair, cat)  Much harder – requires a ‘language’ to describe the classes of objects

Ellen L. Walker

Instance Recognition & Pose Determination

 Instance recognition  Given an image, what object(s) exist in the image?

 Assuming we have geometric features (e.g. sets of control points) for each  Assuming we have a method to extract images of the model features

2

 Pose determination (sometimes simultaneous)  Given an object extracted from an image and its model, find the geometric transformation between the image and the model  This requires a mapping between extracted features and model features

Ellen L. Walker

3

Instance Recognition

 Build database of objects of interest  Features  Reference images  Extract features (or isolate relevant portion) from scene  Determine object and its pose  Object(s) that best match features in the image  Transformation between ‘standard’ pose in database, and pose in the image  Rigid translation, rotation OR affine transform

Ellen L. Walker

4

What Kinds of Features?

 Lines  Contours  3D Surfaces  Viewpoint invariant 2D features (e.g. SIFT)  Features extracted by machine learning (e.g. principal component features)

Ellen L. Walker

5

Geometric Alignment

 OFFLINE (we don’t care how slow this is!)  Extract interest points from each database image (of isolated object)  Store resulting information (features and original locations) in an indexing structure (e.g. search tree)  ONLINE (processing time matters)  Extract features from new image  Compare to database features  Verify consistency of each group of N (e.g. 3) features found from the same image

Ellen L. Walker

6

Hough Transform for Verificaton

 Each minimal set of matches votes for a transformation  Example: SIFT features (location, scale, orientation)  Each Hough cell represents    Object center’s location (x, y) Scale ( s ) Planar (in-image) rotation ( q )  Each individual feature votes for the closest 16 bins (2 in each dimension) to its own (x, y, s, q )  Every peak in the histogram is considered for a possible match  Entire object’s set of features is transformed and checked in the image. If enough found, then it’s a match.

Ellen L. Walker

Issues of Hough-Based Alignment

 Too many correspondences  Imagine 10 points from image, 5 points in model  If all are considered, we have 45 * 10 = 450 correspondences to consider!

 In general N image points, M model points yields (N choose 2)*(M choose 2), or (N*(N-1)*M*(M-1))/4 correspondences to consider!

 Can we limit the pairs we consider?

7

 Accidental peaks  Just like the regular Hough transform, some peaks can be "conspiracies of coincidences"  Therefore, we must verify all "reasonably large" peaks

Ellen L. Walker

8

Parameters of Hough-based Alignment

 How coarse (big) are the Hough space bins?

 If too coarse, unrelated features will “conspire” to form a peak  If too fine, matching features will spread out and the peak will be lost  The finer the binning, the more time & space it takes  Multiple votes per feature provides a compromise  How many features needed to create a “vote”?

 Minimum to determine necessary bin?

 More cuts down time, might lose good information

Ellen L. Walker

9

More Parameters

 What is the minimum # votes to align?

 What is the maximum total error for success (or what is the minimum number of points, and maximum error per point)?

Ellen L. Walker

Alignment by Optimization

 Need to use features to find the transformation that fits the features.

 Least squares optimization (see 6.1.1 for details) 

E

 

i r i

2  

i f

(

x i

;

p

)  

i

2 

x

is the feature vector from the database,  f is the transformation,  p is the set of parameters of the transformation,  x’ is the set of features from the image

10

 Iterative and robust methods are also discussed in 6.1

Ellen L. Walker

Variations on Least Squares

 Weighted Least Squares  In error equations, weight each point by reciprocal of its variance (estimate of uncertainty in the point’s location)  The less sure the location, the lower the weight  Iterative Methods (search) – see Optimization slides

11

 RANSAC (Random Sample Consensus)  Choose k correspondences and compute a transformation.

 Apply transformation to

all correspondences

, count inliers  Repeat many times. Result is transformation that yields the most inliers.

Ellen L. Walker

12

Geometric Transformations (review)

 In general, a geometric transformation is any operation on points that yields points  Linear transformations can be represented by matrix multiplication of homogeneous coordinates: 

t

11  

t t

21 31

t

12

t

22

t

32

t

13

t

23

t

33  

x

    

y

1   

x

'

y s

' '  Result is x’/s’ , y’/s’

Ellen L. Walker

13

Example transformations

 Translation  Set diagonals to 1, right column to new location, all else 0  Translation adds [dx, dy, 1] t to [x, y]  Rotation  Set upper four elements to cos(theta), -sin(theta), sin(theta), cos(theta), last element to 1, all else 0  Scale  Set diagonals to 1 and lower right to 1 / scale factor  OR Set diagonals to scale factor, except lower right to 1  Projective transform  Any arbitrary 3x3 matrix!

Ellen L. Walker

14

Combining Transformations

 Rotation by q about an arbitrary point (xc, yc) 1.

2.

3.

Translate so that the arbitrary point becomes 0 1 0 –xc Temp1 = 0 1 –yc x P 0 0 1 Rotate by q cos q Temp2 = sin q –sin q cos q 0 0 x Temp1 0 0 1 Translate back to the original coordinates 1 0 xc Temp3 = 0 1 yc x Temp2 0 0 1

Ellen L. Walker

15

More generally

 If T1, T2, T3 are a series of matrices representing transformations, then  T3 x T2 x T1 x P performs T1, T2, then T3 on P  Order matters!

 You can precompute a single transformation matrix as T = T3 x T2 x T1 , then P' = TP is the transformed point

Ellen L. Walker

16

Transformations and Invariants

 Invariants are properties that are preserved through transformations  Angle between two vectors is invariant to translation, scaling and rotation (or any combination thereof)  Distance between two vectors in invariant to translation and rotation (or any combination thereof)  Angle and distance preserving transformations are called

rigid transformations

 These are the only logical transformations that can be performed on non-deformable physical objects.

Ellen L. Walker

17

Geometric Invariants

 Given: known shape and known transformation  Use: measure that is

invariant

over the transformation  The value is measurable and constant over all transformed shapes  Examples  Euclidean distance: invariant under translation & rotation  Angle between line segments: translation, rotation, scale  Cross-ratio: projective invariants (including perspective)  Note: invariants are good for locating objects, but give no transformation information for the transformations they are invariant to!

Ellen L. Walker

18

Cross Ratio: Invariant of Projection

 Consider four rays “cut” by two lines  I = (A-C)(B-D) / (A-D)(B-C) A B C B A D C D

Ellen L. Walker

19

Cross Ratio Examples

 Two images of one object makes 2 matching cross ratios!

 Dual of cross ratio: four lines from a point instead of four points on a line  Any five non-collinear but coplanar points yield two cross-ratios (from sets of 4 lines)

Ellen L. Walker

20

Using Invariants for Recognition

 Measure the invariant in one image (or on the object)  Find all possible instances of the invariant (e.g. all sets of 4 collinear points) in the (other) image  If any instance of the invariant matches the measured one, then you (might) have found the object  Research question: to what extent are invariants useful in noisy images?

Ellen L. Walker

21

Calibration Problem (Alignment to World Coordinates)

 Given:  Set of

control points

 Known locations in "standard orientation"  Known distances in world units, e.g. mm  "Easy" to find in images  Image including all control points  Find:  Rigid transformation from "standard orientation" and world units to image orientation and pixel units  This transformation is a 3x3 matrix

Ellen L. Walker

22

Calibration Solution

 The transformation from image to world can be represented as a rotation followed by a scale, then a translation  Pworld = TxSxRxPimage  This provides 2 equations per point  x world = x image *s*cos(theta) – y image *s*sin(theta) + dx  y world = x image *s*sin(theta) + y image *s*cos(theta)+ dy  Because we have 4 unknowns (s, theta, dx, dy), we can solve the equations given 2 points (4 values)  But, the relationship between sin(theta) and cos(theta) is nonlinear.

Ellen L. Walker

23

Getting Rotation Directly

 Find the direction of the segment (P1, P2) in the image  Remember tan(theta) = (y2-y1) / (x2-x1)  Subtract the direction found from the (known) direction of the segment in "standard position"  This is theta - the rotation in the image  Fill in sin(theta) and cos(theta); now the equations are linear and the usual tools can be used to solve them.

Ellen L. Walker

24

Non-Rigid Transformations

 Affine transformation has 6 independent parameters  Last row of matrix is fixed at 0 0 1  We ignore an arbitrary scale factor that can be applied  Allows shear (diagonal stretching of x and/or y axis)  At least 3 control points are needed to find the transform (3 points = 6 values)  Projective transformation has 8 independent parameters  Fix lower-right corner (overall scale) at 1  Ignore arbitrary scale factor that can be applied  Requires at least 4 control points (8 values)

Ellen L. Walker

Image Warping

 Given an affine transformation (any 3x3 transformation)  Given an image with 3 control points specified (origin and two axis extrema)  Create a new image that maps the 3 control points to 3 corners of a pixel-aligned square

25

 Technique:  The control points define an affine matrix  For each point in the new image, apply the transformation to find a point in the old image; copy its pixel value to the new image.

 If the point is outside the borders of the old image, use a default pixel value, e.g. black

Ellen L. Walker

Which feature is which? (Finding correspondences)

 Direct measurements can rule out some correspondences  Round hole vs. square hole  Big hole vs. small hole (relative to some other measurable distance)  Red dot vs. green dot  Invariant relationships between features can rule out others  Distance between 2 points (relative…)  Angle between segments defined by 3 points

26

 Correspondences that cannot be ruled out must be considered (Too many correspondences?)

Ellen L. Walker

27

Structural Matching

 Recast the problem as "consistent labeling"  A consistent labeling is an assignment of labels to parts that satisfies:  If Pi and Pj are related parts, then their labels f(Pi), f(Pj) are related in the same way  Example: if two segments are connected at a vertex in the model, then the respective matching segments in the image must also be connected at a vertex

Ellen L. Walker

A=a

Interpretation Tree

(empty) A=b A=c B=b B=c B=a B=c B=a B=b

Each branch is a choice of feature-label match

28

Cut off branch (and all children) if a constraint is violated

Ellen L. Walker

29

Constraints on Correspondences (review)

 Unary constraints are direct measurements  Round hole vs. square hole  Big hole vs. small hole (relative to some other measurable distance)  Red dot vs. green dot  Binary constraints are measurements between 2 features  Distance between 2 points (relative…)  Angle between segments defined by 3 points  Higher order constraints might measure relationships among 3 or more features

Ellen L. Walker

30

Searching the Interpretation Tree

 Depth-first search (recursive backtracking)  Straightforward, but could be time-consuming  Heuristic (e.g. best-first) search  Requires good guesses as to which branch to expand next  (Specifics are covered in Artificial Intelligence)  Parallel Relaxation  Each node gets all labels  Every constraint removes inconsistent labels  (Review neural net slides for details)

Ellen L. Walker

31

Dealing with Large Databases

 Techniques from Information Retrieval  Study of finding items in large data sets efficiently  E.g. hashing vs. brute-force search  Example “Image Retrieval Using Visual Words”  Vocabulary Construction (offline)  Database Construction (offline)  Image Retrieval (online)

Ellen L. Walker

32

Vocabulary Construction

 Extract affine covariant regions from image (300k)  Shape adapted regions around feature points  Compute SIFT descriptors from each region  Determine average covariance matrix for each descriptor (tracked from frame to frame)  How does this patch change over time?

 Cluster regions using K-means clustering (thousands)  Each region center becomes a ‘word’  Eliminate too frequent ‘words’ (stop words)

Ellen L. Walker

Database Construction

 Determine word distributions for each document (image)  Word frequency = (number times this word occurs) / (number words in doc)  Inverse document frequency =  Log (number of documents containing this word) / (number of documents)  tf-idf measure  (word freq) * (inverse doc freq)

33

 Each document is represented by a vector of tf-idf measures for each word

Ellen L. Walker

Image Retrieval

34

 Extract regions, descriptors, and visual words  Compute tf-idf vector for the query image (or region)  Retrieve candidates with most similar tf-idf vectors  Brute force, or using an ‘inverse index’  (Optional) re-rank or verify all candidate matches (e.g. spatial consistency, validation of transformation)  (Optional) expand the result by submitting highly-ranked matches as new queries  (OK for <10k keyframes, <100k visual words)

Ellen L. Walker

Improvements

 Vocabulary tree approach  Instead of ‘words’, create ‘vocabulary tree’  Hierarchical: each branch has several prototypes  In recognition, follow the branch with the closest prototype (recursively through the tree)  Very fast: 40k CD’s recognized in real time (30/sec); 1M frames at 1Hz (1/sec)

35

 More sophisticated data structures  K-D Trees  Other ideas from IR  Very active research field right now

Ellen L. Walker

Application: Location Recognition

 Match image to location where it was taken  E.g. annotating Google Maps, organizing information on Flickr, star maps  Match via vanishing points (when architectural objects are prominent)  Find landmarks (the ones everyone photographs)  Identify automatically as part of indexing process

36

 Issues:  large number of photos  Lots of ‘clutter’ (e.g. foliage) that doesn’t help recognition

Ellen L. Walker

37

Image Retrieval

 Determine the tf-idf measure for the image (using words already included in the database)  Match to the tf-idf measures for images in the DB  Similarity measured by normalized dot product (more similar = higher)  Difference measured by Euclidean distance

Ellen L. Walker