Recognition using Attention

Transcript Recognition using Attention

Dudek & Jugessur, ICRA 2000.

Robust Place and Object Recognition using Local Appearance based Methods

Gregory Dudek and Deeptiman Jugessur Center for Intelligent Machines McGill University April 2000, IEEE ICRA QuickTime™ and a Animation decompressor are needed to see this picture.

Dudek & Jugessur

• Applications • PCA: shortcomings • Objectives • Approach • Background • System Overview • Results • Conclusion

Outline

April 2000, IEEE ICRA Dudek & Jugessur Dudek & Jugessur, ICRA 2000.

Dudek & Jugessur, ICRA 2000.

Two Applications

• Object recognition: what is that thing?

– Recognizing a known object from its visual appearance.

– Landmarks, grasping targets, etc.

• Place recognition (coarse localization): what room am I in?

– Recognizing the current waypoint on a trajectory, validating the current locale for the application of a precise localization method, topological navigation.

April 2000, IEEE ICRA Dudek & Jugessur

PCA-based recognition.

Dudek & Jugessur, ICRA 2000.

• Has now become a well established method for image recognition.

• PCA-based recognition: global transform of image with N degrees of freedom into an eigenspace with M << N degrees of freedom.

– Freedoms M are the “most important” characteristics of the set of images being memorized.

• Avoids having to segment image into object & background by using the whole thing.

April 2000, IEEE ICRA Dudek & Jugessur

Dudek & Jugessur, ICRA 2000.

Observations

• Using

whole image

object AND implies recognizing combination of

background

• Segmenting object from background would avoid dependence on background, but it’s too difficult.

• • Using a small sub-region gives a less precise recognition (e.e. the sun-window could come from more than one image), it’s is efficient.

Many subwindows together can “vote” for an unambiguous recognition

• If the sub-windows are

suitably

ignore the background.

chosen, they may totally April 2000, IEEE ICRA Dudek & Jugessur

Dudek & Jugessur, ICRA 2000.

Problem Statement

• Improving the performance of classic PCA based recognition by accounting for: – Varying backgrounds – Planar rotations – Occlusions • Also (discussed in less detail) – Changes in object pose – Non-rigid deformation April 2000, IEEE ICRA Dudek & Jugessur

Dudek & Jugessur, ICRA 2000.

Our key idea(s).

• Use sub-windows: several together uniquely accomplish recognition.

• Sub-windows are selected by an attention operator (several kinds can be used).

• Each sub-window is sampled non-uniformly to weight it towards it’s center.

• Use only the amplitude spectrum to buy rotational invariance.

April 2000, IEEE ICRA Dudek & Jugessur

Background

• Standard Appearance Based Recognition – M. Turk and S. Pentland 1991 – S.K. Nayar, H. Murase, S.A. Nene 1994 – H. Murase, S.K. Nayar 1995 – Shortcomings (due to global approach): • Background • Scale • Rotations • Local changes of the image or object • Occlusion Dudek & Jugessur, ICRA 2000.

April 2000, IEEE ICRA Dudek & Jugessur

Dudek & Jugessur, ICRA 2000.

Background (part 2)

• “Enhanced” Local sub-window methods – D. Lowe 1999: scale invariance, simple features. – C. Schmid 1999: Probabilistic approach based on sub-windows extracted using Harris operator.

– C. Schmid & R. Mohr 1997: numerous sub-windows extracted using Harris operator for database image retrieval (simpler problem).

– K. Ohba & K. Ikeuchi 1997: K.L.T. operator used for the extraction of sub-windows for the creation of an eigenspace . Only handles occlusion.

• Interest Operator of choice: – D. Reisfeld, H. Wolfson, Y.Yeshurun 1995: Local symmetry operator April 2000, IEEE ICRA Dudek & Jugessur

Dudek & Jugessur, ICRA 2000.

Approach

• 2 phases: – Training (

off-line

) for the entire database of recognizable images: • Run an interest operator to obtain a

saliency map

for each image.

• Choose

sub-windows

around the salient points for each image.

• Select most informative sub-windows and use foveal sampling.

• Create the

eigenspace

with the processed sub-windows.

– Testing (

on-line

) for a candidate test image: • • Run the same interest operator to obtain the

saliency

map.

Choose the sub-windows

and process the information within them.

•

Project

the sub-windows onto the eigenspace • Perform classification based on

nearest neighbor

rules.

April 2000, IEEE ICRA Dudek & Jugessur

Dudek & Jugessur, ICRA 2000.

Database of recognizable images Candidate test image April 2000, IEEE ICRA Recognition Model Run all images though the interest operator Extract sub-windows based on interest operator saliency values and information content 2D FFT Obtain amplitude spectra for the sub-windows 2D FFT Run the image through the interest operator Dudek & Jugessur Create low dim. eigenspace Eigenspace for classification Project onto eigenspace Off-line On-line

Polar Sampling Dudek & Jugessur, ICRA 2000.

Polar Samplings and 2D FFT

Polar Sampling 2D FFT April 2000, IEEE ICRA Same Amplitude Spectrum (in theory) Dudek & Jugessur 2D FFT

Dudek & Jugessur, ICRA 2000.

Shift Theorem

(

) 

(

)

Sh ift theorem states that :

(



) 

e j

2  (



)

(

)

Amp litud es are the s ame as :

e j

2  (



)

(

) |  |

(

) | April 2000, IEEE ICRA Dudek & Jugessur

Test Images Dudek & Jugessur, ICRA 2000.

Place Recognition Training Images Best match April 2000, IEEE ICRA Best match Dudek & Jugessur

Test Images Dudek & Jugessur, ICRA 2000.

Place Recognition (2) Training Images Best match April 2000, IEEE ICRA Best match Dudek & Jugessur

Test Image Dudek & Jugessur, ICRA 2000.

Object Recognition Training Image Recognition April 2000, IEEE ICRA Dudek & Jugessur

Test Image Dudek & Jugessur, ICRA 2000.

Object Recognition (2) Training Image Best matches April 2000, IEEE ICRA Note: background variation and occlusion

Dudek & Jugessur, ICRA 2000.

Performance metrics

• On-line performance: • 15x15 pixel subwindows: 90% recognition with 10 subwindows (10 interest points).

• 15x15 pixel subwindows: 100% recognition using 15 more subwindows – Interest operator can take 1/30s to 10 min. (depending on the operator, images size, etc.).

– Classification in Eigenspace well under 1 sec (can be performed in real time).

April 2000, IEEE ICRA Dudek & Jugessur

100% Dudek & Jugessur, ICRA 2000.

Performance vs Number of Interest Points

Note: 10 windows of size 15x15 means using only

0.7%

of the total image content.

April 2000, IEEE ICRA Number of features Dudek & Jugessur

Conclusion & Extensions

Dudek & Jugessur, ICRA 2000.

• Approach to object and place recognition from single video images. Works despite planar rotation, occlusion or other deformations.

• Highly robust.

• Recognition rates of up to 100% with 20 test images.

• Improved robustness to background can be achieved using “masking” [Jugessur & Dudek CVPR 2000].

• Ongoing work sees to exploit geometry of interest points.

• Could filter in Eigenspace during training to select only “useful” features.

April 2000, IEEE ICRA Dudek & Jugessur

That’s all April 2000, IEEE ICRA Dudek & Jugessur Dudek & Jugessur, ICRA 2000.

Questions you could ask

Dudek & Jugessur, ICRA 2000.

• Have you considered the use of alternative interest/attention operators? Does the operator matter?

• What if the background is much more interesting (to the operator) that the object?

• How much does color information matter?

• What is the consequence of not using geometric information (and what does that really mean)?

April 2000, IEEE ICRA Dudek & Jugessur

April 2000, IEEE ICRA Dudek & Jugessur Dudek & Jugessur, ICRA 2000.

Dudek & Jugessur, ICRA 2000.

Performance metrics

• Training time: roughly 64 windows, 15x15, 17 objects, 3 views per object: 24 hours.

– This is using MATLAB and highly non-optimized code. • Using similar methods on global images, other groups have reported times on the order of minutes for similar tasks.

• On-line performance: – Interest operator can take 1/30s to 10 min. (depending on the operator, images size, etc.) – Classification in Eigenspace well under 1 sec (can be performed in real time).

April 2000, IEEE ICRA Dudek & Jugessur

Recognition using Attention

Transcript Recognition using Attention

Robust Place and Object Recognition using Local Appearance based Methods

Outline

Two Applications

PCA-based recognition.

Observations

Problem Statement

Our key idea(s).

Background

Background (part 2)

Approach

Polar Samplings and 2D FFT

Shift Theorem

Sh ift theorem states that :

Amp litud es are the s ame as :

Performance metrics

Performance vs Number of Interest Points

Conclusion & Extensions

Questions you could ask

Performance metrics

Directory