Transcript Recognition using Attention
Dudek & Jugessur, ICRA 2000.
+
Robust Place and Object Recognition using Local Appearance based Methods
Gregory Dudek and Deeptiman Jugessur Center for Intelligent Machines McGill University April 2000, IEEE ICRA QuickTime™ and a Animation decompressor are needed to see this picture.
Dudek & Jugessur
• Applications • PCA: shortcomings • Objectives • Approach • Background • System Overview • Results • Conclusion
Outline
April 2000, IEEE ICRA Dudek & Jugessur Dudek & Jugessur, ICRA 2000.
Dudek & Jugessur, ICRA 2000.
Two Applications
• Object recognition: what is that thing?
– Recognizing a known object from its visual appearance.
– Landmarks, grasping targets, etc.
• Place recognition (coarse localization): what room am I in?
– Recognizing the current waypoint on a trajectory, validating the current locale for the application of a precise localization method, topological navigation.
April 2000, IEEE ICRA Dudek & Jugessur
PCA-based recognition.
Dudek & Jugessur, ICRA 2000.
• Has now become a well established method for image recognition.
• PCA-based recognition: global transform of image with N degrees of freedom into an eigenspace with M << N degrees of freedom.
– Freedoms M are the “most important” characteristics of the set of images being memorized.
• Avoids having to segment image into object & background by using the whole thing.
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Observations
• Using
whole image
object AND implies recognizing combination of
background
.
• Segmenting object from background would avoid dependence on background, but it’s too difficult.
• • Using a small sub-region gives a less precise recognition (e.e. the sun-window could come from more than one image), it’s is efficient.
Many subwindows together can “vote” for an unambiguous recognition
.
• If the sub-windows are
suitably
ignore the background.
chosen, they may totally April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Problem Statement
• Improving the performance of classic PCA based recognition by accounting for: – Varying backgrounds – Planar rotations – Occlusions • Also (discussed in less detail) – Changes in object pose – Non-rigid deformation April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Our key idea(s).
• Use sub-windows: several together uniquely accomplish recognition.
• Sub-windows are selected by an attention operator (several kinds can be used).
• Each sub-window is sampled non-uniformly to weight it towards it’s center.
• Use only the amplitude spectrum to buy rotational invariance.
April 2000, IEEE ICRA Dudek & Jugessur
Background
• Standard Appearance Based Recognition – M. Turk and S. Pentland 1991 – S.K. Nayar, H. Murase, S.A. Nene 1994 – H. Murase, S.K. Nayar 1995 – Shortcomings (due to global approach): • Background • Scale • Rotations • Local changes of the image or object • Occlusion Dudek & Jugessur, ICRA 2000.
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Background (part 2)
• “Enhanced” Local sub-window methods – D. Lowe 1999: scale invariance, simple features. – C. Schmid 1999: Probabilistic approach based on sub-windows extracted using Harris operator.
– C. Schmid & R. Mohr 1997: numerous sub-windows extracted using Harris operator for database image retrieval (simpler problem).
– K. Ohba & K. Ikeuchi 1997: K.L.T. operator used for the extraction of sub-windows for the creation of an eigenspace . Only handles occlusion.
• Interest Operator of choice: – D. Reisfeld, H. Wolfson, Y.Yeshurun 1995: Local symmetry operator April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Approach
• 2 phases: – Training (
off-line
) for the entire database of recognizable images: • Run an interest operator to obtain a
saliency map
for each image.
• Choose
sub-windows
around the salient points for each image.
• Select most informative sub-windows and use foveal sampling.
• Create the
eigenspace
with the processed sub-windows.
– Testing (
on-line
) for a candidate test image: • • Run the same interest operator to obtain the
saliency
map.
Choose the sub-windows
and process the information within them.
•
Project
the sub-windows onto the eigenspace • Perform classification based on
nearest neighbor
rules.
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Database of recognizable images Candidate test image April 2000, IEEE ICRA Recognition Model Run all images though the interest operator Extract sub-windows based on interest operator saliency values and information content 2D FFT Obtain amplitude spectra for the sub-windows 2D FFT Run the image through the interest operator Dudek & Jugessur Create low dim. eigenspace Eigenspace for classification Project onto eigenspace Off-line On-line
Polar Sampling Dudek & Jugessur, ICRA 2000.
Polar Samplings and 2D FFT
Polar Sampling 2D FFT April 2000, IEEE ICRA Same Amplitude Spectrum (in theory) Dudek & Jugessur 2D FFT
Dudek & Jugessur, ICRA 2000.
Shift Theorem
f
(
x
,
y
)
F
(
u
,
v
)
Sh ift theorem states that :
f
(
x
a
,
y
b
)
e j
2 (
au
bv
)
F
(
u
,
v
)
Amp litud es are the s ame as :
|
e j
2 (
au
bv
)
F
(
u
,
v
) | |
F
(
u
,
v
) | April 2000, IEEE ICRA Dudek & Jugessur
Test Images Dudek & Jugessur, ICRA 2000.
Place Recognition Training Images Best match April 2000, IEEE ICRA Best match Dudek & Jugessur
Test Images Dudek & Jugessur, ICRA 2000.
Place Recognition (2) Training Images Best match April 2000, IEEE ICRA Best match Dudek & Jugessur
Test Image Dudek & Jugessur, ICRA 2000.
Object Recognition Training Image Recognition April 2000, IEEE ICRA Dudek & Jugessur
Test Image Dudek & Jugessur, ICRA 2000.
Object Recognition (2) Training Image Best matches April 2000, IEEE ICRA Note: background variation and occlusion
Dudek & Jugessur, ICRA 2000.
Performance metrics
• On-line performance: • 15x15 pixel subwindows: 90% recognition with 10 subwindows (10 interest points).
• 15x15 pixel subwindows: 100% recognition using 15 more subwindows – Interest operator can take 1/30s to 10 min. (depending on the operator, images size, etc.).
– Classification in Eigenspace well under 1 sec (can be performed in real time).
April 2000, IEEE ICRA Dudek & Jugessur
100% Dudek & Jugessur, ICRA 2000.
Performance vs Number of Interest Points
Note: 10 windows of size 15x15 means using only
0.7%
of the total image content.
April 2000, IEEE ICRA Number of features Dudek & Jugessur
Conclusion & Extensions
Dudek & Jugessur, ICRA 2000.
• Approach to object and place recognition from single video images. Works despite planar rotation, occlusion or other deformations.
• Highly robust.
• Recognition rates of up to 100% with 20 test images.
• Improved robustness to background can be achieved using “masking” [Jugessur & Dudek CVPR 2000].
• Ongoing work sees to exploit geometry of interest points.
• Could filter in Eigenspace during training to select only “useful” features.
April 2000, IEEE ICRA Dudek & Jugessur
That’s all April 2000, IEEE ICRA Dudek & Jugessur Dudek & Jugessur, ICRA 2000.
Questions you could ask
Dudek & Jugessur, ICRA 2000.
• Have you considered the use of alternative interest/attention operators? Does the operator matter?
• What if the background is much more interesting (to the operator) that the object?
• How much does color information matter?
• What is the consequence of not using geometric information (and what does that really mean)?
April 2000, IEEE ICRA Dudek & Jugessur
April 2000, IEEE ICRA Dudek & Jugessur Dudek & Jugessur, ICRA 2000.
Dudek & Jugessur, ICRA 2000.
Performance metrics
• Training time: roughly 64 windows, 15x15, 17 objects, 3 views per object: 24 hours.
– This is using MATLAB and highly non-optimized code. • Using similar methods on global images, other groups have reported times on the order of minutes for similar tasks.
• On-line performance: – Interest operator can take 1/30s to 10 min. (depending on the operator, images size, etc.) – Classification in Eigenspace well under 1 sec (can be performed in real time).
April 2000, IEEE ICRA Dudek & Jugessur