Transcript Visual Categorization with Bags of Keypoints
Visual Categorization with Bags of Keypoints
Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray Presented by Yun-hsueh Liu EE 148, Spring 2006 5/30/2006 1
What is Generic Visual Categorization?
Categorization: distinguish different classes Generic Visual Categorization: Generic to cope with many object types simultaneously readily extended to new object types.
Handle the variation in view, imaging, lighting, occlusion, and typical object and scene variations 5/30/2006 EE 148, Spring 2006 2
Previous Work in Computational Vision
Single Category Detection Decide if a member of
one visual category
is present in a given image. (faces, cars, targets) Content Based Image Retrieval Retrieve images on the basis of low-level image features, such as colors or textures.
Recognition Distinguish between images of structurally distinct objects within one class. (say, different cell phones) 5/30/2006 EE 148, Spring 2006 3
Bag-of-Keypoints Approach
Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier 0 .
1 .
.
.
0 .
5 1 .
5 5/30/2006 EE 148, Spring 2006 4
SIFT Descriptors
Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier 0 .
1 .
.
.
0 .
5 1 .
5 5/30/2006 EE 148, Spring 2006 5
Bag of Keypoints (1)
Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier Construction of a vocabulary Kmeans clustering find “centroids” (on all the descriptors we find from all the training images) Define a “vocabulary” as a set of “centroids”, where every centroid represents a “word”.
5/30/2006 EE 148, Spring 2006 6
Bag of Keypoints (2)
Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier Histogram Counts the number of occurrences of different
visual words
in each image 5/30/2006 EE 148, Spring 2006 7
Multi-class Classifier
Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier In this paper, classification is based on conventional machine learning approaches Naïve Bayes Support Vector Machine (SVM) 5/30/2006 EE 148, Spring 2006 8
Multi-class classifier – Naïve Bayes (1)
Let V = {v i }, i = 1,…,N, be a visual vocabulary, in which each v i represents a visual word (cluster centers) from the feature space.
A set of labeled images
I
= {I i } . Denote C j to represent our Classes, where j = 1,..,M N(t,i) = number of times v i occurs in image I i (keypoint histogram) Score approach: want to determine P(C j |I i ), where (*) 5/30/2006 EE 148, Spring 2006 9
Multi-class Classifier – Naïve Bayes (2)
Goal: Find one specific class C j so that has maximum value In order to avoid zero probability, use Laplace smoothing: 5/30/2006 EE 148, Spring 2006 10
Multi-class classifier – Support Vector Machine (SVM)
Input: the keypoints histogram for each image Multi-class one-against-all approach Linear SVM gives better performances than quadratic or cubic SVM Goal: find hyperplanes which separate multi-class data with maximun margin 5/30/2006 EE 148, Spring 2006 11
Multi-class classifier – SVM (2)
5/30/2006 EE 148, Spring 2006 12
Evaluation of Multi-class Classifiers
Three performance measures: The confusion matrix Each column of the matrix represents the instances in a predicted class Each row represents the instances in an actual class The overall error rate = Pr(output class = true class) The mean ranks The mean position of the correct labels when labels output by the multi class classifier are sorted by the classifier score.
5/30/2006 EE 148, Spring 2006 13
n-Fold Cross Validation
What is “fold”? Randomly break the dataset into n partitions Example: suppose n = 10 Training on 2, 3,…,10; testing on 1 = result 1 Training on 1, 3,…,10; testing on 2 = result 2 … Answer = Average of result 1, result 2, ….
5/30/2006 EE 148, Spring 2006 14
Experiment on Naïve Bayes – k’s effect
Present the overal error rate as a function of # of clusters k Result Error rate decreases as k increases Selecting point: k = 1000 After passing the selecting point, the error rate decreases slowly 5/30/2006 EE 148, Spring 2006 15
Experiment on Naïve Bayes – Confusion Matrix
faces buildings trees cars phones bikes books error rate mean rank faces
76
2 3 4 9 2 4 24 1.49
buildings 4
44
2 1 15 15 19 56 1.88
trees 2 5
80
0 1 12 0 20 1.33
5/30/2006 EE 148, Spring 2006 cars 3 0 0
75
16 0 6 25 1.33
phones 4 5 0 3
70
8 7 27 1.63
bikes 4 1 5 1 14
73
2 27 1.57
books 13 3 0 4 11 0
69
31 1.57
16
Experiment on SVM – Confusion Matrix
faces buildings trees cars phones bikes books error rate mean rank 5/30/2006 faces
98
1 1 0 0 0 0 2 1.04
buildings 14
63
10 1 5 4 3 27 1.77
trees 10 3
81
1 4 1 0 19 1.28
1.30
EE 148, Spring 2006 cars 10 0 1
85
3 0 1 15 phones 34 3 0 5
55
1 2 45 1.83
bikes 0 1 6 0 2
91
0 9 1.09
books 13 6 0 5 3 0
73
27 1.39
17
Interpretation of Results
The confusion matrix In general, SVM has more correct predictions than Naïve Bayes does The overall error rate In general, Naïve Bayes > SVM The Mean Rank In general, SVM < Naïve Bayes 5/30/2006 EE 148, Spring 2006 18
Why do we have errors?
There are objects from more than 2 classes in one image The data set is not totally clean (noise) Each image is given only one training label 5/30/2006 EE 148, Spring 2006 19
Conclusion
Bag-of-Keypoints is a new and efficient generic visual categorizer.
Evaluated on a seven-category database, this method is proved that it is robust to Choice of clusters, background clutter, multiple objects Any Questions?
Thank you for listening to my presentation!! :) 5/30/2006 EE 148, Spring 2006 20