Simultaneous Image Classification and Annotation

Download Report

Transcript Simultaneous Image Classification and Annotation

(Final Version)
Simultaneous Image
Classification and Annotation
Chong Wang, David Blei, Li Fei-Fei
Computer Science Department
Princeton University
Published in CVPR 2009
Presented by Eric Wang
7-3-09
• Introduction
Outline
• Review of sLDA and Corr-LDA
• Model description
• Model Inference and Parameter Estimation
• Empirical Results
• Conclusion
Introduction
• Images Classification refers to assigning a class label to each
image which globally describes the image.
• Image Annotation refers to assigning words which describe
individual regions of the image.
• The images considered in this paper are both classified with a
class label and annotated with free text.
• This paper will combine the basic framework of Corr-LDA, and
a highly modified version of Supervised LDA (sLDA) to yield a
model which simultaneously classifies images and annotates
the individual regions.
Review of sLDA
• For each document
• In this model, the response variable y is a continuous
random variable.
•
are treated as unknown constants
to be estimated, rather than as random variables.
Source: D. M. Blei and J. D. McAuliffe. Supervised topic models. In NIPS, 2007.
Review of sLDA
• An application of sLDA considered by Blei et. al was
regressing a corpus of textual movie reviews to
number of stars given.
Source: D. M. Blei and J. D. McAuliffe. Supervised topic models. In NIPS, 2007.
Review of Corr-LDA
• For each document
• Corr-LDA is a simple extension of LDA for images to
annotate image regions.
Source: D. M. Blei and M. I. Jordan. Modeling annotated data. In SIGIR, 2003.
Annotated sLDA Model
• This step is identical to LDA, the topic proportions are drawn
once per document.
• In this paper,  is not optimized.
Annotated sLDA Model
• A region is characterized by one of 240 codewords (quantized from 128
dimensional SIFT features).
• Regions are found by segmenting images using the N-cuts algorithm.
•

parameterizes a particular multinomial distribution (topic) over the
quantized codewords.
Annotated sLDA Model
•
The class label c is completely determined by the topic indicators z_{1:N} using a modified sLDA
framework.
•
The total number of classes is known a priori and the class indicators are treated separately from
the annotations. This is a simpler approach than the one taken by L.J. Li, R. Socher and L. Fei-Fei in
that there is no “switch” variable which determines whether a word is an annotation or label.
•
The softmax function is well studied and is also known as “multinomial logistic regression”
Annotated sLDA Model
•
The annotations are assigned to specific regions in the same manner as in
Corr-LDA.
•
This will, for example, encourage words such as “blue” and “white” to be
associated with regions (and thus, codewords) which capture sky.
•
Though not explicitly shown in the graphical model,  and
symmetric Dirichlet priors.

have
Inference of Latent Variables
These updates are
identical to those
used in Corr-LDA.
• Let n parameterize a multinomial over the K topics
• Let  parameterize a Dirichlet over topic distributions.
• Let  parameterize a multinomial over image regions
m
• These updates are local to each document (thus the omission of d).
Inference of Latent Variables
• This equation updates the posterior distribution over topics.
• Note that this update depends on both class label c and the
annotation information wm .
Inference of Latent Variables
Parameter Estimation
Updates of codebook word
f in codebook topic i
(proportional to a
constant).
has no closed form solution and is optimized via conjugate gradient
Inference of Latent Variables
Parameter Estimation
Updates annotation
word w in annotation
topic i (proportional
to a constant).
has no closed form solution and is optimized via conjugate gradient
Empirical Results
• LableMe dataset
– 8 classes: “highway,” “inside city,” “tall building,”
“street,” “forest,” “coast,” “mountain,” and “open
country.”
– 200 256x256 training images per class.
• UIUC dataset
– 8 types of sports: “badminton,” “bocce,” “croquet,”
“polo,” “rockclimbing,” “rowing,” “sailing” and
“snowboarding.”
– 1792 256x256 training images.
• 240 codeword dictionary.
• Annotations which appeared less than 3 times
were removed.
Empirical Results: Classification
•
The black line represents of the performance of Bosch et. al 2006, which
employs a non-annotated LDA on the image regions and a KNN to classify the
images.
•
The blue line is the performance of Fei-Fei and Perona 2005, which uses
unannotated labeled images
•
The models presented in this paper are much more resistant to overfitting
than the models of Bosch et. al and Fei-Fei and Perona .
Emperical Results: Classification
• Confusion matrices comparing the performance of multi-class
sLDA with annotations and multi-class sLDA using 100 topic
models
• Annotations seem to improve performance slightly, although,
as the last slide shows, the main benefit is more consistent
performance as a function of the number of topics.
Empirical Results: Annotation
• The F-Measure is used as a score.
• Results are given over all numbers of topics considered
above.
• LabelMe:
• 38.2% (corr-LDA)
• 38.7% (multi-class sLDA with annotations)
• UIUC-Sport:
• 34.7% (corr-LDA)
• 35.0% (multiclass sLDA with annotations).
Conclusion
• Combining image annotation with classification provides state
of the art image classification performance.
• However, the addition of the classification framework
provides only a small improvement to the annotation
performance.
• The authors’ primary contribution is showing that image
classification and annotation are related and can be
conducted simultaneously in the same framework.
• Inference was done in a Variational EM framework.