Automatic Image Annotation and Retrieval using Cross

Download Report

Transcript Automatic Image Annotation and Retrieval using Cross

Automatic Image Annotation and
Retrieval using Cross-Media Relevance
Models
J. Jeon, V. Lavrenko and R. Manmathat
Computer Science Department
University of Massachusetts – Amherst
Presenter: Carlos Diuk
Introduction

The Problem:

Automatically annotate and retrieve images
from large collections.
Retrieval example: answer query “Tigers in grass” with
Introduction


Manual annotation being done in
libraries.
Different approaches to automatic
image annotation:



Co-occurence Model
Translation Model
Cross-media relevance model
Introduction – related work

Co-occurence Model
Looks at co-occurence of words with image
regions created using a regular grid.

Translation Model
Image annotation viewed as task of translating
from vocabulary of blobs to vocabulary of
words.
Introduction – CMRM

Cross-media relevance models (CMRM)


Assume that images may be described
from small vocabulary of blobs.
From a training set of annotated images,
learn the joint distribution of blobs and
words.
Introduction – CMRM

Cross-media relevance models (CMRM)

Allow query expansion:


Standard technique for reducing ambiguity in
information retrieval.
Perform initial query and expand by using
terms from the top relevant documents.
Example in image context: tigers more often
associated with grass, water, trees than with
cars or computers.
Introduction – CMRM

Variations:

Document based expansion

PACMRM (probabilistic annotation CMRM)
Blobs corresponding to each test image are used to generate words
and associated probabilities. Each test generates a vector of
probabilities for every word in vocabulary.

FACMRM (fixed annotation-based CMRM)
Use top N words from PACMRM to annotate images.

Query based expansion

DRCMRM (direct-retrieval CMRM)
Query words used to generate a set of blob probabilities. Vector of blob
probabilities compared with vector from test image using KullbackLieber divergence and resulting KL distance.
Discrete features in images


Segmentation of images into regions yields fragile
and erroneous results.
Normalized-cuts are used instead (Duygulu et al):


33 features extracted from images.
K (=500) clustering algorithm used to cluster regions based on
features. Vocabulary of 500 blobs.
CMRM Algorithms



Image I = {b1 .. bm} set of blobs
Training collection of images J = {b1 ..
bm ; w1 .. wn}
Two problems:


Given un-annotated image I, assign
meaningful keywords.
Given text query, retrieve images that
contain objects mentioned.
CMRM Algorithms

Calculating probabilities.
CMRM Algorithms

Image retrieval



INPUT: query Q = w1 .. wn and collection C of images
OUTPUT: images described by query words.
Annotation-based retrieval model (PACMRMFACMRM)



Annotate images as shown.
Perform text retrieval as usual.
Fixed-length annotation vs probabilistic annotation:
CMRM Algorithms

Image retrieval



INPUT: query Q = w1 .. wn and collection C of images
OUTPUT: images described by query words.
Direct retrieval model (DRCMRM)

Convert query into language of blobs, instead of images into
words.
Estimation:

Ranking:

Results

Dataset


Metrics:



Corel Stock Photo CDs (5000 images – 4000 training, 500
evaluation, 500 testing). 371 words and 500 blobs. Manual
annotations.
Recall: number of correctly retrieved images divided by
number of relevant images.
Precision: number of correctly retrieved images divided by
number of retrieved images.
Comparisons

Co-occurence vs Translation vs FACMRM
Results

Dataset


Metrics:



Corel Stock Photo CDs (5000 images – 4000 training, 500
evaluation, 500 testing). 371 words and 500 blobs. Manual
annotations.
Recall: number of correctly retrieved images divided by
number of relevant images.
Precision: number of correctly retrieved images divided by
number of retrieved images.
Comparisons

Co-occurence vs Translation vs FACMRM
Results

Precision and recall for 70 one-word queries.
Results

PACMRM vs DRCMRM
Some nice examples
Automatically annotated as sunset, but not
manually
Some nice examples
Response to query “tiger”
Response to query “pillar”
Some bad examples
Questions - Discussion



No semantic representation (just color, texture, shape).
How could we annotate a newspaper’s collection? (“Kennedy”, not just
“people”)
Google: cooperative annotation?

Google search for “tiger”:

Google search for “Kennedy”: