Transcript SIMPLIcity

Image Retrieval and Annotation via
a Stochastic Modeling Approach
Jia Li, Ph.D.
The Pennsylvania State University
Outline



Introduction
Image retrieval: SIMPLIcity
Automatic annotation: ALIP


A stochastic modeling approach
Conclusions and future work
Image Retrieval



The retrieval of relevant images from an
image database on the basis of
automatically-derived image features
Applications: biomedicine, defense,
commercial, cultural, education,
entertainment, Web, ……
Approaches:



Color layout
Region based
User feedback
Can a computer do this?

“Building, sky, lake, landscape, Europe, tree”
Outline



Introduction
Image retrieval: SIMPLIcity
Automatic annotation: ALIP


A stochastic modeling approach
Conclusions and future work
The SIMPLIcity System


Semantics-sensitive Integrated Matching
for Picture LIbraries
Major features



Sensitive to semantics: combine semantic
classification with image retrieval
Region based retrieval:wavelet-based feature
extraction and k-means clustering
Reduced sensitivity to inaccurate segmentation
and simple user interface: Integrated Region
Matching (IRM)
Wavelets
Fast Image Segmentation




Partition an image into 4×4 blocks
Extract wavelet-based features from each block
Use k-means algorithm to cluster feature vectors into
‘regions’
Compute the shape feature by normalized inertia
IRM: Integrated Region
Matching


IRM defines an image-to-image distance as a
weighted sum of region-to-region distances
Weighting matrix is determined based on
significance constrains and a ‘MSHP’ greedy
algorithm
A 3-D Example for IRM
IRM: Major Advantages
1.
2.
3.
Reduces the influence of inaccurate
segmentation
Helps to clarify the semantics of a
particular region given its neighbors
Provides the user with a simple
interface
Experiments and Results

Speed




800 MHz Pentium PC with LINUX OS
Databases: 200,000 general-purpose image DB
(60,000 photographs + 140,000 hand-drawn arts)
70,000 pathology image segments
Image indexing time: one second per image
Image retrieval time:



Without the scalable IRM, 1.5 seconds/query CPU time
With the scalable IRM, 0.15 second/query CPU time
External query: one extra second CPU time
RANDOM SELECTION
Query Results
Current SIMPLIcity System
External Query
Robustness to Image
Alterations








10% brighten on average
8% darken
Blurring with a 15x15 Gaussian filter
70% sharpen
20% more saturation
10% less saturation
Shape distortions
Cropping, shifting, rotation
Status of SIMPLIcity


Researchers from more than 40
institutions/government agencies
requested and obtained SIMPLIcity
We applied SIMPLicity to:



Automatic image classification
Searching of pathological images
Searching of art and cultural images
Outline



Introduction
Image retrieval: SIMPLIcity
Automatic annotation: ALIP


A stochastic modeling approach
Conclusions and future work
Image Database


The image database contains categorized
images.
Each category is annotated with a few
words.



Landscape, glacier
Africa, wildlife
Each category of images is referred to as a
concept.
A Category of Images
Annotation: “man, male, people, cloth, face”
ALIP: Automatic Linguistic
Indexing for Pictures



Learn relations between annotation words
and images using the training database.
Profile each category by a statistical image
model: 2-D Multiresolution Hidden Markov
Model (2-D MHMM).
Assess the similarity between an image and
a category by its likelihood under the
profiling model.
Training Process
Automatic Annotation Process
Model: 2-D MHMM



Represent images by local features extracted at multiple
resolutions.
Model the feature vectors and their inter- and intra-scale
dependence.
2-D MHMM finds “modes” of the feature vectors and
characterizes their spatial dependence.
2D HMM
Regard an image
as a grid. A
feature vector is
computed for
each node.





Each node exists in a hidden state.
The states are governed by a Markov mesh (a causal Markov
random field).
Given the state, the feature vector is conditionally independent
of other feature vectors and follows a normal distribution.
The states are introduced to efficiently model the spatial
dependence among feature vectors.
The states are not observable, which makes estimation difficult.
2D HMM
The underlying states are
governed by a Markov mesh.
(i’,j’)<(i,j) if i’<i; or i’=i & j’<j
2D MHMM



An image is a pyramid
grid.
A Markovian dependence
is assumed across
resolutions.
Given the state of a
parent node, the states
of its child nodes follow a
Markov mesh with
transition probabilities
depending on the parent
state.
2D MHMM

First-order Markov dependence across
resolutions.
2D MHMM
The child nodes at resolution r of node
(k,l) at resolution r-1:

Conditional independence given the
parent state:

Annotation Process





Rank the categories by the likelihoods of an
image to be annotated under their profiling
2-D MHMMs.
Select annotation words from those used to
describe the top ranked categories.
Statistical significance is computed for each
candidate word.
Words that are unlikely to have appeared by
chance are selected.
Favor the selection of rare words.
Initial Experiment
600 concepts, each trained with
40 images
 15 minutes Pentium CPU time
per concept, train only once
 highly parallelizable algorithm

Preliminary Results
Computer Prediction:
people, Europe, man-made,
water
Building, sky, lake,
landscape,
Europe, tree
Food, indoor, cuisine,
dessert
Snow, animal,
wildlife, sky,
cloth, ice, people
People, Europe,
female
More Results
Results: using our own
photographs



P: Photographer annotation
Underlined words: words predicted by computer
(Parenthesis): words not in the learned
“dictionary” of the computer
Systematic Evaluation
10 classes:
Africa,
beach,
buildings,
buses,
dinosaurs,
elephants,
flowers,
horses,
mountains,
food.
600-class Classification



Task: classify a given image to one of the 600 semantic
classes
Gold standard: the photographer/publisher classification
This procedure provides lower-bounds of the accuracy
measures because:



There can be overlaps of semantics among classes (e.g.,
“Europe” vs. “France” vs. “Paris”, or, “tigers I” vs. “tigers II”)
Training images in the same class may not be visually similar
(e.g., the class of “sport events” include different sports and
different shooting angles)
Result: with 11,200 test images, 15% of the time ALIP
selected the exact class as the best choice
 I.e., ALIP is about 90 times more intelligent than a
system with random-drawing system
More Information

J. Li, J. Z. Wang, ``Automatic linguistic
indexing of pictures by a statistical
modeling approach,''
IEEE Transactions on Pattern
Analysis and Machine Intelligence,
25(9):1075-1088,2003.
Conclusions


SIMPLIcity system
Automatic Linguistic Indexing of Pictures



Highly challenging
Much more to be explored
Statistical modeling has shown some
success.
Future Work

Explore new methods for better accuracy




Apply these methods to




refine statistical modeling of images
learning from 3D medical images
refine matching schemes
special image databases
very large databases
Integration with large-scale information systems
……