Transcript Document

IPAL at ImageClef 2007
Mixing Features, Models
and Knowledge
Sheng Gao
IPAL French-Singaporean Joint Lab
Institute for Infocomm Research
Singapore
IPAL at ImageClef’07

Team members
Jean-Pierre Chevallet, IPAL & CNRS,
France (photo & medical)
 Thi Hoang Diem Le, I2R, Singapore (photo
& medical)
 Trong Ton Pham, I2R, Singapore (photo)
 Joo Hwee Lim, I2R, Singapore (photo)

Outline

Ad-hoc photographic image retrieval
(ImageCLEFphoto)






Content based image retrieval (CBIR) system
Text based information retrieval (TBIR) system
Mix-modality retrieval system
Benchmark results
Summary
Ad-hoc medical image retrieval (ImageCLEFmed)



UMLS based medical retrieval system
Benchmark results
Summary
ImageCLEFphoto’07 vs.ImageCLEFphoto’06

Less text information is available in
ImageCLEFphoto’07.
Image text annotations: notes are excluded this year.
Query: only annotation in the title field are used.


Visual information plays more important role in
ImageCLEFphoto’07 than ImageCLEFphoto’06.
Query image samples are excluded from the image
database.
244 new images are added in the image database.
Similar text annotation, different
visual representation
01/1311
Title:
Accommodation
Huanchaco - Exterior View

01/1310
Title:
Accommodation
Huanchaco - Interior View
Visual evidence plays a critical role
in the case.
Low-level visual features







COR: Auto color correlogram, 324-dimension.
HSV: 166-dimensional histogram in HSV(162dimension,18*3*3) and gray image (4-dimension).
GABOR: 48-dimension including means and variances at
2-scale and 12-orientations in 5x5 grids.
SIFT: 128-dimensional appearance feature.
EDGE: Canny edge histogram, 80-dimension (6orientations * 16 patches).
HSV_UNI: 96-dimensional HSV histogram, 32-bin per
channel.
GABOR_global: 60-dimension including means and
variances at 5-scale and 6-orientations in whole image.
Indexing and similarity measure

Indexing
tf-idf: index histogram, each bin treated as a word.
SVD: indexing at the eigen-space (80% eigenvectors are kept).
ISM: Integrated Statistic Model (ISM) based supervised learning is
used to learn ranking function (refer to S. Gao, et al., ACM
Multimedia’07).
HME: Hidden Maximum Entropy (HME) based supervised
learning is used for ranking function (refer to S. Gao, et al.
ICME’07 and ICIP’07).
WORD_SVD: index at intermediate concepts
200 frequent keywords are intermediate concepts.
HME models are trained for 200 concepts.
Indexing image at 200-dim concept space.
WORD_ISM: ISM is used for ranking function at 200-dim feature.
BoV: index with the bag-of-visterm (1000 visterms).
Indexing and similarity measure

Similarity measure
Cosine distance if the image is indexed by one feature
vector, e.g. tf-idf, SVD, etc.
Likelihood ratio between the positive and negative
models, if supervised learning is used, e.g. ISM, HME.
CBIR system

Example fusion structure
Visual feature
COR
Indexing and ranking
Tf-idf
SVD
HSV
GABOR
SIFT
Fusion
cor
ISM
hsv
HME
gabor
WORD_SVD
WORD_ISM
Fusion
sift
Fusion
system
TBIR system

Process XML annotation files
XML
reader
Remove
stop words
Stemming
Lexicon
TBIR system

Language model based IR: a document-dependent
language model is estimated for each document in the
database.
Estimate
LM
P(w1|D)
P(w2|D)
……
P(wn|D)
Lexicon

Ranking according to the probability of the query, Q, is
generated by the document-dependent LM.
P  Q D    P  qi D 
i
Q: q1,q2,……,qm
TBIR system

Latent Semantic Indexing (LSI) based IR
Term-document
matrix
SVD
Eigenspace
Lexicon
Eigenspace

Index in eigen-space
Cosine distance is used for similarity measure in LSI
space.
TBIR system

Access Wikipedia to extract external knowledge
for query / document expansion.
4,881,983 pages are downloaded (Wikipedia in
English, April 2, 2007 ).
23,399 animal terms are extracted, which are useful
for queries 5, 20 and 35.
709 geographical terms are extracted, of which only
mountain should be useful for queries 4 and 44.
Mix-modality system

Linear combination.
CBIR
TBIR

w
1-w
Mix-system
Cross-modality pseudo-relevance feedback (PRF)
Query
words
Query
expansion
TBIR
Top N image document
CBIR
One scheme: CBIR to boost TBIR
Results


27 runs are submitted including CBIR, TBIR and mixmodality runs.
Best run MAP: 0.2833
6th place among 476 runs; 2nd place among automatic runs.
IPAL_04V_12RUNS WEIGHT: combine 12 CBIR runs using the
empirically tuned weights.
IPAL_11TrV_LM_12RUNSVISUAL: LM-based TBIR plus the
PRF.

CBIR best run MAP: 0.1204 without PRF, 4th place
among CBIR.
1st run from INAOE (MAP: 0.1925) and the 2nd run from XRCE
(MAP: 0.1890)
Results

Our TBIR best run MAP: 0.1806 with automatic
feedback, 7th place among TBIR.
19TiV_WTmM S0M2D0.8C6T6: with a very small
thesaurus manually extracted from Wikipedia. It is
terms that are not from info boxes but are relevant to
this collection.
Using a black and white image detector based on HSV
value of image.
Run15: LM, document expansion with automatic
Wikipedia.

Top TBIR runs’ MAP: 0.2020 without feedback
(Budapest) and 0.2075 with feedback (XRCE).
PRF analysis

CBIR system
Feedback from the TBIR has few effect.
MAP of the HSV-based CBIR (run 02) is only
increased to 0.0693 from 0.0684 (run 01).
Combining 12 CBIR PRF runs, MAP is increased to
0.1358 (run 05) from 0.1204 (run 04).

TBIR system
Feedback from CBIR significantly improve MAP.
MAP of the LM-based TBIR is 0.1377 (run 08). With
PRF (run 04), MAP reaches 0.2442 (run 11).
Summary on ImageCLEFphoto



Combing rich visual content representations and
indexing techniques significantly improve the
CBIR system comparing with any individual
visual system.
CBIR based pseudo-relevance feedback
significantly boost text based search system.
Exploiting external knowledge such as Wikipedia
gives an extra bonus, however, it is less effective
than expected. Its large size causes confusion due
to lake of disambiguation.
ImageCLEFmed - Bayesian network based approach

Conceptualization:
Knowledge base: UMLS Metathesaurus (NLM).
Images / texts
French
German
English
q
TreeTagger
c1
XIotamap
UMLS
Metamap
...
c2
cm
d
Concepts
cj
cn
ImageCLEFmed - Bayesian network based approach
q

Retrieval process
Document D observed:
P(D)=1
P(c, D)  P(c | D)  P( D) 
c1
wc
2
w
 c'
c 'D
c, wc  tf  idf
...
c2
cm
d
cj
cn
ImageCLEFmed - Bayesian network based approach
Inference via semantic links from document
concept nodes to query concept nodes
P(cq )  max i ( P(cq | pai (cq ))  P( pai (cq )))
 max i (  P( pai (cq )))
L: Maximum length of UMLS taxonomy
l: minimal length of path between 2 concepts
pa(c): document concept nodes which are parent
nodes of c
ImageCLEFmed - Bayesian network based approach
Relevance status value, RSV(q,d) : belief at q
RSV (q, d )  bel (q ) 
w
ci q
ci
P (ci )
w
ci q
ci
Results
Run
Isa
PARCHD
BRRN
RL
RQ
Map
R-Prec
IPAL-TXT-BAYISA0.1
0.1
0
0
0
0
0.3057
0.332
IPAL-TXT-BAYALLREL2
0.2
0.01
0.01
0
0
0.3042
0.333
IPAL-TXT-BAYALLREL1
0.2
0.01
0.01
0.001
0.001 0.304
0.3338
IPAL-TXT-BAYISA0.2
0.2
0
0
0
0
0.3039
0.3255
IPAL-TXT-BAYISA0.3
0.3
0
0
0
0
0.2996
0.3212
IPAL-TXT-BAYISA0.4
0.4
0
0
0
0
0.2935
0.3177
Summary on ImageCLEFmed



Bayesian model approach exploits semantic
relationship between documents concepts and
query concepts in an unified framework.
It enhances the VSM by using the semantic
relatedness between concepts.
Improvements on relationship weighting issue as
well as performance of model are our further
study.