Transcript Document

A Dynamic Probabilistic
Multimedia Retrieval Model
Tzvetanka I. Ianeva
Arjen P. de Vries
Thijs Westerveld
ICME 2004
Introduction
• Video Representation schemes used for
retrieval:
– Static
– Spatio-temporal
• Video is a temporal media so a ‘good’
model solves the limitations of
keyframe-based shot representation
ICME 2004
Spatio-temporal grouping
• Spatial priority and tracking of regions
from frame to frame
• Joint spatial and temporal segmentation
– Human vision finds salient structures jointly in space and
time (Gepshtein and Kubovy, 2000)
ICME 2004
Motivation
• Pursue video retrieval instead of image
(keyframe) retrieval
• Extension of the Static Probabilistic
Multimedia Retrieval model (2003)
• GMM in DCT-space-time domain
– Diagonal covariance
ICME 2004
Static Model
Docs
Models
•Indexing
- Estimate Gaussian
Mixture Models from
images using EM
- Based on feature
vector with colour,
texture and position
information from pixel
blocks
- Fixed number of
components
ICME 2004
Static Model
• Indexing
– Estimate a Gaussian
Mixture Model from each
keyframe (using EM)
– Fixed number of
components (C=8)
– Feature vectors contain
colour, texture, and
position information from
pixel blocks:
< x,y,DCT >
ICME 2004
Static Model
Models
• Retrieval
P(Q|M1)
–Calculate
conditional
probabilities of
query samples
given models in
collection
Query
P(Q|M2)
P(Q|M3)
P(Q|M4)
ICME 2004
Dynamic Model
• Selecting frames
– 1 second sequence around the keyframe
– Entire video shot as sequence of frames
sampled at regular intervals
• Features
< x, y, t, DCT >
ICME 2004
Dynamic Model
1
.5
0
ICME 2004
• Indexing:
• GMM of multiple
frames around
keyframe
• Feature vectors
extended with timestamp normalized
in [0,1]:
<x,y,t,DCT>
Dynamic Model
ICME 2004
Query example: A single
image
• Artificial sequence of 29 images as the single
query example where the time is normalized
between 0 and 1
• Extend the query example image’s features
with a fixed temporal feature value of 0.5
– Better results and lower computational cost
ICME 2004
Dynamic Model Advantages
• More training data for models
– Less sensitive to random initialization
• Reduced dependency upon selecting
appropriate keyframe
• Some spatio-temporal aspects of shot are
captured
– (Dis-)appearance of objects
ICME 2004
Dynamic Model
ICME 2004
Dynamic Model
ICME 2004
Dynamic Model
ICME 2004
Retrieval Framework
• Smoothing
1
N
RSV wi  
logkP xj wi   1  k P xj 

j 1
N
• Building dynamic GMMs
Px wi   c 1 PCi ,c G x, i , c, i , c 
Nc
Gx,  ,   
1
2 
n

e

1
 x     1  x   
2
Likelihood goes to infinity ???
ICME 2004
Experimental Set-up
• Build models for each shot
– Static, Dynamic, Language
• Build Queries from topics
– Construct simple keyword text query
– Select visual example
– Rescale and compress example images to
match video size and quality
ICME 2004
Combining Modalities
• Independence assumption textual/visual
– P(Qt,Qv|Shot) = P(Qt|LM) * P(Qv|GMM)
• Combination works if
both runs useful
[CWI:TREC:2002]
• Dynamic run more
useful than static run
ICME 2004
Run
MAP
ASR only
Static only
Static+ASR
.130
.022
.105
Dynamic only
.022
Dynamic+ASR .132
Combining Modalities
Dynamic: Higher Initial Precision
ICME 2004
Dynamic: Higher initial
precision
Static run
Dynamic run
ICME 2004
Dow Jones Topic (120)
ICME 2004
Dow Jones Topic (120)
• “Dow Jones Industrial Average
rise day points”
+
=
ICME 2004
Conclusions
• Dynamic model captures visual
similarity better
– Spatio-temporal aspects
– More training data
– Apropriate key-frame less critical
– Less sensitive to the random initialization
• ASR + dynamic better than either alone
ICME 2004
Future work
• More data needs more computation effort
– optimizations ?
• Avoid the singular solutions
Dynamic number of components ?
• Full covariance in space-time < x,y,t >
• Integration of audio
ICME 2004
Thanks !!!
ICME 2004
Merging Run Results
• Combining
(conflicting)
examples difficult
[CWI:TREC:2002]
• Single example 
Miss relevant shots
• Round-Robin
Merging
Combined
1
2
3
4
5
6
7
8
9
10
ICME 2004
1
2
3
4
5
6
7
8
9
10
1
1
2
2
3
3
4
4
.
.
Merging Run Results
ICME 2004
Merging Run Results
• Combining
(conflicting)
examples
difficult
Single
[CWI:TREC:2002]
• Single
All example 
Miss relevant shots
Selected
• Round-Robin
Merging
Best
+ASR
.022
1
2
3
4
5
6
7
8
9
10
.031
.039
.050
ICME 2004
.132
1
2
3
4
5
6
7
8
9
10
.149
.151
.155
Combined
1
1
2
2
3
3
4
4
.
.
Conclusions
• Visual aspects of an information need
are best captured by using multiple
examples
• Combining results for multiple (good)
examples in round-robin fashion, each
ranked on both modalities, gives nearbest performance for almost all topics
ICME 2004