Kein Folientitel

Transcript Kein Folientitel

Video indexing and retrieval at
TREC 2002
Christian Wolf1
David Doermann2
[email protected]
[email protected]
1Laboratoire
de Reconnaissance de Formes et Vision
Institut National des Sciences Appliquées de Lyon
Bât. Jules Verne, 20, Avenue Albert Einstein
69621 Villeurbanne cedex, France
2Laboratory
for Language and Media Processing
Institute for Advanced Computer Studies
University of Maryland
College Park, MD 20742-3275, USA
1/30
Plan of the presentation
 Introduction - The TREC Competition
 Features & query techniques
 Experiments & Results
Run types
Example queries
The impact of speech/text/color
 Conclusion and Outlook
Introduction
Features & Query types
Experimental Results
Impact of Features
Conclusion 2/30
The NIST TExt Retrieval Conference
The goal of the conference series is to
encourage research in information
retrieval from large amounts of text by
providing
 a large test collection
 uniform scoring procedures
 a forum for organizations
interested in comparing their
results
68.45 hours of MPEG 1
from “the internet archive”
and the “open video
project”
Introduction
Features & Query types
The Video Retrieval Track aims at the
investigation of content-based retrieval
from digital video.
Experimental Results
Impact of Features
Conclusion 3/30
Aims and Tasks
3 sub tasks are defined in the Video Track, and participants
are free to choose for which tasks they want to submit
results:
Shot boundary determination
Feature extraction
Search
Feature development collection (23.6h)
Feature test collection (5h)
Search test collection (40.12h)
Introduction
Features & Query types
Experimental Results
Impact of Features
Conclusion 4/30
Search: different query types
Two different query types are supported by the competition:
manual and interactive queries.
Introduction
Features & Query types
Experimental Results
Impact of Features
Conclusion 5/30
Example search topics
Find shots with Eddie Rickenbacker in them
Find additional shots with James H. Chandler
Find pictures of George Washington
Find shots with a depiction of Abraham Lincoln
Find shots of people spending leisure time at the beach, for example: walking,
Find shots of one or more musicians: a man or woman playing a music instrument with instrumental music
audible. Musician(s) and instrument(s) must be at least partly visible sometime during the shot.
Find shots of football players
Find shots of one or more women standing in long dresses. Dress should be one piece and extend below
knees. The entire dress from top to end of dress below knees should be visible at some point.
Find shots of the Golden Gate Bridge
Find shots of Price Tower, designed by Frank Lloyd Wright and built in Bartlesville, Oklahoma, .
Find shots containing Washington Square Park's arch in New York City. The entire arch should be visible at
some point
Find overhead views of cities - downtown and suburbs. The viewpoint should be higher than the highest
building visible
Find shots of oil fields, rigs, derricks, oil drilling/pumping equipment. Shots just of refineries are not desired
Find shots with a map (sketch or graphic) of the continental US.
Find shots of a living butterfly
Find more shots with one or more snow-covered moutain peaks or ridges. Some sky must be visible them
behind
Introduction
Features & Query types
Experimental Results
Impact of Features
Conclusion 6/30
The feature extraction task: overlay text
A linear classifier trained
with Fisher’s linear
discriminant is used to
classfy the OCR output for
each text box into text and
non text.
Detection,
Multiple frame integration
Separation of characters
into 4 types:
Upper
Lower
digits
bad
Binarization:
A-Z
a-z
0-9
rest
Text examples
TONY RIYERA
ARNOLD GILLESPIE
EUGENE PODDANY
EMERY NAWKúN5
GEORGE GORDON
GERALD NEYIU
D i recto r
TRUE BOAROMAN
CARL URBAN
Art Direction
EMERY NAWKINS
Music Score
Director
GEORGE GORDON
l E W K E LLER
PRODUCTION
a yen Pu s1c~
Non-Text ex.
. .a
i ~ i
a 7) E nAl~
1
I.
Mol, 6
I J'-N
r
~v
i
r low
r
e,740~17-j
F 00
Ii
s
!'/
Features:
OCR: Scansoft
Number of good characters (upper+lower+digits)
F1=
“Soukaina Oufkir”
Number of characters
Number of class changes
Suppression of
false alarms
Introduction
F2=
Features & Query types
Number of characters
Experimental Results
Impact of Features
Conclusion 7/30
Features
Outdoors IBM
Outdoors Mediamill
Outdoors MSRA
Face IBM
search test
collection
(40h)
Shot
boundary
definition
(MPEG7XML)
Donated features:
10 different binary features
from different donators (all in
all 32 detectors). Confidence
is given for each shot.
Face IBM
Face Mediamill
MPEG7-XML
Face MSRA
Speech recognition LIMSI
Speech recognition MSRA
14524 shots
Introduction
Features & Query types
Donated feature
MPEG7-XML
Detected and
recognized text
Developped by INSA de Lyon.
[Wolf and Jolion, 2002]
Temporal Color
Correlograms
Developed by UMD in collaboration
with the University of Oulu.
[Rautiainen and Doermann, 2002]
Experimental Results
Impact of Features
Conclusion 8/30
Query techniques
Text
Speech
Binary
features
Temporal
color
features
Query
Introduction
Features & Query types
Experimental Results
Impact of Features
Conclusion 9/30
Recognized text and speech
For the actual retrieval we used the freely available managing
gigabytes software (http://www.cs.mu.oz.au/mg). Two query
metrics are available:
• Boolean
• Ranked, based on the cosine measure.
MG has been written for error free documents so it checks for
exact matches on the stemmed words (e.g. produced fits
producer).
We added an inexact match feature by using N-grams:
Target:
Query:
N-gram:
Results:
Introduction
“Nick Chandler”
“chandler”
chand|handl|andle|ndler|chandl|handle|andler|chandle|handler|chandler
ni ck l6 tia ndler
colleges cattlemen handlers of livestock
Features & Query types
Experimental Results
Impact of Features
Conclusion10/30
Binary features
The binary features specify the presence of a
feature in each shot, the information being
given in the confidence measure [0,1].
Training the combining classifier
X
People - IBM
People - Mediamill
People - MSRA
Outdoors - IBM
Outdoors - Mediamill
Outdoors - MSRA
...
The product rule
Qi ( x) 
 C ( x)
Quantifies the true likelihood, if the
features are statistically independent.
Bad if base classifiers are weakly
trained or have high error rates.
 C ( x)
Works well with base classifiers with
independent noise behaviour.
ij
j
The sum rule
Qi ( x) 
ij
j
Cij ... Output of classifier j for class i
Qi ... Output of combined classifier for class i
Introduction
Features & Query types
Experimental Results
Impact of Features
Conclusion11/30
Binary features - ranked queries
Query vector
Shot 1
Shot 2
People - IBM
1.0
0.27
0.27
People - Mediamill
1.0
0.87
0.23
People - MSRA
1.0
0.94
0.56
Outdoors - IBM
1.0
0.15
0.15
Outdoors - Mediamill
1.0
0.08
0.76
Indoors - IBM
0.0
0.65
0.07
3 dimensional case:
1
1
0
Eucledian distance
D ( x, y )  ( x  y ) T ( x  y )
Mahalanobis distance
D ( x, y )  ( x  y )T  1 ( x  y )
 ... Covariance matrix for the complete data set
Introduction
Features & Query types
Experimental Results
Impact of Features
Conclusion12/30
Temporal color features
For each shot, a temporal color correlogram is held.
[Rautiainen and Doermann, 2002]:

(d )
ci ,c j
Pr
p1I cni
, p2I
n
p  I
2
n
cj
| p1  p2  d

It stores the probability that, given any pixel p1 of color ci, a
pixel p2 at distance d is of color cj among the shots frames In.
The distance is calculated using the L1 norm.
TREC: Auto correlogram  ci = cj
Introduction
Features & Query types
Experimental Results
Impact of Features
Conclusion13/30
The
query
tool
Introduction
Features & Query types
Experimental Results
Impact of Features
Conclusion14/30
Querying
 Keyword based queries on text or speech or both together,
with or without n-grams, boolean or ranked.
 Ranked color queries.
 Ranked queries on binary features.
 Filters on binary features.
 AND, OR combination of query
results incl. weighted combinations
of the ranking of both queries.
 Truncate queries.
Query 1
1.00
Query 2 Query 3 Query 4
1.00
1.00
1.00
0.96
0.70
0.30
 View the keyframes of queries.
 Export query results into stardom,
the graphical browsing tool.
0.20
0.00
0.00
ms 
0.00
Features & Query types
Experimental Results
 rs ,i  1 
i 

 N 1 

i
Introduction
0.00
Impact of Features
Conclusion15/30
Stardom
Introduction
Features & Query types
Experimental Results
Impact of Features
Conclusion16/30
Introduction
Features & Query types
Experimental Results
Impact of Features
Conclusion17/30
Experiments
 Manual run using all
available features.
 Manual run without speech
recognition.
 Interactive run using all
available features. The
graphical tool was used to
browse the data, but all
submitted results were
queries submitted by the
command line tool.
Introduction
Features & Query types
Topic min. Description
75
Eddie Rickenbacker
76
6 James Chandler
77
14 George Washington
78
19 Abraham Lincoln
79
43 People at the Beach
80
20 Musicians with music playing
81
Football players
82
29 Women standing in long dresses
83
19 Golden gate bridge
84
11 Price Tower
85
Washington Square Park´s arch
86
85 Overhead views of cities
87
61 Oil fields, ricks, derricks
88
43 Map of the continental US
89
61 A living butterfly
90
12 Snow covered mountain peaks
91
Parrots
92
20 Sailbots, sailing ship
93
15 beef or dairy cattle, cows
94
17 people walking in cities
95
15 Nuclear explosion with mushroom
96
39 US flag
97
23 Miscroscopic views of living cells
98
15 Locomotive approaching
99
19 Rocket or missile taking off
Experimental Results
Impact of Features
Conclusion18/30
Example queries
“Find additional shots of James H. Chandler”: manual query:
76 manual
Prec./100
0.2
Avg. prec. 0.38
Querytype
on
weight
text/speech
text/speech
text/speech N-gram
Color
Color
Color
James Chandler
Jim Chandler
James Chandler
1. Example video
2. Example video
3. Example video
4
4
2
1
1
1
weight
OR
100000
AND
1
Binary features People>=0.25 Landscape <= 0.75
“Shots of rockets or missiles taking off”: manual & interactive:
99 manual
Prec./100
0.05
Avg. prec. 0.34
Querytype
on
weight
text/speech
text/speech
rocket missile
taking off launch
start
1. Example video
2. Example video
2
2
Color
Color
Binary features
Introduction
Features & Query types
weight
OR
1
1
Ranked: 7000 -People -Faces
Experimental Results
Impact of Features
100000
AND
1
Conclusion19/30
Manual vs. interactive queries
Manual
query
79 manual
Prec./100
0
Avg. prec.
0
Querytype
on
weight
text/speech
text/speech
text/speech
beach
beach fun sun
leisure sand
vacation
1. Example video
2. Example video
3. Example video
4. Example video
4
3
2
Color
Color
Color
Color
Binary features
Interactive
query
79 interactive
Prec./100
0.07
Avg. prec. 0.11
OR
Querytype
on
weight
text/speech
text/speech
swimming
shore
2
1
on
weight
text/speech
water
PL<=0.5 OD>=0.5
CT<=0.05
ID<=0.75 LS>=0.5
2
Features & Query types
AND
weight
OR
100000
weight
AND
1
Binary features Landscape>=0.3 Cityscape <= 0.5 Outdoors>=0.5
Querytype
100000
1
People>=0.25 Indoors <= 0.75 Outdoors>=0.25
Binary features
Introduction
1
1
1
1
weight
OR
1
Experimental Results
Impact of Features
2
OR
1
Conclusion20/30
Ranked binary queries: distance functions
Full query
Topic
82
84
99
Eucl.
0,24
0
0
Binary query only
Mah.
0,16
0
0
Topic
Query
82 +People +Indoors -Outdoors -Landscape
84
+Cityscape -People -Face
99
-People -Face
Example false alarm
Query shot
vector vector
1
0
1
0,18
0
0
0
0,23
0
0
0
0,05
0
0,01
0
1
0
0
Eucl.
0,24
0,48
0,96
Mah.
0,16
0,06
0,8
Distributions of the 3 “people” detectors
vars
14000
People IBM
People Mediamill
People MSRA
0,12
0,13
0,06
0
0,16
0,07
0
0,12
0,04
12000
10000
8000
6000
4000
diff
std dev.
Introduction
Features & Query types
2000
0
0
0.1
0.2
Experimental Results
0.3
0.4
0.5
0.6
Impact of Features
0.7
0.8
0.9
1
Conclusion21/30
Precision curves per topic
Precision / result set size
Interactive
Manual
Manual no ASR
Interactive
Precision / recall
Introduction
Features & Query types
Experimental Results
Impact of Features
Conclusion22/30
Precision curves consolidated
Precision / result set size
Manual
Manual no ASR
Interactive
Precision / recall
Introduction
Features & Query types
Experimental Results
Impact of Features
Conclusion23/30
Comparison with other teams
Average precison - Manual
Introduction
Features & Query types
Experimental Results
Impact of Features
ID mean std. dev.
1 0,23
0,14
2 0,14
0,17
3 0,11
0,15
4 0,09
0,12
5 0,09
0,16
6 0,08
0,11
7 0,07
0,20
8 0,07
0,09
9 0,06
0,10
10 0,06
0,18
11 0,06
0,10
12 0,06
0,12
13 0,06
0,09
14 0,06
0,10
15 0,04
0,11
16 0,03
0,08
17 0,03
0,06
18 0,03
0,05
19 0,02
0,05
20 0,01
0,02
21 0,01
0,01
22 0,01
0,01
23 0,01
0,00
24 0,00
0,01
25 0,00
0,01
26 0,00
0,01
27 0,00
0,00
Conclusion24/30
Comparison with other teams
Average precision - Interactive
ID mean std. dev.
1 0,52
0,24
2 0,32
0,21
3 0,31
0,20
4 0,29
0,21
5 0,26
0,21
6 0,24
0,22
7 0,22
0,23
8 0,18
0,21
9 0,15
0,15
10 0,15
0,15
11 0,07
0,11
12 0,05
0,08
13 0,05
0,08
Introduction
Features & Query types
Experimental Results
Impact of Features
Conclusion25/30
Speech
The quality of the
speech queries
highly depends
on the topic.
In general, the
return sets of
speech queries
are very
heterogenous
and need to be
filtered, e.g. by
binary filters.
Example:
“rocket missile”
Introduction
Features & Query types
Experimental Results
Impact of Features
Conclusion 26/30
Color
As expected, the
color filters have
been very useful in
cases where the
query images
where very different
from other images
in terms of low level
features, or where
the relevant shots
in the database
share common
color properties
with the example
query (e.g. shots
are in the same
environment).
Query “living cells”:
results of the run
without speech are
better than the run
including speech.
Introduction
Features & Query types
Experimental Results
Impact of Features
Conclusion 27/30
Color
Searching for
“James Chandler”
using the color
features only.
Introduction
Features & Query types
Experimental Results
Impact of Features
Conclusion 28/30
Recognized text
“Music”
The type of videos present in the collection does not
favor the use of recognized text. In most videos, the
only text present in the documentaries is the title at
the beginning and the casting at the end.
“Oil”
“Energy
Gas”
“Air plane”
Introduction
“Airline”
Features & Query types
Experimental Results
“Dance”
Impact of Features
Conclusion 29/30
Conclusion and Outlook
 Exploit temporal continuities between the frames, as already
proposed by the dutch team during TREC 2001. This seems
to be especially important for video OCR, since sometimes
single shots with text only “interrupt” content shots.
 Training of the combination of features.
 More research into the combination of the binary features
(normalization, robust outlier detection etc.).
 Browsing: The graphical viewing interface could be very
promising, if it is possible to integrate tiny (and enlargable)
keyframes into the grid.
 Use of additional features:
Explicit color filters and query by (sketched) example:
define regions and color ranges.
Motion features.
Usage of the internet to get example images (google).
Introduction
Features & Query types
Experimental Results
Impact of Features
Conclusion 30/30

Kein Folientitel

Transcript Kein Folientitel

Directory