Transcript Document
Nicolas Moënne-Loccoz
Viper group
Computer vision & multimedia laboratory
University of Geneva
Knowledge-based event
recognition from salient
regions of activity
M4 – Meeting – January 2004
January 23 2003 / [email protected]
Outline
• Context
• Salient Regions of Activity (SRA)
• Learning the semantic of SRA
• Visual Event Query language
• Conclusion
NML - CVML - UniGe
2
Context
• Retrieval of visual events based on user query
Abstract representation of the visual content
Query Language to express visual events
• Approach
– Region-based description of the content
– Classification of the regions
– Events queried as spatio-temporal constraints on the regions
NML - CVML - UniGe
3
Overview
Domain Knowledge
Salient
regions
of activity
Region extraction
Labelled
regions
Classification
NML - CVML - UniGe
Videos
database
User queries
4
Salient regions of activity
• Regions of the image space
– Moving in the scene
– Having an homogenous colour distribution
Moving objects or meaningful parts of moving objects
• Extraction :
– From moving salient points
– By an adaptive mean-shift algorithm
NML - CVML - UniGe
5
Salient points extraction
• Scale invariant interest points (Mikolajczyk, Schmid 2001)
– Extracted in the linear scale-space
Lvi (s, v) I (v) Gvi (s)
– Local maxima of the scale normalized Harris function (image
space)
2
h(v, s) det(H (v, s)) Trace ( H (v, s))
L2x (v, s) Lx L y (v, s)
H ( v, s )
2
Lx L y (v, s) Lx (v, s)
– Local maxima of the scale normalized Laplacian (scale
space)
2
l (v, s) s Lxx (v, s) L yy (v, s)
NML - CVML - UniGe
6
Salient points extraction
• Example :
scale
NML - CVML - UniGe
7
Salient points trajectories
• Trajectories used to :
– Find salient points moving in the scene
– Track salient points along the time
• Points matching using Local grayvalue invariants (Schmid)
L
Li Li
Li Lij L j
L
ii
Lij L ji
g ( w)
(
L
L
L
L
L
L
L
L
)
ij
jkl
i
k
l
jkk
i
l
l
L L L L L LL L
iij
j
k
k
ijk
i
j
k
ij L jkl Li Lk Ll
Lijk Li L j Lk
NML - CVML - UniGe
Li
ix , y
Li
ii 0, i x, y
ij ji , i, j x, y
8
Salient points trajectories
• Mahalanobis distance :
d wi , w j g wi g w j 1 g wi g w j
T
d w , w
• Set of matching points minimize
wi Wt , w j Wt 1
i
j
– Greedy Winner-Takes-All algorithm
Set of points trajectories
Moving salient points :
T w , w w W , w W
w T
wi
i
j
i
t
j
t 1
w
NML - CVML - UniGe
9
Salient regions estimation
• Estimate characteristic regions r of the moving salient points
• Mean-Shift algorithm : estimate the position vr of r
v
K v v Pv v v
K v v Pv
r
v
r
r
r
v
Likelihood of pixels (RGB colour distribution)
Pv Pv w N w , w
Ellipsoidal Epanechnikov Kernel
2
3
T
K v vr 1 v vr r v vr
4
NML - CVML - UniGe
10
Salient regions estimation
• Kernel adaptation step : estimate shape and size r of r
r covPv r r
• Algorithm :
W moving salient points w
for each w W
vr w, r diag (3sw ,3sw )
repeat
vr Mean Shift
r Kernel Adaptation
until vr , r converge
W W w r
NML - CVML - UniGe
11
Salient regions representation
• Set of salient regions of activity represented by :
– Position vr
– Ellipsoid r
– Colour distribution
r rgb , rgb
– Set of salient points
Wr
• Salient regions tracking
– Regions are matched by a majority vote of their salient
points
NML - CVML - UniGe
12
Salient regions of activity
NML - CVML - UniGe
13
Regions classification
• To obtain an abstract description :
– Map regions to a domain-specific basic vocabulary
Meetings : {Arm, Head, Body, Noise}
• SVM classifier :
– Set of 500 annotated salient regions of activity (~200 frames)
NML - CVML - UniGe
14
Regions classification
• Confusion Matrix :
Arm
Head
Body
Noise
Arm
1.000
0
0
0
Head
0
0.909
0.091
0
Body
0
0
1.000
0
Noise
0
0.052
0
0.946
• Discussion :
– Noise class is ill-defined
– Good results explained by the limited number of classes
NML - CVML - UniGe
15
Visual event language
• To express visual events queries
– Spatio-temporal constraints on labelled regions (LR)
• To integrate domain Knowledge
– As specification of the layout (L)
– As set of basic events
a formula of the language is a conjunctive form of :
– Temporal relations
– Spatial relations
– Identity relations
{after, just-after} between 2 LR
{above, left} between 2 LR
{in} between a LR and a L
{is} between 2 LR
{is-a} between a LR and a label
NML - CVML - UniGe
16
Knowledege - Meetings
• Scene layout : L = {SEATS, DOOR, BOARD}
NML - CVML - UniGe
17
Knowledege - Meetings
• Basic events : {Meeting-participant, sitting, standing}
Meeting-participant : actors LR
constraints is-a(head, LR).
Sitting : actor : LR
constraints : Meeting-participant(LR),
in(SEATS, LR).
Standing : actor : LR
constraints : Meeting-participant(LR),
~in(SEATS, LR).
NML - CVML - UniGe
18
Events queries
• Example of user queries :
Sitting-down : actors LR1, LR2
constraints is(LR1, LR2),
sitting(LR1),
standing(LR2),
just-after(LR1, LR2).
Go-to-board : actors LR1, LR2
constraints is(LR1, LR2),
standing(LR1),
~in(Board, LR1),
standing(LR2),
in(Board, LR2), just-after(LR2, LR1).
NML - CVML - UniGe
19
Events queries - Results
• Results :
Precision
Recall
Sit-down
0.43
1.00
Stand-up
0.50
1.00
Go-to-board
1.00
1.00
Enter
0.20
1.00
Leave
0.25
0.50
• Discussion :
• Recall validate the retrieval capability
• False alarms occur because of the hard decision
NML - CVML - UniGe
20
Conclusion
• Contributions
– Well-suited framework for constraint domains
– Generic representation of the visual content
– Paradigm to retrieve visual events from videos
• Limitations
– Cannot retrieve all visual events (e.g. emotion)
• Ongoing work
– Uncertainty handling and fuzziness
– Integration of other modalities (e.g. transcripts)
NML - CVML - UniGe
21