Transcript Document

Nicolas Moënne-Loccoz
Viper group
Computer vision & multimedia laboratory
University of Geneva
Knowledge-based event
recognition from salient
regions of activity
M4 – Meeting – January 2004
January 23 2003 / [email protected]
Outline
• Context
• Salient Regions of Activity (SRA)
• Learning the semantic of SRA
• Visual Event Query language
• Conclusion
NML - CVML - UniGe
2
Context
• Retrieval of visual events based on user query
 Abstract representation of the visual content
 Query Language to express visual events
• Approach
– Region-based description of the content
– Classification of the regions
– Events queried as spatio-temporal constraints on the regions
NML - CVML - UniGe
3
Overview
Domain Knowledge
Salient
regions
of activity
Region extraction
Labelled
regions
Classification
NML - CVML - UniGe
Videos
database
User queries
4
Salient regions of activity
• Regions of the image space
– Moving in the scene
– Having an homogenous colour distribution
 Moving objects or meaningful parts of moving objects
• Extraction :
– From moving salient points
– By an adaptive mean-shift algorithm
NML - CVML - UniGe
5
Salient points extraction
• Scale invariant interest points (Mikolajczyk, Schmid 2001)
– Extracted in the linear scale-space
Lvi (s, v)  I (v)  Gvi (s)
– Local maxima of the scale normalized Harris function (image
space)
2
h(v, s)  det(H (v, s))  Trace ( H (v, s))
 L2x (v, s) Lx L y (v, s)
H ( v, s )  

2
 Lx L y (v, s) Lx (v, s) 
– Local maxima of the scale normalized Laplacian (scale
space)
2
l (v, s)  s Lxx (v, s)  L yy (v, s)
NML - CVML - UniGe
6
Salient points extraction
• Example :
scale
NML - CVML - UniGe
7
Salient points trajectories
• Trajectories used to :
– Find salient points moving in the scene
– Track salient points along the time
• Points matching using Local grayvalue invariants (Schmid)
L




Li Li




Li Lij L j


L
ii



Lij L ji
g ( w)  



(
L
L
L
L

L
L
L
L
)
 ij
jkl
i
k
l
jkk
i
l
l 
 L L L L L LL L

iij
j
k
k
ijk
i
j
k


  ij L jkl Li Lk Ll




Lijk Li L j Lk


NML - CVML - UniGe
Li 

ix , y 
Li
 ii  0, i  x, y
 ij   ji , i, j  x, y
8
Salient points trajectories
• Mahalanobis distance :
d wi , w j   g wi   g w j  1 g wi   g w j 
T
 d w , w 
• Set of matching points minimize
wi Wt , w j Wt 1
i
j
– Greedy Winner-Takes-All algorithm
 Set of points trajectories
 Moving salient points :
T  w , w  w W , w W 
w T   
wi
i
j
i
t
j
t 1
w
NML - CVML - UniGe
9
Salient regions estimation
• Estimate characteristic regions r of the moving salient points
• Mean-Shift algorithm : estimate the position vr of r
v
K v  v Pv v  v 


 K v  v Pv 
r
v
r
r
r
v
 Likelihood of pixels (RGB colour distribution)
Pv   Pv  w   N  w ,  w 
 Ellipsoidal Epanechnikov Kernel



2
3
T
K v  vr   1  v  vr   r v  vr 
4
NML - CVML - UniGe
10
Salient regions estimation
• Kernel adaptation step : estimate shape and size  r of r
 r  covPv  r  r 
• Algorithm :
W  moving salient points w
for each w  W
vr  w,  r  diag (3sw ,3sw )
repeat
vr  Mean  Shift
 r  Kernel  Adaptation
until vr ,  r converge
W  W  w  r
NML - CVML - UniGe
11
Salient regions representation
• Set of salient regions of activity represented by :
– Position vr
– Ellipsoid  r
– Colour distribution
 r   rgb ,  rgb 
– Set of salient points
Wr
• Salient regions tracking
– Regions are matched by a majority vote of their salient
points
NML - CVML - UniGe
12
Salient regions of activity
NML - CVML - UniGe
13
Regions classification
• To obtain an abstract description :
– Map regions to a domain-specific basic vocabulary
 Meetings : {Arm, Head, Body, Noise}
• SVM classifier :
– Set of 500 annotated salient regions of activity (~200 frames)
NML - CVML - UniGe
14
Regions classification
• Confusion Matrix :
Arm
Head
Body
Noise
Arm
1.000
0
0
0
Head
0
0.909
0.091
0
Body
0
0
1.000
0
Noise
0
0.052
0
0.946
• Discussion :
– Noise class is ill-defined
– Good results explained by the limited number of classes
NML - CVML - UniGe
15
Visual event language
• To express visual events queries
– Spatio-temporal constraints on labelled regions (LR)
• To integrate domain Knowledge
– As specification of the layout (L)
– As set of basic events
 a formula of the language is a conjunctive form of :
– Temporal relations
– Spatial relations
– Identity relations
{after, just-after} between 2 LR
{above, left} between 2 LR
{in} between a LR and a L
{is} between 2 LR
{is-a} between a LR and a label
NML - CVML - UniGe
16
Knowledege - Meetings
• Scene layout : L = {SEATS, DOOR, BOARD}
NML - CVML - UniGe
17
Knowledege - Meetings
• Basic events : {Meeting-participant, sitting, standing}
Meeting-participant : actors LR
constraints is-a(head, LR).
Sitting : actor : LR
constraints : Meeting-participant(LR),
in(SEATS, LR).
Standing : actor : LR
constraints : Meeting-participant(LR),
~in(SEATS, LR).
NML - CVML - UniGe
18
Events queries
• Example of user queries :
Sitting-down : actors LR1, LR2
constraints is(LR1, LR2),
sitting(LR1),
standing(LR2),
just-after(LR1, LR2).
Go-to-board : actors LR1, LR2
constraints is(LR1, LR2),
standing(LR1),
~in(Board, LR1),
standing(LR2),
in(Board, LR2), just-after(LR2, LR1).
NML - CVML - UniGe
19
Events queries - Results
• Results :
Precision
Recall
Sit-down
0.43
1.00
Stand-up
0.50
1.00
Go-to-board
1.00
1.00
Enter
0.20
1.00
Leave
0.25
0.50
• Discussion :
• Recall validate the retrieval capability
• False alarms occur because of the hard decision
NML - CVML - UniGe
20
Conclusion
• Contributions
– Well-suited framework for constraint domains
– Generic representation of the visual content
– Paradigm to retrieve visual events from videos
• Limitations
– Cannot retrieve all visual events (e.g. emotion)
• Ongoing work
– Uncertainty handling and fuzziness
– Integration of other modalities (e.g. transcripts)
NML - CVML - UniGe
21