Transcript Document
Nicolas Moënne-Loccoz Viper group Computer vision & multimedia laboratory University of Geneva Knowledge-based event recognition from salient regions of activity M4 – Meeting – January 2004 January 23 2003 / [email protected] Outline • Context • Salient Regions of Activity (SRA) • Learning the semantic of SRA • Visual Event Query language • Conclusion NML - CVML - UniGe 2 Context • Retrieval of visual events based on user query Abstract representation of the visual content Query Language to express visual events • Approach – Region-based description of the content – Classification of the regions – Events queried as spatio-temporal constraints on the regions NML - CVML - UniGe 3 Overview Domain Knowledge Salient regions of activity Region extraction Labelled regions Classification NML - CVML - UniGe Videos database User queries 4 Salient regions of activity • Regions of the image space – Moving in the scene – Having an homogenous colour distribution Moving objects or meaningful parts of moving objects • Extraction : – From moving salient points – By an adaptive mean-shift algorithm NML - CVML - UniGe 5 Salient points extraction • Scale invariant interest points (Mikolajczyk, Schmid 2001) – Extracted in the linear scale-space Lvi (s, v) I (v) Gvi (s) – Local maxima of the scale normalized Harris function (image space) 2 h(v, s) det(H (v, s)) Trace ( H (v, s)) L2x (v, s) Lx L y (v, s) H ( v, s ) 2 Lx L y (v, s) Lx (v, s) – Local maxima of the scale normalized Laplacian (scale space) 2 l (v, s) s Lxx (v, s) L yy (v, s) NML - CVML - UniGe 6 Salient points extraction • Example : scale NML - CVML - UniGe 7 Salient points trajectories • Trajectories used to : – Find salient points moving in the scene – Track salient points along the time • Points matching using Local grayvalue invariants (Schmid) L Li Li Li Lij L j L ii Lij L ji g ( w) ( L L L L L L L L ) ij jkl i k l jkk i l l L L L L L LL L iij j k k ijk i j k ij L jkl Li Lk Ll Lijk Li L j Lk NML - CVML - UniGe Li ix , y Li ii 0, i x, y ij ji , i, j x, y 8 Salient points trajectories • Mahalanobis distance : d wi , w j g wi g w j 1 g wi g w j T d w , w • Set of matching points minimize wi Wt , w j Wt 1 i j – Greedy Winner-Takes-All algorithm Set of points trajectories Moving salient points : T w , w w W , w W w T wi i j i t j t 1 w NML - CVML - UniGe 9 Salient regions estimation • Estimate characteristic regions r of the moving salient points • Mean-Shift algorithm : estimate the position vr of r v K v v Pv v v K v v Pv r v r r r v Likelihood of pixels (RGB colour distribution) Pv Pv w N w , w Ellipsoidal Epanechnikov Kernel 2 3 T K v vr 1 v vr r v vr 4 NML - CVML - UniGe 10 Salient regions estimation • Kernel adaptation step : estimate shape and size r of r r covPv r r • Algorithm : W moving salient points w for each w W vr w, r diag (3sw ,3sw ) repeat vr Mean Shift r Kernel Adaptation until vr , r converge W W w r NML - CVML - UniGe 11 Salient regions representation • Set of salient regions of activity represented by : – Position vr – Ellipsoid r – Colour distribution r rgb , rgb – Set of salient points Wr • Salient regions tracking – Regions are matched by a majority vote of their salient points NML - CVML - UniGe 12 Salient regions of activity NML - CVML - UniGe 13 Regions classification • To obtain an abstract description : – Map regions to a domain-specific basic vocabulary Meetings : {Arm, Head, Body, Noise} • SVM classifier : – Set of 500 annotated salient regions of activity (~200 frames) NML - CVML - UniGe 14 Regions classification • Confusion Matrix : Arm Head Body Noise Arm 1.000 0 0 0 Head 0 0.909 0.091 0 Body 0 0 1.000 0 Noise 0 0.052 0 0.946 • Discussion : – Noise class is ill-defined – Good results explained by the limited number of classes NML - CVML - UniGe 15 Visual event language • To express visual events queries – Spatio-temporal constraints on labelled regions (LR) • To integrate domain Knowledge – As specification of the layout (L) – As set of basic events a formula of the language is a conjunctive form of : – Temporal relations – Spatial relations – Identity relations {after, just-after} between 2 LR {above, left} between 2 LR {in} between a LR and a L {is} between 2 LR {is-a} between a LR and a label NML - CVML - UniGe 16 Knowledege - Meetings • Scene layout : L = {SEATS, DOOR, BOARD} NML - CVML - UniGe 17 Knowledege - Meetings • Basic events : {Meeting-participant, sitting, standing} Meeting-participant : actors LR constraints is-a(head, LR). Sitting : actor : LR constraints : Meeting-participant(LR), in(SEATS, LR). Standing : actor : LR constraints : Meeting-participant(LR), ~in(SEATS, LR). NML - CVML - UniGe 18 Events queries • Example of user queries : Sitting-down : actors LR1, LR2 constraints is(LR1, LR2), sitting(LR1), standing(LR2), just-after(LR1, LR2). Go-to-board : actors LR1, LR2 constraints is(LR1, LR2), standing(LR1), ~in(Board, LR1), standing(LR2), in(Board, LR2), just-after(LR2, LR1). NML - CVML - UniGe 19 Events queries - Results • Results : Precision Recall Sit-down 0.43 1.00 Stand-up 0.50 1.00 Go-to-board 1.00 1.00 Enter 0.20 1.00 Leave 0.25 0.50 • Discussion : • Recall validate the retrieval capability • False alarms occur because of the hard decision NML - CVML - UniGe 20 Conclusion • Contributions – Well-suited framework for constraint domains – Generic representation of the visual content – Paradigm to retrieve visual events from videos • Limitations – Cannot retrieve all visual events (e.g. emotion) • Ongoing work – Uncertainty handling and fuzziness – Integration of other modalities (e.g. transcripts) NML - CVML - UniGe 21