Slide 1

Transcript Slide 1

VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING

PhD Thesis by: Alex Leykin Indiana University

Motivation

• Automated tracking and activity recognition is missing from marketing research • Hardware is already there • Visual information can reveal a lot about human interactions with each other • Help in making intelligent marketing decisions

Goals

Process visual information to get a formal representation of human locations (Visual Tracking) Extract semantic information from the tracks (Activity Analysis)

Related Work: Detection and Tracking

• Yacoob and Davis “Learned models for estimation of rigid and articulated human motion from stationary or moving camera” IJCV 2000 • Zhao and Nevatia “Tracking multiple humans in crowded environment” CVPR 2004 • Haritaoglu, Harwood, and Davis “W-4: Real-time surveillance of people and their activities” PAMI 2000 • J. Deutscher, B. North, B. Bascle and A. Blake “Tracking through singularities and discontinuities by random sampling”, ICCV 1999 • A. Elgammal and L. S. Davis, “Probabilistic Framework for Segmenting People Under Occlusion”, ICCV 2001. • M. Isard, J. MacCormick, “BraMBLe: a Bayesian multiple blob tracker”, ICCV 2001

Related Work: Activity Recognition

• Haritaoglu and Flickner “Detection and tracking of shopping groups in stores” CVPR 2001 • Oliver, Rosario, and Pentland “A bayesian computer vision system for modeling human interactions” PAMI 2000 • Buzan, Sclaroff, and Kollios “Extraction and clustering of motion trajectories in video” ICPR 2004 • Hongeng, Nevatia, and Bremond “Video-based event recognition: activity representation and probabilistic recognition methods” CVIU 2004 • Bobick and Ivanov “Action recognition using probabilistic parsing” CVPR 1998

Low-level Processing

System Components

Tracking Event Detection Activity Detection Camera Model Jump-diffuse transitions Actor Distances Event Distances Obstacle Model Foreground Segmentation Priors and Likelihoods Accept/Reject Candidate Deterministic Agglomerative Clustering Validity Index Fuzzy Agglomerative Clustering Adaptively Remove Weak Clusters Head Detection

Background Modeling

codebook Color • μ RGB • I low • I hi codeword ………..

Adaptive Background Update

 Match pixel p to the codebook b I(p) > I low I(p) < I high (RGB(p)∙ μ RGB ) < T RGB  If there is no match t(p)/t high t(p)/t low > T t1 > T t2   if codebook is saturated then pixel is foreground else create new codeword  Else update the codeword with new pixel information  If >1 matches then merge matching codewords

Background Subtraction

Head Detection

Vanishing Point Projection (VPP) Historgram Vanishing Point in Z-direction

Camera Setup

• Two camera types Perspective Spherical • Mixtures of indoor and outdoor scenes • Color and thermal image sensors • Varying lighting conditions (daylight, cloud cover, incandescent, etc.)

Camera Modeling

Perspective Projection Spherical Projection Lat Y X y x [X c , Y c , Z c ] Z X, Y, Z from: [sx; sy; s] = P [X; Y; Ż; 1] using SVD Where P, is the 3x4 projection matrix X = cos(θ) tan(π-φ)(Z Y = sin(θ) tan(π-φ)(Z c -Ż) Z = Ż [X c , Y c , Z c ] Y X Assumption: floor plane Z f = 0 Z c -Ż) Lon

Tracking

Goal: find a correspondence between the bodies, already detected in the current frame with the bodies which appear in the next frame. ?

Apply Markov Chain Monte Carlo (MCMC) to estimate the next state x t-1 x t ?

Add body Delete body Recover deleted Change Size Move z t

Tracking

Location of each pedestrian is estimated probabilistically based on:  Current image  Previous state of the system  Physical constraints The goal of our tracking system is to find the candidate state

x´

(a set of bodies along with their parameters) which, given the last known state

, will best fit the current observation

z P(x’| z, x) = L(z|x’) · P(x’{x})

observation likelihood state prior probability

Tracking: Priors

Constraints on the body parameters:

N(h μ , h σ 2 ) and N(w μ ,w σ 2 )

 body width and height

U(x) R and U(y) R

 body coordinates are weighted uniformly within the rectangular region R of the floor map.

Temporal continuity:

d(w t , w t−1 ) and d(h t , h t−1 )

 variation from the previous size

d(x t , x’ t−1 ) and d(y, y’ t−1 )

 variation from Kalman predicted position

N(μ door , σ door )

 distance to the closest door (for new bodies)

Tracking Likelihoods: Distance weight plane

Problem:

blob trackers ignore blob position in 3D (see Zhao and Nevatia CVPR 2004)

Solution:

employ “distance weight plane”

D xy = |P xyz , C xyz |

where

coordinates of the camera and reference point correspondingly and and

P z



 2 are world

Tracking Likelihoods: Z-buffer

0 = background, 1=furthermost body, 2 = next closest body, etc

Tracking Likelihoods: Color Histogram

Color observation likelihood is based on the Bhattacharya distance between candidate and observed color histograms

P color

 1 

w color

  1 

(



c t

 1 )  Implementation of z-buffer (Z) and distance weight plane (D) allows to compute multiple-body configuration with one computationally efficient step.

Let:

- set of all blob pixels

- set of body pixels

   (





(

Z xy

 0 ) ) 

D I P

   (



(

Z xy

 0 )



) 

Tracking: Anisotropic Weighted Mean Shift H Classic Mean-Shift Our Mean-Shift t-1 t t

Actors and events

• Shopper groups are formed by individual shoppers who shop together for some amount of time – More than fleeting crossing of paths – Dwelling together – Splitting and uniting after a period of time

Swarming

• Shopper groups detected based on “swarming” idea in reverse – Swarming is used in graphics to generate flocking behaviour in animations. – Rules define flocking behaviour: • Avoid collisions with the neighbors.

• Maintain fixed distance with neighbors • Coordinate velocity vector with neighbors.

Tracking Customer Groups • We treat customers as swarming agents, acting according to simple rules (e.g. stay together with swarm members) Customer groups 6 10 5 1

Terminology

• Actors: shoppers (bodies detected in tracking) – (

) • Swarming events defined as short time activity sequences of multiple agents interacting with each other.

– Could be fleeting (crossing paths) – Later analysis sorts this out and ignores chance encounters.

Swarming • The actors that best fit this model signal a Swarming Event • Multiple swarming events are further clustered with fuzzy weights to find out shoppers in the same group over long periods.

11 13 12

Event detection

• Two actors come sufficiently close according to some distance measure: – Relative position

p i

x i

y i

) of actor i on the floor – Body orientations α

– Dwelling state δ

={T,F}.

Distance between two agents is a linear combination of co-location, co-ordination and co-dwelling

Event detection

Perform agglomerative clustering of actors

• Initialize: N singleton clusters into clusters

• Do: merge two closest clusters • While not: validity index

reaches its maximum

consists of isolation

I ni

and compactness

I nc I ni

= isolation

I nc =

compactness

Event detection

Final events # Iteration # Iteration

Activity Detection

• The shopper group detection is accomplished by clustering the short term events over long time periods. – The events could be separated in time, but they will be part of the same shopper group if the actors are the same (the first term).

Activity detection

• Higher level activities (shopper groups) detected using these events as building blocks over longer time periods • Some definitions: – –

B ei



e i

} the set of all bodies taking part in an event

e i

τ ei

and

τ ej

and

e j

are the average times of events happening.

e i

Activity detection

Define a measure of similarity between two events

D e

2 (

e i

e j

)        1  1 | (

B e i



B e j

)  (

B e j



B e i

)    |

   



e j

|    |   2  2 ( |  

e i

  



| )  2     Overlap between two sets of actors Separation in time

Activity detection

• Perform fuzzy agglomerative clustering • Minimize objective function • where

w ij

are fuzzy weights • and asymmetric variants of Tukey’s biweight estimators: •  (.) is the loss function from robust statistics.

•

(.) is the weight function  Adaptively choose only strong fuzzy clusters  Label remaining clusters as activities

Results: Swarming activities detected in space-time •Dot location: average event location •Dot size: validity •Dots of same color: belong to same activity

Group Detection Results

Quantitative Results

Tracking

Sequence number 1 2 3 4 5 6 Frames 1054 15 0601 1506 2031 1652 People 8 1700 16 3 2 4 %% 8544 48 People missed 3 0 5 0 0 0 12.5

False hits 1 0 1 0 0 0 4.1

Identity switches 3 0 2 0 0 0 10.4

Sequence 1 2 3 Total Percent

Group Detection

Groups 20 17 17 54 100 P+ 0 1 0 1 1.8

P− 7 3 7 12 22.2

Partial 0 1 0 2 3.7

false positives Ground truth (manually determined) false negatives (groups missed) Partially identified groups (≥2 people in the group Correctly identified)

Qualitative Assesments

• Longer paths provide better group detection ( p val << 1 ) • Two-people groups are easiest to detect • Simple one-step clustering of trajectories is not sufficient for long-term group detection • Employee tracks pose a significant problem and have to be excluded • Several groups were missed by the operator in the initial ground truth – System caught groups missed by the human expert after inspection of results.

Contributions

– BG subtraction based on codebook (RGB+thermal) – Introduced head candidate selection method based on VPP histogram – Resolving track initialization ambiguity and non-unique body-blob correspondence – Informed jump-diffuse transitions in MCMC tracker – Weight plane and z-buffer improve likelihood estimation – Anisotropic mean-shift with obstacle model – Two-layer formal framework high level activity detection – Implemented robust fuzzy clustering to group events into activities

Future Work

• Improved Tracking (via feature points) • Demographical analysis • Focus of Attention • Sensor Fusion • Other Types of Swarming Activities

Thank you!

Questions?

(

b i

b j

) 

1 |

p i

p j

| 

2 | 

, 

| 

3 | 

, 

Slide 1

Transcript Slide 1

VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING

Motivation

Goals

Related Work: Detection and Tracking

Related Work: Activity Recognition

System Components

Background Modeling

Adaptive Background Update

Background Subtraction

Head Detection

Camera Setup

Camera Modeling

Tracking

Tracking

Tracking: Priors

Tracking Likelihoods: Z-buffer

Tracking Likelihoods: Color Histogram

Actors and events

Swarming

Terminology

Event detection

Event detection

Event detection

Activity Detection

Activity detection

Activity detection

Activity detection

Group Detection Results

Quantitative Results

Tracking

Group Detection

Qualitative Assesments

Contributions

Future Work

Questions?

Directory