Transcript Slide 1

VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING

PhD Thesis by: Alex Leykin Indiana University

Motivation

• Automated tracking and activity recognition is missing from marketing research • Hardware is already there • Visual information can reveal a lot about human interactions with each other • Help in making intelligent marketing decisions

Goals

Process visual information to get a formal representation of human locations (Visual Tracking) Extract semantic information from the tracks (Activity Analysis)

Related Work: Detection and Tracking

• Yacoob and Davis “Learned models for estimation of rigid and articulated human motion from stationary or moving camera” IJCV 2000 • Zhao and Nevatia “Tracking multiple humans in crowded environment” CVPR 2004 • Haritaoglu, Harwood, and Davis “W-4: Real-time surveillance of people and their activities” PAMI 2000 • J. Deutscher, B. North, B. Bascle and A. Blake “Tracking through singularities and discontinuities by random sampling”, ICCV 1999 • A. Elgammal and L. S. Davis, “Probabilistic Framework for Segmenting People Under Occlusion”, ICCV 2001. • M. Isard, J. MacCormick, “BraMBLe: a Bayesian multiple blob tracker”, ICCV 2001

Related Work: Activity Recognition

• Haritaoglu and Flickner “Detection and tracking of shopping groups in stores” CVPR 2001 • Oliver, Rosario, and Pentland “A bayesian computer vision system for modeling human interactions” PAMI 2000 • Buzan, Sclaroff, and Kollios “Extraction and clustering of motion trajectories in video” ICPR 2004 • Hongeng, Nevatia, and Bremond “Video-based event recognition: activity representation and probabilistic recognition methods” CVIU 2004 • Bobick and Ivanov “Action recognition using probabilistic parsing” CVPR 1998

Low-level Processing

System Components

Tracking Event Detection Activity Detection Camera Model Jump-diffuse transitions Actor Distances Event Distances Obstacle Model Foreground Segmentation Priors and Likelihoods Accept/Reject Candidate Deterministic Agglomerative Clustering Validity Index Fuzzy Agglomerative Clustering Adaptively Remove Weak Clusters Head Detection

Background Modeling

codebook Color • μ RGB • I low • I hi codeword ………..

Adaptive Background Update

 Match pixel p to the codebook b I(p) > I low I(p) < I high (RGB(p)∙ μ RGB ) < T RGB  If there is no match t(p)/t high t(p)/t low > T t1 > T t2   if codebook is saturated then pixel is foreground else create new codeword  Else update the codeword with new pixel information  If >1 matches then merge matching codewords

Background Subtraction

Head Detection

Vanishing Point Projection (VPP) Historgram Vanishing Point in Z-direction

Camera Setup

• Two camera types Perspective Spherical • Mixtures of indoor and outdoor scenes • Color and thermal image sensors • Varying lighting conditions (daylight, cloud cover, incandescent, etc.)

Camera Modeling

Perspective Projection Spherical Projection Lat Y X y x [X c , Y c , Z c ] Z X, Y, Z from: [sx; sy; s] = P [X; Y; Ż; 1] using SVD Where P, is the 3x4 projection matrix X = cos(θ) tan(π-φ)(Z Y = sin(θ) tan(π-φ)(Z c -Ż) Z = Ż [X c , Y c , Z c ] Y X Assumption: floor plane Z f = 0 Z c -Ż) Lon

Tracking

Goal: find a correspondence between the bodies, already detected in the current frame with the bodies which appear in the next frame. ?

?

?

Apply Markov Chain Monte Carlo (MCMC) to estimate the next state x t-1 x t ?

Add body Delete body Recover deleted Change Size Move z t

Tracking

Location of each pedestrian is estimated probabilistically based on:  Current image  Previous state of the system  Physical constraints The goal of our tracking system is to find the candidate state

(a set of bodies along with their parameters) which, given the last known state

x

, will best fit the current observation

z P(x’| z, x) = L(z|x’) · P(x’{x})

observation likelihood state prior probability

Tracking: Priors

Constraints on the body parameters:

N(h μ , h σ 2 ) and N(w μ ,w σ 2 )

 body width and height

U(x) R and U(y) R

 body coordinates are weighted uniformly within the rectangular region R of the floor map.

Temporal continuity:

d(w t , w t−1 ) and d(h t , h t−1 )

 variation from the previous size

d(x t , x’ t−1 ) and d(y, y’ t−1 )

 variation from Kalman predicted position

N(μ door , σ door )

 distance to the closest door (for new bodies)

Tracking Likelihoods: Distance weight plane

Problem:

blob trackers ignore blob position in 3D (see Zhao and Nevatia CVPR 2004)

Solution:

employ “distance weight plane”

D xy = |P xyz , C xyz |

where

P

coordinates of the camera and reference point correspondingly and and

C

P z

h

 2 are world

Tracking Likelihoods: Z-buffer

0 = background, 1=furthermost body, 2 = next closest body, etc

Tracking Likelihoods: Color Histogram

Color observation likelihood is based on the Bhattacharya distance between candidate and observed color histograms

P color

 1 

w color

  1 

B

(

c

t

,

c t

 1 )  Implementation of z-buffer (Z) and distance weight plane (D) allows to compute multiple-body configuration with one computationally efficient step.

Let:

I

- set of all blob pixels

O

- set of body pixels

P

   (

I

O

Z

(

Z xy

 0 ) ) 

D I P

   (

O

Z

(

Z xy

 0 )

O

I

) 

D

Tracking: Anisotropic Weighted Mean Shift H Classic Mean-Shift Our Mean-Shift t-1 t t

Actors and events

• Shopper groups are formed by individual shoppers who shop together for some amount of time – More than fleeting crossing of paths – Dwelling together – Splitting and uniting after a period of time

Swarming

• Shopper groups detected based on “swarming” idea in reverse – Swarming is used in graphics to generate flocking behaviour in animations. – Rules define flocking behaviour: • Avoid collisions with the neighbors.

• Maintain fixed distance with neighbors • Coordinate velocity vector with neighbors.

Tracking Customer Groups • We treat customers as swarming agents, acting according to simple rules (e.g. stay together with swarm members) Customer groups 6 10 5 1

Terminology

• Actors: shoppers (bodies detected in tracking) – (

x

,

y

,

id

) • Swarming events defined as short time activity sequences of multiple agents interacting with each other.

– Could be fleeting (crossing paths) – Later analysis sorts this out and ignores chance encounters.

Swarming • The actors that best fit this model signal a Swarming Event • Multiple swarming events are further clustered with fuzzy weights to find out shoppers in the same group over long periods.

11 13 12

Event detection

• Two actors come sufficiently close according to some distance measure: – Relative position

p i

=(

x i

,

y i

) of actor i on the floor – Body orientations α

i

– Dwelling state δ

i

={T,F}.

Distance between two agents is a linear combination of co-location, co-ordination and co-dwelling

Event detection

Perform agglomerative clustering of actors

a

• Initialize: N singleton clusters into clusters

C

• Do: merge two closest clusters • While not: validity index

I

reaches its maximum

I

consists of isolation

I ni

and compactness

I nc I ni

= isolation

I nc =

compactness

Event detection

Final events # Iteration # Iteration

Activity Detection

• The shopper group detection is accomplished by clustering the short term events over long time periods. – The events could be separated in time, but they will be part of the same shopper group if the actors are the same (the first term).

Activity detection

• Higher level activities (shopper groups) detected using these events as building blocks over longer time periods • Some definitions: – –

B ei

={

b

e i

} the set of all bodies taking part in an event

e i

.

τ ei

and

τ ej

and

e j

are the average times of events happening.

e i

Activity detection

Define a measure of similarity between two events

D e

2 (

e i

,

e j

)        1  1 | (

B e i

B e j

)  (

B e j

B e i

)    |

B

   

B

e j

|    |   2  2 ( |  

e i

  

e

j

| )  2     Overlap between two sets of actors Separation in time

Activity detection

• Perform fuzzy agglomerative clustering • Minimize objective function • where

w ij

are fuzzy weights • and asymmetric variants of Tukey’s biweight estimators: •  (.) is the loss function from robust statistics.

ψ

(.) is the weight function  Adaptively choose only strong fuzzy clusters  Label remaining clusters as activities

Results: Swarming activities detected in space-time •Dot location: average event location •Dot size: validity •Dots of same color: belong to same activity

Group Detection Results

Quantitative Results

Tracking

Sequence number 1 2 3 4 5 6 Frames 1054 15 0601 1506 2031 1652 People 8 1700 16 3 2 4 %% 8544 48 People missed 3 0 5 0 0 0 12.5

False hits 1 0 1 0 0 0 4.1

Identity switches 3 0 2 0 0 0 10.4

Sequence 1 2 3 Total Percent

Group Detection

Groups 20 17 17 54 100 P+ 0 1 0 1 1.8

P− 7 3 7 12 22.2

Partial 0 1 0 2 3.7

false positives Ground truth (manually determined) false negatives (groups missed) Partially identified groups (≥2 people in the group Correctly identified)

Qualitative Assesments

• Longer paths provide better group detection ( p val << 1 ) • Two-people groups are easiest to detect • Simple one-step clustering of trajectories is not sufficient for long-term group detection • Employee tracks pose a significant problem and have to be excluded • Several groups were missed by the operator in the initial ground truth – System caught groups missed by the human expert after inspection of results.

Contributions

– BG subtraction based on codebook (RGB+thermal) – Introduced head candidate selection method based on VPP histogram – Resolving track initialization ambiguity and non-unique body-blob correspondence – Informed jump-diffuse transitions in MCMC tracker – Weight plane and z-buffer improve likelihood estimation – Anisotropic mean-shift with obstacle model – Two-layer formal framework high level activity detection – Implemented robust fuzzy clustering to group events into activities

Future Work

• Improved Tracking (via feature points) • Demographical analysis • Focus of Attention • Sensor Fusion • Other Types of Swarming Activities

Thank you!

Questions?

d

(

b i

,

b j

) 

w

1 |

p i

,

p j

| 

w

2 | 

i

, 

j

| 

w

3 | 

i

, 

j

|