Transcript Slide 1
VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING
PhD Thesis by: Alex Leykin Indiana University
Motivation
• Automated tracking and activity recognition is missing from marketing research • Hardware is already there • Visual information can reveal a lot about human interactions with each other • Help in making intelligent marketing decisions
Goals
Process visual information to get a formal representation of human locations (Visual Tracking) Extract semantic information from the tracks (Activity Analysis)
Related Work: Detection and Tracking
• Yacoob and Davis “Learned models for estimation of rigid and articulated human motion from stationary or moving camera” IJCV 2000 • Zhao and Nevatia “Tracking multiple humans in crowded environment” CVPR 2004 • Haritaoglu, Harwood, and Davis “W-4: Real-time surveillance of people and their activities” PAMI 2000 • J. Deutscher, B. North, B. Bascle and A. Blake “Tracking through singularities and discontinuities by random sampling”, ICCV 1999 • A. Elgammal and L. S. Davis, “Probabilistic Framework for Segmenting People Under Occlusion”, ICCV 2001. • M. Isard, J. MacCormick, “BraMBLe: a Bayesian multiple blob tracker”, ICCV 2001
Related Work: Activity Recognition
• Haritaoglu and Flickner “Detection and tracking of shopping groups in stores” CVPR 2001 • Oliver, Rosario, and Pentland “A bayesian computer vision system for modeling human interactions” PAMI 2000 • Buzan, Sclaroff, and Kollios “Extraction and clustering of motion trajectories in video” ICPR 2004 • Hongeng, Nevatia, and Bremond “Video-based event recognition: activity representation and probabilistic recognition methods” CVIU 2004 • Bobick and Ivanov “Action recognition using probabilistic parsing” CVPR 1998
Low-level Processing
System Components
Tracking Event Detection Activity Detection Camera Model Jump-diffuse transitions Actor Distances Event Distances Obstacle Model Foreground Segmentation Priors and Likelihoods Accept/Reject Candidate Deterministic Agglomerative Clustering Validity Index Fuzzy Agglomerative Clustering Adaptively Remove Weak Clusters Head Detection
Background Modeling
codebook Color • μ RGB • I low • I hi codeword ………..
Adaptive Background Update
Match pixel p to the codebook b I(p) > I low I(p) < I high (RGB(p)∙ μ RGB ) < T RGB If there is no match t(p)/t high t(p)/t low > T t1 > T t2 if codebook is saturated then pixel is foreground else create new codeword Else update the codeword with new pixel information If >1 matches then merge matching codewords
Background Subtraction
Head Detection
Vanishing Point Projection (VPP) Historgram Vanishing Point in Z-direction
Camera Setup
• Two camera types Perspective Spherical • Mixtures of indoor and outdoor scenes • Color and thermal image sensors • Varying lighting conditions (daylight, cloud cover, incandescent, etc.)
Camera Modeling
Perspective Projection Spherical Projection Lat Y X y x [X c , Y c , Z c ] Z X, Y, Z from: [sx; sy; s] = P [X; Y; Ż; 1] using SVD Where P, is the 3x4 projection matrix X = cos(θ) tan(π-φ)(Z Y = sin(θ) tan(π-φ)(Z c -Ż) Z = Ż [X c , Y c , Z c ] Y X Assumption: floor plane Z f = 0 Z c -Ż) Lon
Tracking
Goal: find a correspondence between the bodies, already detected in the current frame with the bodies which appear in the next frame. ?
?
?
Apply Markov Chain Monte Carlo (MCMC) to estimate the next state x t-1 x t ?
Add body Delete body Recover deleted Change Size Move z t
Tracking
Location of each pedestrian is estimated probabilistically based on: Current image Previous state of the system Physical constraints The goal of our tracking system is to find the candidate state
x´
(a set of bodies along with their parameters) which, given the last known state
x
, will best fit the current observation
z P(x’| z, x) = L(z|x’) · P(x’{x})
observation likelihood state prior probability
Tracking: Priors
Constraints on the body parameters:
N(h μ , h σ 2 ) and N(w μ ,w σ 2 )
body width and height
U(x) R and U(y) R
body coordinates are weighted uniformly within the rectangular region R of the floor map.
Temporal continuity:
d(w t , w t−1 ) and d(h t , h t−1 )
variation from the previous size
d(x t , x’ t−1 ) and d(y, y’ t−1 )
variation from Kalman predicted position
N(μ door , σ door )
distance to the closest door (for new bodies)
Tracking Likelihoods: Distance weight plane
Problem:
blob trackers ignore blob position in 3D (see Zhao and Nevatia CVPR 2004)
Solution:
employ “distance weight plane”
D xy = |P xyz , C xyz |
where
P
coordinates of the camera and reference point correspondingly and and
C
P z
h
2 are world
Tracking Likelihoods: Z-buffer
0 = background, 1=furthermost body, 2 = next closest body, etc
Tracking Likelihoods: Color Histogram
Color observation likelihood is based on the Bhattacharya distance between candidate and observed color histograms
P color
1
w color
1
B
(
c
t
,
c t
1 ) Implementation of z-buffer (Z) and distance weight plane (D) allows to compute multiple-body configuration with one computationally efficient step.
Let:
I
- set of all blob pixels
O
- set of body pixels
P
(
I
O
Z
(
Z xy
0 ) )
D I P
(
O
Z
(
Z xy
0 )
O
I
)
D
Tracking: Anisotropic Weighted Mean Shift H Classic Mean-Shift Our Mean-Shift t-1 t t
Actors and events
• Shopper groups are formed by individual shoppers who shop together for some amount of time – More than fleeting crossing of paths – Dwelling together – Splitting and uniting after a period of time
Swarming
• Shopper groups detected based on “swarming” idea in reverse – Swarming is used in graphics to generate flocking behaviour in animations. – Rules define flocking behaviour: • Avoid collisions with the neighbors.
• Maintain fixed distance with neighbors • Coordinate velocity vector with neighbors.
Tracking Customer Groups • We treat customers as swarming agents, acting according to simple rules (e.g. stay together with swarm members) Customer groups 6 10 5 1
Terminology
• Actors: shoppers (bodies detected in tracking) – (
x
,
y
,
id
) • Swarming events defined as short time activity sequences of multiple agents interacting with each other.
– Could be fleeting (crossing paths) – Later analysis sorts this out and ignores chance encounters.
Swarming • The actors that best fit this model signal a Swarming Event • Multiple swarming events are further clustered with fuzzy weights to find out shoppers in the same group over long periods.
11 13 12
Event detection
• Two actors come sufficiently close according to some distance measure: – Relative position
p i
=(
x i
,
y i
) of actor i on the floor – Body orientations α
i
– Dwelling state δ
i
={T,F}.
Distance between two agents is a linear combination of co-location, co-ordination and co-dwelling
Event detection
Perform agglomerative clustering of actors
a
• Initialize: N singleton clusters into clusters
C
• Do: merge two closest clusters • While not: validity index
I
reaches its maximum
I
consists of isolation
I ni
and compactness
I nc I ni
= isolation
I nc =
compactness
Event detection
Final events # Iteration # Iteration
Activity Detection
• The shopper group detection is accomplished by clustering the short term events over long time periods. – The events could be separated in time, but they will be part of the same shopper group if the actors are the same (the first term).
Activity detection
• Higher level activities (shopper groups) detected using these events as building blocks over longer time periods • Some definitions: – –
B ei
={
b
e i
} the set of all bodies taking part in an event
e i
.
τ ei
and
τ ej
and
e j
are the average times of events happening.
e i
Activity detection
Define a measure of similarity between two events
D e
2 (
e i
,
e j
) 1 1 | (
B e i
B e j
) (
B e j
B e i
) |
B
B
e j
| | 2 2 ( |
e i
e
j
| ) 2 Overlap between two sets of actors Separation in time
Activity detection
• Perform fuzzy agglomerative clustering • Minimize objective function • where
w ij
are fuzzy weights • and asymmetric variants of Tukey’s biweight estimators: • (.) is the loss function from robust statistics.
•
ψ
(.) is the weight function Adaptively choose only strong fuzzy clusters Label remaining clusters as activities
Results: Swarming activities detected in space-time •Dot location: average event location •Dot size: validity •Dots of same color: belong to same activity
Group Detection Results
Quantitative Results
Tracking
Sequence number 1 2 3 4 5 6 Frames 1054 15 0601 1506 2031 1652 People 8 1700 16 3 2 4 %% 8544 48 People missed 3 0 5 0 0 0 12.5
False hits 1 0 1 0 0 0 4.1
Identity switches 3 0 2 0 0 0 10.4
Sequence 1 2 3 Total Percent
Group Detection
Groups 20 17 17 54 100 P+ 0 1 0 1 1.8
P− 7 3 7 12 22.2
Partial 0 1 0 2 3.7
false positives Ground truth (manually determined) false negatives (groups missed) Partially identified groups (≥2 people in the group Correctly identified)
Qualitative Assesments
• Longer paths provide better group detection ( p val << 1 ) • Two-people groups are easiest to detect • Simple one-step clustering of trajectories is not sufficient for long-term group detection • Employee tracks pose a significant problem and have to be excluded • Several groups were missed by the operator in the initial ground truth – System caught groups missed by the human expert after inspection of results.
Contributions
– BG subtraction based on codebook (RGB+thermal) – Introduced head candidate selection method based on VPP histogram – Resolving track initialization ambiguity and non-unique body-blob correspondence – Informed jump-diffuse transitions in MCMC tracker – Weight plane and z-buffer improve likelihood estimation – Anisotropic mean-shift with obstacle model – Two-layer formal framework high level activity detection – Implemented robust fuzzy clustering to group events into activities
Future Work
• Improved Tracking (via feature points) • Demographical analysis • Focus of Attention • Sensor Fusion • Other Types of Swarming Activities
Thank you!
Questions?
d
(
b i
,
b j
)
w
1 |
p i
,
p j
|
w
2 |
i
,
j
|
w
3 |
i
,
j
|