Transcript slides
Motion Segmentation from
Clustering of Sparse Point
Features Using Spatially
Constrained Mixture Models
Shrinivas Pundlik
Committee members
Dr. Stan Birchfield (chair)
Dr. Adam Hoover
Dr. Ian Walker
Dr. Damon Woodard
Motion Segmentation
Gestalt insight: grouping forms the basis of human perception
Gestalt Laws: Factors that affect the grouping process (cues)
similarity
proximity
common motion
(common fate)
continuity
Motion segmentation: segmenting images based on common motion
points moving together are grouped together
Typically, motion segmentation uses common motion + proximity
Applications of Motion Segmentation
object detection
tracking
pedestrian detection
vehicle tracking
robotics
surveillance
image and video compression
scene reconstruction
video manipulation / editing
Pedestrian detection
Viola et al., 2003
Vehicle tracking
Kanhere et al., 2005
video matting
video annotation
motion magnification
Video editing
Criminisi et al., 2006
Previous Work
Approach
Motion Layer Estimation
Algorithm
Expectation Maximization
Wang and Adelson 1994
Jojic and Frey 2001
Ayer and Sawhney 1995
Jojic and Frey 2001
Willis et al. 2003
Smith et al. 2004
Kokkinos and Maragos 2004
Xiao and Shah 2005
Multi Body Factorization
Costeria and Kanade 1995
Ke and Kanade 2002
Vidal and Sastry 2003
Yan and Pollefeys 2006
Gruber and Weiss 2006
Object Level Grouping
Sivic et al. 2004
Kanhere et al. 2005
Miscellaneous
Black and Fleet 1998
Birchfield 1999
Levine and Weiss 2006
Graph Cuts
Willis et al. 2003
Xiao and Shah 2005
Criminisi et al. 2006
Normalized Cuts
Shi and Malik 1998
Belief Propagation
Kumar et al. 2005
Variational Methods
Cremers and Soatto 2005
Brox et al. 2005
Nature of Data
Sparse Features
Sivic et al. 2004
Kanhere et al. 2005
Rothganger et al. 2004
Dense Motion
Cremers and Soatto 2005
Brox et al. 2005
Motion + Image Cues
Xiao and Shah 2005
Kumar et al. 2005
Criminisi et al. 2006
Challenges: Short Term
2. wall
1. statue
3. trees
5. biker
6. pedestrian
computation of motion in the scene
• influence of the neighboring motion
+
+
4. grass
number of objects / regions
in the scene
+
+
+
+
initialization of motion parameters
description of complex
motions (articulated human
motion)
Challenges: Long Term
x
fast
x
medium
fast
medium
slow
crawling
threshold
threshold
slow
time window
t
time window
batch processing
t
incremental processing
batch processing vs. incremental processing
• updating the reference frame
maintain existing groups
• growing existing regions
• splitting
adding new groups (new objects)
(deleting invisible groups)
Objectives
Motion Segmentation
Feature Tracking
motion
computation
clustering
(two-frame)
long-term
maintenance
of groups
• motion segmentation
using sparse point features
observed data
• automatically determine
the number of groups
group
assignment
Mixture
Model
parameter Framework
estimation
• handling dynamic sequences
motion models
• real time performance
• handling complex motions
translation
affine
complex models
Articulated Human
Motion Models
Overview of the Topics
Feature Tracking: Tracking sparse point features
for computation of image motion and its extension to
joint feature tracking.
S. T. Birchfield and S. J. Pundlik, “Joint Tracking of Features and
Edges”, CVPR, 2008.
Motion Segmentation: Clustering point features in
videos based on their motion and spatial connectivity.
S. J. Pundlik and S. T. Birchfield, “Motion Segmentation at Any
Speed”, BMVC 2006.
S. J. Pundlik and S. T. Birchfield, “Real Time Motion
Segmentation of Sparse Feature Points at Any Speed”, IEEE Trans.
on Systems, Man, and Cybernetics, 2008.
Articulated Human Motion Models: Learning
human walking motion from various pose and view
angles for segmentation and pose estimation (a
special handling of a complex motion model)
Iris Segmentation: Texture and intensity based
segmentation of non-ideal iris images .
S. J. Pundlik, D. L. Woodard and S. T. Birchfield, “Non Ideal Iris
Segmentation Using Graph Cuts”, CVPR Workshop on Biometrics,
2008.
Point Features
input
gradients
point features
capturing the
information content
Popular features:
Harris corner feature [Harris & Stephens 1987, Schmid et al. 2000]
Shi-Tomasi feature [Shi & Tomasi 1994]
Forstner corner feature [Forstner 1994]
Scale invariant feature transform (SIFT) [Lowe 2000]
Gradient Location and Orientation Histogram (GLOH) [Mikolajczyk and Schmid 2005]
Features from accelerated segment test (FAST) [Rosten and Drummond 2005]
Speeded up robust features (SURF) [Bay et al. 2006]
DAISY [Tola et al. 2008]
Utility of Point Features
Advantages:
highly repeatable and extensible (work for a variety of images)
efficient to compute (real time implementations available)
local methods for processing (tracking through multiple frames)
tracking multiple point features = sparse optical flow
sparse point feature tracks yield the image motion
Tracking Point Features : Lucas-Kanade
Assume constant brightness:
image
pixel displacement
image spatial derivatives
(optic flow
constraint equation)
image temporal derivative
Estimate the pixel displacement u = ( u, v )T by minimizing:
convolution kernel
Differentiating with respect to u and v, setting the derivatives to
zero leads to a linear system:
Gradient covariance matrix
Iterate using Newton-Raphson method
Detection of Point Features
Gradient covariance matrix:
Good feature:
2
1
3
>
Z=
eigenvalues of Z
image gradients
threshold
convolution kernel
2
intensity
3
intensity
intensity
1
y
x
no feature
low
intensity
variation
y
x
edge feature
unidirectional
intensity
variation
emax = 5.15, emin = 3.13
emax = 1026.9, emin = 29.9
two small eigenvalues
a small and a large eigenvalue
y
x
good feature
bidirectional
intensity
variation
emax = 1672.44, emin = 932.4
two large eigenvalues
Dense Optical Flow: Horn-Schunck
Horn-Schunck: find global displacement functions u(x,y) and v(x,y)
by minimizing:
regularization parameter
data term
(optical flow constraint)
Solve using Euler-Lagrange:
Approximation
smoothness term
Laplacian
leads to a sparse system:
a constant
average displacement in the neighborhood
Need for a Joint Approach
Lucas-Kanade (1981)
local method (local smoothing)
Horn-Schunck (1981)
global method (global smoothing)
pixel displacement: constant
within a small neighborhood
pixel displacement: a smooth
function over the image domain
robust under noise
sensitive to noise
produces sparse optical flow
produces dense optical flow
use global smoothing to
improve feature tracking
Joint Feature Tracking
use local smoothing to
improve dense optical flow
Combined Local-Global approach
(Bruhn et al., 2004)
Joint Lucas-Kanade (JLK)
Joint Lucas-Kanade energy functional:
number of feature points
expected values
data term (optical flow constraint)
smoothness term (regularization)
Differentiating EJLK w.r.t. (u,v) gives a 2N x 2N system whose
(2i-1) and (2i)th rows are given by:
Sparse system is solved using Jacobi iterations
Results of JLK
repetitive
texture
low
texture
Overview of the Topics
Feature Tracking: Tracking sparse point features
for computation of image motion and its extension to
joint feature tracking.
S. T. Birchfield and S. J. Pundlik, “Joint Tracking of Features and
Edges”, CVPR, 2008.
Motion Segmentation: Clustering point features in
videos based on their motion and spatial connectivity.
S. J. Pundlik and S. T. Birchfield, “Motion Segmentation at Any
Speed”, BMVC 2006.
S. J. Pundlik and S. T. Birchfield, “Real Time Motion
Segmentation of Sparse Feature Points at Any Speed”, IEEE Trans.
on Systems, Man, and Cybernetics, 2008.
Articulated Human Motion Models: Learning
human walking motion from various pose and view
angles for segmentation and pose estimation (a
special handling of a complex motion model)
Iris Segmentation: Texture and intensity based
segmentation of non-ideal iris images .
S. J. Pundlik, D. L. Woodard and S. T. Birchfield, “Non Ideal Iris
Segmentation Using Graph Cuts”, CVPR Workshop on Biometrics,
2008.
Mixture Models Basics
sample
P(Red|sample)
Posterior
Probability of
drawing a Red
sample
3 bins (components)
P(sample|Red) P(Red)
likelihood of the
sample being Red
(measurement)
how Red is the
drawn sample?
prior probability
of the Red bin
how big is
the Red bin?
probability of drawing a sample from a mixture of three bins:
P(sample) = P(sample|Red)P(Red) + P(sample|Green) P(Green) + P(sample|Blue)P(Blue)
Mixture Model: likelihoods and priors for all the components
challenge: only available information is the drawn sample!
Mixture Model Example: GMM
θ4= {μ4, σ4}
θ1= {μ1, σ1}
θ2= {μ2, σ2}
θ3= {μ3, σ3}
grayscale values
Gaussian density for the jth component:
ith pixel conditioned on parameters
of the jth Gaussian density
Parameters of a Gaussian density, θ : mean (μ) and variance (σ2)
Learning Mixture Models
Mixture model defined as:
number of components (known) observed data point (known)
density parameters (unknown)
mixing weights (unknown)
component density
Learning mixture models (parameter estimation): Estimate mixing weights and
component density parameters
Circular nature of the problem:
parameter estimation
class association (segmentation)
Expectation Maximization
EM: an iterative two step algorithm for parameter estimation
E Step:
Find expectation of the likelihood function
(Segmentation / label assignment)
M Step:
Maximize the likelihood function
(parameter estimation based on segmentation)
convergence:
when the likelihood cannot be further maximized
(when estimates do not change between
successive iterations )
1.
2.
Initialize:
a. number of components K
b. component density parameters θ for all components
c. mixing weights π
d. convergence criterion
repeat until convergence
E STEP
a. for all N data points
i. compute likelihood from the component density
ii. estimate weights, w
M STEP
b. estimate mixing weights
c. estimate component density parameters
Various Mixture Models
data term
(how closely the data
follow the models)
one prior for
each component
(mixing weights)
Finite Mixture
Model
FMM
prior distribution
for each data element
(label probabilities)
Spatially Variant
Finite Mixture
Model (ML)
ML-SVFMM [1]
EM algorithm
1.
smoothness term
(spatial interaction of
the data elements)
neighbors mostly
have similar labels
(loose constraint)
enforce spatial
connectivity of labels
Spatially Variant
Finite Mixture
Model (MAP)
MAP-SVFMM [1]
Spatially
Constrained
Finite Mixture Model
SCFMM
Greedy EM
algorithm
S. Sanjay-Gopal and T. Hebert, “Bayesian Pixel Classification Using Spatially Variant Finite Mixtures and Generalized EM
Algorithm”, IEEE Tran. on Image Processing, 1998.
Greedy-EM (Iterative Region Growing)
consider a 4-connected grid
start location 3
Properties of Greedy EM:
enforces spatial connectivity
of labels (SCFMM)
automatically determines
the number of groups
local initialization of parameters
primary user defined parameters:
• inclusion criterion
• minimum number of elements
in a group
start location 1
start location 2
Grouping Point Features
Between two frames,
Repeat
Randomly select seed
feature
Fit motion model to
neighbors
Repeat until group does
not change:
Discard all features
except the one near the
centroid
Grow group by recursively
including neighboring
features with similar
motion
Update the motion model
Until all features have
been considered
centroid
centroid
centroid
centroid
centroid
original
original
seed
seed
grouping features from a single seed point
Grouping Consistent Features
input: point features
tracked between two
frames
output: groups of
point features
for N seed points
seed 2
seed 1
seed 3
group point features
gather sets of features
always grouped
together
consistent feature group
Grouping Consistent Features
Consistency check: Features that are always grouped together,
no matter the seed point
seed point
seed point
a
a
b
c
a
b
c
d
a
b
c
1
1
1
1
1
d
a
a
+
b
c
1
1
b
c
d
d
a
b
1
c
c
d
d
a
a
1
1
1
b
=
1
b
c
1
In practice, we use 7 seed points
d
b
d
c d
2
2
2
2
2
Consistent Features: Multiple Groups
Feature groups obtained for various iterations
consistent feature groups
Maintaining Groups Over Time
frame k
frame k + n
either features
are regrouped
lost features
3
7
5
2
1
3
4
7
6
track
features
2
1
9
3
7
4
6
5
96
5
86
newly added
features
if
Х2 test
8
fails
find consistent
groups
3
7
or multiple
groups
are found
6
9
5
8
Experimental Results
freethrow
mobile-calendar
statue
robots
car-map
Videos
mobile-calendar sequence
statue sequence
Results Over Time
Algorithm dynamically determines the number of feature groups
freethrow
car-map
mobile-calendar
robots
statue
vehicles
Comparison with Other Approaches
Algorithm
Run Time
(sec/frame)
Max.
number of
groups
Xiao and Shah (PAMI, 2005)
520
4
Kumar et al. (ICCV, 2005)
500
6
Smith et al. (PAMI, 2004)
180
3
Rothganger et al. (CVPR, 2004)
30
3
Jojic and Frey (CVPR, 2001)
1
3
Cremers and Soatto (IJCV, 2005)
40
4
Our algorithm (TSMC, 2008)
0.16
8
Effect of Joint Feature Tracking
input
standard
Lucas-Kanade
Joint
Lucas-Kanade
Overview of the Topics
Feature Tracking: Tracking sparse point features
for computation of image motion and its extension to
joint feature tracking.
S. T. Birchfield and S. J. Pundlik, “Joint Tracking of Features and
Edges”, CVPR, 2008.
Motion Segmentation: Clustering point features in
videos based on their motion and spatial connectivity.
S. J. Pundlik and S. T. Birchfield, “Motion Segmentation at Any
Speed”, BMVC 2006.
S. J. Pundlik and S. T. Birchfield, “Real Time Motion
Segmentation of Sparse Feature Points at Any Speed”, IEEE Trans.
on Systems, Man, and Cybernetics, 2008.
Articulated Human Motion Models: Learning
human walking motion from various pose and view
angles for segmentation and pose estimation (a
special handling of a complex motion model)
Iris Segmentation: Texture and intensity based
segmentation of non-ideal iris images .
S. J. Pundlik, D. L. Woodard and S. T. Birchfield, “Non Ideal Iris
Segmentation Using Graph Cuts”, CVPR Workshop on Biometrics,
2008.
Articulated Motion Models
Purpose of human motion analysis:
pedestrian detection/surveillance
action recognition
pose estimation
Traditional approaches use:
appearance
frame differencing
Theme: Sparse Motion alone captures a
wealth of information
Objectives:
learn articulated human motion models
motion only, no appearance
viewpoint and scale invariant detection
varying lighting conditions (day and night time sequences)
detection in presence of camera and background motion
pose estimation
Use of Motion Capture Data
Top-Down Approach
train high-level descriptors (appearance or motion based) that describe
articulated motion at a global level for detection
Bottom-Up Approach
motion capture
(mocap) data in 3D
hand
center
foot 2
foot 1
displacement of the limbs w.r.t. the body center
learn the motion of individual joints from the training data and aggregate
the information to detect human motion
Approach Overview
Training
3D motion capture points
angular viewpoints
walking poses
Motion Descriptor
Gaussian weight maps for the various means and
orientations that constitute the motion descriptor
spatial arrangement of the descriptor bins w.r.t. the
body center
views
poses
bin values of the motion descriptor describing human
subjects from various viewpoints and pose configurations
confusion matrix for 64 training descriptors
Segmentation Results
right profile
left profile
angular
front
View-invariant segmentation of articulated motion using a motion descriptor
Segmentation of articulated motion in a challenging sequence involving
camera and background motion
Pose Estimation Results
right-profile view
front view
angular view
nighttime sequence
Videos of Detection and Pose Estimation
Overview of the Topics
Feature Tracking: Tracking sparse point features
for computation of image motion and its extension to
joint feature tracking.
S. T. Birchfield and S. J. Pundlik, “Joint Tracking of Features and
Edges”, CVPR, 2008.
Motion Segmentation: Clustering point features in
videos based on their motion and spatial connectivity.
S. J. Pundlik and S. T. Birchfield, “Motion Segmentation at Any
Speed”, BMVC 2006.
S. J. Pundlik and S. T. Birchfield, “Real Time Motion
Segmentation of Sparse Feature Points at Any Speed”, IEEE Trans.
on Systems, Man, and Cybernetics, 2008.
Articulated Human Motion Models: Learning
human walking motion from various pose and view
angles for segmentation and pose estimation (a
special handling of a complex motion model)
Iris Segmentation: Texture and intensity based
segmentation of non-ideal iris images .
S. J. Pundlik, D. L. Woodard and S. T. Birchfield, “Non Ideal Iris
Segmentation Using Graph Cuts”, CVPR Workshop on Biometrics,
2008.
Iris Image Segmentation
non-ideal iris image segmentation using texture and intensity
background
pupil
iris
higher gradient
magnitude
lower gradient
magnitude
eye
un-textured
regions
textured
regions
gradient magnitude
eyelash
input image
non-eyelash
higher density
of point features
lower density
of point features
point features
iris
pupil
background
(Four Regions)
Coarse Texture Computation
Ideas:
• local intensity variations (computed from gradient magnitude and point features) can
be used for texture representation that segments eyelash and non-eyelash regions
• possible segments based on image intensity: iris, pupil and background
Iris Segmentation and Recognition
Iris segmentation:
Input Iris Image
Specular Reflections
Preprocessed input
Iris Segmentation
Iris Mask
Iris Refinement
Iris Ellipse
Iris recognition:
unwrap and normalize the iris mask
generate iris signature from iris mask (using texture in the iris)
compare iris signature using Hamming distance
Image Segmentation Results
Input Image
Segmentation
pupil
iris
eyelashes
background
Iris Mask
Iris Recognition
Iris recognition using our segmentation algorithm
West Virginia Non-Ideal Database
1868 images
467 classes, 4 images/class
West Virginia Off-Axis Database
584 images
146 classes, 4 images/class
Conclusions and Future Work
Motion segmentation based on sparse feature clustering
spatially constrained mixture model and greedy EM algorithm
automatically determines number of groups
real-time performance
ability to handle long, dynamic sequences and arbitrary number of feature groups
Joint feature tracking
incorporation of neighboring feature motion
improved performance in areas of low-texture or repetitive texture
Detection of articulated motion
motion based approach for learning high-level human motion models
segment and track human motion in varying pose, scale, and lighting conditions
view invariant pose estimation
Iris segmentation
graph cuts based dense segmentation using texture and intensity
combines appearance and eye geometry
handles non-ideal iris image with occlusion, illumination changes, and eye rotation
Future Work
integration of motion segmentation, joint feature tracking, and articulated motion segmentation
dense segmentation from the sparse feature groups
handling non-rigid motions, non-textured regions, and occlusions
combining sparse feature groups, discontinuities, and image contours for a novel
representation of video
Questions?