Transcript slides

EECS 274 Computer Vision
Tracking
Tracking
• Motivation: Obtain a compact
representation from an image/motion
sequence/set of tokens
• Should support application
• Broad theory is absent at present
• Some slides from R. Collins lecture notes
• Reading: FP Chapter 17
Tracking
• Very general model:
– We assume there are moving objects, which have an underlying
state X
– There are measurements Y, some of which are functions of this
state
– There is a clock
• at each tick, the state changes
• at each tick, we get a new observation
• Examples
– object is ball, state is 3D position+velocity, measurements are
stereo pairs
– object is person, state is body configuration, measurements are
frames, clock is in camera (30 fps)
Three main steps
Simplifying assumptions
Tracking as induction
• Assume data association is done
– we’ll talk about this later; a dangerous
assumption
• Do correction for the 0’th frame
• Assume we have corrected estimate for
i’th frame
– show we can do prediction for i+1, correction
for i+1
General model for tracking
• The moving object of interest is characterized by an
underlying state X
• State X gives rise to measurements or observations Y
• At each time t, the state changes to Xt and we get a new
observation Yt
X1
X2
Y1
Y2
…
Can be better explained with graphical model
Xt
Yt
Base case
Induction step
Given
Induction step
Explanation with graphical model
Explanation with graphical model
Explanation with graphical model
Linear dynamic models
• Use notation ~ to mean “has the pdf of”,
N(a, b) is a normal distribution with mean
a and covariance b.
• Then a linear dynamic model has the form
xi ~ N (Di 1xi 1; Σ di )
y i ~ N (M i xi ; Σ mi )
D: transition matrix
M: observation (measurement) matrix
• This is much, much more general than it
looks, and extremely powerful
Propagation of Gaussian densities
Examples
• Drifting points
– we assume that the new position of the point is the
old one, plus noise.
– For the measurement model, we may not need to
observe the whole state of the object
• e.g. a point moving in 3D, at the 3k’th tick we see x, 3k+1’th
tick we see y, 3k+2’th tick we see z
• in this case, we can still make decent estimates of all three
coordinates at each tick.
– This property, which does not apply to every model, is
called Observability
Examples
•
•
•
•
Points moving with constant velocity
Periodic motion
etc.
Points moving with constant acceleration
Moving with constant velocity
• We have
p i  p i 1  tv i 1  ε i
v i  v i 1  ς i
– (the Greek letters denote noise terms)
• Stack (u, v) into a single state vector, xi
p   Id
   
 v i  0
t  p 
   noise
I d  v i 1
– which is the form we had above
xi ~ N (Di 1xi 1; Σ di )
y i ~ N (M i xi ; Σ mi )
 Id
Di  
0
tI d 
, M i  I d
Id 
0
Moving with constant acceleration
• We have
p i  p i 1  tv i 1  ε i
v i  v i 1  ta i 1  ς i
a i  a i 1  ξ i
– (the Greek letters denote noise terms)
• Stack (u, v) into a single state vector
 u   I d t 0  u 
  
 
 v    0 I d t  v   noise
 a   0 0 I  a 
d   i 1
 i 
– which is the form we had above
 Id

Di   0
0

tI d
Id
0
0 

tI d , M i  I d
I d 
0 0
Constant velocity dynamic model
velocity
state
1st component of state
position
position
time
measurement and
1st component of state
time
Constant acceleration dynamic model
velocity
position
position
time
Kalman filter
• Key ideas:
– Linear models interact uniquely well with
Gaussian noise
• make the prior Gaussian, everything else
Gaussian and the calculations are easy
– Gaussians are really easy to represent
• once you know the mean and covariance, you’re
done
Kalman filter in 1D
• Dynamic Model
• Notation
Predicted mean
Corrected mean
Before/after i-th measurement
Prediction for 1D Kalman filter
• Because the new state is obtained by
– multiplying old state by known constant
– adding zero-mean noise
• Therefore, predicted mean for new state is
– constant times mean for old state
• Predicted variance is
– sum of constant^2 times old state variance and noise variance
– old state is normal random variable, multiplying normal rv by
constant implies mean is multiplied by a constant variance by
square of constant, adding zero mean noise adds zero to the
mean, adding rv’s adds variance
Correction for 1D Kalman filter
• Pattern match to identities given in book
– basically, guess the integrals, get:
• Notice:
– if measurement noise is small,
we rely mainly on the
measurement,
– if it’s large, mainly on the
prediction
 mi
yi 

 mi  0  xi  ,  i 
mi
mi
 m is large  xi  x  ,  i   i
i
i
Example: Kalman filtering
 : state
 : measuremen t
* : xi (predictio n before measuremen t)
 : xi (correctio n after measuremen t)
Kalman filter: general case
In higher dimensions,
derivation follows the
same lines, but isn’t as
easy. Expressions here.
the measurement standard deviation is small, so the state
estimates are rather good
Smoothing
• Idea
– We don’t have the best estimate of state what about the future?
– Run two filters, one moving forward, the other
backward in time.
– Now combine state estimates
• The crucial point here is that we can obtain a
smoothed estimate by viewing the backward filter’s
prediction as yet another measurement for the
forward filter
– so we’ve already done the equations
Forward filter
 : state
 : measuremen t
* : xi (predictio n before measuremen t)
 : xi (correctio n after measuremen t)
Backward filter
 : state
 : measuremen t
* : xi (predictio n before measuremen t)
 : xi (correctio n after measuremen t)
Forward-backward filter
 : state
 : measuremen t
* : xi (predicito n before measuremen t)
 : xi (correctio n after measuremen t)
Data association
•Also known as correspondence problem
• Given features detectors (measurements) are not perfect, how can one find
correspondence between measurements and features in the model?
Data association
• Determine which measurements are informative
(as not every measurement conveys the same
amount of information)
• Nearest neighbors
– choose the measurement with highest probability
given predicted state
– popular, but can lead to catastrophe
• Probabilistic Data Association (PDA)
– combine measurements, weighting by probability
given predicted state
– gate using predicted state
Data association
Constant velocity model
measurements are plotted
with standard deviation
(dash line)
 : state
 : measuremen t
* : xi (predictio n before measuremen t)
 : xi (correctio n after measuremen t)
Data association with NN
•The blue track represents the actual
measurements of state, and red
tracks are noise
•The noise is all over the place, and
the gate around each prediction
typically contains only one
measurement (the right one).
•This means that we can track rather
well ( the blue track is very largely
obscured by the overlaid
measurements of state)
 : state
•Nearest neighbors (NN) pick best
measurements (consistent with
predictions) work well in this case
•Occasional mis-identifcaiton may
not cause problems
 : measuremen t
* : xi (predictio n before measuremen t)
 : xi (correctio n after measuremen t)
Data association with NN
• If the dynamic model is not
sufficiently constrained, choosing
the measurement that best may lead
to failure
 : state
 : measuremen t
* : xi (predictio n before measuremen t)
 : xi (correctio n after measuremen t)
Data association with NN
• But actually the tracker loses
tracks
•These problems occur because error
can accumulate
•it is now relatively easy to
continue tracking the wrong
point for a long time,
•and the longer we do this the
less chance there is of
recovering the right point
Probabilistic data association
• Instead of choosing the region most like
the predicted measurement,
• we can exclude all regions that are too
different and then use the others,
• weighting them according to their similarity
to prediction
Instead of using P(y i | y 0 ,, y i 1 )
Eh y i    P(h j | y 0 ,, y i 1 )y ij
j
superscipt indicates the region
Nonlinear dynamics
•As one repeats iterations of this
function --- there isn’t any
noise --- points move towards or
away from points where sin(x)=0.
•This means that if one has
a quite simple probability
distribution on x_0, one can end up
with a very complex distribution
(many tiny peaks) quite quickly.
•Most easily demonstrated by
running particles through this
dynamics as in the next slide.
xi 1  xi   sin( xi )
Nonlinear dynamics
•Top figure shows tracks of particle
position against iteration for the
dynamics of the previous
slide, where the particle position
was normal in the initial
configuration (first graph at the
bottom)
•As the succeeding graphs show, the
number of particles near a point (=
histogram = p(x_n))
very quickly gets complex, with
many tiny peaks.
Propagation of general densities
Factored sampling
• Represent the state distribution non-parametrically
– Prediction: Sample points from prior density for the state, p(x)
– Correction: Weight the samples according to p(y|x)
p (x | y )  kp(y | x) p (x)
n 
p z (s ( n ) )
N
 p (s
j 1
z
( j)
)
p z ( x)  p ( y | x)
pxt | y 0 ,, y t  
py t | xt  pxt | y 0 ,, y t 1 
 py
t
| xt  pxt | y 0 ,, y t 1 dxt
Representing p(y|x) in terms of {st-1(n), πt-1(n) }
Particle filtering
• We want to use sampling to propagate
densities over time (i.e., across frames in
a video sequence)
• At each time step, represent posterior
p(xt|yt) with weighted sample set
• Previous time step’s sample set p(xt-1|yt-1)
is passed to next time step as the effective
prior
Particle filtering
Start with weighted samples
from previous time step
Sample and shift according
to dynamics model
Spread due to randomness;
this is predicted density
p(xt|yt-1)
Weight the samples
according to observation
density
Arrive at corrected density
estimate p(xt|yt)
M. Isard and A. Blake, “CONDENSATION -- conditional density propagation for visual
tracking”, IJCV 29(1):5-28, 1998
Paramterization of splines
Dynamic model p(xt|xt-1)
• Can learn from examples using ARMA or
LDS
• Can be modeled with random walk
• Often factorize xt
Observation model p(yt|xt)
Observation model: 2D case
Particle filtering results
Issues
• Initialization
• Obtaining observation and dynamics model
–
–
–
Generative observation model: “render” the state on top of the image and compare
Discriminative observation model: classifier or detector score
Dynamics model: learn (very difficult) or specify using domain knowledge
• Prediction vs. correction
– If the dynamics model is too strong, will end up ignoring the data
– If the observation model is too strong, tracking is reduced to repeated detection
• Data association
– What if we don’t know which measurements to associate with which
tracks?
• Drift
– Errors caused by dynamical model, observation model, and data
association tend to accumulate over time
• Failure detection
Mean-Shift Object Tracking
Start from the
position of the
model in the
current frame
Search in the
model’s
neighborhood in
next frame
Find best
candidate by
maximizing a
similarity func.
Repeat the same
process in the
next pair of
frames
…
Model
Candidate
Current
frame
…
Mean-Shift Object Tracking
Target Representation
Choose a
reference target
model
Represent the
model by its
PDF in the
feature space
Choose a
feature space
0.35
Quantized
Color Space
Probability
0.3
0.25
0.2
0.15
0.1
0.05
0
1
2
3
.
color
Kernel Based Object Tracking, by Comaniniu, Ramesh, Meer
.
.
m
Mean-Shift Object Tracking
PDF Representation
Target Candidate
(centered at y)
0.35
0.3
0.3
0.25
0.25
Probability
Probability
Target Model
(centered at 0)
0.2
0.15
0.1
0.2
0.15
0.1
0.05
0.05
0
0
1
2
3
.
.
.
m
1
2
color
q  qu u 1..m
3
.
.
.
m
color
m
q
u 1
u
1
Similarity f y  f  q , p y 
    
Function:
p  y    pu  y u 1..m
m
p
u 1
u
1
Mean-Shift Object Tracking
Finding the PDF of the target model
xi i1..n
candidate
model
Target pixel locations
y
0
k ( x)
A differentiable, isotropic, convex, monotonically decreasing kernel
b( x )
The color bin index (1..m) of pixel x
• Peripheral pixels are affected by occlusion and background interference
Probability of feature u in model
 

k xi
b ( xi ) u
 y  xi
pu  y   Ch  k 
 h
b ( xi ) u 
2
0.3
0.3
0.25
0.25
0.2
Pixel weight
0.15
0.1
Normalization
factor
Probability
Normalization
factor
Probability
qu  C
Probability of feature u in candidate
Pixel weight
0.1
0
0
1
2
3
.
color
.
.
m



0.2
0.15
0.05
0.05
2
1
2
3
.
color
.
.
m
Mean-Shift Object Tracking
Similarity Function
Target model:
q   q1,
, qm 
Target candidate:
p  y    p1  y  ,
Similarity function:
f  y   f  p  y  , q   ?
, pm  y 
The Bhattacharyya Coefficient
q 

q1 ,
, qm

p1  y  ,
p  y  


q'
1
, pm  y 

y
1
m
p  y  q
f  y   cos  y 
  pu  y  qu
p  y   q u 1
T

p' ( y )
Mean-Shift Object Tracking
Target Localization Algorithm
Start from the
position of the
model in the
current frame

q'
Search in the
model’s
neighborhood
in next frame
similarity func.

p' ( y )


f [ p' ( y), q ' ]
Find best
candidate by
maximizing a
Mean-Shift Object Tracking
Approximating the Similarity Function
m
f  y    pu  y  qu
u 1
Linear
approx.
(around y0)
Model location: y0
Candidate location: y
1 m
1 m
f  y    pu  y0  qu   pu  y 
2 u 1
2 u 1
qu
pu  y0 
 y  xi
pu  y   Ch  k 
 h
b ( xi ) u 
2



Independent of y
Ch
2
 y  xi
wi k 

 h
i 1

n
2



Density estimate!
(as a function of y)
Mean-Shift Object Tracking
Maximizing the Similarity Function
The mode of
Ch
2
 y  xi
wi k 

 h
i 1

n
Important Assumption:
The target
representation
provides sufficient
discrimination
One mode in
the searched
neighborhood
2

 = sought maximum

Mean-Shift Object Tracking
Applying Mean-Shift
The mode of
Ch
2
 y  xi
wi k 

 h
i 1

2
n

 = sought maximum

 y0  xi 2 
xi g 



h
i 1


y1 
2
n
 y0  xi 
g



h
i 1


n
Original
MeanShift:
 y  xi
c k 
 h
i 1

n
Find mode
of
2



using
 y0  xi 2 
xi wi g 


2

h

i 1
y  xi


y

 using 1
n
 y0  xi 2 
h

wi g 



h
i 1


n
Extended
MeanShift:

c wi k 

i 1

n
Find mode
of
Mean-Shift Object Tracking
About Kernels and Profiles
A special class of radially
K  x   ck
symmetric kernels:
 
x
2
The profile of
kernel K
k   x   g  x 
 y0  xi 2 
xi wi g 



h
i 1


y1 
n
 y0  xi 2 
wi g 



h
i 1


n
Extended
MeanShift:
 y  xi
Find mode of c wi k 
i 1
 h
n
2

 using

Mean-Shift Object Tracking
Choosing the Kernel
A special class of radially
symmetric kernels:
 
K  x   ck x
2
Epanechnikov kernel:
Uniform kernel:
1  x if x  1 
k  x  

0
otherwise


1 if x  1 
g  x   k  x   

0
otherwise


 y0  xi 2 
xi wi g 



h
i 1


y1 
n
 y0  xi 2 
wi g 



h
i 1


n
n
y1 
xw
i 1
n
i
i
w
i 1
i
Mean-Shift Object Tracking
Adaptive Scale
Problem:
The scale of
the target
changes in
time
The scale (h)
of the kernel
must be
adapted
Solution:
Run
localization 3
times with
different h
Choose h that
achieves
maximum
similarity
Mean-Shift Object Tracking
Results
Feature space: 161616 quantized RGB
Target: manually selected on 1st frame
Average mean-shift iterations: 4
Mean-Shift Object Tracking
Results
Partial occlusion
Distraction
Motion blur