Tracking with focus on the particle filter (part II) Michael Rubinstein IDC Last time… • Background – State space – Dynamic systems – Recursive Bayesian filters – Restrictive cases •

Download Report

Transcript Tracking with focus on the particle filter (part II) Michael Rubinstein IDC Last time… • Background – State space – Dynamic systems – Recursive Bayesian filters – Restrictive cases •

Tracking
with focus on the particle filter
(part II)
Michael Rubinstein
IDC
Last time…
• Background
– State space
– Dynamic systems
– Recursive Bayesian filters
– Restrictive cases
• Kalman filter
• Grid-based filter
– Suboptimal approximations
• Extended Kalman filter
© Michael Rubinstein
This talk
• Particle filtering
– MC integration
– Sequential Importance Sampling (SIS)
– Resampling
– PF variants
• Multiple-target tracking
– BraMBLe: A Bayesian Multiple-Blob Tracker/ Isard,
MacCormick
© Michael Rubinstein
Dynamic System
k-1
xk-1
fk
k
xk
k+1
xk+1
hk
zk-1
zk
zk+1
Stochastic diffusion
State equation:
xk  f k ( xk 1 , vk )
xk state vector at time instant k
fk
vk
state transition function, f k : R N x  R Nv  R N x
i.i.d process noise
Observation equation: zk  hk ( xk , wk )
zk observations at time instant k
hk observation function, hk : R N  R N  R N
wk i.i.d measurement noise
x
© Michael Rubinstein
w
z
Recursive Bayes filter
p( xk | z1:k )
Sample space
• Prediction:
p ( xk | z1:k 1 )   p ( xk | xk 1 ) p ( xk 1 | z1:k 1 )dx k 1
(1)
• Update:
p( zk | xk ) p( xk | z1:k 1 )
p( xk | z1:k ) 
p( zk | z1:k 1 )
p ( z k | z1:k 1 )   p ( z k | xk ) p( xk | z1:k 1 )dx k
© Michael Rubinstein
(2)
Particle filtering
• Many variations, one general concept:
Represent the posterior pdf by a set of randomly chosen
weighted samples (particles)
Posterior
Sample space
• Randomly Chosen = Monte Carlo (MC)
• As the number of samples become very large – the
characterization becomes an equivalent representation of the
true pdf
© Michael Rubinstein
Particle filtering
• Compared to methods we’ve mentioned last time
– Can represent any arbitrary distribution
– multimodal support
– Keep track of many hypotheses as there are particles
– Approximate representation of complex model
rather than exact representation of simplified model
• The basic building-block: Importance Sampling
© Michael Rubinstein
Monte Carlo integration
• Evaluate complex integrals using probabilistic
techniques
• Assume we are trying to estimate a
complicated integral of a function f over some
domain D:
F 
D
 
f ( x )dx
• Also assume there exists some PDF p defined
over D
© Michael Rubinstein
Monte Carlo integration
• Then
F 
D
 
f ( x )dx  
D

 
f ( x)
 p( x )dx
p( x )
• But

D


 f ( x) 
f ( x)  
 p( x )dx  E   , x ~ p
p( x )
 p( x ) 
• This is true for any PDF p over D!
© Michael Rubinstein
Monte Carlo integration


• Now, if we have i.i.d random samples x1 ,...,xN
sampled
from p, then we can approximate

 f ( x) 
E    by
 p( x ) 
1
FN 
N
N

i 1

f ( xi )

p( xi )
• Guaranteed by law of large numbers:

 f ( x) 
N  , FN  E     F
 p( x ) 
a.s
© Michael Rubinstein
Importance Sampling (IS)

p( x )  0 ?
• What about
• If p is very small, f / p can be arbitrarily large,
Importance weights
‘damaging’ the average
• Design p such that f / p is bounded
• Rule of thumb: take p similar to f as possible
Importance or proposal
density
• The effect: get more samples in ‘important’
areas of f, i.e. where f is large
© Michael Rubinstein
Convergence of MC
integration
• Chebyshev’s inequality: let X be a
random variable with expected value μ
and std σ. For any real number k>0,
Pafnuty Lvovich
Chebyshev
1
Pr{| X   | k }  2
k
• For example, for k  2, it shows that at least
half the values lie in interval (  2 ,   2 )
f ( xi )
1 N
• Let yi  p( x ) , then MC estimator is FN   yi
N
i
© Michael Rubinstein
i 1
Convergence of MC integration
• By Chebyshev’s,
1
 V [ FN ] 
Pr{| FN  E[ FN ] | 
 }
  
2
(k  1/  )
1
1 N  1  N  1 N
V [ FN ]  V   yi   2 V  yi   2 V yi   V y 
N
 N i 1  N  i 1  N i 1
1

1  V [ y]  2
Pr{| FN  F |

 }
N  
• Hence, for a fixed threshold, the error decreases
at rate 1 / N
© Michael Rubinstein
Convergence of MC integration
• Meaning
1. To cut the error in half, it is necessary to
evaluate 4 times as many samples
2. Convergence rate is independent of the
integrand dimension!
•
On contrast, the convergence rate of grid-based
approximations decreases as N x increases
© Michael Rubinstein
IS for Bayesian estimation
E ( f ( X ))   f ( x0:k ) p( x0:k | z1:k )dx0:k
X

X
p( x0:k | z1:k )
f ( x0:k )
q( x0:k | z1:k )dx0:k
q( x0:k | z1:k )
• We characterize the posterior pdf using a set of
samples (particles) and their weights
i
0:k
i N
k i 1
{x , w }
• Then the joint posterior density at time k is
approximated by
N
p( x0:k | z1:k )   wki  ( x0:k  x0i :k )
i 1
© Michael Rubinstein
IS for Bayesian estimation
• We draw the samples from the importance
density q( x0:k | z1:k ) with importance weights
p( x0:k | z1:k )
w 
q( x0:k | z1:k )
i
k
• Sequential update (after some calculation…)
Particle update
Weight update
xki ~ q( xk | xki 1, zk )
wki  wki 1
© Michael Rubinstein
p( zk | xki ) p( xki | xki 1 )
q( xki | xki 1 , zk )
Sequential Importance Sampling (SIS)
{x , w }   SIS{x
i
k
i N
k i 1
• FOR i=1:N
i
k 1
i
N
k 1 i 1
, w } , zk

– Draw xki ~ q( xk | xki 1, zk )
p( zk | xki ) p( xki | xki 1 )
i
i
– Update weights wk  wk 1
i
i
• END
• Normalize weights
q( xk | xk 1 , zk )
© Michael Rubinstein
State estimates
• Any function f ( xk ) can be calculated by
discrete pdf approximation
1
E f ( xk ) 
N
N
i
i
w
f
(
x
 k k)
i 1
Robust mean
• Example:
– Mean (simple average)
– MAP estimate: particle with
largest weight
– Robust mean: mean within
window around MAP estimate
MAP Mean
© Michael Rubinstein
Choice of importance density
Hsiao et al.
© Michael Rubinstein
Choice of importance density
• Most common (suboptimal): the transitional
prior
q ( xk | xki 1 , z k )  p ( xk | xki 1 )
 wki  wki 1
p ( z k | xki ) p ( xki | xki 1 )
i
i

w
p
(
z
|
x
k 1
k
k)
i
i
q ( xk | xk 1 , z k )
Grid filter weight update:
wki |k 
wki |k 1 p( zk | xki )
Ns
j
j
w
p
(
z
|
x
 k|k 1 k k )
j 1
© Michael Rubinstein
The degeneracy phenomenon
• Unavoidable problem with SIS: after a few
iterations most particles have negligible
weights
– Large computational effort for updating particles
with very small contribution to p( xk | z1:k )
• Measure of degeneracy - the effective sample
size:
1
N eff 
i 2
(
w
i 1 k )
N
– Uniform: Neff  N , severe degeneracy: Neff  1
© Michael Rubinstein
Resampling
• The idea: when degeneracy is above some
threshold, eliminate particles with low
importance weights and multiply particles
with high importance weights
i
i N
i* 1 N
{xk , wk }i 1  {xk , N }i 1
• The new set is generated by sampling with
replacement from the discrete representation
of p( xk | z1:k ) such that Pr{xki*  xkj }  wkj
© Michael Rubinstein
Resampling
{x
i*
k


, wki }iN1  RESAMPLE{xki , wki }iN1

• Generate N i.i.d variables ui ~ U [0,1]
• Sort them in ascending order
• Compare them with the cumulative sum of
normalized weights
Ristic et al.
© Michael Rubinstein
Resampling
• Complexity: O(NlogN)
– O(N) sampling algorithms exist
Hsiao et al.
© Michael Rubinstein
Generic PF
{x , w }   PF{x
i
k
i N
k i 1
i
k 1
, wki 1}iN1, zk

• Apply SIS filtering {xki , wki }iN1   SIS{xki 1, wki 1}iN1, zk 
• Calculate Neff
• IF Neff  Nthr
• {xki , wki }iN1   RESAMPLE{xki , wki }iN1 
• END
© Michael Rubinstein
Generic PF
{xki , N 1}
Uniformly weighted measure
Approximates p( xk | z1:k 1 )
Compute for each particle
its importance weight to
Approximate p( xk | z1:k )
{xki , wk }
{xki* , N 1}
(Resample if needed)
{xki 1 , N 1}
Project ahead to approximate
p( xk 1 | z1:k )
p( xk 1 | z1:k 1 )
Van der Merwe et al.
© Michael Rubinstein
{xki 1 , wk 1}
PF variants
• Sampling Importance Resampling (SIR)
• Auxiliary Sampling Importance Resampling
(ASIR)
• Regularized Particle Filter (RPF)
• Local-linearization particle filters
• Multiple models particle filters (maneuvering
targets)
• …
© Michael Rubinstein
Sampling Importance Resampling (SIR)
• A.K.A Bootstrap filter, Condensation
• Initialize {x0i , w0i }iN1 from prior distribution X 0
• For k > 0 do
1 N
i*
i
i
N
• Resample {xk 1, wk 1}i1 into {xk 1 , }i 1
N
i
i*
• Predict xk ~ p( xk | xk 1  xk 1 )
• Reweight wki  p( zk | xk  xki )
• Normalize weights
• Estimate xˆk (for display)
© Michael Rubinstein
Intermission
Questions?
© Michael Rubinstein
Multiple Targets (MTT/MOT)
• Previous challenges
– Full/partial occlusions
– Entering/leaving the scene
–…
• And in addition
– Estimating the number of objects
– Computationally tractable for multiple simultaneous
targets
– Interaction between objects
– Many works on multiple single-target filters
© Michael Rubinstein
BraMBLe: A Bayesian MultipleBlob Tracker
M. Isard and J. MacCormick
Compaq Systems Research Center
ICCV 2001
Some slides taken from Qi Zhao
Some images taken from Isard and MacCormick
BraMBLE
• First rigorous particle filter implementation with
variable number of targets
• Posterior distribution is constructed over possible
object configurations and number
• Sensor: single static camera
• Tracking: SIR particle filter
• Performance: real-time for 1-2 simultaneous
objects
© Michael Rubinstein
The BraMBLe posterior
p( xk | z1:k )
State at frame k Image Sequence
Number,
Positions,
Shapes,
Velocities,
…
© Michael Rubinstein
State space
• Hypothesis configuration:
X k  (mk , x1k , xk2 ,...,xkm )
• Object configuration:
N x  1  13M max
xki  (ki , Xik , Vki , Sik )
shape
identifier
S  (wf , ww , ws , wh , h, ,w ,s )
velocity
V  (vx , vz )
position
X  ( x, z )
© Michael Rubinstein
Object model
• A person is modeled as a generalizedcylinder with vertical axis in the world
coords
S  (wf , ww , ws , wh , h, ,w ,s )
(ri , yi )  {(wf ,0), (ww ,wh), (ws ,s h), (wh , h)}
© Michael Rubinstein
Calibrated
Camera
Observation likelihood
p(Zt | Xt )
• Image overlaid with rectangular Grid (e.g. 5
pixels)
Y
Gaussian
Cr
Cb
Mexican Hat
(second deriv of G)
© Michael Rubinstein
Observation likelihood
p(Zt | Xt )
• The response values are assumed
conditionally independent given X
p(Z | X)  g p( z g | X)  g p( z g | l g )
lg
lg
lg
lg
© Michael Rubinstein
0
1
2
3
Appearance models
• GMMs for background and foreground are
trained using kmeans
1
K
1
p( z g | l g  0) 
K
p( z g | l g  0) 


k
k
N
(

,

g
g  B )  B
k
K 4
k
k
N
(

,

F
F )  F
k
K  16
© Michael Rubinstein
Observation likelihood
 p( z g | l g  0) 

log 
 p( z | l  0) 
g
g


© Michael Rubinstein
System (prediction) model p( X t | X t 1 )
• The number of objects can change:
– Each object has a constant probability r to remain in
the scene.
– At each time step, there is constant probability i
that a new object will enter the scene.
• X tn'1  (mtn'1, ~xtn'1,1,...)  X tn  (mtn , ~xtn,1,...)
Prediction
function
Initialization
function
© Michael Rubinstein
Prediction function
• Motion evolution: damped constant velocity
• Shape evolution: 1st order auto-regressive process
model (ARP)
Xt 1
Vt 1
Xt 1  0.8Vt 1
© Michael Rubinstein
Particles
N Points:
X
X
N Weights:
 t1
 t2
1
t
2
t
...
© Michael Rubinstein
XtN 1
XtN
 tN 1
 tN
Estimate
Xˆ t
• Denote Mt  {1,..., M } the set of existing
unique identifiers
Total probability that
object i exists
(particle,target)
© Michael Rubinstein
Results
• N=1000 particles
• initialization samples always generated
© Michael Rubinstein
Results
• Single foreground model cannot distinguish
between overlapping objects – causes id
switches
© Michael Rubinstein
Parameters
© Michael Rubinstein
Summary
• The particle filters were shown to produce good
approximations under relatively weak
assumptions
–
–
–
–
–
–
can deal with nonlinearities
can deal with non-Gaussian noise
Multiple hypotheses
can be implemented in O(N)
easy to implement
Adaptive focus on more probable regions of the
state-space
© Michael Rubinstein
In practice
1.
2.
3.
4.
5.
State (object) model
System (evolution) model
Measurement (likelihood) model
Initial (prior) state
State estimate (given the pdf)
6. PF specifics
1.
2.
•
Importance density
Resampling method
Configurations for specific problems can be found in
literature
© Michael Rubinstein
Thank you!
© Michael Rubinstein
References
• Beyond the Kalman filter/
Ristic,Arulamplam,Gordon
– Online tutorial: A Tutorial on Particle Filters for Online
Nonlinear/Non-Gaussian Bayesian Tracking/
Arulampalam et al 2002
• Stochastic models, estimation and control/ Peter
S. Maybeck
• An Introduction to the Kalman Filter/ Greg Welch,
Gary Bishop
• Particle filters an overview/ Matthias Muhlich
© Michael Rubinstein
Sequential derivation 1
• Suppose at time k-1, {x0i :k 1, wki 1}iN1 characterize
p( x0:k 1 | z1:k 1 )
• We receive new measurement zk and need to
approximate p( x0:k | z1:k ) using new set of samples
• We choose q such that
q( x0:k | z1:k )  q( xk | x0:k 1 , z1:k )q( x0:k 1 | z1:k 1 )
And we can generate new particles
xki ~ q( xk | x0i :k 1, z1:k )
© Michael Rubinstein
Sequential derivation 2
• For the weight update equation, it can be
shown that
p( z k | xk ) p( xk | xk 1 )
p( x0:k | z1:k ) 
p( x0:k 1 | z1:k 1 )
p( z k | z1:k 1 )
 p( z k | xk ) p( xk | xk 1 ) p( x0:k 1 | z1:k 1 )
And so
p( x0:k | z1:k ) p( z k | xk ) p( xk | xk 1 ) p( x0:k 1 | z1:k 1 )
w 

q( x0:k | z1:k )
q( xk | x0:k 1 , z1:k )q( x0:k 1 | z1:k 1 )
i
i
i
p
(
z
|
x
)
p
(
x
|
x
i
k
k
k
k 1 )
 wk 1
q( xki | x0i :k 1 , z1:k )
i
k
© Michael Rubinstein
Sequential derivation 3
• Further, if q( xk | x0:k 1, z1:k )  q( xk | xk 1, zk )
• Then the weights update rule becomes
wki  wki 1
p( zk | xki ) p( xki | xki 1 )
q( xki | xki 1 , zk )
(3)
(and need not store entire particle paths and full history of
observations)
• Finally, the (filtered) posteriorN density is
approximated by p( xk | z1:k )   wki  ( xk  xki )
i 1
© Michael Rubinstein
Choice of importance density
• Choose q to minimize variance of weights
i
i
q
(
x
|
x
,
z
)

p
(
x
|
x
• Optimal choice:
k
k 1
k opt
k
k 1 , zk )
 wki  wki 1 p( zk | xki 1 )
– Usually cannot sample from qopt or solve for wki
(in some specific cases is works)
• Most commonly used (suboptimal)
alternative:
q( xk | xki 1 , zk )opt  p( xk | xki 1 )
 wki  wki 1 p( zk | xki )
– i.e. the transitional prior
© Michael Rubinstein
Generic PF
• Resampling reduces degeneracy, but new
problems arise…
1. Limits parallelization
2. Sample impoverishment: particles with high
weights are selected many times which leads
to loss of diversity
– if process noise is small – all particles tend to
collapse to single point within few interations
– Methods exist to counter this as well…
© Michael Rubinstein