Visual Tracker Sampler

Download Report

Transcript Visual Tracker Sampler

Tracking by Sampling Trackers
Junseok Kwon* and Kyoung Mu lee
Computer Vision Lab.
Dept. of EECS Seoul National University, Korea
Homepage: http://cv.snu.ac.kr
Goal of Visual Tracking
 Robustly tracks the target in real-world
scenarios
Frame #1
Frame #43
Bayesian Tracking Approach
 Maximum a Posteriori (MAP) estimate
arg max p( Xt | Y1:t )
Xt
edge
Xt  { Xtx , Xty , Xst }
Intensity
State Sampling
 MAP estimate by Monte Carlo sampling
arg max p( X | Y1:t ), l  1,, N
(l)
t
X (t l )
Scale
Guided by
X position
State space
Visual
tracker
Problem of previous works
Tracking environment changes
 Conventional trackers have difficulty in
obtaining good samples.
Fixed
Visual
tracker
can not reflect
the changing
tracking
environment well.
Our approach : Tracker Sampling
 Sampling tracker itself as well as state
State sampling
Scale
Tracker
#1
X position
Scale
Tracker
#2
X position
Tracker space
Scale
Tracker
#M
X position
Two challenges
 How the tracker space is defined?
Tracker space
 When and which tracker should be sampled?
Tracker
#1
Tracker
#2
Tracker space
Tracker
#M
Challenge 1 : Tracker Space
 Tracker space
 Nobody tries to define tracker space.
 Very difficult to design the space because the
visual tracker is hard to be described.
Tracker space
Bayesian Tracking Approach
 Go back to the Bayesian tracking formulation
arg max p( Xt | Y1:t )
Xt
Updating rule
p(Yt | Xt ) p( Xt | Xt 1 )p( Xt 1 | Y1:t 1 )dX t 1
Bayesian Tracking Approach
 What is important ingredients of visual
tracker?
p(Yt | Xt ) p( Xt | Xt 1 )p( Xt 1 | Y1:t 1 )dX t 1
1. Appearance model
2. Motion model
3. State representation type
4. Observation type
Tracker Space
Challenge 2 : Tracker Sampling
 Tracker sampling
 When and which tracker should be sampled ?
 To reflect the current tracking environment.
Appearance model
Motion model
(At )
(M t )
Tracker
#m
State representation type
(S t )
Tracker space
Observation type
(O t )
Reversible Jump-MCMC
 We use the RJ-MCMC method for tracker sampling.
Delete
Add
1
t
A A
A t2
|A t |
t
Set of sampled
appearance models
Delete
Add
M 1t
M t2
M
|M t |
t
Delete
Add
St2 S|S |
t
1
t
t
S
Set of sampled
motion models
Set of sampled state
representation types
Tt1 Tt2
Tt|At ||Mt ||St |Ot |
Sampled basic trackers
Delete
Add
O1t O|tO |
t
O t2
Set of sampled
observation types
Sampling of Appearance Model
 Make candidates using SPCA*
 The candidates are PCs of the target appearance.
Sparse Principle Component Analysis*
Appearance models
* A. d’Aspremont et. al. A direct formulation for sparse PCA using semidefinite programming. Data Min. SIAM Review, 2007.
Sampling of Appearance Model
 Accept an appearance model
 With acceptance ratio

 p( A*t | X t , Y1:t )Q( A t ; A*t ) 
  min 1,

*
p
(
A
|
X
,
Y
)
Q
(
A
;
A
)
t
t
1:t
t
t 

A*t
where  log p( A | X t , Y1:t )  
*
t
t 1
i
*
DD
(
Y
(
X
),

)


log
A

j
j
t
t
i  1 j t  5
Our method has
the limited number of models
A*t

t 1
i
DD
(
Y
(
X
),


j
j
t)
i  1 j t  5
 The accepted model increase the total likelihood
scores
for recent frames
 When it is adopted as the target reference
Sampling of Motion Model
 Make candidates using KHM*
 The candidates are mean vectors of the clusters
for motion vectors.
Motion models
K-Harmonic Means Clustering (KHM)*
* B. Zhang, M. Hsu, and U. Dayal. K-harmonic means - a data clustering algorithm. HP Technical Report, 1999
Sampling of Motion Model
 Accept a motion model
 With acceptance ratio

 p(M *t | X t , Y1:t )Q(M t ; M *t ) 
  min 1,

*
p
(
M
|
X
,
Y
)
Q
(
M
;
M
)
t
t
1:t
t
t 

M *t
where  log p(M *t | X t , Y1:t )   VAR ( Dt ,  i )   log M *t
i 1
Our method has
the limited number of models
M *t
 VAR (D ,  )
i 1
t
i
 The accepted model decreases the total clustering
error of motion vectors
for recent frames
 When it is set to the mean vector of the cluster
Sampling of State Representation
 Make candidates using VPE*
 The candidates describe the target as the different
combinations of multiple fragments.
Position
Fragment 1
Edge
Fragment 2
Intensity
State
representation
Vertical Projection of Edge (VPE)*
* F.Wang, S. Yua, and J. Yanga. Robust and efficient fragments-based tracking using mean shift. Int. J. Electron. Commun., 64(7):614–623, 2010.
Sampling of State Representation
 Accept a state representation type
 With acceptance ratio

 p(S*t | X t , Y1:t )Q(S t ; S*t ) 
  min 1,

*
p
(
S
|
X
,
Y
)
Q
(
S
;
S
)
t
t
1:t
t
t 

S*t
Fti
where  log p(S*t | X t , Y1:t )   VAR (f j )   log S*t
i  1 j 1
Our method has
the limited number of types
S*t
Fti
 VAR (f )
i  1 j 1
j
 The accepted type reduce the total variance of target
appearance in each fragment for recent frames
Sampling of Observation
 Make candidates using GFB*
 The candidates are the response of multiple Gaussian
filters of which variances are different.
Gaussian Filter Bank (GFB)*
* J. Sullivan, A. Blake, M. Isard, and J. MacCormick. Bayesian object localisation in images. IJCV, 44(2):111–135, 2001.
Sampling of Observation
 Accept an observation type
 With acceptance ratio

 p(O*t | X t , Y1:t )Q(O t ; O*t ) 
  min 1,

*
p
(
O
|
X
,
Y
)
Q
(
O
;
O
)
t
t
1:t
t
t 

O*t
where  log p(O*t | X t , Y1:t ) 
t 1

i
i
DD
(

,


j
k)
O*t
t 1
i  1 j,k  t  5
  DD( , 
i  1 j ,k  t  5
i
j
i
k
  log O*t
)
Our method has
the limited number of types
O*t
t 1
  DD( , 
i  1 j ,k  t  5
O*t
i
j
i
k
)
t 1
  DD( , 
i  1 j ,k  t  5
i
j
i
k
)
 The accepted type makes more similar between foregrounds,
but more different with foregrounds and backgrounds
frames
for recent
Overall Procedure
State sampling
Scale
Tracker
#1
Scale
Interaction
X position
Tracker
#2
X position
Tracker space
Scale
Tracker
#M
X position
Qualitative Results
Qualitative Results
Iron-man dataset
Qualitative Results
Matrix dataset
Qualitative Results
Skating1 dataset
Qualitative Results
Soccer dataset
Quantitative Results
MC
IVT
MIL
VTD
Ours
soccer
53
116
41
23
17
skating1
172
213
85
8
8
animal
26
21
30
22
10
shaking
98
150
38
20
5
Soccer*
72
225
147
34
24
Skating1*
126
291
87
16
8
Iron-man
78
104
122
30
15
Matrix
123
50
57
80
12
Average center location errors in pixels
MC : Khan et. al. MCMC-based particle filtering for tracking a variable number of interacting targets. PAMI 2005.
IVT : Ross et. al. Incremental learning for robust visual tracking. IJCV 2007.
MIL : Babenko et. al. Visual tracking with online multiple instance learning. CVPR 2009.
VTD: Kwon et. al. Visual tracking decomposition. CVPR 2010.
Summary
 Visual tracker sampler
 New framework, which samples visual tracker
itself as well as state.
 Efficient sampling strategy to sample the visual
tracker.
http://cv.snu.ac.kr/paradiso