Tracking objects across cameras by incrementally learning intercamera colour calibration and patterns of activity (ECCV 2006) Andrew Gilbert and Richard Bowden University of Surrey CSE 252C, Fall.

Download Report

Transcript Tracking objects across cameras by incrementally learning intercamera colour calibration and patterns of activity (ECCV 2006) Andrew Gilbert and Richard Bowden University of Surrey CSE 252C, Fall.

Tracking objects across cameras by incrementally learning inter camera colour calibration and patterns of activity (ECCV 2006)

Andrew Gilbert and Richard Bowden University of Surrey CSE 252C, Fall 2006 UCSD 1

Inter Camera Tracking

 As the number of cameras in a network increase, the user’s ability is limited in being able to manage such vast amounts of information.

 We would like to be able to automatically track objects across a camera network  Tracking an object throughout each cameras field of view (FOV)  Successful object “handover” between cameras in network  We would like to be able to track objects across cameras without any explicit geometric or color calibration CSE 252C, Fall 2006 UCSD 2

Proposed Algorithm

 Using an incremental learning method, the inter camera color variations and the probability distribution of spacio-temperal links between cameras are modeled.

 Requires no color or spatial pre-calibration  No batch processing  We want the algorithm to work immediately, as more data becomes available will improve performance and is adaptable to changes in the cameras’ environments CSE 252C, Fall 2006 UCSD 3

Test Environment

 Four non-overlapping color cameras in an office building.

CSE 252C, Fall 2006 UCSD 4

Intra-Camera Tracking

     The static background color distribution is modeled.

A Gaussian mixture model on a per-pixel basis is used to form the foreground vs. background pixel segmentation, learned using an online approximation to expectation maximization. Shadows are identified and removed by relaxing a models constraint on intensity by not chromaticity, and the foreground object is formed using connected component analysis on the resulting binary segmentation. Objects are linked temporally with a Kalman filter to provide movement trajectories within each camera.

(Kalman Filter overview later!) CSE 252C, Fall 2006 UCSD 5

Object Description

 Once foreground objects have been identified, an object descriptor is formed.

 Color histogram is used to describe the color fingerprint of the object  Spatially invariant  Simple and efficient to compute  Through some quantization, it also provides some invariance to changes in color appearance  Similarity between objects is given by histogram descriptor intersection CSE 252C, Fall 2006 UCSD 6

CCCM

      Consensus-Colour Conversion of Munsell colour space Breaks RGB color into 11 basic colors Each basic color represents a perceptual color category established through a physiological study of how humans categorize color.

Works best without color-calibration of cameras in network Provides consistent inter camera without calibration, relying on the perceptual consistency of the color  If an object is perceived as red in both images, then CCCM will provide a consistent result With calibration, quantized RGB performs best (we’ll use this later).

CSE 252C, Fall 2006 UCSD 7

Color Descriptor

CSE 252C, Fall 2006 UCSD 8

Building Temporal links between Cameras

 Assume that objects follow similar routes inter camera and repetitions form consistent trends across data.  The temporal inter camera links can be used to link camera regions together, yielding a probabilistic distribution of objects movement between cameras.

 As number of cameras in network increases, the number of possible links increases exponentially  The majority of the possible links are invalid because they correspond to impossible routes  We want a solution to distinguish between valid and invalid links  A solution that CSE 252C, Fall 2006 UCSD 9

Building Temporal Links between Cameras

 Within each cameras field of view, the tracking algorithm forms color descriptor for object, the median histogram of the object recorded over the entire trajectory within a single camera

B

 ( , 1 2 ,...,

b n

)  Each new object is compared to previous objects within a given time window, T. 

Add the other equations here…. JLG!!!

CSE 252C, Fall 2006 UCSD 10

Region Links

 An example probability distribution showing a distinct link between two regions CSE 252C, Fall 2006 UCSD 11

Incremental Block subdivision and recombination

    System based on rectangular subdivision Initially there are one block for each of the four cameras. If the maximum peak of the distribution is found to exceed the noise floor, this indicates a possible correlation between the blocks.

 If correlation exists, block is subdivided into four equal sized blocks. Previous data and new data is used to form new links between newly formed sub-blocks  If no correlation, the links are thrown away to minimize maintenanced Blocks which are found to have similar distributions to neighbors are combined together  Reduces number of blocks and links maintained, and increases accuracy CSE 252C, Fall 2006 UCSD 12

Incremental Block subdivision and recombination

CSE 252C, Fall 2006 UCSD 13

Calculating Posterior Appearance Distributions

 Given an object which disappears in region y we can model its reappearance probability over time as:  Where the weight at time t is given as:  This probability is used to weight the observation likelihood obtained through color similarity to obtain a posterior probability of a match.

CSE 252C, Fall 2006 UCSD 14

Modeling Color Variations

 The CCCM color quantization assumes a similar color response between cameras.

 Use CCCM as initial color descriptor, in parallel form color transformation matrices between cameras  The tracked people are used as the calibration objects, and the transformation matrix is formed incrementally to model color changes between cameras CSE 252C, Fall 2006 UCSD 15

Modeling Color Variations

 Six transformations (with inverses) provide the twelve transformations needed to transform objects between the four cameras  This allows less coarse quantization (RGB) to be used with improved performance

T

12 1 2

T

13

T

12

T

14

T

23

T

24

T

34 CSE 252C, Fall 2006 3 4 UCSD 16

Modeling Color Variations

 There is a color transformation matrix that transforms the color space of one camera into that of another camera  With each tracked object, find this transformation matrix (using SVD) and then average it to the aggregate color transformation matrix  Initially set to the Identity matrix CSE 252C, Fall 2006 UCSD 17

Results

 Data used consisted of 10,000 objects tracked over a period of 72 hours of continuous operation  Evaluation performed using unseen ground-truth 20 minute sequence with 300 instances of people tracked for more than 1 second.

CSE 252C, Fall 2006 UCSD 18

Results

CSE 252C, Fall 2006 UCSD 19

Results

CSE 252C, Fall 2006 UCSD 20

Results

CSE 252C, Fall 2006 UCSD 21

Conclusions

 Derived main entry / exit areas in a camera probabilistically using incremental learning  Simultaneously the inter-camera color variations are learned  These allow people to be tracked between spatially separated uncalibrated cameras with up to 81% accuracy  No a priori information used and learning is unsupervised  Fulfills three ideals of working immediately, improving performance as more data is accumulated, and can adapt to changes CSE 252C, Fall 2006 UCSD 22

Let’s break it!

 Adding / taking off layers of clothing of different colors would fool the object descriptor.

 Take irregular paths. The link probabilities are built following the average paths.

 Move really slowly, so that you blend in with the background. (?)  Anything else?... (invisible cloak) CSE 252C, Fall 2006 UCSD 23

Results

CSE 252C, Fall 2006 UCSD 24

Kalman Filter

 Rudolph E. Kalman in 1960 published a recursive solution to the discrete-data linear filtering problem.

 Set of mathematical equations that implement a predictor-corrector estimator that is optimal in the sense that it minimizes the estimated error covariance  Used extensively in computer graphics / vision for tracking  Greg Welch, Gary Bishop (www.cs.unc.edu/~welch/kalman) CSE 252C, Fall 2006 UCSD 25

Kalman Filter

 Trying to estimate the state of a discrete-time controlled process that is governed by the linear stochastic difference equation:  With a measurement CSE 252C, Fall 2006 UCSD 26

Kalman Filter

 Kalman Filter estimates system state at some time and then obtains feedback in terms of (noisy) measurement  Equations for Kalman Filter fall into two categories:   (1) Time Update Equations  Responsible for projecting forward in time the current state and error covariance estimates to obtain for the next time step

a priori

estimates (2) Measurement Update Equations  Measurement feedback - incorporates new measurements into the

a priori

estimate to obtain an improved

a posteriori

estimate CSE 252C, Fall 2006 UCSD 27

Kalman Filter

 Time update =

predictor

 Measurement update =

corrector

CSE 252C, Fall 2006 UCSD 28

Kalman Filter – Time Update

 P is the error covariance estimate CSE 252C, Fall 2006 UCSD 29

Kalman Filter – Measurement Update

 K is the kalman gain matrix CSE 252C, Fall 2006 UCSD 30

Kalman Filter

CSE 252C, Fall 2006 UCSD 31

Extended Kalman Filter (EKF)

 The Kalman filter is estimating the state of a stochastic process that is

linear

. Many applications are

non-linear

!! Nothing to fear we have the EKF!

 EKF = a Kalman filter that linearizes about the current mean and covariance CSE 252C, Fall 2006 UCSD 32

Kalman Filter

 Because the predicted state is based only on the previous state, it is computationally and memory-usage efficient and there are many implementations available  OpenCV has implementation.

 Oh, and the state being tracked is the centroid of the objects… CSE 252C, Fall 2006 UCSD 33

Questions

 Examples didn’t show occlusion during inter-camera tracking. How well does this system work with occlusion and tracking multiple people simultaneously?

 What other descriptors in addition to color histograms can be used to describe the tracked objects?

 Are there better performing intra-camera tracking techniques besides Kalman filter?

CSE 252C, Fall 2006 UCSD 34