Distribution Fields: A Flexible Representation for Low-level Vision Problems University of Massachusetts Amherst Laura Sevilla-Lara Manjunath Narayana Erik Learned-Miller Prior work Abstract Distribution Fields are a data structure.

Transcript Distribution Fields: A Flexible Representation for Low-level Vision Problems University of Massachusetts Amherst Laura Sevilla-Lara Manjunath Narayana Erik Learned-Miller Prior work Abstract Distribution Fields are a data structure.

Distribution Fields: A Flexible Representation for Low-level Vision Problems

Laura Sevilla-Lara

University of Massachusetts Amherst

Manjunath Narayana Erik Learned-Miller Abstract

Distribution Fields are a data structure for the representation of images. At each pixel location they contain one distribution over the possible values of the pixel. This representation addresses several issues that appear in low level vision problems. It to

undersensitivity

’s robust to

oversensitivity

to the spatial structure of the image, and it can explain non rigid motion. Unlike other histogram based descriptors, it includes more spatial information, making it more robust to certain motions like rotation. Also, the

basin of attraction

of this representation is larger than other descriptor ’s, being able to capture very large displacements. Distribution Fields provide a natural way of comparing and combining images or views of an object. All these issues appear in many different low-level vision problems like alignment, tracking, optical flow, background subtraction and registration. Our experiments in tracking and background subtraction provide evidence of the potential of this descriptor as a common framework for these problems.

Prior work

• Alignment:

Congealing

[1] uses pixel-wise distributions to maximize self-likelihood among pixels in the same column. • Tracking: DF’s are related to histogram-based tracking algorithms like

M eanShift

. [2] • Background Subtraction: Common algorithms for background subtraction represent background and foreground pixels as a

M ixture of Gaussians

[3] or perform non-parametric estimation [4].

•

Geometric Blur

: DF’s capture the idea of geometric blur which looks at average feature response over a set of transformations. However, it adapts the set of transformations on the fly. [5] •

SI FT

and

HOG

features: DF’s use the same ideas about integrating feature responses over space. However, they avoid hard bin boundaries, they use overlapping bins, and they optimize “bin” width for each image. [6,7]

Background Subtraction

Background distribution estimation Foreground distribution estimation Detected foreground object in frame t

Distribution Fields

Definition

•Distribution Fields (DF) are a data structure to represent images. In a DF,

each pixel p

in the image

has associated a non-parametric probability distribution

over the values that it can take in any particular feature space. This distribution represents the probability of each value to be seen at pixel p.

Representation

•A DF is represented as

(2+n)-dimensional matrix

, where the first 2 dimensions are the width and height of the image, and the other

dimensions index that we chose. For example, if the feature space is grayscale, then an image of size

yields a 3D matrix of size m x n x b, where b is the number of bins.

Exploding

•Exploding an image to obtain a basic DF is the process of filling a df with a

Kronecker delta function

at each pixel value. In the example of intensity space, the matrix would be filled using the following expression:

Image I

The likelihood match

Pat ch J Dist . Field D( I,sigma) Pat ch J

The “sharpening” match

Sequence of background images Distribution field for background Smoothed DF for background Detected foreground object with a tolerance region around “Exploded” foreground object pixels with uniform distribution in the tolerance region Smoothed DF for foreground DF for background = DF for foreground, to be used in frame t+1 =

Bayes formula

for calculating posterior probability of background label in frame (t+1) given the pixel intensity at location x,y in the image = prior of background label at location x,y in frame t+1 = prior of foreground label at location x,y in frame t+1

Tracking

Smoothing

•A DF can be smoothed in image space, to represent

expected motion

. This propagates the expected pixel value to its vicinity. It can be implemented as a convolution with a Gaussian kernel. •Small changes in the feature value, for example due to

sub-pixel motions

, can also be explained in a DF by convolving in feature space.

Combining images

•A set of images can be combined easily using a DF representation. Summing 2 DF’s component by component and normalizing yields another DF.

•This provides an empirical distribution of an object’s appearance. • Adjusts to “alignability” of images. • Gives “average distance” from evrey pixel in one image to neighborhood of high likelihood in other image.

• Optimal “sigma” is both a good way to sort images by similarity and a very intuitive description of how similar the images are.

• Works in any feature space.

• No bins!

• Not oversensitive to position (like pixel based measures).

• Not undersensitive to position (like bin based measures).

Basin of attraction experiments

• DF’s allow

information

about a pixel to be widely

spread

in space. Typically, this information is spread by blurring the image. This leads to destruction of information about pixel values. This is easily avoided in DF’s, thanks to their layered structure, even using different distance functions. Moving Camera Large Displacement Illumination changes Distractors Occlusion [1] E. Learned-Miller. Data Driven Image Models through Continuous Joint Alignment In

IEEE Transactions on Pattern Analysis and Machine Intelligence

, Volume 28 Issue 2, February 2006 [2] Comaniciu, D.; Ramesh, V.; Meer, P.; , "Real-time tracking of non-rigid objects using mean shift,"

Computer Vision and Pattern Recognition,

2000. Proceedings. IEEE Conference on , vol.2, no., pp.142-149 vol.2, 2000 [3] Stauffer, C.; Grimson, W.E.L.; , "Adaptive background mixture models for real-time tracking,"

Computer Vision and Pattern Recognition

, 1999. IEEE Computer Society Conference on. , vol.2, no., pp.2 vol. (xxiii +637+663), 1999 [4] Ahmed M. Elgammal, David Harwood, and Larry S. Davis. 2000. “

Non-parametric Model for Background Subtraction”

. In Proceedings of the 6th European Conference on Computer Vision-Part II (ECCV '00), David Vernon (Ed.). Springer-Verlag, London, UK, 751-767.