Modeling Clutter Perception using Parametric Proto

Download Report

Transcript Modeling Clutter Perception using Parametric Proto

Modeling Clutter Perception using Parametric Proto-object Partitioning
Chen-Ping Yu1, Wen-Yu Hua3, Dimitris Samaras1, Gregory Zelinsky1,2
1Dept
of Computer Science, 2Dept of Psychology, Stony Brook University; 3Dept of Statistics, Penn State University
Experiments and Results
Method
Introduction
- The Problems (1) Model human clutter perception using proto-objects. (2) Estimate “set
size” for realistic scenes.
- Superpixel Graph An image is first pre-processed into superpixels using SLIC [3], then it is formulated into a graph, where the
nodes are the superpixels. Each pair of adjacent nodes are connected with a weighted edge.
- Parameters a lower-bound parameter 𝜖 ∈ {0.01, 0.02, …, 0.20}, and a percentile parameter 𝜏 ∈ {0.5, 0.6, …, 0.9}.
- What is Visual Clutter? A “confused collection” or a “crowded disorderly state”. Increasing
visual clutter leads to poorer performance in many behavioral tasks (e.g. visual search).
- What is a Set Size Effect? A drop in search performance with an increase in the number of
objects [1]. However, an object count is difficult to quantify in real world scenes.
- Optimization MLE using the Nelder-Mead algorithm, and Nonlinear Least Squares (NLS).
- MLE Highest Spearman’s ρ = 0.8038, with 𝜖 = 0.14 and 𝜏 = 0.8. 10-fold Cross Validation = 0.7599.
- NLS Highest Spearman’s ρ = 0.7966, with 𝜖 = 0.14 and 𝜏 = 0.4. 10-fold Cross Validation = 0.7375.
Superpixel
Graph
SLIC k = 1000
- Edge Weights: Earth Mover’s Distance The edges are weighted by the dissimilarity between the pair of nodes, in terms of
Intensity, Color, and Orientation. We use Earth Mover’s Distance as the dissimilarity distances. EMD is defined to minimize the
following with an optimal flow
:
How can we quantify set size or the number of objects in these scenes?
- Goal Correlate the model's clutter ranking of our 90 image dataset with the behavioral clutter rankings using
Spearman's ρ.
WMM-mle
WMM-nls
Mean-shift [6]
Graph based
[7]
Power Law [8]
Edge Density
[9]
Feature
Congestion [10]
# of Objects
(SUN) [2]
Color-cluster
clutter [11]
0.8038
0.7966
0.7262
0.6612
0.6439
0.6231
0.5337
0.5255
0.4810
Correlations between human clutter perception and all the evaluated methods. WMM is our Weibull mixture model. Our method runs in 20
seconds using 800x600 images, on an Intel Core i7 3.0 Ghz machine with 8 Gb RAM.
- What are Proto-objects? Regions of locally similar features. They can be objects, object
parts, or just pieces that come together to form objects.
- What does our Clutter Model do? It segments proto-objects from an image, then counts
the number of proto-objects as an estimate of visual clutter.
Contributions
where
and
some dissimilarity metric (i.e. the L2 distance) between
and
are the two signatures to be compared, and
in
.
denotes
- Edge Labeling for Superpixel Clustering Each edge is labeled as Similar or Dissimilar, based on a similarity-threshold
The dissimilar edges are removed to form superpixel clusters, which are merged to form proto-objects.
.
- Clutter Model Our model successfully predicts the degree that a person will perceive an
image as cluttered, and out-performs all other existing models of clutter perception.
0.15
0.15
- Parametric Modelling of Earth Mover’s Distance Statistics We show that Earth Mover’s
Distance statistics (EMD) follow a Weibull distribution for efficient parametric modeling.
0.11
0.11
0.86
0.63
0.28
0.75
0.86
0.12
0.04
= 0.6
0.21
0.82
0.81
0.21
0.93
0.65
Dataset
- 90 800x600 real world images, sampled from the SUN Database [2]
- Divided into 6 groups, each with a different range of object counts (from SUN09).
0.38
protoobjects
0.81
0.65
0.68
0.68
0.71
merge
0.93
0.32
0.32
- Clutter Dataset We obtained a clutter ground truth by having people rank order a subset of
images from SUN09 [2] from least to most cluttered.
0.04
0.31
0.31
0.82
0.75
0.12
0.77
0.77
- Proto-object Segmentation Unsupervised image partitioning by our novel parametric EMD
model.
0.35
0.35
0.63
0.28
Four sample images from our dataset. Human clutter ranking from left to right: 6, 47, 70, 87; Proto-object model’s ranking using
the best-tuned parameter setting (𝜖 = 0.14, 𝜏 = 0.8): 7, 40, 81, 83.
0.77
0.77
0.71
0.38
0.05
0.23
0.05
0.23
0.75
0.75
- Compute
using Weibull-Mixture-Model EMD is identical to Mallow’s Distance,
, when
P and Q have the same total mass [4], and Lp-based distance statistics follow a Weibull distribution [5]. Therefore, a twocomponent WMM (similar/dissimilar) can be used for the computation of
.
- Clutter rankings (15 raters) and object segmentations (SUN) available for each image
- Mean correlation between all pairs of human ranking: Spearman’s ρ = 0.6919
Orientation
Weibull-Mixture Model (WMM):
Color
Intensity
3
5
7
31
32
33
51
52
53
Similarity Threshold – the crossing point between the two components:
Application to parameter-free Image partitioning: Use only 2-component WMM and does not enforce the lower-bound parameter 𝜖.
7
9
10
36
37
39
55
57
1~10 objects
31~40 objects
51~60 objects
15 images
15 images
15 images
90 images total
References & Acknowledgment
58
- Normalized Clutter Measure The count of the final proto-objects are divided by the initial # of superpixels to produce our final
clutter measure for a given image.
[1] J. M. Wolfe. Visual search. Attention, 1998.
[2] J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba. SUN database: Large-scale scene recognition from abbey to zoo. In CVPR, 2010.
[3] R. Achanta, A. Shaji, L. Smith, A. Lucchi, P. Fua, and S. Susstrunk. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE TPAMI, 2012.
[4] E. Levina and P. Bickel. The earth mover’s distance is the mallows distance: some insights from statistics. In ICCV, 2001.
[5] G. J. Burghouts, A. W. M. Smeulders, and J.-M. Geusebroek. The distribution family of similarity distances. In NIPS, 2007.
[6] D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis. IEEE TPAMI, 2002.
[7] P. F. Felzenszwalb and D. P. Huttenlocher. Efficient graph-based image segmentation. In ICCV, 2004.
[8] M. J. Bravo and H. Farid. A scale invariant measure of clutter. Jounal of Vision, 2008.
[9] M. L. Mack and A. Oliva. Computational estimation of visual complexity. In the 12th Annual Object, Perception, Attention, and Memory Conference, 2004.
[10] R. Rosenholtz, Y. Li, and L. Nakano. Measuring visual clutter. Journal of Vision, 2007.
[11] M. C. Lohrenz, J. G. Trafton, R. M. Beck, and M. L. Gendron. Amodel of clutter for complex, multivariate geospatial displays. Human Factors, 2009.
We appreciate the authors of C3 model, Dr. Burghouts of [5], and Dr. Matthew Asher for discussions and code sharing. This work was supported by NIMH Grant R01-MH064748 to G.J.Z., NSF Grant IIS-1111047 to G.J.Z. and D.S., and the SUBSAMPLE Project of the DIGITEO Institute, France.