Recognition Using Regions Jitendra Malik UC Berkeley Joint work with Chunhui Gu, Joseph Lim & Pablo Arbelaez UC Berkeley Computer Vision Group.

Download Report

Transcript Recognition Using Regions Jitendra Malik UC Berkeley Joint work with Chunhui Gu, Joseph Lim & Pablo Arbelaez UC Berkeley Computer Vision Group.

Recognition Using Regions
Jitendra Malik
UC Berkeley
Joint work with Chunhui Gu, Joseph Lim & Pablo Arbelaez
UC Berkeley
Computer Vision Group
Detection and Segmentation: Bottles
Orig. Image Segmentation
UC Berkeley
Orig. Image
Segmentation
Computer Vision Group
Detection and Segmentation: Giraffes
Orig. Image Segmentation
UC Berkeley
Orig. Image
Segmentation
Computer Vision Group
Detection and Segmentation: Mugs
Orig. Image Segmentation
UC Berkeley
Orig. Image
Segmentation
Computer Vision Group
Outline
• Current paradigm: Multiscale scanning
• Our approach
– Bottom up region segmentation
– Hough transform style voting (learned weights)
– Top down segmentation
• Results on ETHZ , Caltech 101, MSRC
UC Berkeley
Computer Vision Group
Detection: Is this an X?
Ask this question repeatedly, varying position, scale, category…
Paradigm introduced by Rowley, Baluja & Kanade 96 for face detection.
Viola & Jones 01, Dalal & Triggs 05, Felzenszwalb, McAllester, Ramanan 08
UC Berkeley
Computer Vision Group
Detection: Is this an X?
Ask this question repeatedly, varying position, scale, category…
Paradigm introduced by Rowley, Baluja & Kanade 96 for face detection
Viola & Jones 01, Dalal & Triggs 05, Felzenszwalb, McAllester, Ramanan 08
UC Berkeley
Computer Vision Group
Problems with the multi-scale scanning paradigm
• Computational
complexity
•10^6 windows, 10 scales, 10^4 categories
• Not natural for irregularly shaped objects
• Segmentation is delinked
• Context is delinked
UC Berkeley
Computer Vision Group
Our Approach
• Perceptual Organization provides the right primitives for visual
recognition.
• After more than a decade of work, we finally have high quality,
generic, detectors for contours and regions. We now only need
to work with ~100 elements, each with its local scale estimate.
• In this talk, we demonstrate recognition using regions.
Detection and segmentation happen in the same framework.
• There will always be some errors in the bottom-up grouping
process, the recognition machinery needs to be robust to that.
UC Berkeley
Computer Vision Group
Contour Detection (CVPR 2008)
UC Berkeley
Computer Vision Group
Region Detection (CVPR 2009)
UC Berkeley
Computer Vision Group
Region detector wins on any measure!
Region Benchmarks on BSDS
Probabilistic Rand Index on BSDS
UC Berkeley
Region Benchmarks on MSRC/PASCAL08
Variation of Information on BSDS
Computer Vision Group
Parallelizing Image Segmentation
Catanzaro et al, UC Berkeley, ICCV 09
• GTX 280 is an Nvidia Graphics Processor, massively parallel general
purpose computing platform
– 30 cores, 8 wide SIMD
= 240 way parallelism
– 140 GB/s memory bandwidth
(Modern CPUs have ~10-20 GB/s)
– Special memory subsystems for
graphics processing
• Sequential Implementation: 5 minutes per image
• Parallel, Optimized Implementation: 2 seconds
UC Berkeley
Computer Vision Group
Why Use Regions?
• Local estimate of scale; no search necessary
• Shape, color and texture in the same framework
• Hierarchy of regions (“partonomy”) represents
scenes, objects, parts. Makes use of context natural.
• Do not suffer from background clutter
• Reduce candidate windows on detection task
– 1000 to 10000 times fewer windows on the ETHZ dataset
• Need to be robust to segmentation errors
UC Berkeley
Computer Vision Group
Object Representation using Regions
Region
Segmentation
Bag of Regions
UC Berkeley
Computer Vision Group
Region Representation
UC Berkeley
Computer Vision Group
Discriminative Weight Learning
• Not all regions are equally important
DIK
DIJ
image J
exemplar I
image K
DIJ = Σi wi · diJ and diJ=minj χ2(fiI, fjJ)
want: DIK > DIJ
Max-margin formulation results
in a sparse solution of weights.
Frome, Singer and Malik. NIPS ‘06
UC Berkeley
Computer Vision Group
Weight Learning Results
UC Berkeley
Computer Vision Group
Classification
• For exemplar 1,2,…,N of class C, define
likelihood function of query image J:
1 N
LJ (C )   f I ( DIJ )
N I 1
where f converts a distance to a similarity measure
(e.g. logistic regression, negation)
• The predicted category label for image J:
~
CJ  arg max( LJ (C ))
C
UC Berkeley
Computer Vision Group
Voting for Detection Windows
WE
Exemplar
Query
Back projection: WijQ = f (RiE,Qj,WE)
Clustering: {WijQ }
UC Berkeley
Computer Vision Group
Generating Object Support Hypothesis
Query Image
Exemplar Image
Matched Region
Projected exemplar mask
Exemplar Region
Hypothesis
UC Berkeley
Computer Vision Group
Refine Object Support Hypothesis
Hypothesis
Query
Constraints
Result
UCM
Constrained Segmentation Algorithm: Propagates the labels on the leaves to the rest of the region tree by
constructing a voronoi Tessellation with respect to the ultrametric distance.
[ Arbeláez, Cohen. CVPR08]
UC Berkeley
Computer Vision Group
ETHZ Shape (Ferrari et al. 06)
• Contains 255 images of 5 diverse shape-based
classes.
UC Berkeley
Computer Vision Group
Detection Results on ETHZ
Det. rate at 0.3FPPI
Hough baseline1
kAS 1
Shape 2
Ours
31.0%
62.4%
67.2%
87.1±2.8%
UC Berkeley1. Ferrari et al. PAMI 2008.
2. Ferrari, Jurie, Schmid. CVPR 2007
Computer Vision Group
Detection Results on ETHZ
UC Berkeley
Computer Vision Group
Detection Results on ETHZ
UC Berkeley
Computer Vision Group
Segmentation Results on ETHZ
Orig. Image Segmentation
Orig. Image
Segmentation
The mean average precision is 75.7±3.2%
UC Berkeley
Computer Vision Group
Segmentation Results on ETHZ
Orig. Image Segmentation
UC Berkeley
Orig. Image
Segmentation
Computer Vision Group
Segmentation Results on ETHZ
Orig. Image Segmentation
UC Berkeley
Orig. Image
Segmentation
Computer Vision Group
Segmentation Results on ETHZ
Orig. Image Segmentation
UC Berkeley
Orig. Image
Segmentation
Computer Vision Group
Segmentation Results on ETHZ
Orig. Image Segmentation
UC Berkeley
Orig. Image
Segmentation
Computer Vision Group
Complexity Reduction
UC Berkeley
Computer Vision Group
Caltech 101 results
UC Berkeley
Computer Vision Group
Context from region tree (ICCV 09)
UC Berkeley
Computer Vision Group
MSRC dataset
UC Berkeley
Computer Vision Group
Confusion matrix (mean diagonal 67%)
UC Berkeley
Computer Vision Group
A framework for image understanding
• Our approach
–
–
–
–
Bottom up region segmentation
Hough transform style voting (learned weights)
Top down segmentation
Capture context by region tree
• Results on ETHZ , Caltech 101, MSRC competitive
• Lot more needs to be done, but I think this is the right
way to approach the grand problem of image
understanding
UC Berkeley
Computer Vision Group