Submodular meets Structured: Finding Diverse Subsets in Exponentially-Large Structured Item Sets (and what to do with them) Dhruv Batra Virginia Tech.

Download Report

Transcript Submodular meets Structured: Finding Diverse Subsets in Exponentially-Large Structured Item Sets (and what to do with them) Dhruv Batra Virginia Tech.

Submodular meets Structured:
Finding Diverse Subsets in
Exponentially-Large Structured Item Sets
(and what to do with them)
Dhruv Batra
Virginia Tech
Submodular meets Structured:
Finding Diverse Subsets in
Exponentially-Large Structured Item Sets
(and what to do with them)
Dhruv Batra
Virginia Tech
Adarsh Prasad
Stefanie Jegelka
Joint work with:
Students:
Aishwarya Agrawal (VT), Harsh Agrawal (VT), Neelima Chavali (VT), Michael Cogswell (VT),
Gordon Christie (VT), Abner Guzman-Rivera (UIUC), Ankit Laddha (CMU), Adarsh Prasad (UT-Austin),
Xiao Lin (VT), Clint Solomon (VT), Qing Sun (VT), Payman Yadollahpour (TTIC)
Colleagues:
Stefanie Jegelka (UC Berkeley), Pushmeet Kohli (MSR), Greg Shakhnarovich (TTIC),
Danny Tarlow (MSR).
Motivation: Uncertainty in AI
“uncertainty arises because of
limitations in our ability to observe the world,
limitations in our ability to model it,
and possibly even because of innate nondeterminism.”
-- Koller & Friedman
(C) Dhruv Batra
3
Local Ambiguity
(C) Dhruv Batra
slide credit: Fei-Fei Li, Rob Fergus & Antonio Torralba 4
“I saw her duck”
(C) Dhruv Batra
Image Credit: Liang Huang
5
“I saw her duck”
(C) Dhruv Batra
Image Credit: Liang Huang
6
“I saw her duck”
(C) Dhruv Batra
Image Credit: Liang Huang
7
“I saw her duck with a telescope…”
(C) Dhruv Batra
Image Credit: Liang Huang
8
Global Ambiguity
• “While hunting in Africa, I shot an elephant in my pajamas.
How an elephant got into my pajamas, I’ll never know!”
– Groucho Marx (1930)
(C) Dhruv Batra
9
Problems
Model-Class
is
Wrong!
Single
Prediction
= Uncertainty
Mismanagement
Not
Enough
Training
Data!
MAP
is NP-Hard
--Inherent
Approximation
Error
Ambiguity
-- Estimation Error
-- Optimization Error
Need: Better
Representation
of Uncertainty
-- Bayes
Error
(C) Dhruv Batra
10
Example Result
Now what?
(C) Dhruv Batra
11
Outline of the talk
Part 1
Part 2
(C) Dhruv Batra
Now what?
12
Exponentially-Large Item Set
(C) Dhruv Batra
13
Quality/Score of Items
(C) Dhruv Batra
14
Subset Selection
(C) Dhruv Batra
15
Subset Selection
Our work: Diverse M-Best Solutions
[ECCV ’12; NIPS ‘14]
(C) Dhruv Batra
-
Greedy Submodular Maximization
-
Provably Near-Optimal Subset Selection
16
Score + Diversity
(C) Dhruv Batra
17
Submodularity in a slide
A={y1,y2}
y1
y2
y
F(A U {y}) − F(A)
≥
y1
F(B U {y}) − F(B)
y2
y3
y4
y
B = {y1,y2,y3,y4}
Slide Credit: submoduliarity.org
18
(Modular) Score + (Submodular) Diversity
(C) Dhruv Batra
19
Greedy Subset Selection
Exponentially Large Space!
Can’t enumerate
(C) Dhruv Batra
20
Score = Factor Graph
y1
1
1
10
0
kx1
y2
MAP
0
10
…
10
Inference
10
0
yn
10
kxk
Most Likely Assignment
Node Scores /
Local Rewards
(C) Dhruv Batra
Edge Scores /
Distributed Prior
21
Greedy Subset Selection
(C) Dhruv Batra
22
Diversity via Groups
person
dog person
dog
horse person
horse
dog
horse
(C) Dhruv Batra
23
Diversity = Coverage of Groups
person
dog person
dog
horse person
horse
dog
horse
(C) Dhruv Batra
24
Diversity = (Soft) Coverage of Groups
person
dog person
dog
horse person
horse
dog
horse
(C) Dhruv Batra
25
Diversity = Coverage of Groups
person
dog person
dog
horse person
horse
dog
horse
(C) Dhruv Batra
26
Marginal Gain = Label Score
0
+1
0
+1
0
1
1
dog person
person
dog
1
2
horse person
horse
dog
1
horse
(C) Dhruv Batra
27
Greedy Subset Selection
+
Just a modified MAP call!
(C) Dhruv Batra
28
Diversity = High-Order Potential (HOP)
+
Diversity
HOP
Label Occurrence
Label Transitions
Label Costs [Delong et al. CVPR10]
Cooperative Cuts [Jegelga et al. CVPR 11;
Kohli et al. CVPR13]
(C) Dhruv Batra
Hamming Ball
DivMBest
Cardinality Potential [Tarlow et al AISTATS10]
Node Potential Perturb [Batra et al ECCV12]
29
[Prasad, Jegelka, Batra, NIPS14]
(C) Dhruv Batra
30
Outline of the talk
Part 1
Part 2
(C) Dhruv Batra
Now what?
31
Your Options
• Nothing: User-in-the-loop [ECCV12]
– Additional Information: None
• Tracking [ECCV12]
– Additional Information: Time
• (Approximate) Min Bayes Risk [CVPR14]
– Additional Information: Loss function
• Re-ranking [CVPR13]
– Additional Information: higher-order constraints
• Holistic Scene Understanding [Under Review]
Increasing Side Information
(C) Dhruv Batra
32
Interactive Segmentation
• Setup
– Model: Color/Texture + Potts Grid CRF
– Inference: Graph-cuts
– Dataset: 50 train/val/test images from PASCAL
Image + Scribbles
MAP
2nd Best MAP
1-2 Nodes Flipped
(C) Dhruv Batra
Diverse 2nd Best
100-500 Nodes Flipped
33
Your Options
• Nothing: User-in-the-loop [ECCV12]
– Additional Information: None
• Tracking [ECCV12]
– Additional Information: Time
• (Approximate) Min Bayes Risk [CVPR14]
– Additional Information: Loss function
• Re-ranking [CVPR13]
– Additional Information: higher-order constraints
• Holistic Scene Understanding [Under Review]
Increasing Side Information
(C) Dhruv Batra
34
Pose Estimation
• Setup
– Model: Mixture of Parts Tree [Park & Ramanan, ICCV ‘11]
– Inference: Dynamic Programming
– Dataset: PARSE
(C) Dhruv Batra
Image Credit: [Yang & Ramanan, ICCV ‘11]
35
Pose Estimation: 10 guesses/frame
• a
(C) Dhruv Batra
[Premachandran, Tarlow, Batra, CVPR14]
36
Pose Tracking
• Chain CRF with M states at each frame
DivMBest
Solutions
(C) Dhruv Batra
Image Credit: [Yang & Ramanan, ICCV ‘11]
37
Pose Tracking
MAP
(C) Dhruv Batra
DivMBest + Viterbi
38
Your Options
• Nothing: User-in-the-loop [ECCV12]
– Additional Information: None
• Tracking [ECCV12]
– Additional Information: Time
• (Approximate) Min Bayes Risk [CVPR14]
– Additional Information: Loss function
• Re-ranking [CVPR13]
– Additional Information: higher-order constraints
• Holistic Scene Understanding [Under Review]
(C) Dhruv Batra
39
Your Options
• Nothing: User-in-the-loop [ECCV12]
– Additional Information: None
• Tracking [ECCV12]
– Additional Information: Time
• (Approximate) Min Bayes Risk [CVPR14]
– Additional Information: Loss function
• Re-ranking [CVPR13]
– Additional Information: higher-order constraints
• Holistic Scene Understanding [Under Review]
(C) Dhruv Batra
46
Example Result
(C) Dhruv Batra
47
PASCAL VOC Semantic Segmentation
DivMBest (Oracle)
Better
PACAL Accuracy
59%
15%-gain possible
56%
Same Features
Same Model
Deep SegNet [Under Review]
53%
[Hariharan ECCV14]
50%
+Deep Features
[CVPR14 workshop]
Re-ranking [CVPR13]
47%
MBR [CVPR14]
MAP
[Carriera ECCV12]
Previous state of art
44%
1
2
3
4
5
6
7
8
9
10
#Solutions / Image
(C) Dhruv Batra
53
Your Options
• Nothing: User-in-the-loop [ECCV12]
– Additional Information: None
• Tracking [ECCV12]
– Additional Information: Time
• (Approximate) Min Bayes Risk [CVPR14]
– Additional Information: Loss function
• Re-ranking [CVPR13]
– Additional Information: higher-order constraints
• Holistic Scene Understanding [Under Review]
(C) Dhruv Batra
54
Pose Estimation
Coarse 3D Layout
Object Segmentation
Scene Recognition
Street
Hypothesis
#1
Inside City
Hypothesis
#2
Consistent
Consistent
Consistent
Construction
Hypothesis
#M
(C) Dhruv Batra
55
“A dog is
standing next
to a woman
on a couch”
Ambiguity: (woman on couch)
vs (dog on couch)
Vision: Semantic Segmentation
Labels:
Hypothesis
#1
NLP: Sentence Parsing
Chairs, desks, etc
Output:
Parse Tree
Person
Couch
Dog
Consistent
Person
Hypothesis
#M
(C) Dhruv Batra
Couch
PASCAL
Sentence Dataset
56
NYU-v2
RGBD
Dataset
Semantic Segmentation
Labels:
Hypothesis
#1
Chairs, desks, etc
3D Support Estimation
Supported from
below, behind, etc
Other Wall
Structure
Wall
Other Table
Prop
Hypothesis
#M
(C) Dhruv Batra
Other
Wall
Structure
Television
Chair Table
57
NYU-v2
RGBD
Dataset
Semantic Segmentation
Labels:
Hypothesis
#1
Chairs, desks, etc
3D Support Estimation
Supported from
below, behind, etc
Other Wall
Structure
Wall
Other Table
Prop
Hypothesis
#M
(C) Dhruv Batra
Other
Wall
Structure
Television
Chair Table
58
(C) Dhruv Batra
59
(C) Dhruv Batra
60
(C) Dhruv Batra
61
NYU-v2
RGBD
Dataset
Semantic Segmentation
Labels:
Hypothesis
#1
Chairs, desks, etc
3D Support Estimation
Supported from
below, behind, etc
Other Wall
Structure
Wall
Other Table
Prop
Hypothesis
#M
(C) Dhruv Batra
Other
Wall
Structure
Television
Chair Table
62
NYU-v2
RGBD
Dataset
Semantic Segmentation
Labels:
Hypothesis
#1
3D Support Estimation
Chairs, desks, etc
Supported from
below, behind, etc
Other Wall
Structure
Wall
Other Table
Prop
Consistent
Hypothesis
#M
(C) Dhruv Batra
Other
Wall
Structure
Television
Chair Table
63
Interpretation: Projected Message Passing
Vision: Semantic Segmentation
NLP: Sentence Parsing
Vision: Human Body Pose
Vision: 3D Layout
(C) Dhruv Batra
64
Summary
• Perception problems are ambiguous
• All models are wrong
• Need communication between many different
modules or perception sub-problems
• Key Problem: State-space explosion
– {all-3D-states} x {all-segmentation-states} x …
• Key Idea:
– Keep around multiple plausible structured hypotheses
– Natural connections to submodular maximization
(C) Dhruv Batra
65
Acknowledgements
• Students
– Virginia Tech
• Aishwarya Agrawal, Harsh Agrawal, Neelima Chavali, Michael
Cogswell, Gordon Christie, Xiao Lin, Clint Solomon, Qing Sun
– External
• Abner Guzman-Rivera (UIUC), Ankit Laddha (CMU), Adarsh Prasad
(UT-Austin), Payman Yadollahpour (TTIC)
• Collaborators
– Stefanie Jegelka (UC Berkeley), Pushmeet Kohli (MSR)
Greg Shakhnarovich (TTIC), Danny Tarlow (MSR)
• Sponsors
(C) Dhruv Batra
66
Thanks!
(C) Dhruv Batra
67