Visual Recognition With Humans in the Loop

Download Report

Transcript Visual Recognition With Humans in the Loop

Visual Recognition With Humans
in the Loop
ECCV 2010, Crete, Greece
Steve Branson
Catherine Wah
Florian Schroff
Boris Babenko
Serge Belongie
Peter Welinder
Pietro Perona
1
What type of
bird is this?
2
Field Guide
What type of
bird is this?
…?
3
Computer Vision
?
What type of
bird is this?
4
Computer Vision
Bird?
What type of
bird is this?
5
Computer Vision
What type of
bird is this?
Chair?
Bottle?
6
• Field guides difficult for
average users
• Computer vision doesn’t
work perfectly (yet)
• Research mostly on basiclevel categories
Parakeet Auklet
7
Visual Recognition With Humans in the
Loop
What kind of
bird is this?
Parakeet Auklet
8
Levels of Categorization
Basic-Level Categories
Airplane? Chair?
Bottle? …
[Griffin et al. ‘07, Lazebnik et al. ‘06, Grauman et al. ‘06,
Everingham et al. ‘06, Felzenzwalb et al. ‘08, Viola et al. ‘01, … ]
9
Levels of Categorization
Subordinate Categories
American Goldfinch?
Indigo Bunting? …
[Belhumeur et al. ‘08 , Nilsback et al. ’08, …]
10
Levels of Categorization
Parts and Attributes
Yellow Belly?
Blue Belly?…
[Farhadi et al. ‘09, Lampert et al. ’09, Kumar et al. ‘09]
11
Visual 20 Questions Game
Blue Belly? no
Cone-shaped Beak? yes
Striped Wing? yes
American Goldfinch? yes
Hard classification problems can be turned into
a sequence of easy ones
12
Recognition With Humans in the Loop
Computer Vision
Cone-shaped Beak? yes
Computer Vision
American Goldfinch? yes
• Computers: reduce number of required questions
• Humans: drive up accuracy of vision algorithms
13
Research Agenda
2010
Heavy Reliance on
Human Assistance
Blue belly? no
Cone-shaped beak? yes
Striped Wing? yes
American Goldfinch? yes
2015
More Automated
Computer
Vision
Improves
Striped Wing? yes
American Goldfinch? yes
Fully Automatic
American Goldfinch? yes
2025
14
Field Guides
www.whatbird.com
15
Field Guides
www.whatbird.com
16
Example Questions
17
Example Questions
18
Example Questions
19
Example Questions
20
Example Questions
21
Example Questions
22
Basic Algorithm
Computer Vision
Input Image ( x )
p(c | x)
Max Expected
Information Gain
Question 1:
Is the belly black?
A: NO
u1
Max Expected
Information Gain
Question 2:
Is the bill hooked?
…
p(c | x, u1 )
A: YES
u2
p(c | x, u1 , u2 )
23
Without Computer Vision
Class Prior
Input Image ( x )
p(c )
Max Expected
Information Gain
Question 1:
Is the belly black?
A: NO
u1
Max Expected
Information Gain
Question 2:
Is the bill hooked?
…
p(c | u1 )
A: YES
u2
p(c | u1 , u2 )
24
Basic Algorithm
Select the next question that maximizes expected
information gain:
• Easy to compute if we can to estimate probabilities of
the form:
p(c | x, u1 , u2 ...ut )
Object
Class
Image
Sequence of user
responses
25
Basic Algorithm
p(c | x, u1 , u2 ...ut )
 p(u1, u2 ...ut | c) p(c | x) Z
Model of user
responses
Computer Normalization
factor
vision
estimate
26
Basic Algorithm
p(c | x, u1 , u2 ...ut )
 p(u1, u2 ...ut | c) p(c | x) Z
Model of user
responses
Computer Normalization
factor
vision
estimate
27
Modeling User Responses
p(u | c)
• Assume: p(u , u ...u | c)  
1
2
t
i 1...t
i
• Estimate p(ui | c) using Mechanical Turk
Guessing
grey
red
black
white
brown
blue
Pine Grosbeak
Probably
grey
red
black
white
brown
blue
grey
red
black
white
brown
blue
Definitely
What is the color of the belly?
28
Incorporating Computer Vision
• Use any recognition algorithm that can
estimate: p(c|x)
• We experimented with two simple methods:
p(c | x)  exp{  m( x)}
1-vs-all SVM
p(c | x)  i p(ai | c)
Attribute-based
classification
[Lampert et al. ’09, Farhadi et al. ‘09]
29
Incorporating Computer Vision
• Used VLFeat and MKL code + color features
Geometric Blur
Self Similarity
Color SIFT, SIFT
Color Histograms
Color Layout
Multiple Kernels
Bag of Words
Spatial Pyramid
[Vedaldi et al. ’08, Vedaldi et al. ’09]
30
Birds 200 Dataset
•
•
200 classes, 6000+ images, 288 binary attributes
Why birds?
Black-footed
Albatross
Arctic Tern
Groove-Billed
Ani
Forster’s Tern
Parakeet Auklet Field Sparrow
Common Tern
Vesper Sparrow
31
Baird’s Sparrow Henslow’s Sparrow
Birds 200 Dataset
•
•
200 classes, 6000+ images, 288 binary attributes
Why birds?
Black-footed
Albatross
Arctic Tern
Groove-Billed
Ani
Forster’s Tern
Parakeet Auklet Field Sparrow
Common Tern
Vesper Sparrow
32
Baird’s Sparrow Henslow’s Sparrow
Birds 200 Dataset
•
•
200 classes, 6000+ images, 288 binary attributes
Why birds?
Black-footed
Albatross
Arctic Tern
Groove-Billed
Ani
Forster’s Tern
Parakeet Auklet Field Sparrow
Common Tern
Vesper Sparrow
33
Baird’s Sparrow Henslow’s Sparrow
Results: Without Computer Vision
Comparing Different User Models
34
Results: Without Computer Vision
Perfect Users: 100% accuracy in 8≈log2(200) questions
if users answers agree with field guides…
35
Results: Without Computer Vision
Real users answer questions
MTurkers don’t always agree with field guides…
36
Results: Without Computer Vision
Real users answer questions
MTurkers don’t always agree with field guides…
37
Results: Without Computer Vision
Probabilistic User Model: tolerate imperfect user responses
38
Results: With Computer Vision
39
Results: With Computer Vision
Users drive performance: 19%  68%
Just
Computer
Vision
19%
40
Results: With Computer Vision
Computer Vision Reduces Manual Labor: 11.1 6.5 questions
41
Examples
Different Questions Asked w/ and w/out Computer Vision
Western Grebe
Without computer vision:
Q #1: Is the shape perching-like? no (Def.)
With computer vision:
Q #1: Is the throat white? yes (Def.)
perching-like
42
Examples
User Input Helps Correct Computer Vision
Magnolia Warbler
Common Yellowthroat
computer vision
Common Yellowthroat
Is the breast pattern solid?
no (definitely)
Magnolia Warbler
43
Recognition is Not Always Successful
Acadian Flycatcher
Parakeet Auklet
Unlimited questions
Least Flycatcher
Least Auklet
Is the belly
multicolored?
yes (Def.)
44
Summary
19%
Recognition of fine-grained
categories
Users drive up performance
11.1 6.5 questions
More reliable than field
guides
Computer vision reduces
manual labor
45
Summary
19%
Recognition of fine-grained
categories
Users drive up performance
11.1 6.5 questions
More reliable than field
guides
Computer vision reduces
manual labor
46
Summary
19%
Recognition of fine-grained
categories
Users drive up performance
11.1 6.5 questions
More reliable than field
guides
Computer vision reduces
manual labor
47
Summary
19%
Recognition of fine-grained
categories
Users drive up performance
11.1 6.5 questions
More reliable than field
guides
Computer vision reduces
manual labor
48
Future Work
• Extend to domains other than birds
• Methodologies for generating questions
• Improve computer vision
49
Questions?
Project page and datasets available at:
http://vision.caltech.edu/visipedia/
http://vision.ucsd.edu/project/visipedia/
50