Transcript Slide 1

Visipedia: Visual Recognition with Humans in the Loop
Steve Branson1 Catherine Wah1 Boris Babenko1 Florian Schroff1 Peter Welinder2 Pietro Perona2 Serge Belongie1
2Electrical
1Computer
Science and Engineering
University of California, San Diego
Engineering
California Institute of Technology
{sbranson,cwah,bbabenko,gschroff,sjb}@cs.ucsd.edu
{welinder,perona}@caltech.edu
Modeling User Responses
Interactive Object Recognition
Abstract
We introduce Visipedia, a user-generated Encyclopedia of visual knowledge that is
intended to enrich the content of Wikipedia. Visual data is the predominant sensory
input through which people observe the world, people are visual learners, and visual
images are fundamentally important toward the ways in which people encode knowledge
and perceive the world. Unfortunately, the organization of visual content on the web is
still very impoverished. This is in large part due to the raw size and complexity of images
and the non-existence of scalable computer vision algorithms capable of automatically
recognizing or organizing images on a semantic level. The shortcomings of computer
vision algorithms can in part be explained by a shortage in the quantity and quality of the
labeled visual images necessary for training machine learning algorithms. We propose a
collaborative effort between computers and humans toward the development of
Visipedia, where the initial user-generated population of Visipedia will help train machine
learning algorithms, which will in turn help automate the process of building Visipedia.
Toward this aim, we propose new paradigms for interactive algorithms combining
computer vision with user-input, richer representations for representing visual objects
than are traditionally studied in computer vision, and earning algorithms that are more
scalable to Internet-scale recognition.
Caltech Birds-200 Dataset
MTurker Label Certainty
• Image harvesting: text search of species name on Flickr
User Responses are Stochastic
• Data cleaning: identifying bird presence/absence with Amazon Mechanical Turk (“a
Rose-breasted Grosbeak
marketplace for work that requires human intelligence” [http://www.mturk.com])
What is Visipedia?
• Visual counterpart to Wikipedia
• User-generated encyclopedia of visual knowledge
• An effort to associate Wikipedia articles with large
quantities of well-organized, intuitive visual concepts
• A paradigm for combining computer vision and machine
learning with human annotation
Attribute-based Classification
Q: Is the belly red? yes (Def)
Q: Is the breast black? yes (Def.)
Q : Is the primary color red? yes (Def.)
• Visual attributes from http://www.whatbird.com
- Attribute classification tasks might be easier
- Easier to incorporate human knowledge
Adding Computer Vision Helps
• Attribute labeling: MTurk interface
• Computer vision reduces manual labor
• Computer vision improves performance
• Different questions are asked with and without computer vision
Motivation
Western Grebe
Rose-breasted
Grosbeak
Yellow-headed
Blackbird
Only CV
CV + Q #1:
Is the crown
black? yes
(Def.)
w/ vision:
Q #1: Is the throat white? yes (Def.)
w/o vision:
Q #1: Is the shape perching-like? no (Def.)
Rosebreasted
Grosbeak
Recognition is not Always Successful
Visual 20-Questions Game
Need for more training data
Need for more realistic data
Parakeet Auklet
Least Auklet
Sayornis
Gray Kingbird
Indigo Bunting
Blue Grosbeak
• Choose question to maximize expected information gain
Q : Is the belly multicolored? yes (Def.)
Dealing with Many Related Classes
MTurker Feedback
(A) Easy for Humans (B) Hard for Humans (C) Easy for Humans
Computer Vision
Chair? Airplane? …
Finch? Bunting?…
Yellow Belly? Blue Belly? …
Input Image (
)
Question 1:
Is the belly black?
Question 2:
Is the bill hooked?
A: NO
A: YES
•
•
•
•
•
•
•
“These hits were fun. Will you be posting more of them anytime soon? Thanks!”
“These are Beautiful birds and I am enjoying this hit collection”
“I really enjoy doing your hits, they are fun and interesting. Thanks.”
“Love doing these because I'm a bird watcher.”
“the birds are so cute..hope u can send more kind of birds”
“I REALLY LOVE THE COLOR OF THE BIRDS.”
“Thank you for providing this job. The fact that the images are beautiful to look at make it
a lot more enjoyable to do!”
• Hourly Wage ≈ $1.25