VizWiz: nearly real-time answers to visual questions

Download Report

Transcript VizWiz: nearly real-time answers to visual questions

Labor Marketplace Applications
KAIST KSE
Uichin Lee
TurKit: Human Computation
Algorithms on Mechanical Turk
Greg Little, Lydia B. Chilton, Rob Miller,
and Max Goldman
(MIT CSAIL)
UIST 2010
Workflow in M-Turk
HIT
HIT
Requester posts
HIT Groups to
Mechanical Turk
HIT
HIT
HIT
HIT
Data
Collected
in CSV
File
Data Exported
for Use
Workflow: Pros & Cons
• Easy to run simple, parallelized tasks.
• Not so easy to run tasks in which turkers
improve on or validate each other’s work.
• TurKit to the rescue!
The TurKit Toolkit
• Arrows indicate the
flow of information
• Programmer writes
2 sets of source code:
– HTML files for web
servers
– JavaScript executed by
TurKit
• Output is retrieved via
a JavaScript database
Turkers
Mechanical Turk
Web Server
*.html
TurKit
JavaScript
Database
*.js
Programmer
Crash-and-rerun programming model
• Observation: local computation is cheap, but the external
human computation costs money
• Managing states over a long running program is challenging
– Examples: Computer restarts? Errors?
• Solution: store states in the database (just in case)
• If an error happens, just crash the program and re-run by
following the history in DB
– Throw a “crash” exception; the script is automatically re-run
• New keyword “once”:
– Remove non-determinism
– Don’t need to re-execute an expensive operation (when re-run)
Example: quicksort
Parallelism
• First time the script runs,
HITs A and C will be
created
• For a given forked branch,
if a task fails (e.g., HIT A),
TurKit crashes the forked
branch (and re-run)
• Synchronization w/ join()
MTurk Functions
• Prompt(message, # of people)
– mturk.prompt("What is your favorite color?", 100)
• Voting(message, options)
• Sort(message, items)
VOTE()
SORT()
TurKit: Implementation
• TurKit: Java using Rhino to interpret JavaScript code, and
E4X2 to handle XML results from MTurk
• IDE: Google App Engine3 (GAE)
Online IDE
Soylent: A Word Processor with a
Crowd Inside
Michael Bernstein, Greg Little, Rob Miller, David Karger,
David Crowell, Katrina Panovich
MIT CSAIL
Björn Hartmann, Mark Ackerman
UC Berkeley University of Michigan
UIST 2010
Part of slides are from http://projects.csail.mit.edu/soylent/
Soylent: a word processing interface
• Uses crowd contributions to aid complex writing tasks.
Challenges for Programming Crowds
• The authors have interacted with ~9000 Turkers
on ~2000 different tasks
• Key Problem: crowd workers often produce poor
output on open-ended tasks
– High variance of effort: Lazy Turker vs. Eager Beaver
– Errors made by Turkers; e.g., “Of Mice and Men” 
modifying text to “Of Mice” or calling it a movie..
• 30% Rule: ~30% of the results from open-ended
tasks will be unsatisfactory
• Solution: find-fix-verify
(e.g., given to 10 Tuckers)
(at least 20% of the Turkers agree upon)
(e.g., given to 5 Tuckers)
(e.g., given to 5 Tuckers)
Find-Fix-Verify Discussion
•
Why split Find and Fix?
–
–
•
Why add Verify?
–
–
•
Force Lazy Turkers (tend to choose easiest work) to work on a problem of our choice
Edits to “highlighted patches” (instead of a whole paragraph) make the task manageable; and also allow
workers to perform tasks in parallel (and easy to merge)
Quality rises when we place Turkers in productive tension
Allows trading off lag time with quality
Timeout at each stage for responsive operations (e.g., 10-20 minutes)
Evaluation
• Implementation
– Microsoft Word plug-in using MS Visual Studio
Tools for Office (VSTO) and Windows Presentation
Foundation (WPF)
– TurKit Mechanical Turk toolkit
• Evaluation:
– Shortn, Crowdproof, Human Macro
• Metrics: quality, delay, cost
Shortn Evaluation
• Setting:
– Find: 6-10 workers ($0.08 per Find)
– Fix/Verify: 3-5 workers ($0.05 per Fix, $0.04 per Verify)
– Delay: wait time, work time
• Results:
–
–
–
–
–
Reduction to 78%-90%..
Wait time of all stages: median 18.5 minutes
Work time: median 118 seconds
Average costs: $1.41 ($0.55 + $0.48 + $0.38)
Caveats: some modifications are grammatically
appropriate, but stylistically incorrect
Crowdproof Evaluation
• Tested 5 input texts
– Manually labeled all spelling, grammatical and style
errors in each of the five inputs (total 49 errors)
• Ran Crowdproof w/ a 20-minute stage timeout;
and measure how many corrections Crowdproof
makes
– Task setup: Find: $0.06, Fix: $0.08, Verify: $0.04
• Soylent: 67% vs. MS Word (grammar check): 30%
– Combined: 82%
Human Macro Evaluation
• Give 5 prompts (Input and Output) to users (CS students, Admin, Author)
• Ask to generate descriptions (Request) used for Human Macro tasks
Discussion
• Delay in interface outsourcing: minutes to
hours..
• Privacy? Confidential documents?
• Legal ownership?
• Lack of domain knowledge or shared context
to usefully contribute
VizWiz: Nearly Real-time
Answers to Visual Questions
Jeffrey P. Bigham, Chandrika Jayant,
Hanjie Ji, Greg Little, Andrew Miller,
Robert C. Miller, Robin Miller, Aubrey
Tatarowicz, Brandyn White, Samual White,
and Tom Yeh. UIST 2010.
Part of slides are from: http://husk.eecs.berkeley.edu/courses/cs298-52-sp11/images/8/8d/Vizwiz_soylent.pdf
VizWiz
• “automatic and human-powered services to
answer general visual questions for people
with visual impairments.”
• Lets blind people use mobile phones to:
1. Take a photo
2. Speak a question
3. Receive multiple spoken answers
Motivation
• Current technology uses automatic approaches to
help blind people access visual information
– Optical Character Recognition (OCR)
– Ex) Kurzweil knfbReader: ~$1,000
• Problems: error-prone, limited in scope,
expensive
– Ex: OCR cannot read graphic labels, handwritten
menu, street sign
Motivation
• Solution: ask real people
• Can phrase questions naturally
– “What is the price of the cheapest salad?” vs. OCR
reading the entire menu
• Feedback
– Real people can guide blind people to take better
photos
• Focus on blind people’s needs, not current
technology
Human-Powered Services
• Problem: Response latency
• Solution: quikTurkit (and some tricks)
– “First attempt to get work done by web based
workers in nearly real-time”
– Maintain a pool of workers to answer questions
quikTurkit
• “requesters create their own web site on which
Mechanical Turk workers answer questions.”
• “answers are posted directly to the requester’s
web site, which allows [them] ... to be returned
before an entire HIT is complete.”
• “workers are required to answer multiple
previously-asked questions to keep workers
around long enough to possibly answer new
questions”
Answering Site
Deployment
• Setting: 11 blind iPhone users
• quikTurkit:
– Reward range: $0.01 (1/2 of jobs), $0.02 (1/4), $0.03 (1/4)
• Results: 86.6% of first answers “correct”
– Average of 133.3s latency for first answer
Problems: Photos too dark or too
blurry and thus unanswerable.
VizWiz 2.0 detects and alerts
users if photo is too dark or blurry
VizWiz: LocateIt
• Combine VizWiz with computer vision to help blind
people locate objects
(distance)
LocateIt mobile:
Sensor (zoom and
filter) and sonification
modules
(angle)
Web interface
Future Work
• Expand worker pools beyond Mechanical Turk
(e.g. social network)
• Reduce cost by using game, volunteers,
friends
• Improve interface to make photo-taking easier
for blind people
• Combine automatic approaches to improve
delay
Discussion
• Resource usage vs. delay (wasting resources for
better responses or near real-time services?)
– Any better approach than quikTurkit?
• Quality control? How do we make sure that the
workers correctly identified the photos?
– How do systems accept/reject the submissions? (if it’s
becoming a large scale service?)
• Other application scenarios with quikTurkit?
• Adding inertial sensors to LocateIt?