TurKit - Interactive Computing Lab

Download Report

Transcript TurKit - Interactive Computing Lab

Task and Workflow Design I
KSE 801
Uichin Lee
TurKit: Human Computation
Algorithms on Mechanical Turk
Greg Little, Lydia B. Chilton, Rob Miller,
and Max Goldman
(MIT CSAIL)
UIST 2010
Workflow in M-Turk
HIT
HIT
Requester posts
HIT Groups to
Mechanical Turk
HIT
HIT
HIT
HIT
Data
Collected
in CSV
File
Data Exported
for Use
Workflow: Pros & Cons
• Easy to run simple, parallelized tasks.
• Not so easy to run tasks in which turkers
improve on or validate each others’ work.
• TurKit to the rescue!
The TurKit Toolkit
• Arrows indicate the
flow of information.
• Programmer writes
2 sets of source code:
– HTML files for web
servers
– JavaScript executed by
TurKit
• Output is retrieved via
a JavaScript database.
Turkers
Mechanical Turk
Web Server
*.html
TurKit
JavaScript
Database
*.js
Programmer
Crash-and-rerun programming model
• Observation: local computation is cheap, but the external
class cost money
• Managing states over a long running program is challenging
– Examples: Computer restarts? Errors?
• Solution: store states in the database (in case)
• If an error happens, just crash the program and re-run by
following the history in DB
– Throw a “crash” exception; the script is automatically re-run.
• New keyword “once”:
– Remove non-determinism
– Don’t need to re-execute an expensive operation (when re-run)
• But why should we re-run???
Example: quicksort
Parallelism
• First time the script runs,
HITs A and C will be
created
• For a given forked branch,
if a task fails (e.g., HIT A),
TurKit crashes the forked
branch (and re-run)
• Synchronization w/ join()
MTurk Functions
• Prompt(message, # of people)
– mturk.prompt("What is your favorite color?", 100)
• Voting(message, options)
• Sort(message, items)
VOTE()
SORT()
TurKit: Implementation
• TurKit: Java using Rhino to interpret JavaScript code, and
E4X2 to handle XML results from MTurk
• IDE: Google App Engine3 (GAE)
Online IDE
Exploring Iterative and Parallel
Human Computation Processes
Greg Little, Lydia B. Chilton
Max Goldman, Robert C. Miller
HCOMP 2010
HC Task Model
• Dimension:
– Dependent (iterative) or independent (parallel) tasks
– Creation and decision tasks
• Task model examples
Creation tasks (creating new
content): e.g., writing ideas,
imagery solutions, etc.
Decision tasks (voting/rating): e.g.,
rating the quality of a description of
an image
HC Task Model
• Combining tasks: iterative and parallel tasks
Iterative pattern: a sequence of creation tasks where the result of
each task feeds into the next one, followed by a comparison task
Parallel pattern: a set of creation tasks executed in parallel,
followed by a task of choosing the best
Experiment: Writing Image Description
• Iterative vs. parallel; each 6 creation tasks ($0.02),
followed by rating tasks (1-10 scale, $0.01)
Experiment: Writing Image Description
• Turkers in iterative condition gave better description
while parallel condition always shows an empty text
area.
Experiment: Writing Image Description
• Average rating after n iterations
– After six iterations: 7.9 vs. 7.4, t-test T29=2.1, p=0.04
iterative
parallel
Experiment: Writing Image Description
• The two outliers
(circled) represent
instances of text copied
from the Internet (with
superficial description)
Rating
• Length vs. rating:
positive correlation
Length (characters)
Experiment: Writing Image Description
• Work Quality:
– 31% mainly append content at the end, and make only minor
modifications (if any) to existing content;
– 27% modify/expand existing content, but it is evident that they
use the provided description as a basis;
– 17% seem to ignore the provided description entirely and start
over;
– 13% mostly trim or remove content;
– 11% make very small changes (adding a word, fixing a
misspelling, etc);
– 1% copy-paste superficially related content found on the
internet.
• Creating vs. improving (takes about the same time, avg. 211
seconds)
Experiment: Brainstorming
Experiment: Brainstorming
• Iterative work: higher average rating
– Biased thinking: e.g., tech -> xxtech -> yytech
• Parallel work: diversity, higher deviation (rating)
– No iteration for brainstorming
Avg. Rating
iterative
parallel
Iteration
Rating
Example: Blurry Text Recognition
Example: Blurry Text Recognition
Accuracy
• Iterative performs better than parallel
Iteration
Summary
• TurKit: a flexible programming tool for m-turk
• Various work-flow can be designed; e.g.,
iterative, parallel, and hybrid
• Iterative performs better than parallel in
several cases (e.g., image description,
brainstorming, text recognition)