Ubiquitious Human Computation

Download Report

Transcript Ubiquitious Human Computation

Ubiquitous Human Computation
KAIST KSE
Uichin Lee
May 11, 2011
Outline
• Review recent papers:
– Crowd-Sourced Sensing and Collaboration Using
Twitter, WoWMOM 2010
– Earthquake Shakes Twitter User:
Analyzing Tweets for Real-Time Event Detection,
WWW 2010
– Location-based Crowdsourcing: Extending
Crowdsourcing to the Real World, NordiCHI 2010
– Social Sensors and Pervasive Services: Approaches and
Perspectives, PerCol 2011
• Understand the potential of ubiquitous human
computation (+social networking)
Crowd-Sourced Sensing and
Collaboration Using Twitter
Murat Demirbas, Murat Ali Bayir, Cuneyt
Gurcan Akcora, Yavuz Selim Yilmaz
SUNY Buffalo
WoWMOM 2010
Slides are based on http://www.cse.buffalo.edu/~demirbas/presentations/twitter.pdf
Cellphones!
• 3-4B cellphone users worldwide
• 1.13 billion phones sold in 2009
(36 per sec) vs 0.3 billion PCs
• 174M were smartphones
– 15% (up from 12.8% in 2008)
– Expected to exceed # feature
phones
Status quo in cellphones
• Each device connects to the Internet
– to download/upload data and
– to accomplish a task that does not require
collaboration and coordination
What is missing?
• An infrastructure to assist mobile users to
perform collaboration and coordination
ubiquitously
• Any user should be able to search & aggregate
the data published by other users in a region
Our goal
• To provide a crowdsourced sensing and
collaboration service using Twitter
• To enable aggregation and sharing of data;
dynamically assign sensing tasks to other
cellphone users
Why Twitter?
• Open publish-subscribe system: 105 million users,
over 30 million users in US, 55 million tweets 600
million search queries everyday
• Each tweet has 140 char limit
• Twitter provides an open source search API and a
REST API (that enables developers to access
tweets, timelines, and user data)
• Different actors may integrate published data
differently and can offer new services in
unanticipated ways
Crowdsourcing architecture
Sensweet
• Employs the smartphone’s ability to work in the
background without distracting a mobile user
– Sense the surrounding environment and send the
resulting data to Twitter
• To search and process sensor values on Twitter,
we need to agree on a standard for publishing
these sensor readings
– Bio-code: Uses Twitter bio sections & allows users to
search for the sensors they are looking for on-the-fly
– TweetML: Uses pre-defined hashtags to improve
searchability
Askweet
• Accepts a question from Twitter
– tries to answer the question using the data on
Twitter, potentially data published by Sensweets
– if that is not possible, Askweet finds experts on
Twitter and forwards the question to these
experts (not clear how this was done in the paper)
• Parallelizable, easy to “cloudify” for scalable
service provisioning
Applications
• Crowdsourced weather
• Noise map application
• Location-based queries (with Foursquare)
1. Crowdsourced weather
• Current weather, everybody on Twitter can be an
expert
• Question to Askweet: “?Weather Loc:Buffalo,NY”
• Forwarded question:“How is the weather there now?
reply 0 for sunny, 1 for cloudy, 2 for rainy, and 3 for
snowy
http://ubicomp.cse.buffalo.edu/rainradar
Experimental results
for NYC in different
time slices
2. Noise map application
• Implemented a Sensweet client for the Nokia N97
Smartphone series
• Sensweet client detects a noise level of the
surrounding environment and forwards this data
to Twitter in the TweetML format
• Sound sample is classified into: Low, Medium,
High state
– Each level is modeled using normal distribution
– Input signal is compared with 3 distributions (Low,
Medium, and High)
Noise map application
Noise levels for a user
3. Location based queries
• Factual vs. non-factual queries
– Factual: “hotels in Miami”
– Non-factual: “Anyone knows any cheap, good
hotel, price ranges between 100 to 200 dollars in
Miami?”
• Traditional search engine performs poorly!
• Significant fraction of location-based queries (in
Twitter) is non-factual
– e.g., 63% of the queries were non-factual, while only 37%
of them were factual (manual classification of 269 queries)
Crowdsourcing Location-based Queries, Bulut et al., Pervasive Collaboration and Social Networking, 2011
http://www.percom.org/proceedings/workshops/papers/p490-bulut.pdf
Location based queries
• Aardvark uses a social network of
the asker to find suitable answerers
for the query and forwards this
query to the answerers, and returns
any answer back to the asker.
• How about Twitter + Foursquare?
– Use Foursquare to determine
users that frequent the queried
locale and that have interests on
the queried category (e.g., food,
nightlife)
– Find a right set of people to ask!
label the category and
quality of questions
tweet starting with ?
keyword checking (anyone,
suggestion, where)
Constantly polling Twitter
account to check answers
[Valid questions]
[Valid answers]
Asker
[Questions to be asked]
3
Moderator
6
5
2
[Answer to be forwarded]
[Users]
1
4
[Questions detected]
forwards validated questions to
appropriate people (using
Twitter bio or Foursquare info)
7
[Answer detected]
Experiment Setup
• Question dataset consists of 269 questions that
the system collected over Twitter and validated as
acceptable by the moderators.
• Manually categorize questions as factual and
nonfactual: 63% - non-factual; 37% factual
• Some examples of questions for each type.
Foursquare Reply Rate vs. Random
User Reply Rate Foursquare
Response Time
• 13 minutes median response time which is comparable
with Aardvark
• 50% of the answers were received within the first 20
minutes.
Earthquake Shakes Twitter User:
Analyzing Tweets for Real-Time
Event Detection
Takehi Sakaki
@tksakaki
Makoto Okazaki Yutaka Matsuo
@okazaki117
@ymatsuo
Tokyo University
WWW 2010 Conference
What’s happening?
• Twitter
– is one of the most popular microblogging services
– has received much attention recently
• Microblogging
– is a form of blogging
• that allows users to send brief text updates
– is a form of micromedia
• that allows users to send photographs or audio clips
• In this research, we focus on an important
characteristic
real-time nature
Real-time Nature of Microblogging
disastrous events
social events
storms
parties
fires
baseball games
traffic jams
presidential campaign
riots
heavy rain-falls
– Twitter users write tweets several times inearthquakes
a single day.
– There is a large number of tweets, which results in many
reports related to events
– We can know how other users are doing in real-time
– We can know what happens around other users in realtime.
Our Goals
• propose an algorithm to detect a target event
– do semantic analysis on Tweet
• to obtain tweets on the target event precisely
– regard Twitter user as a sensor
• to detect the target event
• to estimate location of the target
• produce a probabilistic spatio-temporal model for
– event detection
– location estimation
• propose Earthquake Reporting System using Japanese
tweets
Twitter and Earthquakes in Japan
a map of Twitter user
world wide
a map of earthquake
occurrences world wide
The intersection is regions with many earthquakes
and large twitter users.
Twitter and Earthquakes in Japan
Other regions:
Indonesia, Turkey, Iran, Italy, and Pacific coastal US cities
Event detection algorithms
• do semantic analysis on Tweet
– to obtain tweets on the target event precisely
• regard Twitter user as a sensor
– to detect the target event
– to estimate location of the target
Semantic Analysis on Tweet
• Search tweets including keywords related to a
target event
– Example: In the case of earthquakes
• “shaking”, “earthquake”
• Classify tweets into a positive class or a
negative class
– Example:
• “Earthquake right now!!” --- positive
• “Someone is shaking hands with my boss” --- negative
– Create a classifier
Semantic Analysis on Tweet
• Create classifier for tweets
– use Support Vector Machine(SVM)
• Features (Example: I am in Japan, earthquake right now!)
– A: Statistical features (7 words, the 5th word)
the number of words in a tweet message and the position of the
query within a tweet
– B: Keyword features ( I, am, in, Japan, earthquake, right, now)
the words in a tweet
– C: Word context features (Japan, right)
the words before and after the query word
Tweet as Sensor Data
Event detection from twitter
Probabilistic model
Object detection in
ubiquitous environment
Probabilistic model
values
Classifier
tweets
・・・ ・・・ ・・・ ・・・ ・・・
observation by twitter users
target event
observation by sensors
target object
the correspondence between tweets processing and
sensor data processing for event detection
Tweet as Sensor Data
Event detection from twitter
detect an
earthquake
search and
classify them into
positive class
some users posts
“earthquake
right now!!”
Object detection in
ubiquitous environment
detect an
earthquake
Probabilistic model
Probabilistic model
values
Classifier
tweets
・・・ ・・・
・・・ ・・・
・・・
observation by twitter users
some earthquake
sensors
responses
positive value
observation by sensors
earthquake
target event occurrence
target object
We can apply methods for sensory data detection to
tweets processing
Tweet as Sensor Data
• We make two assumptions to apply methods for observation by
sensors
• Assumption 1: Each Twitter user is regarded as a sensor
– a tweet → a sensor reading
– a sensor detects a target event and makes a report probabilistically
– Example:
• make a tweet about an earthquake occurrence
• “earthquake sensor” return a positive value
• Assumption 2: Each tweet is associated with time and location info
– time : posting timestamp
– location : GPS data or location information in user’s profile
By processing time and location information,
we can detect target events and find events’ locations
Probabilistic Model
• Why we need probabilistic models?
– Sensor readings are noisy and sometimes sensors work
incorrectly
– We cannot judge whether a target event occurred or not
from a single tweet
– We have to calculate the probability of an event
occurrence from a series of data
• We propose probabilistic models for
– event detection from time-series data
– location estimation from a series of spatial information
Temporal Model
• We must calculate the probability of an event
occurrence from a set of sensor readings
• We examine the actual time-series data to
create a temporal model
160
60
20
0
Aug 9…
Aug 9… 0
Aug 9… 0
Aug 10… 0
Aug 10… 0
Aug 10… 0
0
Aug 11…
0
Aug 11…
0
Aug 11…
0
Aug 12…
0
Aug 12…
0
Aug 12… 0
Aug 13… 0
Aug 13… 0
Aug 13… 0
Aug 14… 0
Aug 14… 0
Aug 14… 0
Aug 15… 0
Aug 15… 0
Aug 15… 0
Aug 16… 0
Aug 16… 0
Aug 16… 0
Aug 17… 0
Aug 17… 0
number of tweets
number of tweets
Temporal Model with Exponential Dist.
Example: Earthquake and Typhoon
120
140 100
120
80
100
60
80
40
40 20
0
Spatial Model
• We must calculate the probability distribution
of location of a target
• We apply Bayes filters to this problem which
are often used in location estimation by
sensors
– Kalman Filters
– Particle Filters
Bayesian Filters for Location
Estimation
• Kalman Filters
– are the most widely used variant of Bayes filters
– approximate the probability distribution which is
virtually identical to a uni-modal Gaussian
representation
– advantages: computational efficiency
– disadvantages: limited to accurate sensors or
sensors with high update rates
Bayesian Filters for Location
Estimation
• Particle Filters
– represent the probability distribution by sets of
samples, or particles
– advantages: able to represent arbitrary probability
densities
• particle filters can converge to the true posterior even in
non-Gaussian, nonlinear dynamic systems.
– disadvantages: difficult to apply to
high-dimensional estimation problems
Information Diffusion Related to
Real-time Events
• Proposed spatiotemporal models need to
meet one condition that
– sensors are assumed to be independent
• We check if information diffusions about
target events happen because
– if an information diffusion happened among users,
Twitter user sensors are not independent, they
affect each other (correlation!)
Information Diffusion Related to
Real-time Events
Information Flow Networks on Twitter
Nintendo DS Game
an earthquake
a typhoon
In the case of an earthquake and a typhoon, very little information
diffusion takes place on Twitter, compared to Nintendo DS Game
→ We assume that Twitter user sensors are independent about
earthquakes and typhoons
Experiments and Evaluation
• We demonstrate performances of
– tweet classification
– event detection from time-series data
→ show this result in “application”
– location estimation from a series of spatial
information
Evaluation of Semantic Analysis
• Queries
– Earthquake query: “shaking” and “earthquake”
– Typhoon query:”typhoon”
• Examples to create classifier
– 597 positive examples
Evaluation of Semantic Analysi
Features
Recall
Precision
F-Value
Statistical
87.50%
63.64%
73.69%
Keywords
87.50%
38.89%
53.85%
Context
50.00%
66.67%
57.14%
All
87.50%
63.64%
73.69%
• We obtain highest F-value when we use Statistical
features and all features.
• Keyword features and Word Context features don’t
contribute much to the classification performance
• A user becomes surprised and might produce a very
short tweet
• It’s apparent that the precision is not so high as the
recall
Evaluation of Spatial Estimation
• Target events
– earthquakes
• 25 earthquakes from August.2009 to October 2009
– typhoons
• name: Melor
• Baseline methods
– weighed average
• simply takes the average of latitudes and longitudes
– median
• simply takes the median of latitudes and longitudes
• Metric: distance from an epicenter
– The smaller the better!
Evaluation of Spatial Estimation
balloon: each tweets
color : post time
Kyoto
Tokyo
estimation
by median
Osaka
estimation
by particle filter
actual earthquake center
Evaluation of Spatial Estimation
Typhoon
Discussions of Experiments
• Particle filter performs better than other methods
• If the center of a target event is in an oceanic area,
it’s more difficult to locate it precisely from tweets
• It becomes more difficult to make good estimation in
less populated areas
Results of Earthquake Detection
JMA intensity scale
2 or more
3 or more
4 or more
Num of earthquakes
78
25
3
Detected
70(89.7%)
24(96.0%)
3(100.0%)
Promptly detected*
53(67.9%)
20(80.0%)
3(100.0%)
Promptly detected: detected in a minutes
JMA intensity scale: the original scale of earthquakes by Japan Meteorology Agency
Period: Aug.2009 – Sep. 2009
Tweets analyzed : 49,314 tweets
Positive tweets : 6291 tweets by 4218 users
We detected 96% of earthquakes that were stronger than
scale 3 or more during the period.
Conclusions
• We investigated the real-time nature of Twitter for event
detection
• Semantic analyses were applied to tweets classification
• We consider each Twitter user as a sensor and set a problem
to detect an event based on sensory observations
• Location estimation methods such as Kaman filters and
particle filters are used to estimate locations of events
• We developed an earthquake reporting system, which is a
novel approach to notify people promptly of an earthquake
event
• We plan to expand our system to detect events of various
kinds such as rainbows, traffic jam etc.
Location-based Crowdsourcing:
Extending Crowdsourcing to the Real
World
Alt et al.
NordiCHI 2010
Motivation
• Crowdsourcing beyond the digital?
– Seeker and solvers
– Important aspects: right time and location for
matchmaking.
• Several scenarios:
– Recommendations on demand (e.g., buying
something?)
– Recording on demand (e.g., missing lectures?)
– Remotely looking around? (e.g., apartment?)
– Real-time weather information
– Translations on demand
System Architecture
The mobile client screenshots: (a) Main menu where users can
search tasks. (b) A sample task retrieved from the database.
Lessens learned
• Users prefer address-based task selection (GPS
is too hard to parse)
• Picture tasks are most popular (easy to
handle)
• Tasks were mainly solved at or close to home
• Tasks are solved after work
• Response times vary
Lessens learned
•
•
•
•
•
•
•
Informative tasks are as popular as picture tasks
Time-critical tasks are out of interest
Solution should be achievable in 10 minutes
Tasks are still solved after work
Mid-day breaks are good times to search for task
Solving a task can take up to one day
Home and surrounding areas are the most favorite
places for solving tasks
• Voluntary tasks have lower chance (monetary rewards:
77%)
• Users search for tasks in their current location
Social Sensors and Pervasive
Services: Approaches and
Perspectives
Rosi et al.,
PerCol 2011
Social Sensors?
• Device intelligence with various on-board
sensors such as GPS
• Human intelligence with “social sensors”
– Twitter posts, Facebook status updates, pictures
posted on Flickr
– Personal information: shopping patterns, place
visit patterns, etc. (with some potential social
interactions)
Approaches to integrate social sensing
and pervasive services
Approaches to integrate social sensing
and pervasive services
• A: Extracting data from social networks
– Detecting crowded sites (Fujisaka et al., 2010)
– Mining landmarks from blogs (Ji et al., 2009)
– Event detection using Flickr (Zhao et al., 2006)
• B: Exploiting social networks as a sociopervasive middleware
– Twitter with sensors (Demirbas et al., 2010)
– S-Sensors with micro-blogging (Baqer et al., 2009)
– Status update feeds to social networks (CenceMe)
Approaches to integrate social sensing
and pervasive services
• C: Pervasive overlays on social networks
– Interconnecting and sharing data sensed from
personal devices with the rest of the world
– SenseFace:
• Capture and process (local), and disseminate data
(social nets)
• Dynamically mash-up sensor data and social networks
• D: App-specific socio-pervasive networks
– Fusing mobile, sensor, and social data to fully
enable context-aware computing
Some Issues
• Key issues
– Rich data, yet comes at the cost of understanding the data
– Sheer size (raw facts and data produced by sensors)
• Un-structured, noisy data
– Unified data representation and interpretation
– Overcoming uncertainty of data
• No guarantee on the delivery of specific info about facts
and at specific times by social sensors
– Systems require “critical mass”; heterogeneous popularity
based on location (e.g., rural area vs. urban area)
• Completely out-of-loop of system managers and app
developers