Slides - Information Sciences Institute

Download Report

Transcript Slides - Information Sciences Institute

Information Diffusion in Social Media
Kristina Lerman
University of Southern California
CS 599: Social Media Analysis
University of Southern California
1
Information diffusion on Twitter follower graph
Diffusion on networks
• The spread of disease, ideas, behaviors, … on a network can
be described as a contagion process where an active node
(infected/informed/adopted) activates its non-active
neighbors with some probability
– … creates a cascade on a network
• How large do cascades become?
• What determines their growth?
Diffusion models
• Complex response: infection requires multiple exposures.
• Non-monotonic exposure response
Exposure response function
Complex contagion
Threshold model
1
fiki
number infected neighbors
infection prob.
infection prob.
1
number infected neighbors
Epidemic diffusion model
• Infected nodes propagate contagion to susceptible neighbors
with probability m (transmissibility or virality of contagion)
Exposure response function
infected
infection prob.
1
exposed
number infected neighbors
Epidemic threshold
• Epidemic threshold t:
– For m < t, localized cascades (epidemic dies out)
– For m > t, global cascades
• Epidemic threshold depends on topology only: largest
eigenvalue of adjacency matrix of the network
– True for any network
Cascade size
N
0
Epidemic
threshold
Transmissibility, m
Differences in the Mechanics of
Information Diffusion across
Topics: Idioms, Political Hashtags
and Complex Contagion on Twitter
Daniel M Romero, Brendan Meeder and Jon
Kleinberg
Presentation by Aswin Rajkumar
Motivation and Contribution
• Information Diffusion and Topics
- Eg: Controversial political topics have high information
diffusion.
- Scientific study of the variation in diffusion mechanics across
topics.
• Contribution of the paper
- Empirical analysis of real world data
- Observation that the mechanics of spread can be defined
using two variables, stickiness and persistence.
- Confirmation of sociological theories found in the offline
world – diffusion of innovations
The Study – How?
• Twitter – Dataset, a snapshot covering a large number of
tweets over a period of several months (Aug 09 to Jan 10)
• 3 billion messages from over 60 million users
• #Hashtag – Tokens, Top 500 Hashtags
• @Mention – Network, Neighbor Set
t mentions from X to Y, t = 3
Why? Shows X’s attention to Y.
The Study – What?
• Adoption and Spread of Hashtags - Diffusion
• Topics – Politics, Celebrity, Music, Movies, Games, Idioms,
Sports and Technology
• Stickiness - the probability that a piece of information will
pass from a person who knows or mentions it to another
person who is exposed to it.
• Persistence and “Complex Contagion”, a principle from
sociology. Persistence - the relative extent to which repeated
exposures to a hashtag continue to have significant marginal
effects on adoption.
Rate of decay.
Complex Contagion
Complex contagion refers to the phenomenon in social
networks in which multiple sources of exposure to an
innovation are required before an individual adopts the
change of behavior. - Wikipedia
P(K)
Stickiness
Persistence
Analysis – Stickiness and Persistence
• Take the top 500 hashtags
• Classify them into 8 topics or categories
• Construct p(k) curves for each hashtag and average them
separately within each category
• Compare the shapes
Political Hashtags – High Stickiness and Persistence
Twitter Idioms – High Stickiness, Low Persistence
#mw2, #mafiawars
#lost, #newmoon
#mj, #brazilwantsjb
#pandora, #thisiswar
#obama, #hcr
#cricket, #nhl
#photoshop, #digg
Twitter Idioms
#cantlivewithout
#iloveitwhen
#musicmonday
#followfriday
Analysis – Subgraph Structure
• Interconnections among early adopters
• Subgraphs for political hashtags - High in-degree, large
number of triangles.
• Tie Strength – Strong, Weak.
Credit : Bridge-talent.com
Exposure Curve - Definitions
• K-exposed – A user is k-exposed to a tag h if he has not used h, but is
connected to k other users who have used h in the past.
• What’s the probability that a k-exposed user u will use hashtag h in the
future?
1)
Ordinal Time Estimate
Probability of a k-exposed user u using hashtag h before becoming k+1
exposed.
P(k) = I(k) / E(k)
E(k) – number of k-exposed users
I(k) – number of k-exposed users who used h
before
becoming k+1 exposed.
2)
Snapshot Estimate
Similar, but based on time.
E(k) – numer of users k-exposed at t1.
I(k) – number of users k-exposed at t1 and used h before t2
P(k) = I(k) / E(k) -> Exposure Curve
Comparison Parameters
• Persistence Parameter
F(P) = A(P) / R(P)
A(P) – Area under P curve.
R(P) – Area under the
rectangle of length K
and height max(P(k))
Curve comparisons
Increases rapidly and falls vs Increases slowly and saturates
Increases slowly and saturates vs Rapid Increase
• Stickiness Parameter
M(P) = Max(P(K))
Plots
F(P) = A(P) / R(P) -> Persistence
Parameter
M(P) = Max(P(K)) -> Stickiness
Improvements and Related Work
• @Mention network is not very representative. Also, attention
should be from Y to X.
• Considers only average persistence. Median and variance should be
analyzed too.
• Other types of networks. Eg: Blogs. [Gruhl, Guha, Nowell, Tomkins Information Diffusion through Blogspace].
• Influence on Online Behavior. Eg: Games. [Woo, Kang, Kim – The
Contagion of Malicious Behaviors in Online Games]
• Network structure is dynamic in real life. [Bano, Holthoefer, Wang,
Moreno, Bailon – Diffusion Dynamics with Changing Network
Composition ]
Conclusion
• Hashtags of different topics exhibit different mechanics of
spread. Politically controversial hashtags have the highest
diffusion.
• Information diffusion depends on the probability of users
adopting a hashtag after repeated exposure to it. Depends on
the magnitude of the probabilities as well as the rate of decay
• Confirms the sociological theory of complex contagion
• Higher in-degree and stronger ties results in better spread.
Questions?
What Stops Social Epidemics? (Ver Steeg et al.)
• Why do information cascades in social media
– Grow quickly initially
– But remain much smaller than predicted by epidemic
models?
• Information cascades differ from viral contagion:
– Response to repeated exposure is important on Digg (and
Twitter)
– Drastically alters predictions about size of epidemics
Social news:
• Users submit or vote for
(infected by) news stories
• Social network
– Users follow ‘friends’ to see
• Stories friends submit
• Stories friends vote for
• Trending stories
– Digg promotes most popular
stories to its Top News page
How large are cascades in social media?
Number of people who share a message (with a URL)
Digg
3.5K URLs
258K users
1.7M edges
Twitter
70K URLs
700K users
36M edges
Most cascades less than 1% of total network size!
[Lerman et al. “Social Contagion: An Empirical Study of Information Spread on Digg and Twitter
Follower Graphs” arXiv:1202.3162]
Why are these cascades so small?
Standard
model of
epidemic
growth
Most cascades
fall in this range
Transmissibility, m
Transmissibility of almost all Digg stories fall
within width of this line?!
(Heterogenous
mean field
theory, SIR
model, same
degree
distribution as
Digg)
Maybe graph structure is responsible?
← Mean field prediction
(same degree dist.)
← Simulated cascades
on a random graph with
same degree dist.
epidemic
threshold
Simulated cascades on
the observed Digg graph
Transmissibility m
 clustering reduces epidemic threshold and cascade size,
but not enough!
What about the spreading mechanism?
Infected
Not Infected
?
Are repeat exposures a big effect?
Yes, more
than half of
the users are
exposed to
the same
information
more than
once!
How do people respond to repeated exposure?
Exposure response
Not much.
We have similar
results for
Twitter ------Also noted by
Romero, et al,
WWW 2011
29
Big consequences for cascade growth
• Most people are exposed to a story more than once
• Repeated exposures have little effect
• Growth of epidemics is severely curtailed (especially
compared to Ind. Cascade Model)
30
Weak response to repeated exposures
suppresses outbreaks
Take effect of
repeat exposure
into account:
Actual Digg
cascades
Epidemic
threshold
unchanged
Result of
simulations
λ*
m*, Transmissibility
31
How Limited Visibility and Divided Attention Constrain
Social Contagion (Hodas & Lerman, 2012)
• Questions
– How do people respond to exposures to information by
friends on social media?
– What role does content play in information diffusion?
• Findings
– Users have finite ability to process information
• Most recently received messages are retweeted, the rest are
overlooked
• Highly connected users (hubs) are far less likely to retweet any
message they receive than poorly connected people
– Reduced susceptibility of hubs to “infections” explains why
cascades are small
Mechanics of information diffusion
User must see an item and find it interesting before he/she can
spread it (e.g., by retweeting it, voting for or liking it, …)
See?
Cognitive
Interface
Interesting?
Tastes
Content
Respond
Retweet
Cognitive factors: Position bias
• People pay more attention to items at the top of the screen or a
list of items
[Payne, The Art of Asking
Questions (1951) ]
[Buscher et al, CHI’09]
[Counts & Fisher ICWSM’11]
… limits how far down the list/page the user navigates
Measuring position bias
• Amazon Mechanical Turk experiments
• Users were asked to recommend science stories
• We controlled the order stories were presented to users
Position bias: stories at top list
positions received more
recommendations
[Lerman & Hogg (2014) “Leveraging position bias to improve peer recommendation” in Plos One.
Position bias creates a “limited attention”
prob.
to view
post
post
visibility
post near the top is
most likely to be seen
position
new post at top of
user’s screen
Position bias creates a “limited attention”
… some time later:
newer posts appear
at the top
position
post is less likely to be seen
prob. to view post
Position bias and number of friends
… some time later:
newer posts appear
at the top
few friends
many friends
post is less likely to be
seen
same age post is even
less visible to a highly
connected user
Friends are a source of distraction
P (n f ) = 0.22
n 0.21
f
n f  0.52
nf
•
Limited attention makes hubs less susceptible to ‘infection’
users with more
friends are more
active
users with more
friends are
distracted by more
content
Users retweet most recent messages
high connectivity
users
“Time Response Function”
t
1.15
low connectivity
users
•
•
Users retweet newest messages (at the top of their screen)
Hubs are much less likely to retweet an older message
Does content matter?
Pmsg (t  t0 , n f | retweet)  I msgV (t  t0 )
probability to tweet a message
visibility
“virality”
Estimated virality
Do “viral” messages spread farther?
ln(“virality”)
… “viral” messages can reach many or few people
How do people respond to multiple exposures?
Exposure response
Number of tweeting friends
• Is this evidence for complex contagion?
“Complex contagion”- artifact of heterogeneity
low
connectivity
users
high connectivity
users
• Breaking down exposure response by different subpopulations, separated according to number of friends they
follow, reveals simple, monotonic response
Summary
• “A meme is not a virus”
– Information spread ≠ Disease spread
• Big consequences for modeling information spread in social
media
• Highly connected people (hubs) act as fire walls to
information spread
– They have a hard time finding messages in their stream
 People have a finite capacity to process information; the
more messages they receive, the less likely they are to respond
to any given one
– Information overload actually reduces the size of
information cascades