Transcript Slide 1
CS224W: Social and Information Network Analysis
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
Networks with positive and
negative relationships
Our basic unit of investigation
will be signed triangles
First we will talk about undirected
nets then directed
Plan for today:
-
+
-
-
+
Model: Consider two soc. theories of signed nets
Data: Reason about them in large online networks
Application: Predict if A and B are linked with + or -
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
2
Networks with positive and negative
relationships
Consider an undirected complete graph
Label each edge as either:
Positive: friendship, trust, positive sentiment, …
Negative: enemy, distrust, negative sentiment, …
Examine triples of connected nodes A, B, C
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
3
Start with the intuition [Heider ’46]:
Friend of my friend is my friend
Enemy of enemy is my friend
Enemy of friend is my enemy
Look at connected triples of nodes:
+
+ -
-
+Balanced +
Consistent with “friend of a friend” or
“enemy of the enemy” intuition
7/18/2015
+
+ -
-
-UnbalancedInconsistent with the “friend of a friend”
or “enemy of the enemy” intuition
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
4
Graph is balanced if every connected
triple of nodes has:
all 3 edges labeled +, or
exactly 1 edge labeled +
Unbalanced
7/18/2015
Balanced
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
5
Balance implies global coalitions [Cartwright-Harary]
If all triangles are balanced, then either:
The network contains only positive edges, or
Nodes can be split into 2 sets where negative edges
only point between the sets
+
7/18/2015
+
+
L
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
R
6
Every node in L is enemy of R
B
Any 2 nodes
in L are friends
+
+
+
C
–
–
A
D
+
–
Any 2 nodes
in R are friends
E
R
L
Friends of A
7/18/2015
Enemies of A
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
7
International relations:
Positive edge: alliance
Negative edge: animosity
Separation of Bangladesh from Pakistan in
1971: US supports Pakistan. Why? B
7/18/2015
USSR was enemy of China
China was enemy of India
India was enemy of Pakistan P
US was friendly with China
+?
China vetoed
Bangladesh from U.N.
U
–
+
–?
–
–
C
+
–
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
I
R
8
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
9
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
10
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
11
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
12
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
13
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
14
+
Fill in the missing
edges to achieve
balance
-
Def 2: Global view
Divide the graph into
two coalitions
Balanced?
7/18/2015
Def 1: Local view
The 2 defs. are
equivalent!
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
15
Graph is balanced if and only if it contains no
cycle with an odd number of negative edges.
How to compute this?
Find connected components on + edges
For each component create a super-node
Connect components A and B if there is a
negative edge between the members
Assign super-nodes to sides using BFS
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
16
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
17
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
18
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
19
Using BFS assign each node a side
Graph is unbalanced if any two
super-nodes are assigned the same side
L
R
R
L
L
Unbalanced!
7/18/2015
L
R
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
20
[CHI ‘10]
Each link AB is explicitly tagged with a sign:
Epinions: Trust/Distrust
Does A trust B’s product reviews?
(only positive links are visible)
Wikipedia: Support/Oppose
Does A support B to become
Wikipedia administrator?
+
–
+
–
–
+
–
+
+
+
–
+
+–
–
Slashdot: Friend/Foe
Does A like B’s comments?
Other examples:
Online multiplayer games
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
21
[CHI ‘10]
Does structural balance hold?
Triad
+
-
P(T)
P0(T)
0.62
0.70
0.49
-
0.07
0.05
0.21
0.10
+
0.05
0.32
0.08
0.49
-
0.007
0.003 0.011 0.010
-
P0(T)
Balance
0.87
+
+
P(T)
Wikipedia
+
+
-
Epinions
P(T) … probability of a triad
P0(T)… triad probability if the
signs would be random
7/18/2015
+
x –
+
+
+
–
–
+
+ x +
+
+
+
Real data
+
+
+
x x
– +
+
+
–
x + x –
+
+
+ x
Shuffled data
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
22
Intuitive picture of social
network in terms of
densely linked clusters
How does structure
interact with links?
Embeddedness of
link (A,B): Number of shared
neighbors
23
[CHI ‘10]
Embeddedness of ties:
Epinions
Positive ties tend to be
more embedded
Positive ties tend to be
more clumped together
Public display of signs
(votes) in Wikipedia
further attenuates this
7/18/2015
Wikipedia
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
24
[CHI ‘10]
Clustering:
+net: More clustering than baseline
–net: Less clustering than baseline
Size of connected component:
+
-
+
+
+
+/–net: Smaller than the baseline
7/18/2015
+
+
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
+
-
-
+
+ +
25
[CHI ‘10]
New setting:
Links are directed and
created over time
A
X
B
-
+
-
+
-
+
-
-
+
-
+
+
How many are now
16 *2 signed directed triads
explained by balance?
Only half (8 out of 16)
Is there a better explanation? Yes. Status.
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
26
[CHI ‘10]
Links are directed and created over time
Status theory [Davis-Leinhardt ‘68, Guha et al. ’04, Leskovec et al. ‘10]
+
Link A
B means: B has higher status than A
–
Link A
B means: B has lower status than A
Status and balance give different predictions:
-
X
-
A
B
Balance: +
Status: –
7/18/2015
+
X
+
A
B
Balance: +
Status: –
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
27
[CHI ‘10]
Edges are directed
Edges are created over time
X has links to A and B
Now, A links to B (triad A-B-X)
How does sign of A-B depend
signs of X?
+
A
?
We need to formalize:
Links are embedded in triads:
Provides context for signs
Users are heterogeneous in
their linking behavior
7/18/2015
X
A
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
X
+
B
B
28
[CHI ‘10]
Link (A,B) appears
in the context
(A,B; X)
16 different
contextualized
links:
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
29
[CHI ‘10]
Surprise: How much behavior of user
deviates from baseline in context t:
(A1, B1; X1),…, (An, Bn; Xn) …
instances of contextualized link t
k of them closed with a plus
pg(Ai)… generative baseline of Ai
A
Then: generative surprise of
k p (A )
triad type t:
n
Std. rnd. var.:
sg (t )
i 1
i
n
p ( A )(1 p ( A ))
g
i
7/18/2015
g
-
Vs.
empirical prob. of Ai giving a plus
Give a better
explanation of
what we really do
(2 slides):
1) ForX
every
node
compute the
baseline
2) Identify all
the edges B
that close
same type of
triads
X
3) Compute
surprise
i
g
i
A
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
B
30
Two basic examples:
-
X
-
A
B
Gen. surprise of A: —
Rec. surprise of B: —
7/18/2015
+
X
+
A
B
Gen. surprise of A: —
Rec. surprise of B: —
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
31
END (when I spent 15 min for finishing up the
previous lecture)
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
32
[CHI ‘10]
Determine node status:
Assign X status 0
Based on signs and directions
of edges set status of A and B
+
+1
Surprise is status-consistent, if:
X
A
0
+
B
+1
Status-consistent if:
Gen. surprise > 0
Rec. surprise < 0
Gen. surprise is status-consistent
if it has same sign as status of B
Rec. surprise is status-consistent
if it has the opposite sign from the status of A
Surprise is balance-consistent, if:
If it completes a balanced triad
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
37
[CHI ‘10]
Predictions:
Sg(ti)
Sr(ti)
Bg
Br
Sg
Sr
t3
t15
t2
t14
t16
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
38
[WWW ‘10]
Both theories make predictions about the
global structure of the network
Structural balance – Factions
Find coalitions
+ - +
Status theory – Global Status
Flip direction and sign of
minus edges
Assign each node a unique status
so that edges point from low to high
7/18/2015
3
2
1
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
39
[WWW ‘10]
Fraction of edges of the network that satisfy
Balance and Status?
Observations:
No evidence for global balance beyond the
random baselines
Real data is 80% consistent vs. 80% consistency under
random baseline
Evidence for global status beyond the random
baselines
Real data is 80% consistent, but 50% consistency under
random baseline
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
40
[WWW ‘10]
–
Edge sign prediction problem
–
Given a network and signs on all but
–
u
one edge, predict the missing sign
–
Machine Learning Formulation:
+
Predict sign of edge (u,v)
Class label:
Dataset:
+1: positive edge
-1: negative edge
Learning method:
Logistic regression
+
–
v
?
+
+
+
+
–
–
+
Original: 80% +edges
Balanced: 50% +edges
Evaluation:
Accuracy and ROC curves
Features for learning:
Next slide
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
41
[WWW ‘10]
For each edge (u,v) create features:
Triad counts (16):
Counts of signed triads
edge uv takes part in
Node degree (7 features):
+
+
+
u
-
+
v
Signed degree:
d+out(u), d-out(u),
d+in(v), d-in(v)
Total degree:
dout(u), din(v)
Embeddedness
of edge (u,v)
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
42
[WWW ‘10]
Classification Accuracy:
Epin
Epinions: 93.5%
Slashdot: 94.4%
Wikipedia: 81%
Signs can be modeled from
local network structure alone
Trust propagation model of
[Guha et al. ‘04] has 14% error
on Epinions
Triad features perform less
well for less embedded edges
Wikipedia is harder to model:
Slash
Wiki
Votes are publicly visible
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
43
+
+
+
+
+
+
+
+
-
+
+
+
+
+
+
+
+
44
Do people use these very different linking
systems by obeying the same principles?
How generalizable are the results across the datasets?
Train on row “dataset”, predict on “column”
Nearly perfect generalization of the models
even though networks come from very
different applications
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
45
Signed networks provide insight into how
social computing systems are used:
Status vs. Balance
Different role of reciprocated links
Role of embeddedness and public display
Sign of relationship can be reliably predicted
from the local network context
~90% accuracy sign of the edge
46
More evidence that networks are globally
organized based on status
People use signed edges consistently
regardless of particular application
Near perfect generalization of models across
datasets
Many further directions:
Status difference of nodes
A and B [ICWSM ‘10]:
A<B
A=B
A>B
Status difference (A-B)