Transcript Slide 1

CS224W: Social and Information Network Analysis
Jure Leskovec, Stanford University
http://cs224w.stanford.edu




Networks with positive and
negative relationships
Our basic unit of investigation
will be signed triangles
First we will talk about undirected
nets then directed
Plan for today:
-
+
-
-
+
 Model: Consider two soc. theories of signed nets
 Data: Reason about them in large online networks
 Application: Predict if A and B are linked with + or -
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
2

Networks with positive and negative
relationships

Consider an undirected complete graph
Label each edge as either:

 Positive: friendship, trust, positive sentiment, …
 Negative: enemy, distrust, negative sentiment, …

Examine triples of connected nodes A, B, C
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
3

Start with the intuition [Heider ’46]:
 Friend of my friend is my friend
 Enemy of enemy is my friend
 Enemy of friend is my enemy

Look at connected triples of nodes:
+
+ -
-
+Balanced +
Consistent with “friend of a friend” or
“enemy of the enemy” intuition
7/18/2015
+
+ -
-
-UnbalancedInconsistent with the “friend of a friend”
or “enemy of the enemy” intuition
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
4

Graph is balanced if every connected
triple of nodes has:
 all 3 edges labeled +, or
 exactly 1 edge labeled +
Unbalanced
7/18/2015
Balanced
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
5


Balance implies global coalitions [Cartwright-Harary]
If all triangles are balanced, then either:
 The network contains only positive edges, or
 Nodes can be split into 2 sets where negative edges
only point between the sets
+
7/18/2015
+
+
L
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
R
6
Every node in L is enemy of R
B
Any 2 nodes
in L are friends
+
+
+
C
–
–
A
D
+
–
Any 2 nodes
in R are friends
E
R
L
Friends of A
7/18/2015
Enemies of A
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
7

International relations:
 Positive edge: alliance
 Negative edge: animosity

Separation of Bangladesh from Pakistan in
1971: US supports Pakistan. Why? B





7/18/2015
USSR was enemy of China
China was enemy of India
India was enemy of Pakistan P
US was friendly with China
+?
China vetoed
Bangladesh from U.N.
U
–
+
–?
–
–
C
+
–
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
I
R
8
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
9
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
10
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
11
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
12
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
13
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
14

+
 Fill in the missing
edges to achieve
balance
-

Def 2: Global view
 Divide the graph into
two coalitions
Balanced?

7/18/2015
Def 1: Local view
The 2 defs. are
equivalent!
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
15


Graph is balanced if and only if it contains no
cycle with an odd number of negative edges.
How to compute this?
 Find connected components on + edges
 For each component create a super-node
 Connect components A and B if there is a
negative edge between the members
 Assign super-nodes to sides using BFS
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
16
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
17
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
18
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
19


Using BFS assign each node a side
Graph is unbalanced if any two
super-nodes are assigned the same side
L
R
R
L
L
Unbalanced!
7/18/2015
L
R
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
20
[CHI ‘10]

Each link AB is explicitly tagged with a sign:
 Epinions: Trust/Distrust
 Does A trust B’s product reviews?
(only positive links are visible)
 Wikipedia: Support/Oppose
 Does A support B to become
Wikipedia administrator?
+
–
+
–
–
+
–
+
+
+
–
+
+–
–
 Slashdot: Friend/Foe
 Does A like B’s comments?
 Other examples:
 Online multiplayer games
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
21
[CHI ‘10]

Does structural balance hold?
Triad
+
-
P(T)
P0(T)
0.62
0.70
0.49

-
0.07
0.05
0.21
0.10

+
0.05
0.32
0.08
0.49

-
0.007
0.003 0.011 0.010

-
P0(T)
Balance
0.87
+
+
P(T)
Wikipedia
+
+
-
Epinions
P(T) … probability of a triad
P0(T)… triad probability if the
signs would be random
7/18/2015
+
x –
+
+
+
–
–
+
+ x +
+
+
+
Real data
+
+
+
x x
– +
+
+
–
x + x –
+
+
+ x
Shuffled data
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
22

Intuitive picture of social
network in terms of
densely linked clusters

How does structure
interact with links?

Embeddedness of
link (A,B): Number of shared
neighbors
23
[CHI ‘10]

Embeddedness of ties:
Epinions
 Positive ties tend to be
more embedded


Positive ties tend to be
more clumped together
Public display of signs
(votes) in Wikipedia
further attenuates this
7/18/2015
Wikipedia
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
24
[CHI ‘10]

Clustering:
 +net: More clustering than baseline
 –net: Less clustering than baseline

Size of connected component:
+
-
+
+
+
 +/–net: Smaller than the baseline
7/18/2015
+
+
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
+
-
-
+
+ +
25
[CHI ‘10]

New setting:
 Links are directed and
created over time

A


X

B




-
+
-
+
-
+ 
-
-
+
-
+


+

How many  are now
16 *2 signed directed triads
explained by balance?
 Only half (8 out of 16)
Is there a better explanation? Yes. Status.
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

26
[CHI ‘10]


Links are directed and created over time
Status theory [Davis-Leinhardt ‘68, Guha et al. ’04, Leskovec et al. ‘10]
+
 Link A 
B means: B has higher status than A
–
 Link A 
B means: B has lower status than A

Status and balance give different predictions:
-
X
-
A
B
Balance: +
Status: –
7/18/2015
+
X
+
A
B
Balance: +
Status: –
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
27
[CHI ‘10]


Edges are directed
Edges are created over time
 X has links to A and B
 Now, A links to B (triad A-B-X)
 How does sign of A-B depend
signs of X?

+
A
?
We need to formalize:
 Links are embedded in triads:
 Provides context for signs
 Users are heterogeneous in
their linking behavior
7/18/2015
X

A
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
X
+
B

B
28
[CHI ‘10]

Link (A,B) appears
in the context
(A,B; X)

16 different
contextualized
links:
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
29
[CHI ‘10]

Surprise: How much behavior of user
deviates from baseline in context t:
 (A1, B1; X1),…, (An, Bn; Xn) …
instances of contextualized link t
 k of them closed with a plus
 pg(Ai)… generative baseline of Ai
A
Then: generative surprise of
k   p (A )
triad type t:
n
Std. rnd. var.:
sg (t ) 
i 1
i


n
 p ( A )(1  p ( A ))
g
i
7/18/2015
g
-
Vs.
 empirical prob. of Ai giving a plus

Give a better
explanation of
what we really do
(2 slides):
1) ForX
every
node
compute the
baseline
2) Identify all
the edges B
that close
same type of
triads
X
3) Compute
surprise
i
g
i
A
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
B
30

Two basic examples:
-
X
-
A
B
Gen. surprise of A: —
Rec. surprise of B: —
7/18/2015
+
X
+
A
B
Gen. surprise of A: —
Rec. surprise of B: —
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
31

END (when I spent 15 min for finishing up the
previous lecture)
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
32
[CHI ‘10]

Determine node status:
 Assign X status 0
 Based on signs and directions
of edges set status of A and B

+
+1
Surprise is status-consistent, if:
X
A
0
+
B
+1
Status-consistent if:
Gen. surprise > 0
Rec. surprise < 0
 Gen. surprise is status-consistent
if it has same sign as status of B
 Rec. surprise is status-consistent
if it has the opposite sign from the status of A

Surprise is balance-consistent, if:
 If it completes a balanced triad
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
37
[CHI ‘10]

Predictions:
Sg(ti)
Sr(ti)
Bg
Br
Sg
Sr
t3
t15
t2
t14
t16
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
38
[WWW ‘10]

Both theories make predictions about the
global structure of the network

Structural balance – Factions
 Find coalitions

+ - +
Status theory – Global Status
 Flip direction and sign of
minus edges
 Assign each node a unique status
so that edges point from low to high
7/18/2015
3
2
1
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
39
[WWW ‘10]

Fraction of edges of the network that satisfy
Balance and Status?

Observations:
 No evidence for global balance beyond the
random baselines
 Real data is 80% consistent vs. 80% consistency under
random baseline
 Evidence for global status beyond the random
baselines
 Real data is 80% consistent, but 50% consistency under
random baseline
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
40
[WWW ‘10]
–
Edge sign prediction problem
–
 Given a network and signs on all but
–
u
one edge, predict the missing sign
–
Machine Learning Formulation:
+
 Predict sign of edge (u,v)
 Class label:
 Dataset:
 +1: positive edge
 -1: negative edge

Learning method:
 Logistic regression
+
–
v
?
+
+
+
+
–
–
+
 Original: 80% +edges
 Balanced: 50% +edges

Evaluation:
 Accuracy and ROC curves

Features for learning:
 Next slide
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
41
[WWW ‘10]
For each edge (u,v) create features:
 Triad counts (16):
 Counts of signed triads
edge uv takes part in

Node degree (7 features):
+
+
+
u
-
+
v
 Signed degree:
 d+out(u), d-out(u),
d+in(v), d-in(v)
 Total degree:
 dout(u), din(v)
 Embeddedness
of edge (u,v)
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
42
[WWW ‘10]

Classification Accuracy:
Epin
 Epinions: 93.5%
 Slashdot: 94.4%
 Wikipedia: 81%

Signs can be modeled from
local network structure alone
 Trust propagation model of
[Guha et al. ‘04] has 14% error
on Epinions


Triad features perform less
well for less embedded edges
Wikipedia is harder to model:
Slash
Wiki
 Votes are publicly visible
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
43
+
+
+
+
+
+
+
+
-
+
+
+
+
+
+
+
+
44

Do people use these very different linking
systems by obeying the same principles?
 How generalizable are the results across the datasets?
 Train on row “dataset”, predict on “column”

Nearly perfect generalization of the models
even though networks come from very
different applications
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
45

Signed networks provide insight into how
social computing systems are used:
 Status vs. Balance
 Different role of reciprocated links
 Role of embeddedness and public display

Sign of relationship can be reliably predicted
from the local network context
 ~90% accuracy sign of the edge
46


More evidence that networks are globally
organized based on status
People use signed edges consistently
regardless of particular application
 Near perfect generalization of models across
datasets

Many further directions:
 Status difference of nodes
A and B [ICWSM ‘10]:
A<B
A=B
A>B
Status difference (A-B)