Transcript Slides
Centrality in Social Networks
Kristina Lerman
University of Southern California
CS 599: Social Media Analysis
University of Southern California
1
Network analysis
4.2
Three views of social network analysis
1. Freeman, L. 1979 “Centrality in Social Networks: Conceptual
Clarification”, Social Networks 1, No. 3.
2. Bonacich, P. 1987 “Power and Centrality in Social Networks:
a Family of Measures”, American Journal of Sociology
3. Franceschetti, M. 2011 “PageRank: standing on the shoulders
of giants” Commun. ACM, Vol. 54, pp. 92-101.
Power in social networks
Certain positions within the network give nodes more power
– Directly affect/influence others
– Control the flow of information
– Avoid control of others
Centrality in social networks
• Centrality encodes the relationship between structure and
power in groups
Certain positions within the network give nodes more
power or importance
• How do we measure importance?
– Who can directly affect/influence others?
• Highest degree nodes are “in the thick of it”
– Who controls information flow?
• Nodes that fall on shortest paths between others can disrupt the
flow of information between them
– Who can quickly inform most others?
• Nodes who are close to other nodes can quickly get information to
them
Degree centrality
• The number of others a node is connected to
– Node with high degree has high potential communication
activity
node
Indegree
Outdegree
Total
degree
1
0
1
1
2
3
2
5
3
1
3
4
4
2
1
3
5
2
1
3
4
5
2
3
1
4
5
1
2
3
Mathematical representation and operations on graphs
Adjacency matrix A
1 2 3 4 5
1
0
1
0
0
0
2
0
0
1
1
0
3
0
1
0
1
1
4
0
0
0
0
1
5
0
1
0
0
0
Out-degree: row sum
5
2
3
1
d
out
i
Aij
j
In-degree: column sum
4
diin A ji
j
node
Indegree
Outdegree
Total
degree
1
0
1
1
2
3
2
5
3
1
3
4
4
2
1
3
5
2
1
3
Betweenness centrality
• Number of shortest paths (geodesics) connecting all pairs of
other nodes that pass through a given node
– Node with highest betweenness can potentially control or
distort communication
12
123
124
1245
23
24
245
32
34
35
4
5
2
3
1
4
452
52
4523
523
45
524
5
1
2
3
Closeness centrality
• Node that is closest to other nodes can reach other nodes in
shortest amount of time
– Can best avoid being controlled by others
• Closeness centrality is sum of geodesic distances from a node
to all other nodes
Self-consistent measures of centrality
• Katz (1953): Katz score
– “not only on how many others a person is connected to,
but who he is connected to”
– One’s status is determined by the status of the people
s/he is connected to
• Bonacich (1972): Eigenvector centrality
– Node’s centrality is the sum of the centralities of its
connections
lei Aije j
le Ae
j
– e is the eigenvector of A, and l its associated eigenvalue
(largest eigenvalue)
Alpha-centrality (Bonacich, 1987)
• Similar to eigenvector centrality, but the degree to which a
node centrality contributes to the centralities of other nodes
depends on parameter a.
c i (a ) (1 ac j (a ))Aij
j
• Mathematical interpretation
– ci(a) is the expected number of paths activated directly or
indirectly by a node i
c i (a ) A aA 2 a 2 A 3 ...
j
A closer look at Alpha-Centrality
• Alpha-Centrality matrix
c(a ) A aA a A ...
2
2
3
4
5
2
3
1
• 1st term: number of paths of length 1 (edges) between i and j
0
1
0
0
0
0
0
1
1
0
0
1
0
1
1
0
0
0
0
1
0
1
0
0
0
• Contribution of this term to ci(a) is SjAij
A closer look at Alpha-Centrality
• Alpha-Centrality matrix
2
5
2
3
1
c(a ) A aA a A ...
2
4
3
• 2nd term: number of paths of length 2 between i and j
0
1
0
0
0
0
1
0
0
0
0
0
1
1
0
0
0
1
1
0
0
0
1
1
0
0
1
0
1
2
0
1
0
1
1
0
1
0
1
1
0
1
1
1
1
0
0
0
0
1
0
0
0
0
1
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
1
1
0
x
=
A closer look at Alpha-Centrality
• Alpha-Centrality matrix
2
5
2
3
1
c(a ) A aA a A ...
2
4
3
• 3rd term: number of paths of length 3 between i and j
0
0
1
1
0
0
1
0
0
0
0
1
0
1
2
0
1
0
1
2
0
0
1
1
0
0
2
1
1
1
0
1
1
1
1
0
1
0
1
1
0
2
1
2
2
0
1
0
0
0
0
0
0
0
1
0
0
1
1
0
0
0
1
1
0
0
1
0
0
0
0
1
0
1
2
x
=
A closer look at Alpha-Centrality
• Alpha-Centrality matrix
c(a ) A aA 2 a 2 A 3 ... a k A k 1
k 0
• Number of paths of diverges as length of the path k grows
• To keep the infinite sum finite, a < 1/l1, where l1 is the
largest eigenvalue of A (also called radius of centrality)
c(a ) Aa k A k (I aA)1 A
k 0
• Interpretation: Node’s centrality is the sum of paths of any
length connecting it to other nodes, exponentially attenuated
by length
of the path, so that longer paths contribute less
than shorter paths
Radius of centrality
Parameter a sets the length scale of communication or
interactions.
• For a= 0, only local interactions (with neighbors) are
considered
– Only local structure is important
– centrality is same as degree centrality
Radius of centrality
Parameter a sets the length scale of communication or
interactions.
• As a grows, the length of interaction grows
– Global structure becomes more important
– Centrality depends on node’s position within a larger
structure, e.g., a community
Radius of centrality
Parameter a sets the length scale of communication or
interactions.
• As a 1/l1, length of interactions becomes infinite
– Global structure is important
– Centrality is same as eigenvector centrality
Normalized Alpha-Centrality [Ghosh & Lerman 2011]
• Alpha-Centrality diverges for a> 1/l1
• Solution: Normalized Alpha-Centrality
n(a )
c(a )
N
c
ij
i, j
– Holds
for
0 a 1
(a )
Multi-scale analysis with Alpha-Centrality
• Parameter a allows for multi-scale analysis of networks
– Differentiate between local and global structures
• Study how rankings change with a
• Leaders: high influence on group members
• Nodes with high centrality for small values of a
• Bridges: mediate communication between groups
• Nodes with low centrality for small values of a
• But high centrality for large values of a
• Peripherals: poorly connected to everyone
• Nodes with low centrality for any value of a
Karate club network [Zachary, 1977]
administrator
instructor
Ranking karate club members
Centrality scores of nodes vs. a
Florentine families in 15th century Italy
Ranking Florentine families
Summary
• Network position confers advantages or disadvantages to a
node, but how you measure it depends on what you mean by
advantage
– Ability to directly reach many nodes degree centrality
– Ability to control information betweenness centrality
– Ability to avoid control closeness centrality
• Self-consistent definitions of centrality
– Node’s centrality depends on centrality of those it is
connected to, directly or indirectly, but contribution of
distant nodes is attenuated by how far they are
• Attenuation parameter sets the length scale of interactions
• Can probe structure at different scales by varying this parameter
PageRank: Standing on the Shoulders of Giants [Franceschet]
Key insights
• Analyzes the structure of the web of hyperlinks to determine
importance score of web pages
– A web page is important if it is pointed to by other
important pages
• An algorithm with deep mathematical roots
– Random walks
– Social network theory
PageRank and the Random Surfer
Random Surfer
• Starts at arbitrary
page
I
H
F
L
M
G
B
C
E
D
A
PageRank and the Random Surfer
Random Surfer
• Starts at arbitrary
page
• Bounces from page
to page by following
links randomly
I
H
F
L
M
G
B
C
E
D
A
PageRank and the Random Surfer
I
H
F
L
M
G
B
C
E
D
A
Random Surfer
• Starts at arbitrary
page
• Bounces from page to
page by following
links randomly
• PageRank score of a
web page is the
relative number of
time it is visited by
the Random Surfer
Mathematics of PageRank
• PageRank is a solution to a random walk on a graph
Adjacency matrix of the graph A
I
H
F
L
M
G
B
C
E
D
A
A
B
C
D
E
F
G
H
I
L
M
A
0
0
0
0
0
0
0
0
0
0
0
B
0
0
1
0
0
0
0
0
0
0
0
C
0
1
0
0
0
0
0
0
0
0
0
D
1
1
0
0
0
0
0
0
0
0
0
E
0
1
0
1
0
1
0
0
0
0
0
F
0
1
0
0
1
0
0
0
0
0
0
G
0
1
0
0
1
0
0
0
0
0
0
H
0
1
0
0
1
0
0
0
0
0
0
I
0
1
0
0
1
0
0
0
0
0
0
L
0
0
0
0
1
0
0
0
0
0
0
M 0
0
0
0
1
0
0
0
0
0
0
Mathematics of PageRank
• PageRank is a solution to a random walk on a graph
I
(Diagonal) Out-degree matrix D
H
F
L
M
G
B
C
E
D
A
A
B
C
D
E
F
G
H
I
L
M
A
0
0
0
0
0
0
0
0
0
0
0
B
0
1
0
0
0
0
0
0
0
0
0
C
0
0
1
0
0
0
0
0
0
0
0
D
0
0
0
2
0
0
0
0
0
0
0
E
0
0
0
0
3
0
0
0
0
0
0
F
0
0
0
0
0
2
0
0
0
0
0
G
0
0
0
0
0
0
2
0
0
0
0
H
0
0
0
0
0
0
0
2
0
0
0
I
0
0
0
0
0
0
0
0
2
0
0
L
0
0
0
0
0
0
0
0
0
1
0
M 0
0
0
0
0
0
0
0
0
0
1
Mathematics of PageRank
• PageRank is a solution to a random walk on a graph
• hij is probability to go from node i to node j
hij=1/di H=D-1A
I
H
F
L
M
G
B
C
E
D
A
A
B
C
D
E
F
G
H
I
L
M
A
0
0
0
0
0
0
0
0
0
0
0
B
0
0
1
0
0
0
0
0
0
0
0
C
0
1
0
0
0
0
0
0
0
0
0
D
.5
.5 0
0
0
0
0
0
0
0
0
E
0
.3 0
.3
0
.3 0
0
0
0
0
F
0
.5 0
0
1
.5 0
0
0
0
0
G
0
.5 0
0
.5 0
0
0
0
0
0
H
0
.5 0
0
.5 0
0
0
0
0
0
I
0
.5 0
0
.5 0
0
0
0
0
0
L
0
0
0
0
1
0
0
0
0
0
0
M 0
0
0
0
1
0
0
0
0
0
0
Mathematics of PageRank
• PageRank of page j is defined recursively
as
I
pj=Sipihij
– Or in matrix form p=pH
H
• What contributes to PageRank score?
L
– Number of links page j receives
F
B
C
D
A
E
• Cf B and D
– Number of outgoing links of linking
pages
• Cf E’s effect on F and B’s effect on C
– PageRank scores of linking pages
• Cf E and B
M
G
… but there are problems
• Random Surfer gets trapped by dangling nodes! (no outlinks)
• Solution: matrix S
– replace zero rows in H with u=[0.9,0.9, …, 0.9]
– From dangling node, surfer jumps to any other node
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
I
H
F
L
M
G
B
C
E
D
A
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
.5
.5
0
0
0
0
0
0
0
0
0
0
.3
0
.3
0
.3
0
0
0
0
0
0
.5
0
0
1
.5
0
0
0
0
0
0
.5
0
0
.5
0
0
0
0
0
0
0
.5
0
0
.5
0
0
0
0
0
0
0
.5
0
0
.5
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
Still problems
• Random Surfer gets trapped in buckets
– Reachable strongly connected component without
outlinks
• Solution: teleportation matrix E
– Matrix of u
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
I
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
H
F
L
B
C
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
E
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
M
G
D
A
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
… finally
• Google matrix
G = aS + (1-a) E
• Where a is the damping factor
• Interpretation of G
– With probability a, Random Surfer follows a hyperlink
from a page (selected at random)
– With probability 1-a, Random Surfer jumps to any page
(e.g., by entering a new URL in the browser)
• PageRank scores are the solution of self-consistent equation
p=pG
=apS + (1-a)u
PageRank scores
I
1.6
H
1.6
L
1.6
M
1.6
G
1.6
F
3.9
E
8.1
C
34.3
B
38.4
D
3.9
A
3.3
Summary
• Recursive (or self-consistent) nature of PageRank has roots in
social network analysis metrics
• PageRank is fundamentally related to random walks on
graphs
– Lots of research to compute it efficiently
– Huge economic and social impact!