Transcript Slides

Centrality in Social Networks
Kristina Lerman
University of Southern California
CS 599: Social Media Analysis
University of Southern California
1
Network analysis
4.2
Three views of social network analysis
1. Freeman, L. 1979 “Centrality in Social Networks: Conceptual
Clarification”, Social Networks 1, No. 3.
2. Bonacich, P. 1987 “Power and Centrality in Social Networks:
a Family of Measures”, American Journal of Sociology
3. Franceschetti, M. 2011 “PageRank: standing on the shoulders
of giants” Commun. ACM, Vol. 54, pp. 92-101.
Power in social networks
Certain positions within the network give nodes more power
– Directly affect/influence others
– Control the flow of information
– Avoid control of others
Centrality in social networks
• Centrality encodes the relationship between structure and
power in groups
 Certain positions within the network give nodes more
power or importance
• How do we measure importance?
– Who can directly affect/influence others?
• Highest degree nodes are “in the thick of it”
– Who controls information flow?
• Nodes that fall on shortest paths between others can disrupt the
flow of information between them
– Who can quickly inform most others?
• Nodes who are close to other nodes can quickly get information to
them
Degree centrality
• The number of others a node is connected to
– Node with high degree has high potential communication
activity
node
Indegree
Outdegree
Total
degree
1
0
1
1
2
3
2
5
3
1
3
4
4
2
1
3
5
2
1
3
4
5
2
3
1
4
5
1
2
3
Mathematical representation and operations on graphs
Adjacency matrix A
1 2 3 4 5
1
0
1
0
0
0
2
0
0
1
1
0
3
0
1
0
1
1
4
0
0
0
0
1
5
0
1
0
0
0
Out-degree: row sum

5
2
3
1
d
out
i
  Aij
j
In-degree: column sum
4
diin   A ji
j
node
Indegree
Outdegree
Total
degree
1
0
1
1
2
3
2
5
3
1
3
4
4
2
1
3
5
2
1
3
Betweenness centrality
• Number of shortest paths (geodesics) connecting all pairs of
other nodes that pass through a given node
– Node with highest betweenness can potentially control or
distort communication
12
123
124
1245
23
24
245
32
34
35
4
5
2
3
1
4
452
52
4523
523
45
524
5
1
2
3
Closeness centrality
• Node that is closest to other nodes can reach other nodes in
shortest amount of time
– Can best avoid being controlled by others
• Closeness centrality is sum of geodesic distances from a node
to all other nodes
Self-consistent measures of centrality
• Katz (1953): Katz score
– “not only on how many others a person is connected to,
but who he is connected to”
– One’s status is determined by the status of the people
s/he is connected to
• Bonacich (1972): Eigenvector centrality
– Node’s centrality is the sum of the centralities of its
connections
lei   Aije j
le  Ae
j
– e is the eigenvector of A, and l its associated eigenvalue
(largest eigenvalue)


Alpha-centrality (Bonacich, 1987)
• Similar to eigenvector centrality, but the degree to which a
node centrality contributes to the centralities of other nodes
depends on parameter a.
c i (a )  (1 ac j (a ))Aij
j
• Mathematical interpretation
– ci(a) is the expected number of paths activated directly or

indirectly by a node i
c i (a )  A  aA 2  a 2 A 3  ...
j
A closer look at Alpha-Centrality
• Alpha-Centrality matrix
c(a )  A  aA  a A  ...
2
2
3
4
5
2
3
1
• 1st term: number of paths of length 1 (edges) between i and j

0
1
0
0
0
0
0
1
1
0
0
1
0
1
1
0
0
0
0
1
0
1
0
0
0
• Contribution of this term to ci(a) is SjAij
A closer look at Alpha-Centrality
• Alpha-Centrality matrix
2
5
2
3
1
c(a )  A  aA  a A  ...
2
4
3
• 2nd term: number of paths of length 2 between i and j

0
1
0
0
0
0
1
0
0
0
0
0
1
1
0
0
0
1
1
0
0
0
1
1
0
0
1
0
1
2
0
1
0
1
1
0
1
0
1
1
0
1
1
1
1
0
0
0
0
1
0
0
0
0
1
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
1
1
0
x
=
A closer look at Alpha-Centrality
• Alpha-Centrality matrix
2
5
2
3
1
c(a )  A  aA  a A  ...
2
4
3
• 3rd term: number of paths of length 3 between i and j

0
0
1
1
0
0
1
0
0
0
0
1
0
1
2
0
1
0
1
2
0
0
1
1
0
0
2
1
1
1
0
1
1
1
1
0
1
0
1
1
0
2
1
2
2
0
1
0
0
0
0
0
0
0
1
0
0
1
1
0
0
0
1
1
0
0
1
0
0
0
0
1
0
1
2
x
=
A closer look at Alpha-Centrality
• Alpha-Centrality matrix

c(a )  A  aA 2  a 2 A 3  ...  a k A k 1
k 0
• Number of paths of diverges as length of the path k grows
• To keep the infinite sum finite, a < 1/l1, where l1 is the
 largest eigenvalue of A (also called radius of centrality)

c(a )  Aa k A k  (I  aA)1 A
k 0
• Interpretation: Node’s centrality is the sum of paths of any
length connecting it to other nodes, exponentially attenuated
by length
 of the path, so that longer paths contribute less
than shorter paths
Radius of centrality
Parameter a sets the length scale of communication or
interactions.
• For a= 0, only local interactions (with neighbors) are
considered
– Only local structure is important
– centrality is same as degree centrality
Radius of centrality
Parameter a sets the length scale of communication or
interactions.
• As a grows, the length of interaction grows
– Global structure becomes more important
– Centrality depends on node’s position within a larger
structure, e.g., a community
Radius of centrality
Parameter a sets the length scale of communication or
interactions.
• As a  1/l1, length of interactions becomes infinite
– Global structure is important
– Centrality is same as eigenvector centrality
Normalized Alpha-Centrality [Ghosh & Lerman 2011]
• Alpha-Centrality diverges for a> 1/l1
• Solution: Normalized Alpha-Centrality
n(a ) 
c(a )
N
c
ij
i, j
– Holds
for


0  a 1
(a )
Multi-scale analysis with Alpha-Centrality
• Parameter a allows for multi-scale analysis of networks
– Differentiate between local and global structures
• Study how rankings change with a
• Leaders: high influence on group members
• Nodes with high centrality for small values of a
• Bridges: mediate communication between groups
• Nodes with low centrality for small values of a
• But high centrality for large values of a
• Peripherals: poorly connected to everyone
• Nodes with low centrality for any value of a
Karate club network [Zachary, 1977]
administrator
instructor
Ranking karate club members
Centrality scores of nodes vs. a
Florentine families in 15th century Italy
Ranking Florentine families
Summary
• Network position confers advantages or disadvantages to a
node, but how you measure it depends on what you mean by
advantage
– Ability to directly reach many nodes  degree centrality
– Ability to control information  betweenness centrality
– Ability to avoid control  closeness centrality
• Self-consistent definitions of centrality
– Node’s centrality depends on centrality of those it is
connected to, directly or indirectly, but contribution of
distant nodes is attenuated by how far they are
• Attenuation parameter sets the length scale of interactions
• Can probe structure at different scales by varying this parameter
PageRank: Standing on the Shoulders of Giants [Franceschet]
Key insights
• Analyzes the structure of the web of hyperlinks to determine
importance score of web pages
– A web page is important if it is pointed to by other
important pages
• An algorithm with deep mathematical roots
– Random walks
– Social network theory
PageRank and the Random Surfer
Random Surfer
• Starts at arbitrary
page
I
H
F
L
M
G
B
C
E
D
A
PageRank and the Random Surfer
Random Surfer
• Starts at arbitrary
page
• Bounces from page
to page by following
links randomly
I
H
F
L
M
G
B
C
E
D
A
PageRank and the Random Surfer
I
H
F
L
M
G
B
C
E
D
A
Random Surfer
• Starts at arbitrary
page
• Bounces from page to
page by following
links randomly
• PageRank score of a
web page is the
relative number of
time it is visited by
the Random Surfer
Mathematics of PageRank
• PageRank is a solution to a random walk on a graph
Adjacency matrix of the graph A
I
H
F
L
M
G
B
C
E
D
A
A
B
C
D
E
F
G
H
I
L
M
A
0
0
0
0
0
0
0
0
0
0
0
B
0
0
1
0
0
0
0
0
0
0
0
C
0
1
0
0
0
0
0
0
0
0
0
D
1
1
0
0
0
0
0
0
0
0
0
E
0
1
0
1
0
1
0
0
0
0
0
F
0
1
0
0
1
0
0
0
0
0
0
G
0
1
0
0
1
0
0
0
0
0
0
H
0
1
0
0
1
0
0
0
0
0
0
I
0
1
0
0
1
0
0
0
0
0
0
L
0
0
0
0
1
0
0
0
0
0
0
M 0
0
0
0
1
0
0
0
0
0
0
Mathematics of PageRank
• PageRank is a solution to a random walk on a graph
I
(Diagonal) Out-degree matrix D
H
F
L
M
G
B
C
E
D
A
A
B
C
D
E
F
G
H
I
L
M
A
0
0
0
0
0
0
0
0
0
0
0
B
0
1
0
0
0
0
0
0
0
0
0
C
0
0
1
0
0
0
0
0
0
0
0
D
0
0
0
2
0
0
0
0
0
0
0
E
0
0
0
0
3
0
0
0
0
0
0
F
0
0
0
0
0
2
0
0
0
0
0
G
0
0
0
0
0
0
2
0
0
0
0
H
0
0
0
0
0
0
0
2
0
0
0
I
0
0
0
0
0
0
0
0
2
0
0
L
0
0
0
0
0
0
0
0
0
1
0
M 0
0
0
0
0
0
0
0
0
0
1
Mathematics of PageRank
• PageRank is a solution to a random walk on a graph
• hij is probability to go from node i to node j
hij=1/di  H=D-1A
I
H
F
L
M
G
B
C
E
D
A
A
B
C
D
E
F
G
H
I
L
M
A
0
0
0
0
0
0
0
0
0
0
0
B
0
0
1
0
0
0
0
0
0
0
0
C
0
1
0
0
0
0
0
0
0
0
0
D
.5
.5 0
0
0
0
0
0
0
0
0
E
0
.3 0
.3
0
.3 0
0
0
0
0
F
0
.5 0
0
1
.5 0
0
0
0
0
G
0
.5 0
0
.5 0
0
0
0
0
0
H
0
.5 0
0
.5 0
0
0
0
0
0
I
0
.5 0
0
.5 0
0
0
0
0
0
L
0
0
0
0
1
0
0
0
0
0
0
M 0
0
0
0
1
0
0
0
0
0
0
Mathematics of PageRank
• PageRank of page j is defined recursively
as
I
 pj=Sipihij
– Or in matrix form p=pH
H
• What contributes to PageRank score?
L
– Number of links page j receives
F
B
C
D
A
E
• Cf B and D
– Number of outgoing links of linking
pages
• Cf E’s effect on F and B’s effect on C
– PageRank scores of linking pages
• Cf E and B
M
G
… but there are problems
• Random Surfer gets trapped by dangling nodes! (no outlinks)
• Solution: matrix S
– replace zero rows in H with u=[0.9,0.9, …, 0.9]
– From dangling node, surfer jumps to any other node
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
I
H
F
L
M
G
B
C
E
D
A
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
.5
.5
0
0
0
0
0
0
0
0
0
0
.3
0
.3
0
.3
0
0
0
0
0
0
.5
0
0
1
.5
0
0
0
0
0
0
.5
0
0
.5
0
0
0
0
0
0
0
.5
0
0
.5
0
0
0
0
0
0
0
.5
0
0
.5
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
Still problems
• Random Surfer gets trapped in buckets
– Reachable strongly connected component without
outlinks
• Solution: teleportation matrix E
– Matrix of u
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
I
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
H
F
L
B
C
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
E
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
M
G
D
A
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
.09 .09 .09 .09 .09 .09 .09 .09 .09 .09 .09
… finally
• Google matrix
G = aS + (1-a) E
• Where a is the damping factor
• Interpretation of G
– With probability a, Random Surfer follows a hyperlink
from a page (selected at random)
– With probability 1-a, Random Surfer jumps to any page
(e.g., by entering a new URL in the browser)
• PageRank scores are the solution of self-consistent equation
p=pG
=apS + (1-a)u
PageRank scores
I
1.6
H
1.6
L
1.6
M
1.6
G
1.6
F
3.9
E
8.1
C
34.3
B
38.4
D
3.9
A
3.3
Summary
• Recursive (or self-consistent) nature of PageRank has roots in
social network analysis metrics
• PageRank is fundamentally related to random walks on
graphs
– Lots of research to compute it efficiently
– Huge economic and social impact!