Transcript Slide 1
Complete Network Analysis
Network Connections: Large-Scale network structure
The basic network hypothesis is that the structure of a network affects the
likelihood that goods will flow through the network. While direct
measures are fine for smaller networks, we often want to make
generalizations to very large-scale network structure.
The next section covers large-scale network topography and bridges us to
generalized images of the network structure captured by cohesive groups
and blockmodels.
We focus on 3 such factors today:
1) Basic structure of large-scale networks
2) Cohesive Peer Groups
3) Identifying Role positions (blockmodels)
Complete Network Analysis
Network Connections: Large-Scale network structure
Based on Milgram’s (1967) famous work,
the substantive point is that networks
are structured such that even when
most of our connections are local,
any pair of people can be connected
by a fairly small number of relational
steps.
Complete Network Analysis
Network Connections: Large-Scale network structure
Watts says there are 4 conditions that make the small world phenomenon
interesting:
1) The network is large - O(Billions)
2) The network is sparse - people are connected to a small fraction of
the total network
3) The network is decentralized -- no single (or small #) of stars
4) The network is highly clustered -- most friendship circles are
overlapping
Complete Network Analysis
Network Connections: Large-Scale network structure
Formally, we can characterize a graph through 2 statistics.
1) The characteristic path length, L
The average length of the shortest paths connecting
any two actors.
(note this only works for connected graphs)
2) The clustering coefficient, C
•Version 1: the average local density. That is, Cv =
ego-network density, and C = Cv/n
•Version 2: transitivity ratio. Number of closed triads
divided by the number of closed and open triads.
A small world graph is any graph with a relatively small L
and a relatively large C.
Complete Network Analysis
Network Connections: Large-Scale network structure
The most clustered graph is Watt’s
“Caveman” graph:
Complete Network Analysis
Network Connections: Large-Scale network structure
C and L as functions of k
for a Caveman graph of n=1000
1.2
140
Clustering Coefficient
100
0.8
80
0.6
60
0.4
40
0.2
20
0
0
20
40
60
Degree (k)
80
100
0
120
Characteristic Path Length
120
1
Complete Network Analysis
Network Connections: Large-Scale network structure
Compared to random graphs, C is large and L is long. The
intuition, then, is that clustered graphs tend to have (relatively)
long characteristic path lengths. But the small world
phenomenon rests on just the opposite: high clustering and
short path distances. How is this so?
Complete Network Analysis
Network Connections: Large-Scale network structure
A model for pair formation, as a function of mutual contacts.
1
mi , j k
m
Ri , j ki , j (1 p) p k mi , j 0,
p
mi , j 0
Using this equation, produces networks that range from completely
ordered (caveman-like) to random.
Complete Network Analysis
Network Connections: Large-Scale network structure
C=Large, L is Small =
SW Graphs
Complete Network Analysis
Network Connections: Large-Scale network structure
Why does this work? Key is
fraction of shortcuts in the
network
In a highly clustered, ordered
network, a single random
connection will create a shortcut
that lowers L dramatically
Watts demonstrates that
Small world graphs occur
in graphs with a small
number of shortcuts
Complete Network Analysis
Network Connections: Large-Scale network structure
1) Movie network: Actors through Movies
Lo/Lr= 1.22 Co/Cr = 2925
2) Western Power Grid:
Lo/Lr= 1.50 Co/Cr = 16
3) C. elegans
Lo/Lr= 1.17
Co/Cr = 5.6
Complete Network Analysis
Network Connections: Large-Scale network structure
What are the substantive implications?
Return to the initial interest in connectivity: disease diffusion
1) Diseases move more slowly in highly clustered graphs
(fig. 11) - not a new finding.
2) The dynamics are very non-linear -- with no clear pattern based on
local connectivity. Implication: small local changes (shortcuts) can
have dramatic global outcomes (disease diffusion)
Complete Network Analysis
Network Connections: Large-Scale network structure
How do we know if an observed graph fits the SW model?
Random expectations:
For basic one-mode networks (such as acquaintance nets), we can
get approximate random values for L and C as:
Lrandom ~ ln(n) / ln(k)
Crandom ~ k / n
As k and n get large.
Note that C essentially approaches zero as N increases, and K is assumed
fixed. This formula uses the density-based measure of C, but the
substantive implications are similar for the triad formula.
Complete Network Analysis
Network Connections: Large-Scale network structure
How do we know if an observed graph fits the SW model?
One problem with using the simple formulas for most extant data on large
graphs is that, because the data result from people overlapping in
groups/movies/publications, necessary clustering results from the
assignment to groups.
G1 G2 G3 G4
1
0 1 0
0
1 0 1
0
1 0 1
1
0 0 0
1
0 1 0
0
1 0 1
0
1 0 1
. . . . LINES CUT . . . . .
William
0
1 0 0
Xavier
0
1 0 1
Yolanda
1
0 1 0
Zanfir
0
1 1 1
12 14 9 14
Amy
Billy
Charlie
Debbie
Elaine
Frank
George
G5
0
0
0
0
1
0
0
0
0
0
1
5
Complete Network Analysis
Network Connections: Large-Scale network structure
How do we know if an observed graph fits the SW model?
Newman, M. E. J.; Strogatz, S. J., and Watts, D. J. “Random Graphs with
arbitrary degree distributions and their applications” Phys. Rev. E. 2001
This paper extends the formulas for expected clustering and path length using
a generating functions approach, making it possible to calculate E(C,L) for
graphs with any degree distribution.
Importantly, this procedure also makes it possible to account for clustering in
a two-mode graph caused by the distribution of assignment to groups.
Complete Network Analysis
Network Connections: Large-Scale network structure
How do we know if an observed graph fits the SW model?
Newman, M. E. J.; Strogatz, S. J., and Watts, D. J. “Random Graphs with arbitrary
degree distributions and their applications” Phys. Rev. E. 2001
log( N / z1 )
l
log( z 2 / z1 )
Where N is the size of the graph, Z1 is the average number of people 1 step away
(degree) and Z2 is the average number of people 2 steps away.
Theoretically, these formulas can be used to calculate many properties of the network
– including largest component size, based on degree distributions.
A word of warning: The math in these papers is not simple, sharpen your calculus
pencil before reading the paper…
Complete Network Analysis
Network Connections: Large-Scale network structure
How do we know if an observed graph fits the SW model?
Since C is just the transitivity ratio, there are a number of good formulas for
calculating the expected value. Using the ratio of complete to (incomplete +
complete) triads, we can use the expected values from the triad distribution in
PAJEK for a simple graph or we can use the expected value conditional on the
dyad types (if we have directed data) using the formulas in SPAN and
Wasserman and Faust (1994).
Complete Network Analysis
Network Connections: Large-Scale network structure
Across a large number of substantive
settings, Barabási points out that the
distribution of network involvement
(degree) is highly and characteristically
skewed.
Complete Network Analysis
Network Connections: Large-Scale network structure
Many large networks are characterized by a highly skewed distribution of the
number of partners (degree)
Complete Network Analysis
Network Connections: Large-Scale network structure
Many large networks are characterized by a highly skewed distribution of the
number of partners (degree)
p(k ) ~ k
Complete Network Analysis
Network Connections: Large-Scale network structure
The scale-free model focuses on the distance-reducing
capacity of high-degree nodes:
Complete Network Analysis
Network Connections: Large-Scale network structure
The scale-free model focuses on the distance-reducing capacity of highdegree nodes, as ‘hubs’ create shortcuts that carry network flow.
The diffusion implications
of mathematical models
based on the preferential
attachment model are dim,
because the carrying
capacity of the network
comes to depend entirely
on a vanishingly small
number of stars, who are
statistically hard to find.
Thus, random treatment to
the network does no good,
but targeted treatment does.
Complete Network Analysis
Network Connections: Large-Scale network structure
The scale-free model focuses on the distance-reducing capacity of highdegree nodes, as ‘hubs’ create shortcuts that carry network flow.
The primary mechanism hypothesized to drive a power-law degree distribution is the
“preferential attachment” model. This model suggests that new nodes enter the
population and connect to current nodes with probability proportional to the current
node’s degree.
This implies that “The rich get richer” and the graph takes on a decidedly star-like
shape.
Complete Network Analysis
Network Connections: Large-Scale network structure
Critiques of the Scale-free model:
1) The insights are not particularly new, having been anticipated in the
epidemiology of STDs for some time.
2) Many of the empirical claims are over-stated.
• The most common ‘test’ for a scale free network is to plot the degree
histogram on a log-log scale and fit a regression line to it. This is poor
statistical practice, and better models for fitting distributions show that
most of the sexual networks are not, in fact, scale free (see Jones and
Handcock, "Sexual contacts and epidemic thresholds" Nature, 423, 6940,
605-606)
3) Theoretically, any degree-based metric has no necessary relation to the
arrangement of ties within the network. That is, there are many graphs with
identical degree distributions but very different topologies.
• Preferential attachment scale free, but not vice versa
• Finding a power-law degree distribution is really not that useful if there
is any kind of blocking structure (focal aspects) to the network.
Complete Network Analysis
Network Connections: Large-Scale network structure
Colorado Springs High-Risk
(Sexual contact only)
•Network is approximately
scale-free, with = -1.3
•But connectivity does not
depend on the hubs.
Complete Network Analysis
Network Connections: Large-Scale network structure
White, D. R. and F. Harary. 2001. "The Cohesiveness of Blocks
in Social Networks: Node Connectivity and Conditional
Density." Sociological Methodology 31:305-59.
Moody, James and Douglas R. White. 2003. “Structural
Cohesion and Embeddedness: A hierarchical Conception of
Social Groups” American Sociological Review 68:103-127
White, Douglas R., Jason Owen-Smith, James Moody, &
Walter W. Powell (2004) "Networks, Fields, and
Organizations: Scale, Topology and Cohesive
Embeddings." Computational and Mathematical
Organization Theory. 10:95-117
Moody, James "The Structure of a Social Science
Collaboration Network: Disciplinary Cohesion from 1963
to 1999" American Sociological Review. 69:213-238
Complete Network Analysis
Network Connections: Large-Scale network structure
Analytically, most of work on connectivity has focused on summaries of
completely local properties (degree distributions or clustering). We turn the
argument around and ask what features of a network are essential for holding
the whole structure together?
Def. 1:
“A collectivity is cohesive to the extent that the social relations of
its members hold it together.”
What network pattern embodies all the elements of this intuitive definition?
Complete Network Analysis
Network Connections: Large-Scale network structure
This definition contains 5 essential elements:
1.
2.
3.
4.
5.
Focuses on what holds the group together
Expressed as a group level property
The conception is continuous
Rests on observable social relations
Applies to groups of any size
Complete Network Analysis
Network Connections: Large-Scale network structure
1) Actors must be connected: a collection of isolates is not cohesive.
Not cohesive
Minimally cohesive:
a single path connects
everyone
Complete Network Analysis
Network Connections: Large-Scale network structure
1) Reachability is an essential element of relational cohesion. As more paths
re-link actors in the group, the ability to ‘hold together’ increases.
The important feature is not the density
of relations, but the pattern.
Cohesion increases as #
of paths connecting
people increases
Complete Network Analysis
Network Connections: Large-Scale network structure
Consider the minimally cohesive group:
D = . 25
D = . 25
Moving a line keeps density constant, but changes
reachability.
Complete Network Analysis
Network Connections: Large-Scale network structure
What if density increases, but through a single person?
D = . 25
D = . 39
Removal of 1
person destroys
the group.
Complete Network Analysis
Network Connections: Large-Scale network structure
Cohesion increases as the number of independent paths in the network
increases. Ties through a single person are minimally cohesive.
D = . 39
Minimal cohesion
D = . 39
More cohesive
Complete Network Analysis
Network Connections: Large-Scale network structure
Substantive differences between networks connected through a single actor and those
connected through many.
Minimally Cohesive
Power is centralized
Information is concentrated
Expect actor inequality
Vulnerable to unilateral action
Segmented structure
Strongly Cohesive
Power is decentralized
Information is distributed
Actor equality
Robust to unilateral action
Even structure
Def 2.
“A group is structurally cohesive to the extent that multiple independent relational
paths among all pairs of members hold it together.”
Complete Network Analysis
Network Connections: Large-Scale network structure
Def 2.
“A group is structurally cohesive to the extent that multiple independent relational paths
among all pairs of members hold it together.”
0
2
1
Node Connectivity
3
Complete Network Analysis
Network Connections: Large-Scale network structure
Formalize the argument:
If there is a path between every node in a graph, the graph is connected, and called
a component.
In every component, the paths linking actors i and j must pass through a set of
nodes, S, that if removed would disconnect the graph.
The number of nodes in the smallest S is equal to the number of independent paths
connecting i and j.
Complete Network Analysis
Network Connections: Large-Scale network structure
The relation between cut-set size and number of paths (recall our
discussion of bicomponents) leads to the two versions of our final
definition:
Def 3a
“A group’s structural cohesion is equal to the minimum
number of actors who, if removed from the group, would disconnect the
group.”
Def 3b
“A group’s structural cohesion is equal to the minimum
number of independent paths linking each pair of actors in the group.”
These two definitions are equivalent.
Complete Network Analysis
Network Connections: Large-Scale network structure
Some graph theoretic properties of k-components
1) Every member of a k-components must have at least k-ties.
If a person has less than k ties, then there would be fewer than k paths
connecting them to the rest of the network.
2) A graph where every person has k-ties is not necessarily a k-component.
That is, (1) does not work in reverse. Structures can have high degree, but
low connectivity.
3) Two k-components can only overlap by k-1 members.
If the k-components overlap by more than k-1 members, then there would
be at least k paths connecting the two components, and they would be a
single k-component.
4) A clique is n-1 connected.
5) k-components can be nested, such that a k+l component is contained within a kcomponent.
Complete Network Analysis
Network Connections: Large-Scale network structure
Nested connectivity sets: An operationalization of embeddedness.
2
3
1
9
10
8
4
5
11
7
6
14
17
12
13
15
16
18
19
20
21
22
23
Complete Network Analysis
Network Connections: Large-Scale network structure
“Embeddedness” refers to the fact that economic action and outcomes, like all social
action and outcomes, are affected by actors’ dyadic (pairwise) relations and by the
structure of the overall network of relations. As a shorthand, I will refer to these as the
relational and the structural aspects of embeddedness. The structural aspect is
especially crucial to keep in mind because it is easy to slip into “dyadic atomization,”
a type of reductionism.
(Granovetter 1992:33, italics in original)
Complete Network Analysis
Network Connections: Large-Scale network structure
G
{7,8,9,10,11
12,13,14,15,16}
{7, 8, 11, 14}
{1, 2, 3, 4, 5, 6, 7,
17, 18, 19, 20, 21,
22, 23}
{1,2,3,4,
5,6,7}
{17, 18, 19, 20,
21, 22, 23}
Complete Network Analysis
Network Connections: Large-Scale network structure
Empirical Examples:
a) Embeddedness and School Attachment
b) Political similarity among Large American Firms
Complete Network Analysis
Network Connections: Large-Scale network structure
School Attachment
Complete Network Analysis
Network Connections: Large-Scale network structure
Business Political
Action
Complete Network Analysis
Network Connections: Large-Scale network structure
Theoretical Implications:
•Resource and Risk Flow
Structural cohesion increases the probability of diffusion in a network,
particularly if flow depends on individual behavior (as opposed to edge
capacity).
Complete Network Analysis
Network Connections: Large-Scale network structure
Structural Cohesion also provides a new way of thinking about STD cores
Project 90, Sex-only network (n=695)
3-Component (n=58)
Complete Network Analysis
Network Connections: Large-Scale network structure
IV Drug Sharing
Largest BC: 247
k > 4: 318
Max k: 12
Structural Cohesion
simultaneously gives
us a positional and
subgroup analysis.
Connected
Bicomponents
Complete Network Analysis
Network Connections: Large-Scale network structure
Development of STD cores in low-degree networks: rapid transition without stars.
Complete Network Analysis
Network Connections: Large-Scale network structure
Complete Network Analysis
Network Connections: Social Subgroups
A primary interest in Social Network Analysis is the identification of
“significant social subgroups” – some smaller collection of nodes in
the graph that can be considered, at least in some senses, as a “unit”
based on the pattern, strength, or frequency of ties.
There are many ways to identify groups. They all insist on a group
being in a connected component, but other than that the variation is
wide.
Complete Network Analysis
Network Connections: Social Subgroups
A) Graph theoretical methods: Cliques and extensions of cliques
•Cliques
•k-cores
•k-plexes
•Freeman (1992) Models
•K-components
B) Algorithmic methods: search through a network trying to maximize for a particular
pattern. Adjust assignment of actors to groups until a particular pattern of ties (block
diagonal, usually) is identified.
•Standard models:
- Factions (UCI-NET)
- NEGOPY (Richards)
- KliqueFinder (Frank)
- RNM (Moody)
- CROWDS (Moody)
- General Distance & Clustering Methods
Complete Network Analysis
Network Connections: Social Subgroups
Graph Theoretical Models.
Start with a clique. A clique is defined as a maximal subgraph in which every member
of the graph is connected to every other member of the graph. Cliques are collections
of nodes where density = 1.0.
Properties of cliques:
• Density: 1.0
• Everyone connected to n-1 alters
• Distance between every pair is 1
• Ratio of within group ties to between group
ties is infinite
• All triads are transitive
Complete Network Analysis
Network Connections: Social Subgroups
Graph Theoretical Models.
In practice, complete cliques are not very useful. They tend to overlap
heavily and are limited in their size.
Graph theorists have thus
relaxed the complete
connectivity requirement
(with varying degrees of
success). See the Moody
& White (2003) for a
discussion of these
attempts.
Complete Network Analysis
Network Connections: Social Subgroups
Graph Theoretical Models.
k-cores: Every person connected to at least k other people.
Ideally, they would
look something like
this (here two 3cores).
However, adding a
single tie from A to B
would make the
whole graph a 3-core
Complete Network Analysis
Network Connections: Social Subgroups
Extensions of this idea include:
K-Core: Every person has ties to at least k other people in the set.
K-plex: Every member connected to at least n-k other
people in the graph (recall in a clique everyone is connected to n-1, so this
relaxes that condition.
n-clique: Every person is connected by a path of N or less (recall a clique is with
distance = 1).
N-clan: same as an n-clique, but all paths must be inside the group.
I’ve never had much luck with any of these methods empirically. Real data is
usually too messy to work well. Since many of the graph-theoretic options seem
not to work well, authors have used optimization techniques, that attempt to
identify groups iteratively.
Complete Network Analysis
Network Connections: Social Subgroups
Algorithmic Approaches to Identifying Primary Groups:
1) Measures of fit
To identify a primary group, we need some measure of how clustered the
network is. Usually, this is a function of the number of ties that fall within
group to the number of ties that fall between group.
2.1) Processes designed to maximize (1)
Once we have such an index, we need a method for searching through the
network to maximize the fit.
2.2) Generalized cluster analysis
In addition to maximizing a group function such as (1) we can use the
relational distance directly, and look for clusters in the data.
Complete Network Analysis
Network Connections: Social Subgroups
Segregation Index
(Freeman, L. C. 1972. "Segregation in Social Networks." Sociological Methods and
Research 6411-30.)
Freeman asked how we could identify segregation in a social network.
Theoretically, he argues, if a given attribute (group label) does not matter for
social relations, then relations should be distributed randomly with respect to the
attribute. Thus, the difference between the number of cross-group ties expected
by chance and the number observed measures segregation.
E( X ) X
Seg
E( X )
Complete Network Analysis
Network Connections: Social Subgroups
Consider the (hypothetical) network below. There are two attributes in
this network: people with Blue eyes and Brown eyes and people who
are square or not (they must be hip).
Complete Network Analysis
Network Connections: Social Subgroups
Segregation Index
Mixing Matrix:
Blue
Blue
Brown
6
Brown 17
17
16
Seg = -0.25
Hip
Square
Hip
20
3
Square
3
30
Seg = 0.78
Complete Network Analysis
Network Connections: Social Subgroups
Segregation Index
To calculate the number of expected, we use the standard formula
for a contingency table: Row marginal * column Marginal / Total
observed
Blue
Expected
Brown
Blue
Blue
6
17
23
Blue
Brown
17
23
16
33
33
56
Brown
In matrix form:
E(X) = R*C/T
9.45
Brown
13.55
23
13.55 19.45
23
33
33
56
Complete Network Analysis
Network Connections: Social Subgroups
Segregation Index
observed
Blue
Expected
Brown
Blue
Blue
6
17
23
Blue
Brown
17
23
16
33
33
56
Brown
9.45
Brown
13.55
23
13.55 19.45
23
33
33
56
E(X)
X
= (13.55+13.55)
= (17+17)
Seg
= 27.1 - 34 / 27.1
= -6.9 / 27.1
= -0.25
Complete Network Analysis
Network Connections: Social Subgroups
Segregation Index
Observed
Hip
Hip
Square
20
3
23
Expected
Square
Hip
3
23
30
33
33
56
Hip
Square
E(X) = (13.55+13.55)
X = (3+3)
Seg
= 27.1 - 6 / 27.1
= 21.1 / 27.1
= 0.78
9.45
Square
13.55
23
13.55 19.45
23
33
33
56
Complete Network Analysis
Network Connections: Social Subgroups
Segregation Index
One problem with the segregation index is that it is not ‘margin free.’ That
is, if you were to change the distribution of the category of interest (say
race) by a constant but not the core association between race and friendship
choice, you can get a different segregation level.
One antidote to this problem is to use odds ratios. In this case, and odds
ratio tells us the relative likelihood that two people in the same category
will choose each other as friends.
Complete Network Analysis
Network Connections: Social Subgroups
Odds Ratios
The odds ratio tells us how much more likely people in the same group are to
nominate each other. You calculate the odds ratio based on the number of ties in a
group and their relative size, based on the following table:
Member of:
Same Group
Different Group
Friends
A
B
Not Friends
C
D
OR = AD/ BC
Complete Network Analysis
Network Connections: Social Subgroups
Odds Ratios
Observed
Hip
Hip
Square
20
3
23
Square
3
23
30
33
33
56
Group
Same
Dif
Yes 50
6
Friend
No
52
102
OR = (50)102 / 52(6)
= 16.35
There are 6 hip people and 9 square
people in this network. This
implies that there are the following
number of possible ties in the
network:
Hip
Square
Hip
30
54
Square
54
Diagonal = ni(ni-1)
off diagonal = ni2
72
Complete Network Analysis
Network Connections: Social Subgroups
Segregation index compared to the odds ratio:
Index
Friendship Segregation
rsegnom
.684106
r=.95
-.176744
-.602628
1.8946
log_or
Log(Same-Sex Odds Ratio)
Complete Network Analysis
Network Connections: Social Subgroups
The segregation index is one metric used to identify groups. Others include:
a) The ratio of in-group to out-group ties (Negopy, UCINET Factions)
b) Maximizing the probability of in-group contact (CliqueFinder)
c) The Segregation Matrix Index (SMI)
d) The dyadic factor loadings for overlapping groups (akin to a latent
class model)
e) Minimize the within-group distance
Once a metric has been chosen, some algorithm is needed to search through
the graph to identify clusters. These algorithms range from very sophisticated
“graph-intelligent” algorithms, such as NEGOPY, to simple cluster analysis
of distance matrices.
In most cases, you have to pre-set the number of groups to use (the exceptions
are NEGOPY and CliqueFinder. Moody’s CROWDS algorithm also has
automatic stopping criteria, but you have to give it starting values.
Complete Network Analysis
Network Connections: Social Subgroups
In practice, the different
algorithms will give
different results.
Here, I compare the
NEGOPY results to the
RNM results. NEGOPY
returned one large group,
RNM found many smaller,
denser groups.
It’s usually a good idea to
explore multiple solutions
and algorithms.
Complete Network Analysis
Network Connections: Social Subgroups
Gangon Prison Network
In practice, the different
algorithms will give
different results.
Here, I compare
NEGOPY, FACTIONS
and RNM. Groups A and
B are identical, C is close.
F, E and D differ.
It’s usually a good idea to
explore multiple solutions
and algorithms.
(all solutions constrained to 6 groups)
Complete Network Analysis
Network Connections: Social Subgroups
Cluster analysis
In addition to tools like FACTIONS, we can use the distance information contained in a
network to cluster observations that are ‘close’ to each other. In general, cluster analysis
is a set of techniques that allows you to identify collections of objects that are simmilar
to each other in some degree.
A very good reference is the SAS/STAT manual section called, “Introduction to
clustering procedures.” (http://wks.uts.ohio-state.edu/sasdoc/8/sashtml/stat/chap8/index.htm)
(See also Wasserman and Faust, though the coverage is spotty).
We are going to start with the general problem of hierarchical clustering applied to any
set of analytic objects based on similarity, and then transfer that to clustering nodes in a
network.
Complete Network Analysis
Network Connections: Social Subgroups
Cluster analysis
How Smart you are
Imagine a set of objects (say
people) arrayed in a two
dimensional space. You want to
identify groups of people based
on their position in that space.
How do you do it?
How Cool you are
Complete Network Analysis
Network Connections: Social Subgroups
Cluster analysis
x
Start by choosing a pair of
people who are very close
to each other (such as 15 &
16) and now treat that pair
as one point, with a value
equal to the mean position
of the two nodes.
Complete Network Analysis
Network Connections: Social Subgroups
Cluster analysis
Now repeat that
process for as long
as possible.
Complete Network Analysis
Network Connections: Social Subgroups
Cluster analysis
This process is captured in the cluster tree (called a dendrogram)
Complete Network Analysis
Network Connections: Social Subgroups
Cluster analysis
As with the network cluster algorithms, there are many options for
clustering. The three that I use most are:
•Ward’s Minimum Variance -- the one I use almost 95% of the time
•Average Distance -- the one used in the example above
•Median Distance -- very similar
The SAS manual is the best single place I’ve found for information on
each of these techniques.
Some things to keep in mind:
Units matter. The example above draws together pairs
horizontally because the range there is smaller. Get around this by
standardizing your data.
This is an inductive technique. You can find clusters in a purely
random distribution of points. Consider the following example.
Complete Network Analysis
Network Connections: Social Subgroups
Cluster analysis
The data in this scatter
plot are produced using
this code:
data random;
do i=1 to 20;
x=rannor(0);
y=rannor(0);
output;
end;
run;
Complete Network Analysis
Network Connections: Social Subgroups
Resulting dendrogram
Complete Network Analysis
Network Connections: Social Subgroups
Resulting cluster solution
Complete Network Analysis
Network Connections: Social Subgroups
Cluster analysis works by building a distance matrix between each pair
of points. In the example above, it used the Euclidean distance which
in two dimensions is simply the physical distance between the points in
a plot.
Can work on any number of dimensions.
To use cluster analysis in a network, we base the distance on the pathdistance between pairs of people in the network.
Consider again the blue-eye hip example:
Complete Network Analysis
Network Connections: Social Subgroups
0
1
3
2
3
3
4
3
3
2
3
2
2
1
1
1
0
2
2
2
3
3
3
2
1
2
2
1
2
1
3
2
0
3
2
4
3
3
2
1
1
1
2
2
3
Distance
2 3 3 4 3
2 2 3 3 3
3 2 4 3 3
0 1 1 2 1
1 0 2 1 1
1 2 0 1 1
2 1 1 0 2
1 1 1 2 0
1 1 2 2 1
2 1 3 2 2
3 2 4 3 3
3 2 4 3 3
3 3 4 4 4
2 3 3 4 3
1 2 2 3 2
Matrix
3 2 3 2
2 1 2 2
2 1 1 1
1 2 3 3
1 1 2 2
2 3 4 4
2 2 3 3
1 2 3 3
0 1 2 2
1 0 1 1
2 1 0 1
2 1 1 0
3 2 2 1
3 2 2 1
2 2 3 2
2
1
2
3
3
4
4
4
3
2
2
1
0
2
2
1
2
2
2
3
3
4
3
3
2
2
1
2
0
1
1
1
3
1
2
2
3
2
2
2
3
2
2
1
0
Complete Network Analysis
Network Connections: Social Subgroups
The distance matrix implies a space that nodes are embedded within. Using something
like MDS, we can represent the space implied by the distance matrix in two
dimensions. This is the image of the network you would get if you did that.
Complete Network Analysis
Network Connections: Social Subgroups
When you use variables, the cluster analysis program generates a distance matrix. We
can, instead use the network distance matrix directly. If we do that with this example
network, we get the following:
Complete Network Analysis
Network Connections: Social Subgroups
Complete Network Analysis
Network Connections: Social Subgroups
The CROWDS algorithm combines the density approach above with an initial cluster analysis and a routine for
determining how many clusters are in the network. It does so by using the Segregation index and all of the
information from the cluster hierarchy, combining two groups only if it improves the segregation fit for both groups.
Total
.745
.735
.692
.745
.701
.679
.404
.368
.325
.589
.171
.285
.646
.762
.185
.473
.127
.085
.614
.341
.395
.555
.410
.402
.496
.400
.319
.254
.387
.398
.255
.394
.197
.372
.370
.279
.238
.224
Complete Network Analysis
Network Connections: Social Subgroups
The one other program you should know about is NEGOPY. Negopy is a
program that combines elements of the density based approach and the graph
theoretic approach to find groups and positions. Like CROWDS, NEGOPY
assigns people both to groups and to ‘outsider’ or ‘between’ group positions.
Negopy also determines how many groups are in the network, though in my
experience it often finds a single large group.
Complete Network Analysis
Network Connections: Social Subgroups
The Recursive Neighborhood Means algorithm creates the variables that are
then used in the cluster analysis to identify groups based on a simulated
peer influence process.
•Start by randomly assigning every node a random value on k variables
•Then calculate the average for each variable for the people each person is
tied to
•Repeat this process many times
This results in people who have many ties to each other having similar
values on the k random variables. This similarity then gets picked up in a
cluster analysis.
Complete Network Analysis
Network Connections: Social Subgroups
Example of the RNM procedure
Time 1
Time 2
Time 3
Complete Network Analysis
Network Connections: Social Subgroups
As an example,
consider the
process active on a
known-to-be
clustered networks,
starting with 2
random k
variables.
You get something
like this, where the
nodes are now
placed according
to their resulting
values on the 2
variables.
Complete Network Analysis
Network Connections: Social Subgroups
All of these techniques are inductive procedures. It is possible to specify a deductive, test
for group membership, if you have a prior reasons to assume that a particular set of people
are a group.
The simplest way would be to specify a dyadic model on the adjacency matrix, then model
the probability (strength) of a tie as a function of dyadic characteristics and your indicator
for being in the same group.
If this parameter is large and significant, then you have evidence for the group.
This is, in fact, what KliqueFinder does inductively.
Complete Network Analysis
Network Connections: Role Positions
Overview
•Social life can be described (at least in part) through social roles.
•To the extent that roles can be characterized by regular interaction
patterns, we can summarize roles through common relational patterns.
•Identifying these sets is the goal of block-model analyses.
Nadel: The Coherence of Role Systems
•Background ideas for White, Boorman and Brieger. Social life as
interconnected system of roles
•Important feature: thinking of roles as connected in a role system =
social structure
White, Harrison C.; Boorman, Scott A., and Breiger, Ronald L. Social
Structure from Multiple Networks I. American Journal of Sociology.
1976; 81730-780.
•The key article describing the theoretical and technical elements of
block-modeling
Complete Network Analysis
Network Connections: Role Positions
Elements of a Role:
•Rights and obligations with respect to other people or classes of people
•Roles require a ‘role compliment’ another person who the role-occupant acts
with respect to
Examples:
Parent – child
Teacher – student
Lover – lover
Friend – Friend
Husband - Wife
Nadel (Following functional anthropologists and sociologists) defines ‘logical’ types
of roles, and then examines how they can be linked together.
Complete Network Analysis
Network Connections: Role Positions
White et al: From logical role systems to empirical social structures
Start with some basic ideas of what a role is: An exchange of something (support,
ideas, commands, etc) between actors. Thus, we might represent a family as:
H
W
C
C
C
Romantic Love
Provides food for
Bickers with
(and there are, of course, many other relations inside a family)
Complete Network Analysis
Network Connections: Role Positions
The key idea, is that we can express a role through a relation (or set of relations) and thus a
social system by the inventory of roles. If roles equate to positions in an exchange system,
then we need only identify particular aspects of a position. But what aspect?
Structural Equivalence
Two actors are structurally equivalent if they have the same types of
ties to the same people.
Complete Network Analysis
Network Connections: Role Positions
Structural Equivalence
A single relation
Complete Network Analysis
Network Connections: Role Positions
Structural Equivalence
Graph reduced to positions
Complete Network Analysis
Network Connections: Role Positions
Blockmodeling: basic steps
In any positional analysis, there are 4 basic steps:
1) Identify a definition of equivalence
2) Measure the degree to which pairs of actors are equivalent
3) Develop a representation of the equivalencies
4) Assess the adequacy of the representation
Complete Network Analysis
Network Connections: Role Positions
1) Identify a definition of equivalence
Structural Equivalence:
Two actors are equivalent if they have the same type of ties to the same people.
Complete Network Analysis
Network Connections: Role Positions
Automorphic Equivalence:
Actors occupy indistinguishable structural locations in the network. That is,
that they are in isomorphic positions in the network.
Automorphically equivalent nodes are equivalent with respect to all graph
theoretic properties (I.e. degree, number of people reachable, centrality, etc.)
(Which suggests a simple way of using cluster analyses to find these groups)
Complete Network Analysis
Network Connections: Role Positions
Automorphic Equivalence:
Complete Network Analysis
Network Connections: Role Positions
Regular Equivalence:
Regular equivalence does not require actors to have identical
ties to identical actors or to be structurally indistinguishable.
Actors who are regularly equivalent have identical ties to and
from equivalent actors.
If actors i and j are regularly equivalent, then for all relations
and for all actors, if i
k, then there exists some actor l such
that j l and k is regularly equivalent to l.
Complete Network Analysis
Network Connections: Role Positions
Regular Equivalence:
There may be multiple regular equivalence partitions in a network, and thus we tend
to want to find the maximal regular equivalence position, the one with the fewest
positions.
Complete Network Analysis
Network Connections: Role Positions
Role or Local Equivalence:
While most equivalence measures focus on position within the full network, some
measures focus only on the patters within the local tie neighborhood. These have
been called ‘local role’ equivalence.
Note that:
Structurally equivalent actors are automorphically equivalent,
Automorphically equivalent actors are regularly equivalent.
Structurally equivalent and automorphically equivalent actors are role equivalent
In practice, we tend to ignore some of these distinctions, as they get blurred quickly
once we have to operationalize them in real-world graphs. It turns out that few
people are ever exactly equivalent, and thus we approximate the links between the
types.
In all cases, the procedure can work over multiple relations simultaneously.
The process of identifying positions is called blockmodeling, and requires identifying
a measure of similarity among nodes.
Complete Network Analysis
Network Connections: Role Positions
Once you identify equivalent actors, block them in the matrix and reduce it, based on the number of ties
in the cell of interest. The key values are a zero block (no ties) and a one-block (all ties present):
1 2
1 . 1
2 1 .
1 0
3
1 0
0 1
4
0 1
0 0
5 0 0
0 0
0 0
0 0
6 0 0
0 0
0 0
3
1
0
.
1
0
0
1
1
1
1
0
0
0
0
4
1
0
1
.
0
0
1
1
1
1
0
0
0
0
0
1
0
0
.
1
0
0
0
0
1
1
1
1
5
0
1
0
0
1
.
0
0
0
0
1
1
1
1
0
0
1
1
0
0
.
0
0
0
0
0
0
0
0
0
1
1
0
0
0
.
0
0
0
0
0
0
6
0
0
1
1
0
0
0
0
.
0
0
0
0
0
0
0
1
1
0
0
0
0
0
.
0
0
0
0
0
0
0
0
1
1
0
0
0
0
.
0
0
0
0
0
0
0
1
1
0
0
0
0
0
.
0
0
0
0
0
0
1
1
0
0
0
0
0
0
.
0
0
0
0
0
1
1
0
0
0
0
0
0
0
.
1
2
3
4
5
6
1
0
1
1
0
0
0
2
1
0
0
1
0
0
3
1
0
1
0
1
0
4
0
1
0
1
0
1
5
0
0
1
0
0
0
6
0
0
0
1
0
0
Structural equivalence thus generates 6 positions in the network
Complete Network Analysis
Network Connections: Role Positions
Once you partition the matrix, reduce it:
.
1
1
1
0
0
0
0
0
0
0
0
0
0
1
.
0
0
1
1
0
0
0
0
0
0
0
0
1
0
.
1
0
0
1
1
1
1
0
0
0
0
1
0
1
.
0
0
1
1
1
1
0
0
0
0
0
1
0
0
.
1
0
0
0
0
1
1
1
1
0
1
0
0
1
.
0
0
0
0
1
1
1
1
0
0
1
1
0
0
.
0
0
0
0
0
0
0
0
0
1
1
0
0
0
.
0
0
0
0
0
0
0
0
1
1
0
0
0
0
.
0
0
0
0
0
0
0
1
1
0
0
0
0
0
.
0
0
0
0
0
0
0
0
1
1
0
0
0
0
.
0
0
0
0
0
0
0
1
1
0
0
0
0
0
.
0
0
0
0
0
0
1
1
0
0
0
0
0
0
.
0
0
0
0
0
1
1
0
0
0
0
0
0
0
.
1
1 1
2 1
3 0
1
2
1
1
1
2
3
Regular equivalence
(here I placed a one in the image matrix if there were any ties in the ij block)
3
0
1
0
Complete Network Analysis
Network Connections: Role Positions
Operationally, you have to measure the similarity between actors. If two actors
are structurally equivalent, then they will have identical ties to other people.
Consider the example again:
1 2
1 . 1
2 1 .
1 0
3
1 0
0 1
4
0 1
0 0
5 0 0
0 0
0 0
0 0
6 0 0
0 0
0 0
3
1
0
.
1
0
0
1
1
1
1
0
0
0
0
4
1
0
1
.
0
0
1
1
1
1
0
0
0
0
0
1
0
0
.
1
0
0
0
0
1
1
1
1
5
0
1
0
0
1
.
0
0
0
0
1
1
1
1
0
0
1
1
0
0
.
0
0
0
0
0
0
0
0
0
1
1
0
0
0
.
0
0
0
0
0
0
6
0
0
1
1
0
0
0
0
.
0
0
0
0
0
0
0
1
1
0
0
0
0
0
.
0
0
0
0
0
0
0
0
1
1
0
0
0
0
.
0
0
0
0
0
0
0
1
1
0
0
0
0
0
.
0
0
0
0
0
0
1
1
0
0
0
0
0
0
.
0
0
0
0
0
1
1
0
0
0
0
0
0
0
.
C D Match
1 1
1
0 0
1
. 1
.
1 .
.
0 0
1
0 0
1
1 1
1
1 1
1
1 1
1
1 1
1
0 0
1
0 0
1
0 0
1
0 0
1
Sum: 12
C and D match on all
12 other people, and
are thus structurally
equivalent.
Complete Network Analysis
Network Connections: Role Positions
If the model is going to be based on asymmetric or multiple relations, you simply stack the
various relations, usually including both “directions” of asymmetric relations:
H
Romance
0 1 0 0 0
1 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
W
C
C
C
Romantic Love
Provides food for
Bickers with
0
0
0
0
0
Feeds
0 1 1
0 1 1
0 0 0
0 0 0
0 0 0
Bicker
0 0 0 0
0 0 0 0
0 0 0 1
0 0 1 0
0 0 1 1
1
1
0
0
0
0
0
1
0
0
Stacked
0
1
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
1
1
0
Complete Network Analysis
Network Connections: Role Positions
The metric used to measure structural equivalence by White, Boorman and Brieger is
the correlation between each node’s set of ties. For the example, this would be:
1.00
-0.20
0.08
0.08
-0.19
-0.19
0.77
0.77
0.77
0.77
-0.26
-0.26
-0.26
-0.26
-0.20
1.00
-0.19
-0.19
0.08
0.08
-0.26
-0.26
-0.26
-0.26
0.77
0.77
0.77
0.77
0.08
-0.19
1.00
1.00
-1.00
-1.00
0.36
0.36
0.36
0.36
-0.45
-0.45
-0.45
-0.45
0.08
-0.19
1.00
1.00
-1.00
-1.00
0.36
0.36
0.36
0.36
-0.45
-0.45
-0.45
-0.45
-0.19
0.08
-1.00
-1.00
1.00
1.00
-0.45
-0.45
-0.45
-0.45
0.36
0.36
0.36
0.36
-0.19
0.08
-1.00
-1.00
1.00
1.00
-0.45
-0.45
-0.45
-0.45
0.36
0.36
0.36
0.36
0.77
-0.26
0.36
0.36
-0.45
-0.45
1.00
1.00
1.00
1.00
-0.20
-0.20
-0.20
-0.20
0.77
-0.26
0.36
0.36
-0.45
-0.45
1.00
1.00
1.00
1.00
-0.20
-0.20
-0.20
-0.20
0.77
-0.26
0.36
0.36
-0.45
-0.45
1.00
1.00
1.00
1.00
-0.20
-0.20
-0.20
-0.20
0.77
-0.26
0.36
0.36
-0.45
-0.45
1.00
1.00
1.00
1.00
-0.20
-0.20
-0.20
-0.20
-0.26
0.77
-0.45
-0.45
0.36
0.36
-0.20
-0.20
-0.20
-0.20
1.00
1.00
1.00
1.00
-0.26
0.77
-0.45
-0.45
0.36
0.36
-0.20
-0.20
-0.20
-0.20
1.00
1.00
1.00
1.00
-0.26
0.77
-0.45
-0.45
0.36
0.36
-0.20
-0.20
-0.20
-0.20
1.00
1.00
1.00
1.00
-0.26
0.77
-0.45
-0.45
0.36
0.36
-0.20
-0.20
-0.20
-0.20
1.00
1.00
1.00
1.00
Another common metric is the Euclidean distance between pairs of actors, which you
then use in a standard cluster analysis.
Complete Network Analysis
Network Connections: Role Positions
The initial method for finding structurally equivalent positions was CONCOR, the
CONvergence of iterated CORrelations.
Concor iteration 1:
1.00
-.77
0.55
0.55
-.57
-.57
0.95
0.95
0.95
0.95
-.75
-.75
-.75
-.75
-.77
1.00
-.57
-.57
0.55
0.55
-.75
-.75
-.75
-.75
0.95
0.95
0.95
0.95
0.55
-.57
1.00
1.00
-1.0
-1.0
0.73
0.73
0.73
0.73
-.75
-.75
-.75
-.75
0.55
-.57
1.00
1.00
-1.0
-1.0
0.73
0.73
0.73
0.73
-.75
-.75
-.75
-.75
-.57
0.55
-1.0
-1.0
1.00
1.00
-.75
-.75
-.75
-.75
0.73
0.73
0.73
0.73
-.57
0.55
-1.0
-1.0
1.00
1.00
-.75
-.75
-.75
-.75
0.73
0.73
0.73
0.73
0.95
-.75
0.73
0.73
-.75
-.75
1.00
1.00
1.00
1.00
-.77
-.77
-.77
-.77
0.95
-.75
0.73
0.73
-.75
-.75
1.00
1.00
1.00
1.00
-.77
-.77
-.77
-.77
0.95
-.75
0.73
0.73
-.75
-.75
1.00
1.00
1.00
1.00
-.77
-.77
-.77
-.77
0.95
-.75
0.73
0.73
-.75
-.75
1.00
1.00
1.00
1.00
-.77
-.77
-.77
-.77
-.75
0.95
-.75
-.75
0.73
0.73
-.77
-.77
-.77
-.77
1.00
1.00
1.00
1.00
-.75
0.95
-.75
-.75
0.73
0.73
-.77
-.77
-.77
-.77
1.00
1.00
1.00
1.00
-.75
0.95
-.75
-.75
0.73
0.73
-.77
-.77
-.77
-.77
1.00
1.00
1.00
1.00
-.75
0.95
-.75
-.75
0.73
0.73
-.77
-.77
-.77
-.77
1.00
1.00
1.00
1.00
Complete Network Analysis
Network Connections: Role Positions
The initial method for finding structurally equivalent positions was CONCOR, the
CONvergence of iterated CORrelations.
Concor iteration 2:
1.00
-.99
0.94
0.94
-.94
-.94
0.99
0.99
0.99
0.99
-.99
-.99
-.99
-.99
-.99
1.00
-.94
-.94
0.94
0.94
-.99
-.99
-.99
-.99
0.99
0.99
0.99
0.99
0.94
-.94
1.00
1.00
-1.0
-1.0
0.97
0.97
0.97
0.97
-.97
-.97
-.97
-.97
0.94
-.94
1.00
1.00
-1.0
-1.0
0.97
0.97
0.97
0.97
-.97
-.97
-.97
-.97
-.94
0.94
-1.0
-1.0
1.00
1.00
-.97
-.97
-.97
-.97
0.97
0.97
0.97
0.97
-.94
0.94
-1.0
-1.0
1.00
1.00
-.97
-.97
-.97
-.97
0.97
0.97
0.97
0.97
0.99
-.99
0.97
0.97
-.97
-.97
1.00
1.00
1.00
1.00
-.99
-.99
-.99
-.99
0.99
-.99
0.97
0.97
-.97
-.97
1.00
1.00
1.00
1.00
-.99
-.99
-.99
-.99
0.99
-.99
0.97
0.97
-.97
-.97
1.00
1.00
1.00
1.00
-.99
-.99
-.99
-.99
0.99
-.99
0.97
0.97
-.97
-.97
1.00
1.00
1.00
1.00
-.99
-.99
-.99
-.99
-.99
0.99
-.97
-.97
0.97
0.97
-.99
-.99
-.99
-.99
1.00
1.00
1.00
1.00
-.99
0.99
-.97
-.97
0.97
0.97
-.99
-.99
-.99
-.99
1.00
1.00
1.00
1.00
-.99
0.99
-.97
-.97
0.97
0.97
-.99
-.99
-.99
-.99
1.00
1.00
1.00
1.00
-.99
0.99
-.97
-.97
0.97
0.97
-.99
-.99
-.99
-.99
1.00
1.00
1.00
1.00
Complete Network Analysis
Network Connections: Role Positions
The initial method for finding structurally equivalent positions was CONCOR, the
CONvergence of iterated CORrelations.
Concor iteration 3:
1.00
-1.0
1.00
1.00
-1.0
-1.0
1.00
1.00
1.00
1.00
-1.0
-1.0
-1.0
-1.0
-1.0
1.00
-1.0
-1.0
1.00
1.00
-1.0
-1.0
-1.0
-1.0
1.00
1.00
1.00
1.00
1.00
-1.0
1.00
1.00
-1.0
-1.0
1.00
1.00
1.00
1.00
-1.0
-1.0
-1.0
-1.0
1.00
-1.0
1.00
1.00
-1.0
-1.0
1.00
1.00
1.00
1.00
-1.0
-1.0
-1.0
-1.0
-1.0
1.00
-1.0
-1.0
1.00
1.00
-1.0
-1.0
-1.0
-1.0
1.00
1.00
1.00
1.00
-1.0
1.00
-1.0
-1.0
1.00
1.00
-1.0
-1.0
-1.0
-1.0
1.00
1.00
1.00
1.00
1.00
-1.0
1.00
1.00
-1.0
-1.0
1.00
1.00
1.00
1.00
-1.0
-1.0
-1.0
-1.0
1.00
-1.0
1.00
1.00
-1.0
-1.0
1.00
1.00
1.00
1.00
-1.0
-1.0
-1.0
-1.0
1.00
-1.0
1.00
1.00
-1.0
-1.0
1.00
1.00
1.00
1.00
-1.0
-1.0
-1.0
-1.0
1.00
-1.0
1.00
1.00
-1.0
-1.0
1.00
1.00
1.00
1.00
-1.0
-1.0
-1.0
-1.0
-1.0
1.00
-1.0
-1.0
1.00
1.00
-1.0
-1.0
-1.0
-1.0
1.00
1.00
1.00
1.00
-1.0
1.00
-1.0
-1.0
1.00
1.00
-1.0
-1.0
-1.0
-1.0
1.00
1.00
1.00
1.00
-1.0
1.00
-1.0
-1.0
1.00
1.00
-1.0
-1.0
-1.0
-1.0
1.00
1.00
1.00
1.00
-1.0
1.00
-1.0
-1.0
1.00
1.00
-1.0
-1.0
-1.0
-1.0
1.00
1.00
1.00
1.00
Complete Network Analysis
Network Connections: Role Positions
Automorphic and Regular equivalence are more difficult to find, and require
iteratively searching over possible class assignments for sets that have the same
graph theoretic patterns. Usually start with a set of nodes defined as similar on
a number of network measures, then look within these classes for automorphic
equivalence classes.
A theoretically appealing method for finding structures that are very similar to
regular equivalence, role equivalence, uses the triad census. Each node is
involved in (n-1)(n-2)/2 triads, and occupies a particular position in each of
these triads.
Complete Network Analysis
Network Connections: Role Positions
Triadic Position Census:
40 Positions within all on
two types of mutual ties
Complete Network Analysis
Network Connections: Role Positions
Moving from a similarity/distance matrix to a blockmodel:
number of groups and determining blocks:
“An important decision in an analysis using CONCOR is how fine the
partition should be; in other words, when should one stop splitting
positions? Theory and the interpretability of the solution are the
primary consideration in deciding how many positions to produce.”
(W&F, p.378)
“In defining positions of actors, the ‘trick’ is to choose the point along
the series that gives a useful and interpretable partition of the actors
into equivalence classes.” (W&F p.383)
Complete Network Analysis
Network Connections: Role Positions
Once you have decided on a number of blocks, you need to determine what
counts as a ‘one’ block or a ‘zero’ block. Usually this is a some function of the
density of the resulting block.
General rules:
“Fat Fit” Only put a one in blocks with all ones in the adjacency
matrix
“Lean Fit” Put a zero if all the cells are zero, else put a one
“Density fit” If the average value of the cell is above a certain
cutoff.
White, Boorman and Breiger used a ‘lean fit’ (zeroblock) rule for the
examples in their paper:
Complete Network Analysis
Network Connections: Role Positions
Most common block structures identified in Add Health
Based on CONCOR, imposing a 5-block fit
Complete Network Analysis
Network Connections: Role Positions
An example:
Padgett, J. F. and Ansell, C. K.
Robust action and the rise of
the Medici, 1400-1434.
American Journal of
Sociology. 1993; 9812591319.
“Political Groups” in the attribute
sense do not seem to exist, so
P&A turn to the pattern of
network relations among
families.
This is the block reduction of the
full 92 family network.
Complete Network Analysis
Network Connections: Role Positions
An example based on
regular equivalence
using the Add Health
data.
003
021C_S
030T_S
120U_E
012_S
021C_B
030T_B
120U_S
012_E
021C_E
030T_E
120C_S
012_I
111D_S
030C
120C_B
102_D
111D_B
201_S
102_I
111D_E
201_B
210_S
021D_S
111U_S
120D_S
210_B
021D_E
111U_B
120D_E
021U_S
111U_E
021U_E
120C_E
210_E
300
Complete Network Analysis
Network Connections: Role Positions
Jefferson High School
School provides a good boundary for social
relations
Sunshine High School
School does not provide a good boundary for
social relations
Complete Network Analysis
Network Connections: Role Positions
Jefferson High School
Sunshine High School
4%
34%
43%
32%
52%
33%
Image networks. Width of tie is proportional to the ratio of cell density to mean cell density.
Complete Network Analysis
Network Connections: Role Positions
Jefferson High School
Block
3: Periphery-Outside
4:Periphery -Inside
1:Semi-Per - Outside
2: Semi - per - Inside
7: Core - 2nd String
5: Core - Aloof
6: Core - Most Popular
Size
147
121
64
212
121
80
87
Popularity
0.93
3.29
1.50
3.37
6.03
6.97
10.08
Out
Degree
1.68
2.91
5.89
6.43
7.72
3.39
7.16
WithinBlock
Ties
0.204
0.678
0.25
1.37
1.98
0.488
2.77
OtherBlock
Popularity
0.721
2.612
1.25
2.00
4.05
6.49
7.31
WithinBlock
Reciprocity
0.067
0.732
0.125
0.593
0.267
0.359
0.73
Within
Block
Transitivity
0.00
0.054
0.40
0.107
0.186
0.263
0.568
Out-of
school
ties
2.80
1.88
1.50
0.94
0.67
0.89
0.62
Sunshine High School
Block
1: Receiving Periphery
2: Sending Periphery
3: Semi-Periphery
4: Lieutenants
5: Popular Core
Size
432
471
255
487
76
Popularity
1.23
0.656
2.34
3.77
7.29
Out
Degree
0.20
2.58
5.09
2.90
6.76
WithinBlock
Ties
0.04
0.38
0.89
1.33
2.83
OtherBlock
Popularity
1.19
0.27
1.46
2.45
4.46
WithinBlock
Reciprocity
0.44
0.00
0.75
0.41
0.78
Within
Block
Transitivity
0.00
0.16
0.21
0.18
0.50
Out-of
school
ties
2.16
2.16
1.51
1.74
0.92
Complete Network Analysis
Network Connections: Role Positions
Jefferson High School
Sunshine High School
Complete Network Analysis
Network Connections: Role Positions
Jefferson High School
•Being in the same block significantly increases the
likelihood of being the same behavioral cluster
•“Locally” defined: OR = 1.13
•“Globally” defined: OR = 1.12
•The effect is differential across blocks:
Block:
Local
Global
1: Semi P Outsiders
1.03
1.16 ***
2: Semi P Insiders
0.89***
0.83 ***
3: Periphery
Outsiders
1.76 ***
2.08 ***
4: Periphery Insiders
1.03
1.09 **
5: “Aloof” Core
1.15 ***
0.97
6: Popular Core
1.01
0.79 ***
7: 2nd string Core
1.08
1.04
•Being adjacent in the network has a consistent positive
effect:
•Local: OR = 1.21
•Global: OR = 1.35
Coefficients based on a dyad-level logistic regression model. Models control for grade, gender and SES.
Complete Network Analysis
Network Connections: Role Positions
Sunshine High School
•Being in the same block barely increases the likelihood
of being the same behavioral cluster
•Locally defined: OR = 1.03
•Globally define: OR = 1.02
•The effect is differential across blocks:
Block:
Local
Global
1: Receiving Periphery
1.25 ***
1.27**
2: Sending Periphery
0.99
1.01
3: Semi – Periphery
0.93 **
0.89 **
4: Lieutenants
0.93**
0.88 **
5: Popular Core
0.87**
1.08
•Being adjacent in the network has a weaker, but still
positive effect:
•Local: 1.13
•Global: 1.08
Coefficients based on a dyad-level logistic regression model. Models control for grade, race, gender & SES.
Complete Network Analysis
Network Connections: Role Positions
Compound Relations
One of the most powerful tools in role analysis involves looking at role systems
through compound relations.
A compound relation is formed by combining relations in single dimensions.
The best example of compound relations come from kinship.
Sibling
Sibling
0 1 0 0 0
1 0 0 0 0
0 0 0 1 0
0 0 1 0 0
0 0 0 0 0
x
Child of
0 0 1 1 0
0 0 0 0 1
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
Child of
SC = SC
=
Nephew/Niece
0 0 0 0 1
0 0 1 1 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
Complete Network Analysis
Network Connections: Role Positions
An example of
compound relations
can be found in
W&F. This role
table catalogues the
compounds for two
relations “Is boss
of” and “Is on the
same level as”
Complete Network Analysis
Network Connections: Role Positions
The newest work in block modeling comes from Doreian, Batagelj, and Ferligoj,
who have proposed a system for ‘generalized block models’
Instead of having blocks composed of zeros or ones, you can specify the type of
relation within and between each block as an ideal type.
For example, you might specify a block as “row regular” meaning that every row in
the block has at least one tie (i.e. every person in the block sends a tie to at least one
person in another block).
Two advances:
a) conceptually they generalize the meaning of a block-block tie
b) algorithmically, they make it possible to specify the tie pattern in advance,
moving us from an inductive to a deductive approach.
•While promising, the routine is still a little sensitive. Many of the models
described are not well matched to substantive theory and many of the
examples fit are quite small and don’t always reduce the data much.
•But it is likely where blockmodeling will go in the future.
Complete Network Analysis
Stochastic Network Analysis
Confidence Intervals: Bootstraps and Jackknifes
(Snijders & Borgatti, 1999)
Goal: “Useful to have an indication of how precise a given description is,
particularly when making comparisons between groups.”
Assumes that “a researcher is interested in some descriptive statistic …
and wishes to have a standard error for this descriptive statistic without
making implausibly strong assumptions about how the network came
about.”
Complete Network Analysis
Stochastic Network Analysis
Jackknifes.
Given a dataset w. N sample elements, N artificial datasets are
created by deleting each sample element in turn from the
observed dataset.
In standard practice, the formula for the standard error is then:
SE j
N 1 N
2
(
Z
Z
)
i
N i 1
Complete Network Analysis
Stochastic Network Analysis
Jackknifes: Example on regular data
Obs
i
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
MEAN:
x
0.85
0.70
1.00
0.59
0.22
0.69
0.43
0.32
0.50
0.67
0.60
s1
.
0.70
1.00
0.59
0.22
0.69
0.43
0.32
0.50
0.67
0.57
s2
0.85
.
1.00
0.59
0.22
0.69
0.43
0.32
0.50
0.67
0.58
s3
0.85
0.70
.
0.59
0.22
0.69
0.43
0.32
0.50
0.67
0.55
s4
0.85
0.70
1.00
.
0.22
0.69
0.43
0.32
0.50
0.67
0.60
s5
0.85
0.70
1.00
0.59
.
0.69
0.43
0.32
0.50
0.67
0.64
s6
0.85
0.70
1.00
0.59
0.22
.
0.43
0.32
0.50
0.67
0.59
s7
0.85
0.70
1.00
0.59
0.22
0.69
.
0.32
0.50
0.67
0.61
s8
0.85
0.70
1.00
0.59
0.22
0.69
0.43
.
0.50
0.67
0.63
s9
0.85
0.70
1.00
0.59
0.22
0.69
0.43
0.32
.
0.67
0.61
s10
0.85
0.70
1.00
0.59
0.22
0.69
0.43
0.32
0.50
.
0.59
Complete Network Analysis
Stochastic Network Analysis
SEj = 0.0753
SE = 0.0753
Complete Network Analysis
Stochastic Network Analysis
For networks,we need to adjust the scaling parameter:
SE j
N 2 N
2
(
Z
Z
)
i
2 N i 1
Where Z-i is the network statistic calculated without vertex i, and Z- is
the average of Z-1 … Z-N.
Theoretically, this procedure will work for any network statistic Z
UCINET will use it to test differences in network density.
Complete Network Analysis
Stochastic Network Analysis
An example based on the Trade data. Density, Std. Errors and
confidence intervals for each matrix.
DIP_DEN
DIP_SEJ
DIP_UB
DIP_LB
0.6684783 0.0636125 0.7931588 0.5437978
CRUDE_DEN CRUDE_SEJ CRUDE_UB CRUDE_LB
0.5561594 0.0676669 0.6887866 0.4235323
FOOD_DEN FOOD_SEJ
FOOD_UB
FOOD_LB
0.5561594 0.0633776 0.6803794 0.4319394
MAN_DEN
MAN_SEJ
MAN_UB
MAN_LB
0.5615942 0.0724143 0.7035263 0.4196621
MIN_DEN
MIN_SEJ
MIN_UB
MIN_LB
0.2445652 0.0530224 0.3484891 0.1406414
In practice, I think the estimates can be pretty wide for other network statistics
Complete Network Analysis
Stochastic Network Analysis
In general, bootstrap techniques effectively treat the given sample as the
population, then draw samples, with replacement, from the observed
distribution.
For networks, we draw random samples of the vertices, creating a new
network Y*
Y Yi ( k )i ( h), for i(k ) i(h)
*
kh
If i(k) = i(h), then randomly fill in the dyads based from the set of all possible
dyads (I.e. fill in this cell with a random draw from the population).
Complete Network Analysis
Stochastic Network Analysis
For each bootstrap sample:
• Draw N random numbers, with replacement, from 1 to N, denoted
i(1)..i(N)
• Construct Y* based on i(1)..i(N)
• Calculate the statistic of interest, called Z*m,
Repeat this process M (=thousands) of times.
1
M
*(m )
*( ) 2
SEb
(Z
Z )
m 1
M 1
Complete Network Analysis
Stochastic Network Analysis
Bootstraps: Comparing density
BOOTSTRAP PAIRED SAMPLE T-TEST
-------------------------------------------------------------------------------Density of trade_min is: 0.2446
Density of trade_dip is: 0.6685
Difference in density is: -0.4239
Number of bootstrap samples: 5000
Variance of ties for trade_min: 0.1851
Variance of ties for trade_dip: 0.2220
Classical standard error of difference: 0.0272
Classical t-test (indep samples): -15.6096
Estimated bootstrap standard error for density of trade_min: 0.0458
Estimated bootstrap standard error for density of trade_dip: 0.0553
Bootstrap standard error of the difference (indep samples): 0.0719
95% confidence interval for the difference (indep samples): [-0.5648, -0.2831]
bootstrap t-statistic (indep samples): -5.8994
Bootstrap SE for the difference (paired samples): 0.0430
95% bootstrap CI for the difference (paired samples): [-0.5082, -0.3396]
t-statistic: -9.8547
Average bootstrap difference: -0.3972
Proportion of absolute differences as large as observed: 0.0002
Proportion of differences as large as observed: 1.0000
Proportion of differences as large as observed: 0.0002
Complete Network Analysis
Stochastic Network Analysis
In general, one can test the sensitivity of a particular network measure by randomly
perturbing the original network.
1) Randomly add or delete a small percent of ties and recalculate Z. Do this 1000s of
times and generate a distribution for your statistic of interest.
2) Treat the ties in the network as realizations of an underlying probability distribution,
and then generate networks from this probability distribution:
Simple:
Specify probabilities based on the current structure:
Pij = P1 if Xij = Xji = 1
Pij = P2 if Xij = 1 or Xji = 1
Pij = P3 if Xij = Xji = 0
This is a standard sensitivity process, so you specify P1..Pn as reasonable
ranges, perhaps with constraints.
Complex:
Use a random graph model to generate the edge probabilities, and simulate
from that
Complete Network Analysis
Stochastic Network Analysis: Exponential Random Graph Models
A long research tradition in statistics and random graph theory has lead to
parametric models of networks.
These are models of the entire graph, though as we will see they often work on
the dyads in the graph to be estimated.
Substantively, the approach is to ask whether the graph in question is an element
of the class of all random graphs with the given known elements. For example,
all graphs with 5 nodes and 3 edges, or, put probabilistically, the probability of
observing the current graph given the conditions.
Complete Network Analysis
Stochastic Network Analysis
The earliest approaches are based on simple random graph theory, but there’s
been a flurry of activity in the last 10 years or so.
Key references:
- Holland and Leinhardt (1981) JASA
- Frank and Strauss (1986) JASA
- Wasserman and Faust (1994) – Chap 15 & 16
- Wasserman and Pattison (1996)
Thanks to Mark Handcock for sharing some figures/slides about these models.
Complete Network Analysis
Stochastic Network Analysis
exp{ z ( x)}
p ( X x)
( )
Where:
is a vector of parameters (like regression coefficients)
z is a vector of network statistics, conditioning the graph
is a normalizing constant, to ensure the probabilities sum to 1.
Complete Network Analysis
Stochastic Network Analysis
The simplest graph is a Bernoulli random graph,where each Xij is
independent:
p ( X x)
exp{ ij xij }
i, j
( )
Where:
ij = logit[P(Xij = 1)]
() =P[1 + exp(ij )]
Note this is one of the few cases where () can be written.
Complete Network Analysis
Stochastic Network Analysis
Typically, we add a homogeneity condition, so that all isomorphic
graphs are equally likely. The homogeneous bernulli graph model:
p ( X x)
exp { xij }
Where:
() =[1 + exp()]g
i, j
( )
Complete Network Analysis
Stochastic Network Analysis
If we want to condition on anything much more complicated than density, the
normalizing constant ends up being a problem. We need a way to express the
probability of the graph that doesn’t depend on that constant. It turns out we
can do this by conditioning on a ‘complement’ graph.
First some terms:
X i, j Sociomatri x with ij element forced to 1
X i, j Sociomatri x with ij element forced to 0
X ic, j Sociomatri x with no tie between i and j
Complete Network Analysis
Stochastic Network Analysis
After some algebra:
p( X ij 1 | X ijc )
ij log
[ z ( xij ) z ( xij )]
c
p( X ij 0 | X ij )
Note that we can now model the conditional probability of the graph, as a function
of a set of difference statistics, without reference to the normalizing constant.
The model, then, simply reduces to a logit model on the dyads.
This is a pseudo-liklihood estimate. And is not optimal under many circumstances.
In new work (2005), Wasserman suggests that the statistical inference on the
parameters be viewed with caution.
New methods based on MCMC are coming out, and they are much better.
Complete Network Analysis
Stochastic Network Analysis
Fitting p* models
I highly recommend working through the p* primer examples, which can be
found at:
http://kentucky.psych.uiuc.edu/pstar/index.html
Including:
A Practical Guide To Fitting p* Social Network Models
Via Logistic Regression
The site includes the PREPSTAR program for creating the difference variables
of interest.
Complete Network Analysis
Stochastic Network Analysis
1 2
3 |4 5 6
1 1 1
2 1
1
3 1
1
1
x
4
1 1
5
1
6
1 1
1
2
3
6
4
5
We can model this network based on parameters for overall degree of Choice
(), Differential Choice Within Positions (W), Mutuality(), Differential
Mutuality Within Positions (W), and Transitivity (T).
The vector of model parameters to be estimated is: = { W W T }.
Complete Network Analysis
Stochastic Network Analysis
The first step is to calculate the vector of change statistics. This is done by first
calculating the value of the statistic if the ij tie is present, then if it is absent, then
take the difference. The program PREPSTAR does this for you (see also pspar –
for large networks:
http://www.sfu.ca/~richards/Pages/pspar.html)
For example, the simple choice parameter is Xij, so if forced present Xij=1,
if absent, Xij=0, the difference is going to be 1. Since this is true for every
dyad, it is a constant, equivalent to the model intercept.
Complete Network Analysis
Stochastic Network Analysis
The model described above would be written in W&P notation as:
• z1(x) = L = i,j Xij is the statistic for the Choice parameter, ,
• z2(x) = LW = i,j Xij ij is the statistic for the Choice Within Positions
parameter, W,
• z3(x) = M = i<j Xij Xji is the statistic for the Mutuality parameter, ,
• z4(x) = MW = i<j Xij Xji ij is the statistic for the Mutuality Within Positions
parameter, W,
• z5(x) = TT = i,j,k Xij Xjk Xik is the statistic for the Transitivity parameter, T.
Note that the indicator variable ij=1 if actors i and j are in the same position,
and 0 otherwise.
Complete Network Analysis
Stochastic Network Analysis
1
2
proc logistic descending ;
tie = l lw m mw tt / noint;
run;
3
6
4
5
L = Choice
LW = Within Group
M = Mutuality
MW = Mutual within Group
TT = Transitivity
Substantively, this graph is likely from the random class of graphs with similar mutuality and size
Complete Network Analysis
Stochastic Network Analysis
One practical problem is that the resulting values are often quite correlated,
making estimation difficult. This is particularly difficult with “star”
parameters.
lw
m
mw
tt
lw
1.00000
0.58333
0.0007
0.80178
<.0001
0.15830
0.4034
m
0.58333
0.0007
1.00000
0.80178
<.0001
-0.02435
0.8984
mw
0.80178
<.0001
0.80178
<.0001
1.00000
-0.11716
0.5375
tt
0.15830
0.4034
-0.02435
0.8984
-0.11716
0.5375
1.00000
Complete Network Analysis
Stochastic Network Analysis
Parameters that are often fit include:
1) Expansiveness and attractiveness parameters. = dummies for
each sender/receiver in the network
2) Degree distribution
3) Mutuality
4) Group membership (and all other parameters by group)
5) Transitivity / Intransitivity
6) K-in-stars, k-out-stars
7) Cyclicity
Complete Network Analysis
Stochastic Network Analysis
A second, perhaps more fundamental problem, is that many of the models themselves are
impossible to fit, because they imply graphs that cannot exist in the real world.
Mark Handcock (UW, Statistics) has shown that some of the simplest models predict
‘degenerate networks’ networks where everyone is connected to everyone or noone.
Others have recently suggested that this is a problem of model specification, and that if
you include higher-order graph statistics, the models do not fail.
In either case, the implied link between a probability model of the graph and the
statistical estimation of the graph makes it simple to simulate graphs from parameter
estimates.
This might hold the key for moving from local network data to global network estimates.
Complete Network Analysis
Stochastic Network Analysis
An example:
Network Model Coefficients, In school Networks
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
Complete Network Analysis
Stochastic Network Analysis
Other statistical / computational models for social networks:
1) Actor-oriented models (Snijders). These models attempt to get to the same place as
the p* models, but by specifying the “parameters” as optimization rules in an
oriented micro-simulation. Very effective at dealing with real-world graphs, so
long as they are not too big. The SIENA software deals with this.
2) Dynamic network models
• Both the actor-oriented models and the ERGM models can use time, by
including past graph features as covariates. This effectively models the change
in an arc/edge over time.
•
Tom Snijders has developed a set of HLM-like models for dealing with
networks over time.
Complete Network Analysis
Stochastic Network Analysis
A conceptual merge between random graph models and QAP models is to identify a
sample of graphs from the universe you are trying to model. So, instead of
estimating:
exp{ z ( x)}
p ( X x)
( )
generate X empirically, then compare z(x) to see how likely a measure on x would
be given X. The difficulty, however, is generating X.
Complete Network Analysis
Stochastic Network Analysis
The first option would be to generate all isomorphic graphs within a given
constraint.
This is possible for small graphs, but the number gets large fast. For a
network with 3 nodes, there are 16 possible directed graphs. For a
network with 4 nodes, there are 218, for 5 nodes 9608, for 6
nodes1,540,944, and so on…
So, the best approach is to sample from the universe, but, of course, if you
had the universe you wouldn’t need to sample from it. How do you
sample from a population you haven’t observed?
Use a construction algorithm that generates a random graph with known
constraints.
Complete Network Analysis
Stochastic Network Analysis
Example: Bearman, Peter S., James Moody and Katherine Stovel (2004) “Chains of Affection:
The Structure of Adolescent Romantic and Sexual Networks” American Journal of Sociology
110:44:92
Romantic Relations in Jefferson High
Complete Network Analysis
Stochastic Network Analysis
Simulate random networks with similar degree distribution:
Complete Network Analysis
Stochastic Network Analysis
Simulated networks preserve observed degree, isolated dyad
distribution, and four-cycle constraint
Complete Network Analysis
Stochastic Network Analysis
Simulated networks preserve observed degree, isolated dyad
distribution, and four-cycle constraint: 4 examples from the
simulated set
Social Network Software
UCINET
•The Standard network analysis program, runs in Windows
•Good for computing measures of network topography for single nets
•Input-Output of data is a special 2-file format, but is now able to read
PAJEK files directly.
•Not optimal for large networks
•Available from:
Analytic Technologies
Social Network Software
PAJEK
•Program for analyzing and plotting very large networks
•Intuitive windows interface
•Used for most of the real data plots in this presentation
•Started mainly a graphics program, but has expanded to a wide range of
analytic capabilities
•Can link to the R statistical package
•Free
•Available from:
Social Network Software
Cyram Netminer for Windows
•Newest Product, not yet widely used
•Price range depends on application
•Limited to smaller networks O(100)
http://www.netminer.com/NetMiner/home_01.jsp
Social Network Software
NetDraw
•Also very new, but by one of the best known names
in network analysis software.
•Free
•Limited to smaller networks O(100)
Social Network Software
NEGOPY
•Program designed to identify cohesive sub-groups in a network,
based on the relative density of ties.
•DOS based program, need to have data in arc-list format
•Moving the results back into an analysis program is difficult.
•Available from:
William D. Richards
http://www.sfu.ca/~richards/Pages/negopy.htm
SPAN - Sas Programs for Analyzing Networks (Moody, ongoing)
•is a collection of IML and Macro programs that allow one to:
a) create network data structures from nomination data
b) import/export data to/from the other network programs
c) calculate measures of network pattern and composition
d) analyze network models
•Allows one to work with multiple, large networks
•Easy to move from creating measures to analyzing data
•Available by sending an email to:
[email protected]