Complete Network Analysis Exploratory Analysis Social Networks capture the relations between people.

Download Report

Transcript Complete Network Analysis Exploratory Analysis Social Networks capture the relations between people.

Complete Network Analysis
Exploratory Analysis
Social Networks capture the relations between people. These relations form a system
that can be thought of as a social space.
The advantage of the space analogy is that it captures the “topography” of social
networks: classes, clusters, distance, “centrality”, etc.
The disadvantage is that “spaces” and “fields” are notoriously difficult to study,
because key features are simultaneously active. Current calls for “relational”
sociology make this point clearly (See Martin 2003, Abbott 2001).
“Field serves as some sort of representation for those overarching social regularities
that may also be visualized … as quasi-organisms, systems or structures” J. L.
Martin AJS 2003.
Examples of fields range from abstract notions of status spaces to concrete examples
such as the French academic system.
Complete Network Analysis
Exploratory Analysis
Sociologists often use spatial
analogies, such as MDS or
correspondence analysis,
based on patterns of actor
attributes.
Social Network Analysis lets
you explore the relational
space directly, by mapping
relations directly.
The first step in this
exploration is often
visualizing the network.
Bourdieu “Social Space and Symbolic Space”
Complete Network Analysis
Exploratory Analysis: Network visualization
Network visualization helps build intuition, but you have to keep the drawing
algorithm in mind:
Spring-embeder layouts
Tree-Based layouts
Most effective for very sparse,
regular graphs. Very useful
when relations are strongly
directed, such as organization
charts, internet connections,
Most effective with graphs that have a strong
community structure (clustering, etc). Provides a very
clear correspondence between social distance and
plotted distance
Two images of the same network
Complete Network Analysis
Exploratory Analysis: Network visualization
Network visualization helps build intuition, but you have to keep the drawing
algorithm in mind:
Tree-Based layouts
Spring-embeder layouts
Two images of the same network
Complete Network Analysis
Exploratory Analysis: Network visualization
Network visualization helps build intuition, but you have to keep the drawing
algorithm in mind.
Hierarchy & Tree models
Use optimization routines to add meaning to the “Y-axis” of the plot. This
makes it possible to easily see who is most central because of who is on the
top of the figure. Usually includes some routine for minimizing linecrossing.
Spring Embedder layouts
Work on an analogy to a physical system: ties connecting a pair have
‘springs’ that pull them together. Unconnected nodes have springs that push
them apart. The resulting image reflects the balance of these two features.
This usually creates a correspondence between physical closeness and
network distance.
Complete Network Analysis
Exploratory Analysis: Network visualization
2
12
9
63
Male
Female
Complete Network Analysis
Exploratory Analysis: Network visualization
Using colors to code
attributes makes it simpler to
compare attributes to
relations.
Here we can assess the
effectiveness of two different
clustering routines on a
school friendship network.
Complete Network Analysis
Exploratory Analysis: Network visualization
As networks increase in size, the
effectiveness of a point-and-line
display diminishes, because you
simply run out of plotting
dimensions.
I’ve found that you can still get
some insight by using the
‘overlap’ that results in from a
space-based layout as
information.
Here you see the clustering
evident in movie co-staring for
about 8000 actors.
Complete Network Analysis
Exploratory Analysis: Network visualization
As networks increase in size, the
effectiveness of a point-and-line
display diminishes, because you
simply run out of plotting
dimensions.
I’ve found that you can still get
some insight by using the
‘overlap’ that results in from a
space-based layout as
information.
This figure contains over 29,000
social science authors. The two
dense regions reflect different
topics.
Complete Network Analysis
Exploratory Analysis: Network visualization
Adding time to social networks is
also complicated, as you run out
of space to put time in most
network figures.
One solution is to animate the
network.
Here we see streaming interaction
in a classroom, where the teacher
(yellow square) has trouble
maintaining order.
The SONIA software program
(McFarland and Bender-deMoll)
will produce these figures.
Complete Network Analysis
Exploratory Analysis: Network visualization
Visualization is a tool, but networks are complex and our visualization
tools can sometimes confound. The strong advantage is that you get a
complete overview of multiple features at once. The difficulty comes
with trying to map a complex multi-dimensional object in lowdimensional space.
Complete Network Analysis
Network Connections
“Goods” flow through networks:
Complete Network Analysis
Network Connections
We often care about networks because of how “goods” travel through the
network.
In addition to the simple pairwise probability that one actor passes
information on to another (pij), two factors affect flow through a network:
Topology
-the shape, or form, of the network
- Example: one actor cannot pass information to another unless they
are either directly or indirectly connected
Time
- the timing of contact matters
- Example: an actor cannot pass information he has not receive yet
Complete Network Analysis
Network Connections: Topology
Two features of the network’s topology are known to be important: connectivity
and centrality
Connectivity refers to how actors in one part of the network are connected to
actors in another part of the network.
• Reachability: Is it possible for actor i to reach actor j? This can only be
true if there is a chain of contact from one actor to another.
• Distance: Given they can be reached, how many steps are they from
each other?
• Number of paths: How many different paths connect each pair?
Complete Network Analysis
Network Connections: Topology
Without full network data, you can’t distinguish actors with limited flow
potential from those more deeply embedded in a setting.
c
b
a
Complete Network Analysis
Network Connections: Connectivity
Indirect connections are what make networks systems. One actor can
reach another if there is a path in the graph connecting them.
b
a
a
d
c
b
e
f
c
f
d
e
Paths can be directed, leading to a distinction between strong and weak
components
Complete Network Analysis
Network Connections: Connectivity
Reachability
If you can trace a sequence of relations from one actor to another,
then the two are reachable. If there is at least one path connecting
every pair of actors in the graph, the graph is connected and is called
a component.
Intuitively, a component is the set of people who are all connected by
a chain of relations.
Complete Network Analysis
Network Connections: Connectivity
This example
contains many
components.
Complete Network Analysis
Network Connections: Connectivity
Because relations can be directed or undirected, components come in two flavors:
For a graph with any directed edges, there are two types of components:
Strong components consist of the set(s) of all nodes that are mutually
reachable
Weak components consist of the set(s) of all nodes where at least one node can
reach the other.
Complete Network Analysis
Network Connections: Connectivity
There are only 2 strong
components with more than
1 person in this network.
Components are the
minimum requirement for
social groups. As we will see
later, they are necessary but
not sufficient
All of the major network analysis
software identifies strong and weak
components
Complete Network Analysis
Network Connections: Connectivity
We can extend our conception of component to increase the structural
cohesion of the definition.
Multiple connectivity:
Two paths with the same start and end point, but that have no other
nodes in common are called node independent.
In every component, the paths linking actors i and j must pass through a
set of nodes, S, that if removed would disconnect the graph.
The number of nodes in the smallest S is equal to the number of
independent paths connecting i and j.
Complete Network Analysis
Network Connections: Connectivity
Simple component
2
1
3
4
5
7
6
8
Every path from 1 to 8 must go through 4. S(1,8) = 4, and
N(1,8)=1. That is, the graph is a component.
Complete Network Analysis
Network Connections: Connectivity
In this graph, there are multiple
paths connecting nodes 1 and 8.
Multiple connectivity
2
1
3
4
5
1
7
8
5
6
8
4
3
6
6
7
N(1,8) = 2.
But only 2
of them are
independent.
2
8
7
8
1
4
5
8
5
1
2
3
8
6
7
8
Complete Network Analysis
Network Connections: Connectivity
A bicomponent is the set of all nodes connected by at least 2 node-independent paths.
Complete Network Analysis
Network Connections: Connectivity
Bicomponents can overlap by at most 1 person. These nodes are cutpoints in
the graph. If that node is removed, the graph would be disconnected.
2
1
3
4
5
1
7
6
8
4 is a cutpoint
1 is a cutpoint
Complete Network Analysis
Network Connections: Distance
Geodesic distance is measured by the smallest (weighted) number of relations separating a pair:
Actor “a” is:
1 step from 4
2 steps from 5
3 steps from 4
4 steps from 3
5 steps from 1
a
Complete Network Analysis
Network Connections: Distance
Probability of transfer
by distance and number of paths, assume a constant pij of 0.6
1.2
1
probability
10 paths
0.8
5 paths
0.6
2 paths
0.4
1 path
0.2
0
2
3
4
Path distance
5
6
Complete Network Analysis
Network Connections: Distance
Reachability in Colorado Springs
(Sexual contact only)
•High-risk actors over 4 years
•695 people represented
•Longest path is 17 steps
•Average distance is about 5 steps
•Average person is within 3 steps of
75 other people
•137 people connected through 2
independent paths, core of 30 people
connected through 4 independent
paths
(Node size = log of degree)
Complete Network Analysis
Network Connections: Distance
Calculating distance in global networks: Powers of the adjacency matrix
Calculate reachability through matrix multiplication.
(see p.162 of W&F)
0
1
0
0
0
1
e
d
c
b
a
f
1
0
1
0
0
0
X
0 0
1 0
0 1
1 0
1 1
1 0
0
0
1
1
0
0
1
0
1
0
0
0
2
0
2
0
0
0
0
2
0
1
1
2
X2
2 0
0 1
4 1
1 2
1 1
0 1
Distance
. 1 2 0
1 . 1 2
2 1 . 1
0 2 1 .
0 2 1 1
1 2 1 2
0
1
1
1
2
1
0
2
1
1
.
2
4
0
6
1
1
0
X3
0 2
6 1
2 5
5 2
5 3
6 1
0
2
0
1
1
2
0
4
0
2
2
4
2
1
5
3
2
1
4
0
6
1
1
0
1
2
1
2
2
.
Distance
. 1 2 3 3
1 . 1 2 2
2 1 . 1 1
3 2 1 . 1
3 2 1 1 .
1 2 1 2 2
1
2
1
2
2
.
Complete Network Analysis
Network Connections: Distance
Calculating distance in global networks: Breadth-First Search
In large networks, matrix multiplication is just too slow. A breadth-first
search algorithm works by walking through the graph, reaching all nodes
from a particular start node.
Distance is calculated directly in most SNA software packages.
Complete Network Analysis
Network Connections: Distance
As a graph statistic, the distribution of distance can tell you a good deal about how close
people are to each other (we’ll see this more fully when we get to closeness centrality).
The diameter of a graph is the longest geodesic, giving the maximum distance. We often
use the l, or the mean distance between every pair to characterize the entire graph.
For example, all else equal, we would expect rumors to travel faster through settings
where the average distance is small.
Complete Network Analysis
Network Connections: Distance
Complete Network Analysis
Network Connections: Distance
Complete Network Analysis
Network Connections: Distance
Travers and Milgram’s work on the small world is responsible for the standard
belief that “everyone is connected by a chain of about 6 steps.”
Two questions:
Given what we know about networks, what is the longest path (defined by
handshakes) that separates any two people?
Is 6 steps a long distance or a short distance?
Complete Network Analysis
Network Connections: Distance
When the graph is directed, distance
is also directed (distance to vs
distance from), following the
direction of the tie.
a b c d e f g h i j k l m
-----------------------------------------a. . 1 2 . . . . . . . . 2 1
b. 3 . 1 . . . . . . . . 1 2
c. . . . . . . . . . . . . .
d. 4 3 1 . 1 2 1 . 2 . . 2 3
e. 3 2 2 1 . 1 2 . 1 . . 1 2
f. 4 3 3 2 1 . 3 . 2 . . 2 3
g. 5 4 4 3 2 1 . . 3 . . 3 4
h. . . . . . . . . 1 . . . .
i. . . . . . . . . . . . . .
j. . . . . . . . . 1 . . . .
k. . . . . . . . . 1 . . . .
l. 2 1 2 . . . . . . . . . 1
m. 1 2 3 . . . . . . . . 1 .
a
k
m
b
l
j
i
e
c
f
d
g
h
Complete Network Analysis
Network Connections: Distance
What if everyone maximized
structural holes?
Associates do not know each other:
Results in an exponential growth curve.
Reach entire planet quickly.
Complete Network Analysis
Network Connections: Distance
What if people know each
other randomly?: Random
graph theory shows that we
could reach people quite quickly
if ties were random
Complete Network Analysis
Random Reachability:
Network Connections: Distance
By number of close friends
100%
Degree = 4
Degree = 3
80%
Percent Contacted
Degree = 2
60%
40%
20%
0
0
1
2
3
4
5
6
7
8
Remove
9
10
11
12
13
14
15
Complete Network Analysis
Distance-Reach Distribution for a large Jr. High School
(Add Health data)
Network Connections: Distance
P r o p o r tio n R e a c h e d
"Pine Brook Jr. High"
1
Random graph
Observed
0.8
0.6
0.4
0.2
0
0
1
2
3
4
5
6
7
8
Remove
9
10
11
12
13
14
Complete Network Analysis
Network Connections: Distance
Milgram’s test: Send a packet from sets of randomly selected
people to a stockbroker in Boston.
Experimental Setup:
Arbitrarily select people from 3 pools:
a) People in Boston
b) Random in Nebraska
c) Stockholders in Nebraska
Complete Network Analysis
Network Connections: Distance
Milgram’s Findings:
Distance to target person, by sending group.
Complete Network Analysis
Network Connections: Distance
Most chains found their way
through a small number of
intermediaries.
Understanding why this is true has
been called the “Small-World
Problem,” which has since been
generalized to a much more formal
understanding of tie patterns in large
networks (see below)
For purposes of flow through graphs,
distance is a primary concern so long
as pij < 1. Most measures of position
in a network account for some aspect
of distance.
Complete Network Analysis
Network Connections: Centrality
Distance measures “locate” a node by number of steps that separate them from
the remainder of the network, but there are many other ways of locating nodes
in networks.
Centrality refers to (one dimension of) location, identifying where an actor
resides in a network.
• For example, we can compare actors at the edge of the network to actors
at the center.
• In general, this is a way to formalize intuitive notions about the
distinction between insiders and outsiders.
As a terminology point, some authors distinguish centrality from prestige based on the
directionality of the tie. Since the formulas are the same in every other respect, I stick with
“centrality” for simplicity.
Complete Network Analysis
Network Connections: Centrality
Conceptually, centrality is fairly straight forward: we want to identify
which nodes are in the ‘center’ of the network. In practice, identifying
exactly what we mean by ‘center’ is somewhat complicated, but
substantively we often have reason to believe that people at the center are
very important.
The standard centrality measures capture a wide range of “importance” in
a network:
•Degree
•Closeness
•Betweenness
•Eigenvector / Power measures
After discussing these, I will describe measures that combine features of
each of them.
Complete Network Analysis
Network Connections: Centrality
The most intuitive notion of centrality focuses on degree. Degree is the
number of direct contacts a person has. The ideas is that the actor with the
most ties is the most important:
CD  d (ni )  X i    X ij
j
Complete Network Analysis
Network Connections: Centrality
In a simple random graph (Gn,p), degree will have a Poisson distribution, and the nodes
with high degree are likely to be at the intuitive center. Deviations from a Poisson
distribution suggest non-random processes, which is at the heart of current “scale-free”
work on networks (see below).
Complete Network Analysis
Network Connections: Centrality
Degree centrality,
however, can be
deceiving, because it is a
purely local measure.
Complete Network Analysis
Network Connections: Centrality
If we want to measure the degree to which the graph as a whole is centralized,
we look at the dispersion of centrality:
Simple: variance of the individual centrality scores.
g

2
2
S D   (CD (ni )  Cd )  / g
 i 1

Or, using Freeman’s general formula for centralization (which ranges from 0 to 1):

C


g
CD
i 1
(n )  C D (ni )
*
D

[( g  1)( g  2)]
UCINET, SPAN, PAJEK and most other network software will calculate these measures.
Complete Network Analysis
Degree Centralization Scores
Network Connections: Centrality
Freeman: 1.0
Variance: 3.9
Freeman: .02
Variance: .17
Freeman: .07
Variance: .20
Freeman: 0.0
Variance: 0.0
Complete Network Analysis
Network Connections: Centrality
A second measure of centrality is closeness centrality. An actor is considered
important if he/she is relatively close to all other actors.
Closeness is based on the inverse of the distance of each actor to every other actor
in the network.
Closeness Centrality:


Cc (ni )   d (ni , n j )
 j 1

g
1
Normalized Closeness Centrality
CC' (ni )  (CC (ni ))(g 1)
Complete Network Analysis
Closeness Centrality in the examples
Network Connections: Centrality
Distance
0
1
1
1
1
1
1
1
1
0
2
2
2
2
2
2
1
2
0
2
2
2
2
2
1
2
2
0
2
2
2
2
1
2
2
2
0
2
2
2
1
2
2
2
2
0
2
2
Closeness
1
2
2
2
2
2
0
2
1
2
2
2
2
2
2
0
Distance
0
1
2
3
4
4
3
2
1
1
0
1
2
3
4
4
3
2
2
1
0
1
2
3
4
4
3
3
2
1
0
1
2
3
4
4
4
3
2
1
0
1
2
3
4
4
4
3
2
1
0
1
2
3
3
4
4
3
2
1
0
1
2
.143
.077
.077
.077
.077
.077
.077
.077
Closeness
2
3
4
4
3
2
1
0
1
1
2
3
4
4
3
2
1
0
.050
.050
.050
.050
.050
.050
.050
.050
.050
normalized
1.00
.538
.538
.538
.538
.538
.538
.538
normalized
.400
.400
.400
.400
.400
.400
.400
.400
.400
Complete Network Analysis
Network Connections: Centrality
Closeness Centrality in the examples
Distance
0 1 2 3 4
1 0 1 2 3
2 1 0 1 2
3 2 1 0 1
4 3 2 1 0
5 4 3 2 1
6 5 4 3 2
5
4
3
2
1
0
1
6
5
4
3
2
1
0
Closeness
.048
.063
.077
.083
.077
.063
.048
normalized
.286
.375
.462
.500
.462
.375
.286
Complete Network Analysis
Closeness Centrality in the examples
Network Connections: Centrality
Distance
0
1
1
2
3
4
4
5
5
6
5
5
6
1
0
1
1
2
3
3
4
4
5
4
4
5
1
1
0
1
2
3
3
4
4
5
4
4
5
2
1
1
0
1
2
2
3
3
4
3
3
4
3
2
2
1
0
1
1
2
2
3
2
2
3
4
3
3
2
1
0
2
3
3
4
1
1
2
4
3
3
2
1
2
0
1
1
2
3
3
4
5
4
4
3
2
3
1
0
1
1
4
4
5
5
4
4
3
2
3
1
1
0
1
4
4
5
6
5
5
4
3
4
2
1
1
0
5
5
6
Closeness
5
4
4
3
2
1
3
4
4
5
0
1
1
5
4
4
3
2
1
3
4
4
5
1
0
1
6
5
5
4
3
2
4
5
5
6
1
1
0
.021
.027
.027
.034
.042
.034
.034
.027
.027
.021
.027
.027
.021
normalized
.255
.324
.324
.414
.500
.414
.414
.324
.324
.255
.324
.324
.255
Complete Network Analysis
Network Connections: Centrality
Betweenness Centrality:
Model based on communication flow: A person who lies on communication
paths can control communication flow, and is thus important. Betweenness centrality
counts the number of shortest paths between i and k that actor j resides on.
b
a
C d e f g h
Complete Network Analysis
Network Connections: Centrality
Betweenness Centrality:
CB (ni )   g jk (ni ) / g jk
j k
Where gjk = the number of geodesics connecting jk, and
gjk(ni) = the number that actor i is on.
Usually normalized by:
C (ni )  CB (ni ) /[(g 1)(g  2) / 2]
'
B
Complete Network Analysis
Network Connections: Centrality
Betweenness Centrality:
Centralization: 1.0
Centralization: .59
Centralization: .31
Centralization: 0
Complete Network Analysis
Network Connections: Centrality
Betweenness Centrality:
Centralization: .183
Complete Network Analysis
Network Connections: Centrality
Information Centrality:
It is quite likely that information can flow through paths other than the geodesic. The
Information Centrality score uses all paths in the network, and weights them based on their length.
Complete Network Analysis
Network Connections: Centrality
Graph Theoretic Center
(Barry or Jordan Center).
Identify the points with the
smallest, maximum distance
to all other points.
Value = longest
distance to any other
node.
The graph theoretic
center is ‘3’, but you
might also consider a
continuous measure as
the inverse of the
maximum geodesic
Complete Network Analysis
Network Connections: Centrality
Information Centrality:
Complete Network Analysis
Network Connections: Centrality
Comparing across these 3 centrality values
•Generally, the 3 centrality types will be positively correlated
•When they are not (low) correlated, it probably tells you something interesting about the network.
Low
Degree
High Degree
High Closeness
Key player tied to
important
important/active alters
High Betweenness
Ego's few ties are
crucial for network
flow
Low
Closeness
Low
Betweenness
Embedded in cluster
that is far from the rest
of the network
Ego's connections are
redundant communication
bypasses him/her
Probably multiple
paths in the network,
ego is near many
people, but so are
many others
Very rare cell. Would
mean that ego
monopolizes the ties
from a small number
of people to many
others.
Complete Network Analysis
Network Connections: Centrality
Bonacich Power Centrality: Actor’s centrality (prestige) is equal to a function of the
prestige of those they are connected to. Thus, actors who are tied to very central actors
should have higher prestige/ centrality than those who are not.
C( ,  )   ( I  R) R1
1
•  is a scaling vector, which is set to normalize the score.
•  reflects the extent to which you weight the centrality of people ego is tied to.
• R is the adjacency matrix (can be valued)
• I is the identity matrix (1s down the diagonal)
• 1 is a matrix of all ones.
Complete Network Analysis
Network Connections: Centrality
Bonacich Power Centrality:
The magnitude of  reflects the radius of power. Small values of  weight
local structure, larger values weight global structure.
If  is positive, then ego has higher centrality when tied to people who are
central.
If  is negative, then ego has higher centrality when tied to people who are not
central.
As  approaches zero, you get degree centrality.
Complete Network Analysis
Network Connections: Centrality
Bonacich Power Centrality:
2
1.8
1.6
 = 0.23
1.4
1.2
Positive
Negative
1
0.8
0.6
0.4
0.2
0
1
2
3
4
5
6
7
Complete Network Analysis
Network Connections: Centrality
Bonacich Power Centrality:
=.35
=-.35
Complete Network Analysis
Network Connections: Centrality
Bonacich Power Centrality:
=.23
=-.23
Complete Network Analysis
Network Connections: Centrality
In recent work, Borgatti (2003; 2005) discusses centrality in terms of two key
dimensions:
Radial
Frequency
Distance
Degree Centrality
Bon. Power centrality
Closeness Centrality
Medial
Betweenness
(empty: but would be
an interruption measure
based on distance)
Complete Network Analysis
Network Connections: Centrality
In recent work, Borgatti (2003; 2005) discusses centrality in terms of two key
dimensions:
Substantively, the key question for centrality is knowing what is flowing
through the network. The key features are:
•Whether the actor retains the good to pass to others (Information,
Diseases) or whether they pass the good and then loose it (physical
objects)
•Whether the key factor for spread is distance (disease with low pij) or
multiple sources (information)
The off-the-shelf measures do not always match the social process of
interest, so researchers need to be mindful of this.
Complete Network Analysis
Network Connections: Centrality
Actors that appear very
different when seen
individually, are
comparable in the global
network.
Graph is 27% centralized
(Node size proportional to betweenness centrality )
Complete Network Analysis
Network Connections: Centrality
Centrality example: Add Health
Node size proportional to
betweenness centrality
Graph is 45% centralized
Network Topology: Centrality and Centralization
Measures research:
Rothenberg, et al. 1995. "Choosing a Centrality Measure: Epidemiologic
Correlates in the Colorado Springs Study of Social Networks." Social
Networks: Special Edition on Social Networks and Infectious Disease:
HIV/AIDS 17:273-97.
•Found that the HIV positive actors were not central to the overall
network
Bell, D. C., J. S. Atkinson, and J. W. Carlson. 1999. "Centrality Measures for
Disease Transmission Networks." Social Networks 21:1-21.
•Using a data-based simulation on 22 people, found that simple degree
measures were adequate, relative to complexity
Poulin, R., M.-C. Boily, and B. R. Masse. 2000. "Dynamical Systems to
Define Centrality in Social Networks." Social Networks 22:187-220
•Method that allows one to compare across non-connected portions of a
network, applied to a network of 40 people w. AIDS
Complete Network Analysis
Network Connections: Network Evolution
Two factors that affect network flows:
Topology
- the shape, or form, of the network
- simple example: one actor cannot pass information to
another unless they are either directly or indirectly
connected
Time
- the timing of contacts matters
- simple example: an actor cannot pass information he has
not yet received.
Complete Network Analysis
Network Connections: Network Evolution
Timing in networks
A focus on contact structure has often slighted the importance of network
dynamics,though a number of recent pieces are addressing this.
Time affects networks in two important ways:
1) The structure itself evolves, in ways that will affect the topology an
thus flow.
Wasserheit and Aral, 1996. “The dynamic topology of Sexually
Transmitted Disease Epidemics” The Journal of Infectious Diseases
74:S201-13
Rothenberg, et al. 1997 “Using Social Network and Ethnographic Tools
to Evaluate Syphilis Transmission” Sexually Transmitted Diseases
25: 154-160
2) The timing of contact constrains flow
Moody 2002, Social Forces
Morris and Kretchmar, 1995
Complete Network Analysis
Network Connections: Network Evolution
Sexual Relations among A syphilis outbreak
Rothenberg et al map the
pattern of sexual contact
among youth involved in
a Syphilis outbreak in
Atlanta over a one year
period.
(Syphilis cases in red)
Jan - June, 1995
Sexual Relations among A syphilis outbreak
July-Dec, 1995
Sexual Relations among A syphilis outbreak
July-Dec, 1995
Complete Network Analysis
Network Connections: Network Evolution
Drug Relations, Colorado Springs, Year 1
Data on drug users in
Colorado Springs, over
5 years
Complete Network Analysis
Network Connections: Network Evolution
Drug Relations, Colorado Springs, Year 2
Current year in red, past relations in gray
Complete Network Analysis
Network Connections: Network Evolution
Drug Relations, Colorado Springs, Year 3
Current year in red, past relations in gray
Complete Network Analysis
Network Connections: Network Evolution
Drug Relations, Colorado Springs, Year 4
Current year in red, past relations in gray
Complete Network Analysis
Network Connections: Network Evolution
Drug Relations, Colorado Springs, Year 5
Current year in red, past relations in gray
Complete Network Analysis
Network Connections: Network Evolution
How do we analyze change in networks over time?
a) Descriptive techniques (change in measures over time)
b) Visualization
c) Network statistical models (Sienna, see below under models)
Complete Network Analysis
Network Connections: Social Balance
One of the best theoretical approaches to understanding change in networks over time
is to ask how the current relational patterns are likely to affect future relations. That
is, make relational change endogenous.
There are many models that do this, but the most famous for affective relations is
social balance.
Other models include:
•Preferential attachment: “the rich get richer” (Barabasi)
•Avoiding asymmetry (Gould)
•Avoiding close past relations (cycles of 4) (Bearman, Moody & Stovel)
•Development of Hierarchy (Ivan Chase)
Complete Network Analysis
Network Connections: Social Balance
Social Balance & Transitivity
We determine balance based on the product of the edges:
+
+
(+)(+)(+) = (+)
Balanced
“A friend of a friend
is a friend”
-
(-)(+)(-) = (-)
Balanced
“An enemy of my
enemy is a friend”
-
(-)(-)(-) = (-)
Unbalanced
“An enemy of my
enemy is an enemy”
+
(+)(-)(+) = (-)
+
+
-
+
-
Unbalanced
“A Friend of a
Friend is an enemy”
Complete Network Analysis
Network Connections: Social Balance
Heider argued that unbalanced triads would be unstable: They should
transform toward balance
+
+
Become Friends
-
Become Enemies
+
Become Enemies
+
+
+
+
-
-
-
Complete Network Analysis
Network Connections: Social Balance
IF such a balancing process were active throughout the graph, all
intransitive triads would be eliminated from the network. This would
result in one of two possible graphs (Balance Theorem):
Complete Clique
Balanced Opposition
Friends with
Enemies with
Complete Network Analysis
Network Connections: Social Balance
Empirically, we often find that graphs break up into more than two groups.
What does this imply for balance theory?
It turns out, that if you allow all negative triads, you can get a graph with
many clusters. That is, instead of treating (-)(-)(-) as an forbidden triad,
treat it as allowed. This implies that the micro rule is different: negative
ties among enemies are not as motivating as positive ties.
Complete Network Analysis
Network Connections: Social Balance
Empirically, we also rarely have symmetric relations (at least on
affect) thus we need to identify balance in undirected relations.
Directed dyads can be in one of three states:
1) Mutual
2) Asymmetric
3) Null
Every triad is composed of 3 dyads, and we can identify triads based on the
number of each type, called the MAN label system
Complete Network Analysis
Network Connections: Social Balance
Balance in directed relations
Actors seek out transitive relations, and avoid intransitive
relations. A triple is transitive
If:
i
j
&
j
k
then:
i
k
• A property of triples within triads
• Assumes directed relations
• The saliency of a triad may differ for each actor, depending on
their position within the triad.
Complete Network Analysis
Network Connections: Social Balance
Once we admit directed relations, we need to decompose triads
into their constituent triples.
Ordered Triples:
b
a
c
120C
a
a
b
b
c
c
b
c
a
c
a
b
c;
b;
c;
a;
b;
a;
a
a
b
b
c
c
c
b
c
a
b
a
Transitive
Vacuous
Vacuous
Intransitive
Intransitive
Vacuous
Complete Network Analysis
Network Connections: Social Balance
Network Sub-Structure: Triads
(0)
(1)
(2)
(3)
(4)
(5)
003
012
102
111D
201
210
021D
111U
120D
(6)
300
Intransitive
Transitive
021U
030T
120U
021C
030C
120C
Mixed
Complete Network Analysis
Network Connections: Social Balance
An Example of the triad census
Type
Number of triads
--------------------------------------1 - 003
21
--------------------------------------2 - 012
26
3 - 102
11
4 - 021D
1
5 - 021U
5
6 - 021C
3
7 - 111D
2
8 - 111U
5
9 - 030T
3
10 - 030C
1
11 - 201
1
12 - 120D
1
13 - 120U
1
14 - 120C
1
15 - 210
1
16 - 300
1
--------------------------------------Sum (2 - 16):
63
Pajek & SPAN will give you the triad census
Complete Network Analysis
Network Connections: Social Balance
As with undirected graphs, you can use the type of triads allowed to characterize the
total graph. But now the potential patterns are much more diverse
1) All triads are 030T:
A perfect linear hierarchy.
Complete Network Analysis
Network Connections: Social Balance
Triads allowed: {300, 102}
N*
M
M
1
0
0
1
Complete Network Analysis
Network Connections: Social Balance
Cluster Structure, allows triads: {003, 300, 102}
N*
Eugene Johnsen
(1985, 1986)
specifies a number of
structures that result
from various triad
configurations
M
M
N*
N*
N*
1
N*
1
M
1
1
M
Complete Network Analysis
Network Connections: Social Balance
PRC{300,102, 003, 120D, 120U, 030T, 021D, 021U} Ranked Cluster:
M
A*
A*
A*
A*
N*
M
M
A*
A*
1
1
1
1
1
0
1
0
1
1
0
0
1
1
1
0
0
0
1
0
0
0
0
0
1
A*
A*
N*
M
M
And many more...
Complete Network Analysis
Network Connections: Social Balance
Substantively, specifying a set of triads defines a behavioral mechanism, and we
can use the distribution of triads in a network to test whether the hypothesized
mechanism is active.
We do this by (1) counting the number of each triad type in a given network and (2)
comparing it to the expected number, given some random distribution of ties in the
network.
See Wasserman and Faust, Chapter 14 for computation details, and the SPAN
manual for SAS code that will generate these distributions, if you so choose.
Complete Network Analysis
Network Connections: Social Balance
Structural Indices based on the distribution of triads
The observed distribution of triads can be fit to the hypothesized structures using
weighting vectors for each type of triad.
T  lμ T )
(
l
 (l ) 
l T l
Where:
l = 16 element weighting vector for the triad types
T = the observed triad census
mT= the expected value of T
T = the variance-covariance matrix for T
Complete Network Analysis
Network Connections: Social Balance
For the Add Health data, the observed distribution of the tau statistic
for various models was:
Indicating that a ranked-cluster model fits the best.
Complete Network Analysis
Network Connections: Social Balance
So far, the structural features of a network focus on the graph ‘at equilibrium.’
That is, we have hypothesized structures once people have made all the
choices they are going to make. What we have not done, is really look
closely at the implication of changing relations.
That is, we might say that triad 030C should not occur, but what would a
change in this triad imply from the standpoint of the actor making a relational
change?
Complete Network Analysis
Network Connections: Social Balance
Transition to a
Vacuous Triple
Transition to a
Transitive Triple
Transition to an
Intransitive
Triple
102
030C
120C
111U
021C
201
003
012
111D
021D
210
120U
030T
021U
120D
300
Complete Network Analysis
Network Connections: Social Balance
Observed triad transition patterns, from Hallinan’s data.
030C
120C
102
111U
201
021C
111D
003
012
210
021D
021U
120U
030T
120D
300
Complete Network Analysis
Network Connections: Social Balance
Doreian, Kapuscinski, Krackhardt & Szczypula:
A breif history of balance through time.
Reanalyzes the Newcomb fraternity data, to look at changes in social balance over
time.
The basic balance theory hypothesis is that people who find themselves in an
unbalanced position should change their relations to generate balance.
Hypothetically, this should lead to greater balance over time.
After discussing a set of problems imposed because the data are forced ranks, they
first look at simple reciprocity.
Complete Network Analysis
Network Connections: Social Balance
Doreian, Kapuscinski, Krackhardt & Szczypula:
A brief history of balance through time.
Complete Network Analysis
Network Connections: Social Balance
Doreian, Kapuscinski, Krackhardt & Szczypula:
A brief history of balance through time.
Complete Network Analysis
Network Connections: Social Balance
Doreian, Kapuscinski, Krackhardt & Szczypula:
A brief history of balance through time.
Relational Stability
% Change in ties
40
30
20
10
0
1
2
3
4
5
6
7
8
Week
9
10
11
12
13
14
Complete Network Analysis
Network Connections: Social Balance
Doreian, Kapuscinski, Krackhardt & Szczypula:
A brief history of balance through time.
In addition to the simple degree of transitivity, they want to measure whether the
structure as a whole conforms to the prediction of structural balance.
They identify groups by partitioning the network to minimize the number of negative
ties within group and the number of positive ties between group (this algorithm is
implemented in PAJEK).
They can then measure structural imbalance as the sum if departures for structural
balance (2 and only 2 groups) and generalized balance (greater than 2 groups).
Complete Network Analysis
Network Connections: Social Balance
Extent of Structural Imbalance
17
Structural Imbalance
Generalized Imbalance
15
13
11
9
1
2
3
4
5
6
7
8
Week
9
10 11 12 13 14 15
Complete Network Analysis
Network Connections: Social Balance
Doreian, Kapuscinski, Krackhardt & Szczypula:
A brief history of balance through time.
They point out that the dynamic action of individuals had group implications, which is
part of what makes balance so attractive.
“…the micro-level processes can be viewed as generating social forces that
move the structure toward group balance.”
They also point out that negative ties within groups are likely less tolerated than positive
ties between groups, as negatives within group may threaten the group in ways that
positive ties between groups do not.
Complete Network Analysis
Network Connections: Time Constraint
What impact does this kind of timing have on disease flow?
The most dramatic effect occurs with the distinction between concurrent and
serial relations.
Relations are concurrent whenever an actor has more than one sex partner
during the same time interval. Concurrency is dangerous for disease spread
because:
a) compared to serially monogamous couples, and STDis not trapped
inside a single dyad
b) the std can travel in two directions - through ego - to either of his/her
partners at the same time
Complete Network Analysis
Network Connections: Time Constraint
Concurrency and Epidemic Size
Morris & Kretzschmar (1995)
1200
800
400
0
0
Monogamy
1
2
3
Disassortative
Population size is 2000, simulation ran over 3 ‘years’
4
Random
5
6
7
Assortative
Complete Network Analysis
Network Connections: Time Constraint
Concurrency and disease spread
Adjusting for
other mixing
patterns:
Variable
Constant
Concurrent
K2
Degree Correlation
Bias
Coefficient
84.18
357.07
440.38
-557.40
982.31
Each .1 increase in
concurrency results in
45 more positive cases
Complete Network Analysis
Network Connections: Time Constraint
What impact does timing have on flow through the network?
C
A
2-5
8-9
E
B
D
Numbers above lines indicate contact periods
3-5
F
Complete Network Analysis
Network Connections: Time Constraint
The path graph for the hypothetical contact network
A
C
E
D
F
B
While clearly important, this is not often handled well by current software.
Complete Network Analysis
Network Connections: Time Constraint
Direct Contact Network of 8 people in a ring
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Complete Network Analysis
Network Connections: Time Constraint
Implied Contact Network of 8 people in a ring
All relations Concurrent
1
1
2
2
2
2
2
1
1
2
2
2
2
2
2
1
1
2
2
2
2
2
2
1
1
2
2
2
2
2
2
1
1
2
2
2
2
2
2
1
1
2
2
2
2
2
2
1
1
1
2
2
2
2
2
1
Complete Network Analysis
Network Connections: Time Constraint
Implied Contact Network of 8 people in a ring
Mixed Concurrent
3
2
1
2
1
1
1
1
1
2
2
3
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Complete Network Analysis
Network Connections: Time Constraint
Implied Contact Network of 8 people in a ring
Serial Monogamy (1)
8
1
1
2
7
3
6
5
4
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Complete Network Analysis
Network Connections: Time Constraint
Implied Contact Network of 8 people in a ring
Serial Monogamy (2)
8
1
1
2
7
1
1
1
1
1
1
1
1
3
6
1
4
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Complete Network Analysis
Network Connections: Time Constraint
Implied Contact Network of 8 people in a ring
Serial Monogamy (3)
2
1
1
2
1
1
1
1
1
2
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Complete Network Analysis
Network Connections: Time Constraint
Identifying the Minimum Path Density of a Graph
It turns out that the safest network is one where relations are ‘inter-woven’
in a “early-late-earlier” pattern. To identify the paths empirically, you
must search all possible paths in the network.
t2
t2
t1
t1
t2
t1
t1
t2
Complete Network Analysis
Network Connections: Time Constraint
Implications of Time-ordered Networks
• Any measure calculated on the adjacency structure that rests on
reachability or flow may be misleading.
•There are highly non-linear effects to changing the timing of a
relation on total reachability
• Within connected components, time order may partition the
network into reachable sub-groups.
• Infection risk can be assessed on a continuum from complete
concurrency to some minimum level of reachability in the network.
Complete Network Analysis
Network Connections: Time Constraint
Complete Network Analysis
Network Connections: Time Constraint
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Most of our interest in networks is in how things flow through the network, which brings
us to questions about the diffusion of goods through networks.
We have already seen the limits to diffusion through network connections and timing, but
a number of studies focus on how network structure affects diffusion directly.
These include questions about the diffusion of goods and ideas through a network as well
as the outcomes of diffusion.
Examples include:
Spatial Diffusion Models
Critical Mass Models
Dyadic Contact models
Peer Influence Models
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Cumulative % using “Gammanym”
Coleman, Katz and Menzel, “Diffusion of an innovation among physicians”
Sociometry (1957)
1
> 3 nominations
0.8
1 - 2 noms
0.6
0.4
0 nominations
0.2
0
2
4
6
8
10
12
14
Week since introduction
16
18
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Attitudes are a function of two sources:
a) Individual characteristics
•Gender, Age, Race, Education, Etc. Standard sociology
b) Interpersonal influences
•Actors negotiate opinions with others
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Freidkin claims in his Structural Theory of Social Influence that the theory has four
benefits:
•relaxes the simplifying assumption of actors who must either
conform or deviate from a fixed consensus of others (public choice
model)
•Does not necessarily result in consensus, but can have a stable
pattern of disagreement
•Is a multi-level theory:
•micro level: cognitive theory about how people weigh and
combine other’s opinions
•macro level: concerned with how social structural
arrangements enter into and constrain the opinion-formation
process
•Allows an analysis of the systemic consequences of social structures
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Formal Model
Y
(1)
 XB
(T 1)
Y  αWY
(t )
(1)
 (1  α)Y
(1)
(2)
Y(1) = an N x M matrix of initial opinions on M issues for N
actors
X = an N x K matrix of K exogenous variable that affect Y
B = a K x M matrix of coefficients relating X to Y
 = a weight of the strength of endogenous interpersonal
influences
W = an N x N matrix of interpersonal influences
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Formal Model
Y
(1)
 XB
(1)
This is the standard sociology model for explaining anything: the General Linear Model.
It says that a dependent variable (Y) is some function (B) of a set of independent
variables (X). At the individual level, the model says that:
Yi   X ik Bk
k
Usually, one of the X variables is e, the model error term.
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
(T 1)
Y  αWY
(t )
 (1  α)Y
(1)
(2)
This part of the model taps social influence. It says that each person’s final opinion is
a weighted average of their own initial opinions
(1  α)Y
(1)
And the opinions of those they communicate with (which can include their own current
opinions)
(T 1)
αWY
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
The key to the peer influence part of the model is W, a matrix of
interpersonal weights. W is a function of the communication structure of the
network, and is usually a transformation of the adjacency matrix. In general:
0  wij  1
w
ij
1
j
Various specifications of the model change the value of wii, the extent to which
one weighs their own current opinion and the relative weight of alters.
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
2
1
3
Self weight:
1
2
3
4
1
1
1
1
0
2
1
1
1
0
3
1
1
1
1
4
0
0
1
1
1
2
3
4
1
2
.33 .33
.33 .33
.25 .25
0
0
3
4
.33
0
.33
0
.25 .25
.50 .50
Even
3
4
.25
0
.25
0
.40 .20
.33 .67
2*self
1
2
3
4
1
2
.50 .25
.25 .50
.20 .20
0
0
3
4
.25
0
.25
0
.50 .17
.50 .50
degree
1
2
3
4
1
2
.50 .25
.25 .50
.17 .17
0
0
4
1
2
3
4
1
2
1
1
0
1
2
3
4
2
1
2
1
0
1
2
1
1
0
3
1
1
2
1
2
1
2
1
0
4
0
0
1
2
3
1
1
3
1
4
0
0
1
1
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Formal Properties of the model
(T 1)
Y  αWY
(t )
 (1  α)Y
(1)
When interpersonal influence is complete, model reduces to:
Y
(t )
(T 1)
 1WY
 0Y
(1)
(T 1)
 WY
When interpersonal influence is absent, model reduces to:
Y
(t )
(T 1)
 0WY
Y
(1)
Y
(1)
(2)
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Formal Properties of the model
If we allow the model to run over t, we can describe the model as:
Y
( )
( )
 αWY
 (1  α)XB
The model is directly related to spatial econometric models:
Y
( )
( )
 αWY
~
 X  e
Where the two coefficients ( and ) are estimated directly (See Doreian,
1982, SMR)
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Simple example
2
1
3
1
2
3
4
1
2
.33 .33
.33 .33
.25 .25
0
0
3
4
.33
0
.33
0
.25 .25
.50 .50
Y
1
3
5
7
3
2.93
3.33
4.16
5.30
4
2.98
3.38
4.14
5.18
 = .8
4
T: 0
1.00
3.00
5.00
7.00
1
2.60
3.00
4.20
6.20
2
2.81
3.21
4.20
5.56
5
3.00
3.40
4.14
5.13
6
3.01
3.41
4.13
5.11
7
3.01
3.41
4.13
5.10
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Simple example
2
1
1
2
3
4
3
1
2
.33 .33
.33 .33
.25 .25
0
0
3
4
.33
0
.33
0
.25 .25
.50 .50
Y
1
3
5
7
 = 1.0
4
T: 0
1.00
3.00
5.00
7.00
1
2
3
4
5
6
7
3.00
3.00
4.00
6.00
3.33
3.33
4.00
5.00
3.56
3.56
3.92
4.50
3.68
3.68
3.88
4.21
3.74
3.74
3.86
4.05
3.78
3.78
3.85
3.95
3.81
3.81
3.84
3.90
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Extended example: building intuition
Consider a network with three cohesive groups, and an initially random distribution of
opinions:
(to run this model, use peerinfl1.sas)
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Simulated Peer Influence:
75 actors, 2 initially random opinions, Alpha = .8, 7 iterations
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Simulated Peer Influence:
75 actors, 2 initially random opinions, Alpha = .8, 7 iterations
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Simulated Peer Influence:
75 actors, 2 initially random opinions, Alpha = .8, 7 iterations
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Simulated Peer Influence:
75 actors, 2 initially random opinions, Alpha = .8, 7 iterations
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Simulated Peer Influence:
75 actors, 2 initially random opinions, Alpha = .8, 7 iterations
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Simulated Peer Influence:
75 actors, 2 initially random opinions, Alpha = .8, 7 iterations
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Simulated Peer Influence:
75 actors, 2 initially random opinions, Alpha = .8, 7 iterations
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Simulated Peer Influence:
75 actors, 2 initially random opinions, Alpha = .8, 7 iterations
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Extended example: building intuition
Consider a network with three cohesive groups, and an initially random distribution of
opinions:
Now weight in-group ties higher than between group ties
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Simulated Peer Influence:
75 actors, 2 initially random opinions, Alpha = .8, 7 iterations, in-group tie: 2
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Simulated Peer Influence:
75 actors, 2 initially random opinions, Alpha = .8, 7 iterations, in-group tie: 2
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Simulated Peer Influence:
75 actors, 2 initially random opinions, Alpha = .8, 7 iterations, in-group tie: 2
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Simulated Peer Influence:
75 actors, 2 initially random opinions, Alpha = .8, 7 iterations, in-group tie: 2
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Simulated Peer Influence:
75 actors, 2 initially random opinions, Alpha = .8, 7 iterations, in-group tie: 2
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Simulated Peer Influence:
75 actors, 2 initially random opinions, Alpha = .8, 7 iterations, in-group tie: 2
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Consider the implications for populations of different structures. For example, we might
have two groups, a large orthodox population and a small heterodox population. We can
imagine the groups mixing in various levels:
Heterodox: 10 people
Orthodox: 100 People
Little Mixing
Moderate Mixing
Heavy Mixing
.95 .001
.001 .02
.95 .008
.008 .02
.95 .05
.05 .02
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Heavy
Light
Moderate
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Light mixing
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Light mixing
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Light mixing
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Light mixing
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Light mixing
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Light mixing
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Moderate mixing
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Moderate mixing
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Moderate mixing
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Moderate mixing
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Moderate mixing
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Moderate mixing
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
High mixing
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
High mixing
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
High mixing
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
High mixing
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
High mixing
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
High mixing
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
In an unbalanced situation (small group vs large group) the extent of contact
can easily overwhelm the small group. Applications of this idea are evident
in:
•Missionary work (Must be certain to send missionaries out into the
world with strong in-group contacts)
•Overcoming deviant culture (I.e. youth gangs vs. adults)
•Work by Hyojung Kim (U Washington) focuses on the first of these
two processes in social movement models
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
In extensions (Friedkin, 1998), Friedkin generalizes the model so that alpha
varies across people. We can extend the basic model by (1) simply changing
 to a vector (A), which then changes each person’s opinion directly, and (2)
by linking the self weight (wii) to alpha.
(T 1)
Y  AWY
(t )
 (I  A)Y
(1)
Were A is a diagonal matrix of endogenous weights, with 0 < aii < 1. A
further restriction on the model sets wii = 1-aii
This leads to a great deal more flexibility in the theory, and some
interesting insights. Consider the case of group opinion leaders with
unchanging opinions (I.e. many people have high aii, while a few have
low):
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Peer Opinion Leaders
Group 1
Leaders
Group 2
Leaders
Group 3
Leaders
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Peer Opinion Leaders
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Peer Opinion Leaders
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Peer Opinion Leaders
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Peer Opinion Leaders
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Peer Opinion Leaders
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Further extensions of the model might:
• Time dependent : people likely value other’s opinions more early than later in a
decision context
• Interact  with XB: people’s self weights are a function of their behaviors &
attributes
• Make W dependent on structure of the network (weight transitive ties greater
than intransitive ties, for example)
• Time dependent W: The network of contacts does not remain constant, but is
dynamic, meaning that influence likely moves unevenly through the network
• And others likely abound….
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Testing the fit of the general model
Identifying peer influence in real data
There are two general ways to test for peer influence in an observed network.
The first estimates the parameters ( and ) of the peer influence model directly,
the second transforms the network into a dyadic model, predicting similarity
among
actors.model:
Peer influence
For details, see Doriean, 1982, sociological methods and research. Also Roger Gould
(AJS, Paris Commune paper for example)
Y
( )
( )
 αWY
~
 X  e
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
For details, see Doriean, 1982, sociological methods and research. Also Roger
Gould (AJS, Paris Commune paper for example)
The basic model says that people’s opinions are a function of the opinions of others
and their characteristics.
Y
( )
( )
 αWY
~
 X  e
WY = A simple vector which can be added to your model. That is, multiple
Y by a W matrix, and run the regression with WY as a new variable, and the
regression coefficient is an estimate of .
This is what Doriean calls the QAD estimate of peer influence.
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
The problem with the above regression is that cases are, by definition, not
independent. In fact, WY is also known as the ‘network autocorrelation’ coefficient,
since a ‘peer influence’ effect is an autocorrelation effect -- your value is a function of
the people you are connected to. In general, OLS is not the best way to estimate this
equation. That is, QAD = Quick and Dirty, and your results will not be exact.
In practice, the QAD approach (perhaps combined with a GLS estimator) results in
empirical estimates that are “virtually indistinguishable” from MLE (Doreian et al,
1984)
The proper way to estimate the peer equation is to use maximum likelihood estimates,
and Doreian gives the formulas for this in his paper.
The other way is to use non-parametric approaches, such as the Quadratic Assignment
Procedure, to estimate the effects.
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
An empirical Example: Peer influence in the OSU Graduate Student Network.
Each person was asked to rank their satisfaction with the program, which is the dependent variable
in this analysis.
I constructed two W matrices, one from HELP the other from Best Friend. I treat relations as
symmetric and valued, such that:
 1 if Aijt  1 or A jit  1 


Wijt  2 if Aijt  1 and A jit  1


0
otherwise


W
ij
1
j
Wii  0
I also include Race (white/Non-white, Gender and Cohort Year as exogenous variables in the model.
(to run the model, see osupeerpi1.sas)
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Distribution of Satisfaction with the department.
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Parameter Estimates
Variable
Parameter
Estimate
Standardized
Pr > |t| Estimate
Intercept
FEMALE
NONWHITE
y00
y99
y98
y97
PEER_BF
PEER_H
2.60252
-1.07540
-0.22087
0.93176
-0.19375
-0.45912
0.60670
0.23936
0.50668
0.0931
0.0142
0.5975
0.0798
0.7052
0.4637
0.3060
0.0002
0.0277
0
-0.25455
-0.05491
0.21627
-0.04586
-0.08289
0.11919
0.42084
0.23321
Model R2 = .41, compared to .15 without the peer effects
Alternative is to use a QAP Model (see below)
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Peer influence through Dyad Models
Another way to get at peer influence is not through the level of Y, but through the
extent to which actors are similar with respect to Y. Recall the simulated
example: peer influence is reflected in how close points are to each other.
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Peer influence through Dyad Models
The model is now expressed at the dyad level as:
Yij  b0  b1 Aij   bk X k  eij
k
Where Y is a matrix of similarities, A is an adjacency matrix,
and Xk is a matrix of similarities on attributes
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
If we break the original peer influence model into it’s components, the attribute
part of the model suggests that any two people with the same attribute should have
the same value for Y.
The Peer influence model says that (a) if you and I are tied to each other, then we
should have similar opinions and (b) that if we are tied to many of the same people,
then we should have similar opinions. We can test both sides of these (and many
other dyadic properties) directly at the dyad level.
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
NODE
1
2
3
4
5
6
7
8
9
0
1
1
1
0
0
0
0
0
1
0
1
0
0
0
1
0
0
ADJMAT
1 1 0 0
1 0 0 0
0 0 1 0
0 0 1 0
1 1 0 1
0 0 1 0
1 0 0 0
0 0 1 1
0 0 0 1
0
1
1
0
0
0
0
0
0
0
0
0
0
1
1
0
0
1
0
0
0
0
0
1
0
1
0
0
1
0
0
1
0
0
0
1
1
0
0
0
1
0
0
0
1
SAMERCE
0 0 1 0
0 0 1 0
0 1 0 1
1 0 0 1
0 0 0 0
1 1 0 0
1 1 0 1
1 1 0 1
0 0 1 0
0
0
1
1
0
1
0
1
0
0
0
1
1
0
1
1
0
0
1
1
0
0
1
0
0
0
0
0
0
1
1
0
0
1
1
0
0
0
0
0
1
1
0
0
1
SAMESEX
1 1 0 0 1
0 0 1 1 0
0 1 0 0 1
1 0 0 0 1
0 0 0 1 0
0 0 1 0 0
1 1 0 0 0
1 1 0 0 1
0 0 1 1 0
1
0
1
1
0
0
1
0
0
0
1
0
0
1
1
0
0
0
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Y
0.32
0.59
0.54
0.50
0.04
0.02
0.41
0.01
-0.17
Distance (Dij=abs(Yi-Yj)
.000 .277 .228 .181 .278
.277 .000 .049 .096 .555
.228 .049 .000 .047 .506
.181 .096 .047 .000 .459
.278 .555 .506 .459 .000
.298 .575 .526 .479 .020
.095 .182 .134 .087 .372
.307 .584 .535 .488 .029
.481 .758 .710 .663 .204
.298
.575
.526
.479
.020
.000
.392
.009
.184
.095
.182
.134
.087
.372
.392
.000
.401
.576
.307
.584
.535
.488
.029
.009
.401
.000
.175
.481
.758
.710
.663
.204
.184
.576
.175
.000
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
SENDER
RCVER
SIM
NOM
SAMERCE
SAMESEX
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
3
4
5
6
7
8
9
1
3
4
5
6
7
8
0.27694
0.22828
0.18136
0.27766
0.29763
0.09473
0.30671
0.48148
0.27694
0.04866
0.09559
0.55460
0.57457
0.18221
0.58365
1
1
1
0
0
0
0
0
1
1
0
0
0
1
0
1
0
0
1
0
0
0
1
1
0
0
1
0
0
0
0
1
1
0
0
1
1
0
0
0
0
1
1
0
0
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
The REG Procedure
Model: MODEL1
Dependent Variable: SIM
Analysis of Variance
Source
DF
Sum of
Squares
Model
Error
Corrected Total
4
31
35
0.90657
0.75591
1.66248
Root MSE
Dependent Mean
Coeff Var
0.15615
0.33161
47.08929
Mean
Square
0.22664
0.02438
R-Square
Adj R-Sq
F Value
Pr > F
9.29
<.0001
0.5453
0.4866
Parameter Estimates
Variable
Intercept
NOM
SAMERCE
SAMESEX
NCOMFND
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
1
1
0.51931
-0.17054
0.05387
-0.06535
-0.16134
0.05116
0.05963
0.05916
0.05365
0.03862
10.15
-2.86
0.91
-1.22
-4.18
<.0001
0.0075
0.3696
0.2324
0.0002
Complete Network Analysis
Network Connections: Network Diffusion & Peer Influence
Like the basic Peer influence model, cases in a dyad model are not
independent. However, the non-independence now comes from two sources:
the fact that the same person is represented in (n-1) dyads and that i and j are
linked through relations.
One of the best solutions to this problem is QAP: Quadratic Assignment
Procedure. A non-parametric procedure for significance testing.
QAP runs the model of interest on the real data, then randomly permutes the
rows/cols of the data matrix and estimates the model again. In so doing, it
generates an empirical distribution of the coefficients.
Complete Network Analysis
Network Connections: QAP
Comparing multiple networks: QAP
The substantive question is how one set of relations (or dyadic attributes)
relates to another.
For example:
• Do marriage ties correlate with business ties in the Medici family
network?
• Are friendship relations correlated with joint membership in a club?
Complete Network Analysis
Network Connections: QAP
Assessing the correlation is straight forward, as we simply correlate each
corresponding cell of the two matrices:
Marriage
1 ACCIAIUOL
2
ALBIZZI
3 BARBADORI
4 BISCHERI
5 CASTELLAN
6
GINORI
7 GUADAGNI
8 LAMBERTES
9
MEDICI
10
PAZZI
11
PERUZZI
12
PUCCI
13
RIDOLFI
14 SALVIATI
15
STROZZI
16 TORNABUON
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
1
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
1
0
0
0
Business
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 1
6 0 0 1
7 0 0 0
8 0 0 0
9 0 0 1
10 0 0 0
11 0 0 1
12 0 0 0
13 0 0 0
14 0 0 0
15 0 0 0
16 0 0 0
0
0
0
0
0
0
1
1
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
1
0
0
0
0
0
Correlation:
1 0.3718679
0.3718679
1
0
0
1
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
1
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
1
0
0
0
1
0
0
0
1
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
1
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
Dyads:
1 2 0
1 3 0
1 4 0
1 5 0
1 6 0
1 7 0
1 8 0
1 9 1
1 10 0
1 11 0
1 12 0
1 13 0
1 14 0
1 15 0
1 16 0
2 1 0
2 3 0
2 4 0
2 5 0
2 6 1
2 7 1
2 8 0
2 9 1
2 10 0
2 11 0
2 12 0
2 13 0
2 14 0
2 15 0
2 16 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Complete Network Analysis
Network Connections: QAP
But is the observed value statistically significant?
Can’t use standard inference, since the assumptions are violated. Instead, we
use a permutation approach.
Essentially, we are asking whether the observed correlation is large (small)
compared to that which we would get if the assignment of variables to nodes
were random, but the interdependencies within variables were maintained.
Do this by randomly sorting the rows and columns of the matrix, then reestimating the correlation.
Complete Network Analysis
Network Connections: QAP
Comparing multiple networks: QAP
When you permute, you have to permute both the rows and the columns
simultaneously to maintain the interdependencies in the data:
ID ORIG
A
B
C
D
E
0
0
0
0
0
1
0
0
0
0
2
1
0
0
0
Sorted
3
2
1
0
0
4
3
2
1
0
A
D
B
C
E
0
0
0
0
0
3
0
2
1
0
1
0
0
0
0
2
0
1
0
0
4
1
3
2
0
Complete Network Analysis
Network Connections: QAP
Procedure:
1. Calculate the observed correlation
2. for K iterations do:
a) randomly sort one of the matrices
b) recalculate the correlation
c) store the outcome
3. compare the observed correlation to the distribution of
correlations created by the random permutations.
Complete Network Analysis
Network Connections: QAP
QAP MATRIX CORRELATION
-------------------------------------------------------------------------------Observed matrix:
Structure matrix:
# of Permutations:
Random seed:
PadgBUS
PadgMAR
2500
356
This can be done
simply in UCINET
Univariate statistics
1
Mean
2 Std Dev
3
Sum
4 Variance
5
SSQ
6
MCSSQ
7 Euc Norm
8 Minimum
9 Maximum
10 N of Obs
1
2
PadgBUS PadgMAR
------- ------0.125
0.167
0.331
0.373
30.000 40.000
0.109
0.139
30.000 40.000
26.250 33.333
5.477
6.325
0.000
0.000
1.000
1.000
240.000 240.000
Hubert's gamma: 16.000
Bivariate Statistics
1
Pearson Correlation:
2
Simple Matching:
3
Jaccard Coefficient:
4 Goodman-Kruskal Gamma:
5
Hamming Distance:
1
2
3
4
5
6
7
Value
Signif
Avg
SD P(Large) P(Small)
NPerm
--------- --------- --------- --------- --------- --------- --------0.372
0.000
0.001
0.092
0.000
1.000 2500.000
0.842
0.000
0.750
0.027
0.000
1.000 2500.000
0.296
0.000
0.079
0.046
0.000
1.000 2500.000
0.797
0.000
-0.064
0.382
0.000
1.000 2500.000
38.000
0.000
59.908
5.581
1.000
0.000 2500.000
Complete Network Analysis
Network Connections: QAP
Using the same logic,we can estimate alternative models, such as
regression, logits, probits, etc. Only complication is that you need
to permute all of the independent matrices in the same way each
iteration.
Complete Network Analysis
Peer-influence results on similarity
dyad model, using QAP
Network Connections: QAP
# of permutations:
Diagonal valid?
Random seed:
Dependent variable:
Expected values:
Independent variables:
2000
NO
995
EX_SIM
C:\moody\Classes\soc884\examples\UCINET\mrqap-predicted
EX_SSEX
EX_SRCE
EX_ADJ
Number of valid observations among the X variables = 72
N = 72
Number of permutations performed: 1999
MODEL FIT
R-square Adj R-Sqr Probability
# of Obs
-------- --------- ----------- ----------0.289
0.269
0.059
72
REGRESSION COEFFICIENTS
Un-stdized
Stdized
Proportion Proportion
Independent Coefficient Coefficient Significance
As Large
As Small
----------- ----------- ----------- ------------ ----------- ----------Intercept
0.460139
0.000000
0.034
0.034
0.966
EX_SSEX
-0.073787
-0.170620
0.140
0.860
0.140
EX_SRCE
-0.020472
-0.047338
0.272
0.728
0.272
EX_ADJ
-0.239896
-0.536211
0.012
0.988
0.012