Document

Transcript Document

Diffusion & Visualization in Dynamic Networks
By
James Moody
Duke University
Thanks to Dan McFarland, Skye Bender-deMoll, Martina Morris, & the network modeling group at UW
& the Social Structure Reading Group at OSU. Supported by NIH grants DA12831 and HD41877
What is a network?
“We will refer to the presence of regular patterns in relationship as structure.”
- Wasserman & Faust p.3
As a description of the social network perspective:
2) “Relational ties … between actors are channels for transfer or flow of
resources”
4) “Network models conceptualize structure … as lasting patterns of
relations among actors.”
- Wasserman & Faust p.4
But how does a structural approach work when the patterns are transient?
When is a network?
Source: Bender-deMoll & McFarland “The Art and Science of Dynamic Network Visualization” JoSS Forthcoming
When is a network?
At the finest levels of aggregation networks disappear, but at the higher levels of
aggregation we mistake momentary events as long-lasting structure.
Is there a principled way to analyze and visualize networks where the edges are not
stable?
There is unlikely to be a single answer for all questions, but the set of types of
questions might be manageable:
•Diffusion and flow (networks as resources or constraints for actors):
•The timing of relations affects flow in a way that changes many of our
standard measures. If our interest is in “Relational ties [as] channels for
transfer or flow of resources” (W&F p.4), then we can use the diffusion
process to shape our analyses.
•Structural change (networks as dynamic objects of study).
•The interest is in mapping changes in the topography of the network, to
see model how the field itself changes over time.
•Ultimately, this has to be linked to questions about how network macrostructures emerge as the result of actor behavior rules.
Network Dynamics & Flow
The key element that makes a network a system is the path: it’s how sets of actors are
linked together indirectly.
A walk is a sequence of nodes and lines, starting and ending with nodes, in which
each node is incident with the lines following and preceding it in a sequence.
A path is a walk where all of the nodes and lines are distinct.
Paths are the routes through networks that make diffusion possible.
In a dynamic network, the timing of edges affect the whether a good can flow across a
path. A good cannot pass along a relation that ends prior to the actor receiving the
good: goods can only flow forward in time.
A time-ordered path exists between i and j if a graph-path from i to j can be identified
where the starting time for each edge step precedes the ending time for the next edge.
The notion of a time-ordered path must change our understanding of the system
structure of the network. Networks exist both in relation-space and time-space.
Network Dynamics & Flow
A time-ordered path exists between i and j if a graph-path from i to j can be identified
where the starting time for each edge step precedes the ending time for the next edge.
Note that this allows for non-intuitive non-transitivity. Consider this simple example:
A
1-2
B
3-4
C
1-2
D
Here A can reach B, B can reach C, and C and reach D.
But A cannot reach D, since any flow from A to C would have happened after the
relation between C and D ended.
Network Dynamics & Flow
This can also introduce a new dimension for “shortest” paths:
B
3-4
C
D
A
E
The geodesic from A to D is AE, ED and is two steps long.
But the fastest path would be AB, BC, CD, which while 3 steps long
could get there by day 5 compared to day 7.
Network Dynamics & Flow
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Direct Contact Network of 8 people in a ring
1
1
Network Dynamics & Flow
1
1
2
2
2
2
2
1
1
2
2
2
2
2
2
1
1
2
2
2
2
2
2
1
1
2
2
2
2
2
2
1
1
2
2
2
2
2
2
1
1
2
Implied Contact Network of 8 people in a ring
All relations Concurrent
2
2
2
2
2
1
1
1
2
2
2
2
2
1
Network Dynamics & Flow
3
2
1
2
1
1
1
1
1
2
2
3
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
= 0.57 reachability
Implied Contact Network of 8 people in a ring
Mixed Concurrent
Network Dynamics & Flow
8
1
1
2
7
3
6
5
4
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
= 0.71 reachability
Implied Contact Network of 8 people in a ring
Serial Monogamy (1)
1
1
1
1
1
1
1
Network Dynamics & Flow
8
1
1
2
7
1
1
1
1
1
1
1
1
3
6
1
4
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
= 0.51 reachability
Implied Contact Network of 8 people in a ring
Serial Monogamy (2)
1
1
1
Network Dynamics & Flow
1
2
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
= 0.43 reachability
1
2
t2
t2
t1
t1
t2
t1
t1
t2
Minimum Contact Network of 8 people in a ring
Serial Monogamy (3)
Network Dynamics & Flow
In this graph, timing alone can change mean
reachability from 2.0 when all ties are concurrent
to 0.43: a factor of ~ 4.7.
2
1
1
2
2
1
1
2
In general, ignoring time order is equivalent to
assuming all relations occur simultaneously –
assumes perfect concordance across all relations.
Network Dynamics & Flow
The distribution of paths is important for many of the measures we typically construct
on networks, and these will be change if timing is taken into consideration:
Centrality:
Closeness centrality
Path Centrality
Information Centrality
Betweenness centrality
Network Topography
Clustering
Path Distance
Groups & Roles:
Correspondence between degree-based position and reach-based position
Structural Cohesion & Embeddedness
Opportunities for Time-based block-models (similar reachability profiles)
In general, any measures that take the systems nature of the graph into account will
differ.
Network Dynamics & Flow
New versions of classic reachability measures:
1) Temporal reach: The ij cell = 1 if i can reach j through time.
2) Temporal geodesic: The ij cell equals the number of steps in the shortest path
linking i to j over time.
3) Temporal paths: The ij cell equals the number of time-ordered paths linking i to j.
These will only equal the standard versions when all ties are concurrent.
Duration explicit measures
4) Quickest path: The ij cell equals the shortest time within which i could reach j.
5) Earliest path: The ij cell equals the real-clock time when i could first reach j.
6) Latest path: The ij cell equals the real-clock time when i could last reach j.
7) Exposure duration: The ij cell equals the longest (shortest) interval of time over
which i could transfer a good to j.
Each of these also imply different types of “betweenness” roles for nodes or edges, such
as a “limiting time” edge, which would be the edge whose comparatively short
duration places the greatest limits on other paths.
Network Dynamics & Flow
Define time-dependent closeness as the inverse of the sum of the
distances needed for an actor to reach others in the network.*
CTDCloseness
1

T
( Dij )
j
Actors with high time-dependent closeness centrality are
those that can reach others in few steps. Note this is directed.
Since Dij =/= Dji (in most cases) once you take time into
account.
*If
i cannot reach j, I set the distance to n+1
Network Dynamics & Flow
Define fastness centrality as the average of the clock-time needed
for an actor to reach others in the network:
C fast 
1
N 1
 max( time)  time
ij
j
Actors with high fastness centrality are those that would
reach the most people early. These are likely important for
any “first mover” problem.
Network Dynamics & Flow
Define quickness centrality as the average of the minimum
amount of time needed for an actor to reach others in the network:
Cquick 
1
N 1
 min( T
jit
 Tit )
j
Where Tjit is the time that j receives the good sent by i at time t, and Tit is
the time that i sent the good. This then represents the shortest duration
between transmission and receipt between i and j.
Note that this is a time-dependent feature, depending on when i
“transmits” the good out into the population. Note min is one of many
functions, since the time-to-target speed is really a profile over the
duration of t.
Network Dynamics & Flow
Define exposure centrality as the average of the amount of time
that actor j is at risk to a good introduced by actor i.
Cexposure 
1
N 1
 (T
ijl
 Tijf )
j
Where Tijl is the last time that j could receive the good from i
and Tiif is the first time that j could receive the good from i,
so the difference is the interval in time when i is at risk from
j.
Network Dynamics & Flow
How do these centrality scores compare?
Here I compare the duration-dependent measures to the standard measures
on this example graph.
Based only on the
structure of the ties, not
the timing, the most
central nodes are nodes
13, 16 and 4.
Since this is a
simulation, I permute
the observed timeranges on this graph to
test the general relation
between the fixed and
temporal measures.
Network Dynamics & Flow
How do these centrality scores compare?
Here I compare the duration-dependent measures to the standard measures
on this example graph.
Box plots based on 500 permutations of the observed time durations. This holds constant
the duration distribution and the number of edges active at any given time.
Network Dynamics & Flow
How do these centrality scores compare?
Here I compare the duration-dependent measures to the standard measures
on this example graph.
Box plots based on 500 separate permutations of the start and end times. This changes
the duration distribution and the number of edges active at any given time.
Network Dynamics & Flow
How do these centrality scores compare?
The “most important actors” in the graph depend crucially on when they are
active. The correlations can range wildly over the exact same contact
structure.
The “centrality” scores described here are low-hanging fruit: simple
extensions of graph-based ideas.
But the crucial features for population interests will be creating aggregations
of these features – something like “centralization” that captures the
regularity, asymmetry and temporal role-structure of the network.
Network Dynamics & Flow
How can we visualize such graphs?
Animation of the edges, when the graph is sparse, helps us see the emergence of the graph, but
diffusion paths are difficult to see:
Consider an example:
Romantic Relations at
“Jefferson” high school
Network Dynamics & Flow
How can we visualize such graphs?
Animation of the edges, even when the graph is sparse, does not typically help us see the
potential flow space, as it’s just too hard to follow the implication paths with our eyes, so it
seems better to plot the implied paths directly.
Consider an example:
Plotting the reachability
matrix can be informative if
the graph has clear pockets of
reachability:
Network Dynamics & Flow
How can we visualize such graphs?
Animation of the edges, even when the graph is sparse, does not typically help us see the
potential flow space, as it’s just too hard to follow the implication paths with our eyes, so it
seems better to plot the implied paths directly.
Consider an example:
Plotting the reachability
matrix can be informative if
the graph has clear pockets of
reachability:
(Good readability example)
Network Dynamics & Flow
How can we visualize such graphs?
Animation of the edges, even when the graph is sparse, does not typically help us see the
potential flow space, as it’s just too hard to follow the implication paths with our eyes, so it
seems better to plot the implied paths directly.
Consider an example:
Edges have discrete start and
end times, tagged as days over
a 2-year window: so first
contact between nodes 10 and
4 was on day 40, last contact
on day 72.
Network Dynamics & Flow
How can we visualize such graphs?
Animation of the edges, even when the graph is sparse, does not typically help us see the
potential flow space, as it’s just too hard to follow the implication paths with our eyes, so it
seems better to plot the implied paths directly.
Consider an example:
Here we plot the reachability
matrix over the coordinates for
the direct network. . Direct ties
are retained as green lines, if
node i can reach node j, then a
directed arrow joins the two
nodes. Here I mark cases where
two nodes can reach each other
with red, purely asymmetric with
blue.
This is accurate, but hard to read
when reachability paths are long.
(poor readability example)
Network Dynamics & Flow
How can we visualize such graphs?
Animation of the edges, even when the graph is sparse, does not typically help us see the
potential flow space, as it’s just too hard to follow the implication paths with our eyes, so it
seems better to plot the implied paths directly.
Consider an example:
Various weightings of the
indirect paths also don’t help in
an example like this one. Here
I weight the edges of the
reachability graph as 1/d, and
plot using FR. You get some
sense of nodes who reach many
(size is proportional to outreach).
Here you really miss the
asymmetry in reach (the
correlation between number
reached and number reached by
is nearly 0).
Network Dynamics & Flow
How can we visualize such graphs?
Another tack is to shift our attention from nodes to edges, by plotting the line graph (thanks to
Scott Feld for making this suggestion). The idea is to identify an ordering to the vertical
dimension of the graph to capture the flow through the network.
Consider an example:
So now we:
1) Convert every edge to a node
2) Draw a directed arc between
edges that (a) share a node and
(b) precede each other in time.
Network Dynamics & Flow
How can we visualize such graphs?
Another tack is to shift our attention from nodes to edges, by plotting the line graph (thanks to
Scott Feld for making this suggestion). The idea is to identify an ordering to the vertical
dimension of the graph to capture the flow through the network.
Consider an example:
So now we:
1) Convert every edge to a node
2) Draw a directed arc between edges
that (a) share a node and (b) precede
each other in time.
3) Concurrent edges (such as {13-8 and
13-5} or {1-16,2-16} will be
connected with a bi-directed edge
(they will form completely connected
cliques) while the remainder of the
graph will be asymmetric & ordered
in time.
Network Dynamics & Flow
Further Complications, that ultimately link us back to the question of
“When is a network”
1) Range of temporal activity
- When the graph is globally sparse (like the example above), the
path-structure will also be sparse. Increasing density will lead to
lots of repeated interactions, and thus reachability cycles.
- Consider email exchange networks or classroom communication
networks vs. sexual networks. In sexual or romantic networks,
returning to a partner once the relation has ended is rare, in
communication networks it is common.
2) Observed vs. Real
- We will often have discrete observations of real-time processes.
How do we account for between-wave temporal ordering? What
are the limits of observed measures to such inter-wave activity?
- The Snijders et. al. Siena modeling approach is an obvious first
step here.
Network Dynamics & Flow
Further Complications, that ultimately link us back to the question of
“When is a network”
3) Temporal reachability as higher-order model feature
- As the capacity of ERGM models continue to expand, we can start
to build temporal sequence rules in to the local models (such as
communication triplets, or avoidance of past relations once ended),
which then makes it sensible to ask whether the models fit the
time-structure of the data.
4) Optimal observation windows
Either for data collection or visualization, we often have to decide on a
time-range for our analyses. What should that range be?
5) Relational temporal asymmetry. For many types of relations, it is
difficult to decide when relations end. This taps a distinction between
activated and potential relations.

Document

Transcript Document

Directory