Document

Transcript Document

Measurement Sensitivity It seems a reasonable approach to assessing the effect of measurement error on the ties in a network is to ask

how would the network measures change

if the observed ties differed from those observed. This question can be answered simply with Monte Carlo simulations on the observed network. Thus, the procedure I propose is to: • Generate a probability matrix from the set of observed ties, • Generate many realizations of the network based on these underlying probabilities, and •Compare the distribution of generated statistics to those observed in the data.

•

How do we set p ij ?

•

Range based on observed features (Sensitivity analysis)

•

Outcome of a model based on observed patterns (ERGM)

Measurement Sensitivity As an example, consider the problem of defining “friendship” ties in highschools.

Should we count nominations that are not reciprocated?

Measurement Sensitivity All ties Reciprocated

Measurement Sensitivity

Statistical Analysis of Social Networks Comparing multiple networks: QAP The substantive question is how one set of relations (or dyadic attributes) relates to another. For example: • Do marriage ties correlate with business ties in the Medici family network?

• Are friendship relations correlated with joint membership in a club?

(review)

Modeling Social Networks parametrically: ERGM approaches The earliest approaches are based on simple random graph theory, but there’s been a flurry of activity in the last 10 years or so.

Key historical references:

- Holland and Leinhardt (1981)

JASA

- Frank and Strauss (1986)

JASA

- Wasserman and Faust (1994) – Chap 15 & 16 -Wasserman and Pattison (1996) Good practical overview: http://www.jstatsoft.org/v24 Great tutorial: http://statnet.csde.washington.edu/workshops/SUNBELT/EUSN/ergm/er gm_tutorial.html

(last year’s sunbelt) Or https://statnet.csde.washington.edu/trac/wiki/Sunbelt2014 slides) (lots of how to

Modeling Social Networks parametrically: ERGM approaches The “p1” model of Holland and Leinhardt is the classic foundation – the basic idea is that you can generate a statistical model of the network by predicting the counts of types of ties (asym, null, sym). They formulate a log-linear model for these counts; but the model is equivalent to a logit model on the dyads:

logit



X

ij 

1

  

 

 

(

X ji

)

Note the subscripts! This implies a distinct parameter for every node

and

in the model, plus one for reciprocity.

Modeling Social Networks parametrically: ERGM approaches

Modeling Social Networks parametrically: ERGM approaches Results from SAS version on PROSPER datasets

Modeling Social Networks parametrically: ERGM approaches Once you know the basic model format, you can imagine other specifications:

logit logit logit

 

X X



X

ij ij ij   

1 1

 

1

    



 

   

(



(



(

X ji

) (orig)

) (different ial reciprocit

)



(node chars)



(orig) y)

Key is to ensure that the specification doesn’t imply a linear dependency of terms. Model fit is hard to judge – newer work shows that the se’s are “approximate” ;-)

Modeling Social Networks parametrically: ERGM approaches

(



)  exp{   (  

) (

)} Where:  is a vector of parameters (like regression coefficients)

 is a vector of network statistics, conditioning the graph is a normalizing constant, to ensure the probabilities sum to 1.

Modeling Social Networks parametrically: ERGM approaches The simplest graph is a Bernoulli random graph,where each Xij is independent:

(



)  exp{

, 



 (  )

x ij

} Where:  ij = logit[P(X ij = 1)]  (  ) = P [1 + exp(ij )] Note this is one of the few cases where  (  ) can be written.

Modeling Social Networks parametrically: ERGM approaches Typically, we add a homogeneity condition, so that all isomorphic graphs are equally likely. The homogeneous bernulli graph model:

(



)  exp   { 

(  )

x ij

} Where:  (  ) =[1 + exp(  )] g

Modeling Social Networks parametrically: ERGM approaches If we want to condition on anything much more complicated than density, the normalizing constant ends up being a problem. We need a way to express the probability of the graph that doesn’t depend on that constant. First some terms: 

X i



X i



Sociomatri x with ij element forced to 1



Sociomatri x with ij element forced to 0

c X i



Sociomatri x with no tie between i and j

Modeling Social Networks parametrically: ERGM approaches exp(

w ij

) 

(

X ij p

(

X ij

 1 |

X ij c

)  0 |

X c ij

)

(

X ij p

(

X ij

 1 |

X ij c

)  0 |

X ij c

)  exp{  exp{  

(

x ij

 )} 

(

x ij

 )}  exp{   [

(

x ij

 ) 

(

x ij

 )] 

 log

(

X ij p

(

X ij

 1 |

X c ij

)  0 |

X c ij

)    [

(

x ij

 ) 

(

x ij

 )]

Modeling Social Networks parametrically: ERGM approaches 

 log

(

X ij p

(

X ij

 1 |

X c ij

)  0 |

X c ij

)    [

(

x ij

 ) 

(

x ij

 )] Note that we can now model the conditional probability

of the graph,

as a function of a set of difference statistics, without reference to the normalizing constant. The model, then, simply reduces to a logit model on the dyads.

Modeling Social Networks parametrically: ERGM approaches 

 log

(

X ij p

(

X ij

 1 |

X c ij

)  0 |

X c ij

)    [

(

x ij

 ) 

(

x ij

 )] Consider the simplest possible model: the Bernoulli random graph model, which says the only feature of interest is the number of edges in the graph. What is the change statistic for that feature?

(

x ij

 

1 ) (assume edge is present, so value is one)

(

x ij

 

0 ) (assume edge is absent, so vakye is zero)

[

x ij

 

x ij



]



1 (differenc e is 1 for all dyads)

Modeling Social Networks parametrically: ERGM approaches Consider the simplest possible model: the Bernoulli random graph model, which says the only feature of interest is the number of edges in the graph. What is the change statistic for that feature?

The “Edges” parameter is simply an intercept-only model. NODE ADJMAT 1 0 1 1 1 0 0 0 0 0 2 1 0 1 0 0 0 1 0 0 3 1 1 0 0 1 0 1 0 0 4 1 0 0 0 1 0 0 0 0 5 0 0 1 1 0 1 0 1 0 6 0 0 0 0 1 0 0 1 1 7 0 1 1 0 0 0 0 0 0 8 0 0 0 0 1 1 0 0 1 9 0 0 0 0 0 1 0 1 0 Density: 0.311

The “Edges” parameter is simply an intercept-only model.

proc logistic

descending data =dydat; model nom =;

run

;

quit

; ---see results copy coef ---

data

chk; x=exp(-

0.5705

)/(

+exp(-

0.5705

));

run

;

proc print

data =chk;

run

;

Modeling Social Networks parametrically: ERGM approaches

Modeling Social Networks parametrically: ERGM approaches The logit model

estimation

procedure was popularized by Wasserman & colleagues, and a good guide to this approach is: Including: A Practical Guide To Fitting p* Social Network Models Via Logistic Regression The site includes the PREPSTAR program for creating the variables of interest. The following example draws from this work. – this bit nicely walks you through the logic of constructing change variables, model fit and so forth.

But the estimates are not very good for any parameters other than “dyad independent” parameters!

Modeling Social Networks parametrically: ERGM approaches Parameters that are often fit include: 1) Expansiveness and attractiveness parameters. = dummies for 2) 3) 4) 5)

7) 8) 9) each sender/receiver in the network Degree distribution Mutuality Group membership (and all other parameters by group) Transitivity / Intransitivity

-in-stars, k-out-stars Cyclicity Node-level covariates (Matching, difference) Edge-level covariates (dyad-level features such as exposure) 10) Temporal data – such as relations in prior waves.

Modeling Social Networks parametrically: Exponential Random Graph Models

Modeling Social Networks parametrically: Exponential Random Graph Models …and there are LOTS of terms…

Modeling Social Networks parametrically: Exponential Random Graph Models The terms currently available are (help(ergm.terms)

Node Main Effects:

nodecov(attrname)

Main effect of a covariate:

nodefactor(attrname, base=1)

Factor attribute effect:

nodeicov(attrname)

Main effect of a covariate for in-edges:

nodeifactor(attrname, base=1)

Factor attribute effect for in-edges:

nodeocov(attrname)

Main effect of a covariate for out-edges:

nodeofactor(attrname, base=1)

Factor attribute effect for out-edges:

receiver(base=1)

Receiver effect:

sender(base=1)

Sender effect:

sociality(attrname=NULL, base=1)

Undirected degree:

Modeling Social Networks parametrically: Exponential Random Graph Models

Attribute Mixing Effects

absdiff(attrname, pow=1)

Absolute difference:

absdiffcat(attrname, base=NULL)

Categorical absolute difference:

dyadcov(x, attrname=NULL)

Dyadic covariate:

edgecov(x, attrname=NULL)

Edge covariate:

The edgecov and dyadcov terms are equivalent for undirected networks. hamming(x, cov, attrname=NULL)

Hamming distance:

hammingmix(attrname, x, base=0)

Hamming distance within mixing:

match(attrname, diff=FALSE, keep=NULL)

Uniform homophily and differential homophily:

This is an alias for nodematch(attrname, diff=FALSE) . nodematch(attrname, diff=FALSE, keep=NULL)

Uniform homophily and differential homophily:

nodemix(attrname, base=NULL)

Nodal attribute mixing:

Modeling Social Networks parametrically: Exponential Random Graph Models

Structural Effects

Base Volume

density

Density:

edges

Edges:

meandeg

Mean vertex degree: Degree/Star effects

altkstar(lambda, fixed=FALSE)

Alternating k-star:

gwdegree(decay, fixed=FALSE, cutoff=30)

Geometrically weighted degree distribution:

gwidegree(decay, fixed=FALSE, cutoff=30)

Geometrically weighted in-degree distribution:

gwodegree(decay, fixed=FALSE, cutoff=30)

Geometrically weighted out-degree distribution:

idegree(d, by=NULL, homophily=FALSE)

In-degree:

isolates

Isolates:

istar(k, attrname=NULL)

In-stars:

kstar(k, attrname=NULL)

k-Stars:

odegree(d, by=NULL, homophily=FALSE)

Out-degree:

ostar(k, attrname=NULL)

k-Outstars:

Modeling Social Networks parametrically: Exponential Random Graph Models

Structural Effects

Dyadic Effects

asymmetric(attrname=NULL, diff=FALSE, keep=NULL)

Asymmetric dyads:

degree(d, by=NULL, homophily=FALSE)

Degree:

degcrossprod

Degree Cross-Product:

degcor

Degree Correlation:

mutual(same=NULL, diff=FALSE, by=NULL, keep=NULL)

Mutuality: Path Effects

m2star

Mixed 2-stars, a.k.a 2-paths:

See also twopath . threepath(keep=1:4)

Three-paths:

twopath

2-Paths:

Modeling Social Networks parametrically: Exponential Random Graph Models

Triadic Effects

ctriple(attrname=NULL)

Cyclic triples:

. cycle(k)

Cycles:

dsp(d)

Dyadwise shared partners:

esp(d)

Edgewise shared partners:

balance

Balanced triads:

gwdsp(alpha, fixed=FALSE, cutoff=30)

Geometrically weighted dyadwise shared partner distribution:

gwesp(alpha, fixed=FALSE, cutoff=30)

Geometrically weighted edgewise shared partner distribution:

gwnsp(alpha, fixed=FALSE, cutoff=30)

Geometrically weighted nonedgewise shared partner distribution:

intransitive

Intransitive triads:

localtriangle(x)

Triangles within neighborhoods:

nearsimmelian

Near simmelian triads:

nsp(d)

Nonedgewise shared partners:

simmelian

Simmelian triads:

simmelianties

Ties in simmelian triads:

transitive

Transitive triads:

transitiveties(attrname=NULL)

Transitive ties:

triadcensus(d)

Triad census:

triangle(attrname=NULL)

Triangles:

tripercent(attrname=NULL)

Triangle percentage:

ttriple(attrname=NULL)

Transitive triples:

Modeling Social Networks parametrically: Exponential Random Graph Models

Two Mode Networks

b1concurrent(by=NULL)

Concurrent node count for the first mode in a bipartite (aka two mode) network:

b1degree(d, by=NULL)

Degree for the first mode in a bipartite (aka two-mode) network:

b1factor(attrname, base=1)

Factor attribute effect for the first mode in a bipartite (aka two-mode) network :

b1star(k, attrname=NULL)

k-Stars for the first mode in a bipartite (aka two-mode) network:

b1starmix(k, attrname, base=NULL, diff=TRUE)

Mixing matrix for k-stars centered on the first mode of a bipartite network:

b1twostar(b1attrname, b2attrname, base=NULL)

Two-star census for central nodes ceneterd on the first mode of a bipartite network:

b2concurrent(by=NULL)

Concurrent node count for the second mode in a bipartite (aka two-mode) network:

. b2degree(d, by=NULL)

Degree for the second mode in a bipartite (aka two-mode) network:

b2factor(attrname, base=1)

Factor attribute effect for the second mode in a bipartite (aka two-mode) network :

b2star(k, attrname=NULL)

k-Stars for the second mode in a bipartite (aka two-mode) network:

b2starmix(k, attrname, base=NULL, diff=TRUE)

Mixing matrix for k-stars centered on the second mode of a bipartite network:

b2twostar(b1attrname, b2attrname, base=NULL)

Two-star census for central nodes ceneterd on the second mode of a bipartite network:

gwb1degree(decay, fixed=FALSE, cutoff=30)

Geometrically weighted degree distribution for the first mode in a bipartite (aka two-mode) network:

gwb2degree(decay, fixed=FALSE, cutoff=30)

Geometrically weighted degree distribution for the second mode in a bipartite (aka two-mode) network:

concurrent(by=NULL)

Concurrent node count:

Modeling Social Networks parametrically: Exponential Random Graph Models In practice, logit estimated models are difficult to estimate, and we have no good sense of how approximate the PMLE is.

The STATNET generalization is to use MCMC methods to better estimate the parameters. This is essentially a simulation procedure working “under the hood” to explore the space of graphs described by the model parameters; searching for the best fit to the observed data.

Modeling Social Networks parametrically: Exponential Random Graph Models:

Modeling Social Networks parametrically: Exponential Random Graph Models You can specify a model as a simple statement on terms:

Modeling Social Networks parametrically: Exponential Random Graph Models A simple example: One of the schools in PROSPER library(statnet); library(foreign); g <- read.paj("C:/jwmdata/prosper/Network_data_files/PAJEK/MATCHED/SC1C1W1Sch101.net"); g %v% "indegree" <- degree(g,cmode="indegree"); g %v% "outdegree" <- degree(g,cmode="outdegree"); atr<-read.table("C:/jwmdata/prosper/Network_data_files/Rfiles/ergmfiles/n111101.txt"); g %v% "sex" <- atr[,2 ]; g %v% "white" <- atr[,3 ]; g %v% "slun" <- atr[,4 ]; g %v% "irtuse" <- atr[,5 ]; g %v% "irtdev" <- atr[,6 ]; g %v% "tgrad" <- atr[,7 ]; g %v% "discip" <- atr[,8 ]; g %v% "church" <- atr[,9 ]; g %v% "sens" <- atr[,10 ]; plot(g,vertex.col="sex"); plot(g,vertex.col="slun"); plot(g,vertex.col="white");

Dynamics 1:

Simple time-lag model: Prosper Peers

Modeling Social Networks parametrically: Exponential Random Graph Models

Complete Network Analysis Stochastic Network Analysis Panel model in PROSPER An example:

Complete Network Analysis Stochastic Network Analysis

Modeling Social Networks parametrically: Exponential Random Graph Models: Degeneracy "Assessing Degeneracy in Statistical Models of Social Networks" Mark S. Handcock, CSSS Working Paper #39

Modeling Social Networks parametrically: Exponential Random Graph Models: Quick example (demo)

Modeling Social Networks parametrically: Latent Space Models

Modeling Social Networks parametrically: Latent Space Models Z = a dimension in some unknown space that, once accounted for makes ties independent. Z is effectively chosen with respect to some latent cluster-space, G. These “groups” define different social sources for association.

Modeling Social Networks parametrically: Latent Space Models

Modeling Social Networks parametrically: Latent Space Models Prosper data, with three groups

Modeling Social Networks parametrically: Latent Space Models Prosper data, with three groups (posterior density plots)

Modeling Social Networks parametrically: Latent Space Models …note there is a non-R option.,..

Generating Random Graph Samples A conceptual merge between exponential random graph models and QAP/sensitivity models is to attempt to identify a sample of graphs from the universe you are trying to model.

(



)  exp{   (  

) (

)} That is, generate X empirically, then compare z(x) to see how likely a measure on x would be given X. The difficulty, however, is generating X.

Generating Random Graph Samples The first option would be to generate all isomorphic graphs within a given constraint.

This is possible for small graphs, but the number gets large fast. For a network with 3 nodes, there are 16 possible directed graphs. For a network with 4 nodes, there are 218, for 5 nodes 9608, for 6 nodes1,540,944, and so on… So, the best approach is to sample from the universe, but, of course, if you had the universe you wouldn’t need to sample from it. How do you sample from a population you haven’t observed? (a) use a construction algorithm that generates a random graph with known constraints (b) use a ERGM model like above.

Generating Random Graph Samples Romantic Networks

Generating Random Graph Samples Romantic Networks A draw from the simulation, this is what appeared in “Glamour”

Generating Random Graph Samples Edge-matching random permutation Can easily generate networks with appropriate degree distributions by generating “edge stems” and sorting: Degree: 1: 2 2: 2 3: 1 d a b i =1 d i =2 c c d d d i =3 f f f (need to ensure you have a valid edge list!)

Generating Random Graph Samples Edge-matching random permutation

Generating Random Graph Samples Emergent Connectivity in low-degree networks Partner Distribution Component Size/Shape

Complete Network Analysis Network Connections: Connectivity Development of STD cores in low-degree networks: rapid transition

without

stars.

Complete Network Analysis Network Connections: Connectivity Extend this view across the space of low-degree distributions defined by shape and volume...

Complete Network Analysis Network Connections: Connectivity ERGMs make it (fairly) easy to simulate networks from models.

• • • • Simple: simulation from an estimated ERGM (this is how the GOF function works) Simple II: simulate from a pre-defined ERGM formula (i.e. set the parameters by hand) A little harder: Simulate from EGO networks. Here you can use ERGM to match the observed distribution for mixing by node characteristics reported in an ego-network survey.

• Can use degree, attribute mixing, A bit harder: fit global structure features using ego-nets by modeling distribution of sub-structures (see Jeff Smith’s work)

Generating Random Graph Samples Model based estimates ERGM to simulate networks from Add Health

Modeling Network Dynamics Rule-based simulation models Rule-Based simulation models: The network-science approach to dynamic networks has been to identify toy behavioral models and play out the implications of these models for network dynamics. Focus is typically on how the network evolves (or reaches a steady stat).

dynamics OF networks Balance, preferential attachment, voter models dynamics ON networks diffusion simulations These are usually agent-based models, difficult to specify – tradeoff in simplicity & realism.

Modeling Network Dynamics Descriptive dynamic techniques Goal here is to make sense of how networks change or how things flow through them using a clear measurement / metrics approach. Challenge is defining the network.

Examples of looking at change in networks: Roy and interlocking directorates (ASR interlocks: 1886 - 1890 Time and Social Networks

Examples of looking at change in networks: Roy and interlocking directorates (ASR interlocks: 1891 - 1895 Time and Social Networks

Examples of looking at change in networks: Roy and interlocking directorates (ASR interlocks: 1896 - 1900 Time and Social Networks

Examples of looking at change in networks: Roy and interlocking directorates (ASR interlocks: 1901 - 1905 Time and Social Networks

Bearman and Everett: The Structure of Social Protest 3 1 5 (‘61-63) 4 2 2 3 5 1 4 7 6 (‘76-78) See paper for group compositions 3 2 1 4 (‘66-68) 5 6 6 2 3 1 4 (‘81-83) 7 5 4 1 3 2 5 6 7 (‘71-73)

Data on drug users in Colorado Springs, over 5 years

Representing dynamic networks?

Animation captures much of the dynamism we care about: STD Diffusion

http://csde.washington.edu/statnet/movies/ConcurrencyAndReachability.mov

Representing dynamic networks?

Animation captures much of the dynamism we care about:

Representing dynamic networks?

Animation captures much of the dynamism we care about:

Modeling Network Dynamics Random Graph models Panel ERGM: Simply want to account for effect of past structures, you can add temporal covariates to the standard ERGM. Really only good for two waves.

STERGM: Separable Temporal ERGM. This is a two-equation model, with one equation for the formation of ties, a 2 nd for the dissolution of ties. Goal is like ERGM, to explain the dynamics of the network.

http://statnet.csde.washington.edu/workshops/SUNBELT/current/tergm/tergm_tutori al.pdf

RELEVENT: Relational Events Model. This is really a model of action on a network  think of conversation events or similar. Dynamic networks of very short duration events.

http://statnet.csde.washington.edu/workshops/SUNBELT/current/relevent/statnet_su nbelt2014_relevent.pdf

SIENA: Stochastic Actor Oriented Model (SAOM). Used to disentangle selection from influence, by jointly modeling both as functions of each other. Multi-equation model, simplest is one for behavior & one for network formation.

Intro: https://www.stats.ox.ac.uk/~snijders/siena/SnijdersSteglichVdBunt2009.pdf

Manual: https://www.stats.ox.ac.uk/~snijders/siena/RSiena_Manual.pdf

Modeling Network Dynamics Random Graph models: STERGM http://statnet.csde.washington.edu/workshops/SUNBELT/current/tergm/tergm_tutorial.html

slides adapted from the workshop materials: http://statnet.csde.washington.edu/EpiModel/nme/index.html

Modeling Network Dynamics Random Graph models: STERGM Under certain assumptions, you can model a single network w. average duration information (assumes an equilibrium process) http://statnet.csde.washington.edu/workshops/SUNBELT/current/tergm/tergm_tutorial.html

slides adapted from the workshop materials: http://statnet.csde.washington.edu/EpiModel/nme/index.html

Modeling Network Dynamics Random Graph models: STERGM samp.fit < formation=

stergm

(samp, ~edges+mutual+cyclicalties+transitiveties, dissolution = estimate = ~edges+mutual+cyclicalties+transitiveties, "CMLE" , times= 1 : 3 )

SIENA

SIENA: Key Assumptions of the model

SIENA

Key element is how actors make changes. This is based on an evaluation of “utility” functions, similar to discrete choice models.

The model is then implemented as an actor-simulation, where actors are striving to maximize their utility.

note Tom is adamant that this is an “as if” model – no clear ontological commitment to a “choice” model!

Modeling Network Dynamics Random Graph models: Siena

Modeling Network Dynamics Random Graph models: Siena Osgood, D. W., Ragan, D. T., Wallace, L., Gest, S. D., Feinberg, M. E., & Moody, J. 2013. “Peers and the emergence of alcohol use: Influence and selection processes in adolescent friendship networks.”

Journal of Research on Adolescence

23:500–512.

Modeling Network Dynamics Random Graph models: RelEvent For repeated interactions amongst nodes

Document

Transcript Document

logit

X

1

(

)

logit logit logit

X X

X

1 1

1

(

(

(

) (orig)

) (different ial reciprocit

)

(node chars)

(orig) y)

Sociomatri x with ij element forced to 1

Sociomatri x with ij element forced to 0

Sociomatri x with no tie between i and j

(

1 ) (assume edge is present, so value is one)

(

0 ) (assume edge is absent, so vakye is zero)

[

]

1 (differenc e is 1 for all dyads)

Directory