Introduction to Social Network Analysis Columbia University November 2004 James Moody Ohio State University Introduction We live in a connected world: “To speak of social life is.

Download Report

Transcript Introduction to Social Network Analysis Columbia University November 2004 James Moody Ohio State University Introduction We live in a connected world: “To speak of social life is.

Introduction to Social Network Analysis

Columbia University November 2004 James Moody Ohio State University

Introduction We live in a connected world: “To speak of social life is to speak of the association between people – their associating in work and in play, in love and in war, to trade or to worship, to help or to hinder. It is in the social relations men establish that their interests find expression and their desires become realized.” Peter M. Blau

Exchange and Power in Social Life

, 1964 "If we ever get to the point of charting a whole city or a whole nation, we would have … a picture of a vast solar system of intangible structures, powerfully influencing conduct, as gravitation does in space. Such an invisible structure underlies society and has its influence in determining the conduct of society as a whole." J.L. Moreno,

New York Times

, April 13, 1933 These patterns of connection form a

social space,

that can be seen in multiple contexts:

Introduction Source: Linton Freeman “See you in the funny pages”

Connections

, 23, 2000, 32-42.

Introduction High Schools as Networks

Introduction And yet, standard social science analysis methods do not take this space into account.

“For the last thirty years, empirical social research has been dominated by the sample survey. But as usually practiced, …, the survey is a sociological meat grinder, tearing the individual from his social context and guaranteeing that nobody in the study interacts with anyone else in it.”

Allen Barton, 1968 (Quoted in Freeman 2004)

Moreover, the complexity of the relational world makes it impossible to identify social connectivity using only our intuition.

Social Network Analysis (SNA) provides a set of tools to empirically extend our theoretical intuition of the patterns that compose social structure.

Introduction Why do Networks Matter?

Local vision

Introduction Why do Networks Matter?

Local vision

Introduction Social network analysis is: •a set of

relational

methods for systematically understanding and identifying connections among actors. SNA •is motivated by a structural intuition based on ties linking social actors •is grounded in systematic empirical data •draws heavily on graphic imagery •relies on the use of mathematical and/or computational models.

•Social Network Analysis embodies a range of theories relating types of observable social spaces and their relation to individual and group behavior.

Introduction 1.

2.

3.

4.

5.

Introduction Social Network data a.

b.

Basic data elements Network data sources Local (ego) Network Analysis a.

Introduction b.

c.

Network Composition Network Structure d.

Local Network Models Complete Network Analysis a.

b.

Exploratory Analysis Network Connections c.

d.

Network Macro Structure Stochastic Network Analyses Social Network Software

Introduction Key Questions Social Network analysis lets us answer questions about social interdependence. These include: “Networks as Variables” approaches •Are kids with smoking peers more likely to smoke themselves?

•Do unpopular kids get in more trouble than popular kids?

•Are people with many weak ties more likely to find a job?

•Do central actors control resources?

“Networks as Structures” approaches •What generates hierarchy in social relations?

•What network patterns spread diseases most quickly?

•How do role sets evolve out of consistent relational activity?

We don’t want to draw this line too sharply: emergent role positions can affect individual outcomes in a ‘variable’ way, and variable approaches constrain relational activity.

Social Network Data The unit of interest in a network are the combined sets of actors and their relations.

We represent

actors

with points and

relations

with lines. Actors are referred to variously as: Nodes, vertices, actors or points Relations are referred to variously as: Edges, Arcs, Lines, Ties Example: b d a c e

Social Network Data Basic Data Elements Social Network data consists of two linked classes of data:

a)

• • •

Nodes:

Information on the individuals (actors, nodes, points, vertices) Network nodes are most often people, but can be any other unit capable of being linked to another (schools, countries, organizations, personalities, etc.) The information about nodes is what we usually collect in standard social science research: demographics, attitudes, behaviors, etc.

Often includes dynamic information about when the node is active b)

Edges

: Information on the relations among individuals (lines, edges, arcs) • Records a connection between the nodes in the network • • Can be valued, directed (arcs), binary or undirected (edges) One-mode (direct ties between actors) or two-mode (actors share membership • in an organization) Includes the times when the relation is active Graph theory notation: G(V,E)

Social Network Data Basic Data Elements In general, a relation can be: (1) Binary or Valued (2) Directed or Undirected a b d c Undirected, binary e a 1 b d 3 1 c 4 Undirected, Valued 2 e a a b c Directed, binary d b c Directed, Valued d e e The

social process

of interest will often determine what form your data take. Almost all of the techniques and measures we describe can be generalized across data format.

Social Network Data Basic Data Elements: Levels of analysis Global-Net Ego-Net 2-step Partial network Best Friend Dyad Primary Group

Social Network Data Basic Data Elements: Levels of analysis We can examine networks across multiple levels:

1) Ego-network

- Have data on a respondent (ego) and the people they are connected to (alters). Example: 1985 GSS module - May include estimates of connections among alters

2) Partial network

- Ego networks plus some amount of tracing to reach contacts of contacts - Something less than full account of connections among all pairs of actors in the relevant population - Example: CDC Contact tracing data for STDs

Social Network Data Basic Data Elements: Levels of analysis We can examine networks across multiple levels:

3) Complete or “Global” data

- Data on

all

actors within a particular (relevant) boundary - Never exactly complete (due to missing data), but boundaries are set -Example: Coauthorship data among all writers in the social sciences, friendships among all students in a classroom

Social Network Data

Basic Data Structures

Working with pictures.

No standard way to draw a sociogram: each of these are equal:

Social Network Data

Basic Data Structures

In general, graphs are cumbersome to work with analytically, though there is a great deal of good work to be done on using visualization to build network intuition. I recommend using layouts that optimize on the feature you are most interested in. The two I use most are a

hierarchical

layout or a

force-directed layout

are best.

Social Network Data

Basic Data Structures

From pictures to matrices b d a a b c d e c Undirected, binary a b c d 1 1 1 1 1 1 1 1 e 1 1 e b d a a b c d e c Directed, binary a b c d 1 1 1 1 1 1 e 1 e

Social Network Data

Basic Data Structures

From matrices to lists a a b c d e 1 b 1 1 c 1 1 1 d 1 1 e 1 1 Adjacency List a b b a c c b d e d c e e c d Arc List a b b a b c c b c d c e d c d e e c e d

Social Network Data Basic Data Elements: Modes Social network data are substantively divided by the number of

modes

in the data.

1-mode data represents edges based on

direct

contact between actors in the network. All the nodes are of the same type (people, organization, ideas, etc). Examples: Communication, friendship, giving orders, sending email.

There are no constraints on connections between classes of nodes. 1-mode data are usually singly reported (each person reports on their friends), but you can use multiple-informant data, which is more common in child development research (Cairns and Cairns).

Social Network Data Basic Data Elements: Modes Social network data are substantively divided by the number of

modes

in the data.

2-mode data represents nodes from two separate classes, where

all relations cross classes

. Examples:

People

as members of

groups People

as authors on

papers Words

used often by

people Events

in the life history of

people

The two modes of the data represent a duality: you can project the data as people connected to people through joint membership in a group, or groups to each other through common membership N-mode data generalizes the constraint on ties between classes to N groups

Social Network Data Basic Data Elements: Modes Breiger: 1974 - Duality of Persons and Groups Argument: Metaphor: people intersect through their associations, which defines (in part) their individuality.

The

Duality

argument is that relations among groups imply relations among individuals

Social Network Data Basic Data Elements: Modes Bipartite networks imply a constraint on the mixing, such that ties only cross classes.

Here we see a tie connecting each woman with the party she attended (Davis data)

Social Network Data Basic Data Elements: Modes Bipartite networks imply a constraint on the mixing, such that ties only cross classes.

Here we see a tie connecting each woman with the party she attended (Davis data)

Social Network Data Basic Data Elements: Modes By

projecting

the data, one can look at the shared between people or the common memberships in groups: this is the person-to-person projection of the 2-mode data.

Social Network Data Basic Data Elements: Modes By

projecting

the data, one can look at the shared between people or the common memberships in groups: this is the group-to-group projection of the 2-mode data.

Social Network Data Basic Data Elements: Modes Working with two-mode data A person-to-group adjacency matrix is rectangular, with persons down rows and groups across columns

A

=

1 2 3 4 5 A 0 0 0 0 1 B 1 0 0 0 0 C 1 1 0 0 0 D 0 1 1 1 1 E 0 0 1 0 0 F 0 0 1 1 0

Each column is a group, each row a person, and the cell = 1 if the person in that row belongs to that group.

You can tell how many groups two people both belong to by comparing the rows: Identify every place that both rows = 1, sum them, and you have the overlap.

Social Network Data Basic Data Elements: Modes Working with two-mode data Compare persons A and F:

1 2 3 4 5

S

A 0 0 0 0 1 = 1 F 0 0 1 1 0 = 2 AF 0 0 0 0 0 = 0

A

=

1 2 3 4 5 A 0 0 0 0 1 B 1 0 0 0 0 C 1 1 0 0 0 D 0 1 1 1 1 E 0 0 1 0 0 F 0 0 1 1 0

Or persons D and F:

1 2 3 4 5

S

D 0 1 1 1 1 = 4 F 0 0 1 1 0 = 4 DF 0 0 1 1 0 = 2

Person A is in 1 group, Person F is in two groups, and they are in no groups together.

Person D is in 4 groups, Person F is in two groups, and they are in 2 groups together.

Social Network Data Basic Data Elements: Modes Working with two-mode data

A

=

1 2 3 4 5 A 0 0 0 0 1 B 1 0 0 0 0 C 1 1 0 0 0 D 0 1 1 1 1 E 0 0 1 0 0 F 0 0 1 1 0

Similarly for Groups:

1 2 1 • 2 A 0 0 0 B 1 0 0 C 1 1 1 D 0 1 0 E 0 0 0 F 0 0 0

2 2 1

Group 1 has 2 members, group 2 has 2 members and they overlap by 1 members (C).

Social Network Data Basic Data Elements: Modes Working with two-mode data In general, you can get the overlap for any pair of groups / persons by summing the multiplied elements of the corresponding rows/columns of the persons-to-groups adjacency matrix. That is: Persons-to-Persons

P ij

k g

  1

A ik A jk

Groups-to-Groups

G ij

k p

  1

A ki A kj

Social Network Data Basic Data Elements: Modes Working with two-mode data One can get these easily with a little matrix multiplication. First define

A

T as the transpose of

A

of size P x G, then A T (simply reverse the rows and columns). If A is will be of size G x P.

A T ij

A ji

A

=

1 2 3 4 5 A 0 0 0 0 1 B 1 0 0 0 0 C 1 1 0 0 0 D 0 1 1 1 1 E 0 0 1 0 0 F 0 0 1 1 0

A T =

A B C D E F 1 0 1 1 0 0 0 2 0 0 1 1 0 0 3 0 0 0 1 1 1 4 0 0 0 1 0 1 5 1 0 0 1 0 0

Social Network Data Basic Data Elements: Modes P = A(A G = A T T (A) )

A * A

T

(6

x

5)(5

x

6) = P (6

P A B C D E F A 1 0 0 1 0 0 B 0 1 1 0 0 0 C 0 1 2 1 0 0 D 1 0 1 4 1 2 E 0 0 0 1 1 1 F 0 0 0 2 1 2

x

6)

A

=

1 2 3 4 5 A 0 0 0 0 1 B 1 0 0 0 0 C 1 1 0 0 0 D 0 1 1 1 1 E 0 0 1 0 0 F 0 0 1 1 0

(6 x 5) A T =

A B C D E F 1 0 1 1 0 0 0 2 0 0 1 1 0 0 3 0 0 0 1 1 1 4 0 0 0 1 0 1 5 1 0 0 1 0 0

(5 x 6)

A

T

* A = P (5

x

6) 6

x

5) (5

x

5)

G 1 2 3 4 5 1 2 1 0 0 0 2 1 2 1 1 1 3 0 1 3 2 1 4 0 1 2 2 1 5 0 1 1 1 2

Social Network Data Basic Data Elements: Modes Theoretically, these two equations define what Breiger means by duality: “With respect to the membership network,…, persons who are actors in one picture (the P matrix) are with equal legitimacy viewed as connections in the dual picture (the G matrix), and conversely for groups.” (p.87) The resulting network: 1) Is always symmetric 2) the diagonal tells you how many groups (persons) a person (group) belongs to (has) In practice, most network software (UCINET, PAJEK) will do all of these operations. It is also simple to do the matrix multiplication in programs like SAS or SPSS

Social Network Data Network Data Sources: Existing data sources Existing Sources of Social Network Data:

There are lots of network data archived. Check INSNA for a listing.

The PAJEK data page includes a number of exemplars for large-scale networks.

2-Mode Data • One can construct networks from many different data sources if you want to • • work with 2-mode data. Any

list

Director interlocks Protest event participation can be so transformed.

1-Mode Data Local Network data: • Fairly common, because it is easy to collect from sample surveys. • • • GSS, NHSL, Urban Inequality Surveys, etc.

Pay attention to the question asked

Key features are (a) number of people named and (b) whether alters are able to nominate each other.

Social Network Data Network Data Sources: Existing data sources Existing Sources of Social Network Data: 1-Mode Data Partial network data: • Much less common, because cost goes up significantly once you • • start tracing to contacts. • • Snowball data: start with focal nodes and trace to contacts CDC style data on sexual contact tracing • Limited snowball samples: Colorado Springs drug users data • • Geneology data Small-world network samples • • • Limited Boundary data: select data within a limited bound Cross-national trade data Friendships within a classroom Family support ties

Social Network Data Network Data Sources: Existing data sources Existing Sources of Social Network Data: 1-Mode Data Complete network data: • Significantly less common and never perfect. • Start by defining a

theoretically relevant boundary

• • Then identify all relations among nodes within that boundary Co-sponsorship patterns among legislators • Friendships within strongly bounded settings (sororities, • • schools) • • • • Examples: Add Health on adolescent friendships Hallinan data on within-school friendships McFarland’s data on verbal interaction Electronic data on citations or coauthorship (see Pajek data page) See INSNA home page for many small-scale networks

Social Network Data Network Data Sources: Collecting network data Boundary Specification Problem Network methods describe positions in relevant social fields, where flows of particular goods are of interest. As such, boundaries are a fundamentally

theoretical

question about what you think matters in the setting of interest.

See Marsden (19xx) for a good review of the boundary specification problem In general, there are usually relevant social foci that bound the relevant social field. We expect that social relations will be very clumpy. Consider the example of friendship ties within and between a high school and a Jr. high:

Social Network Data Network Data Sources: Collecting network data a) Network data collection can be time consuming. It is better (I think) to have

breadth

over

depth.

Having detailed information on <50% of the sample will make it very difficult to draw conclusions about the general network structure.

b) • • Question format: If you ask people to

recall

names (an open list format), fatigue will result in under-reporting If you ask people to check off names from a full list, you can often get over-reporting c) It is common to limit people to a small number if nominations (~5). This will bias network measures, but is sometimes the best choice to avoid fatigue. d) Concrete relational indicators are best (who did you talk to?) over attitudes that are harder to define (who do you like?)

Social Network Data Network Data Sources: Collecting network data Boundary Specification Problem While students were given the option to name friends in the other school, they rarely do. As such, the school likely serves as a strong substantive boundary

Social Network Data Network Data Sources: Collecting network data Local Network data: • When using a survey, common to use an “ego-network module.” • First part: “Name Generator” question to elicit a list of names • Second part: Working through the list of names to get • information about each person named Third part: asking about relations among each person named.

GSS Name Generator: “From time to time, most people discuss important matters with other people. Looking back over the last six months -- who are the people with whom you discussed matters important to you? Just tell me their first names or initials.” Why this question?

•Only time for one question •Normative pressure and influence likely travels through strong ties •Similar to ‘best friend’ or other strong tie generators •

Note there are significant substantive problems with this name generator

Social Network Data Network Data Sources: Collecting network data Electronic Small World name generator:

Social Network Data Network Data Sources: Collecting network data Local Network data: The second part usually asks a series of questions about each person GSS Example: “Is (NAME) Asian, Black, Hispanic, White or something else?” ESWP example: Will generate N x (number of attributes) questions to the survey

Social Network Data Network Data Sources: Collecting network data Local Network data: The third part usually asks about relations among the alters. Do this by looping over all possible combinations. If you are asking about a symmetric relation, then you can limit your questions to the n(n-1)/2 cells of one triangle of the adjacency matrix: 1 2 3 4 5 1 2 3 4 5

GSS

: Please think about the relations between the people you just mentioned. Some of them may be total strangers in the sense that they wouldn't recognize each other if they bumped into each other on the street. Others may be especially close, as close or closer to each other as they are to you. First, think about NAME 1 and NAME 2. A. Are NAME 1 and NAME 2 total strangers? B. ARe they especially close? PROBE: As close or closer to eahc other as they are to you?

Social Network Data Network Data Sources: Collecting network data Local Network data: The third part usually asks about relations among the alters. Do this by looping over all possible combinations. If you are asking about a symmetric relation, then you can limit your questions to the n(n-1)/2 cells of one triangle of the adjacency matrix:

Social Network Data Network Data Sources: Collecting network data Snowball Samples: • Snowball samples work much the same as ego-network modules, and if time allows I recommend asking at least some of the basic ego-network questions, even if you plan to sample (some of) the people your respondent names.

• Start with a name generator, then any demographic or relational questions.

• • • • • Have a sample strategy Random Walk designs (Klovdahl) Strong tie designs All names designs Get contact information from the people named Snowball samples are very effective at providing network context around focal nodes. Detailed treatments of snowball sampling estimates are given in Frank ().

Social Network Data Network Data Sources: Collecting network data Snowball Samples:

Social Network Data Network Data Sources: Collecting network data Complete Network data • Data collection is concerned with all relations within a specified boundary.

• Requires sampling every actor in the population of interest (all • kids in the class, all nations in the alliance system, etc.) The network survey itself can be much shorter, because you are • getting information from each person (so ego

does not

report on alters).

• • Two general formats: Recall surveys (“Name all of your best friends”) Check-list formats: Give people a list of names, have them check off those with whom they have relations.

Social Network Data Network Data Sources: Collecting network data Complete network surveys require a process that lets you link answers to respondents.

•You cannot have anonymous surveys.

•Recall: •Need Id numbers & a roster to link, or hand code names to find matches •Checklists •Need a roster for people to check through

Social Network Data Network Data Sources: Missing Data Whatever method is used, data will always be incomplete. What are the implications for analysis?

Example 1. Ego is a matchable person in the School Un M Ego M True Network Out M M Un Out Out Un Ego M M Observed Network M M

Social Network Data Network Data Sources: Missing Data Example 2. Ego is not on the school roster M M Un M True Network M M Un M M Un M Un Observed Network M M Un

Social Network Data Network Data Sources: Missing Data Example 3: Node population:

2-step neighborhood of Actor X

Relational population:

Any connection among all nodes

1-step 2-step 3-step F 1 2 3 4 5 1 2 3 4 5 6 7 8 1 2 3 F 1.1

1.2

1.3

1.4

1.5

2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

3.1

3.2

3.3

F F (0) F (0) Full Full Full Full (0) Full (0) Full Full Unknown Full (0) Full (0) UK UK

Social Network Data Network Data Sources: Missing Data Example 4 Node population:

2-step neighborhood of Actor X

Relational population:

Trace, plus All connections among 1-step contacts

1-step 2-step 3-step F 1 2 3 4 5 1 2 3 4 5 6 7 8 1 2 3 F 1.1

1.2

1.3

1.4

1.5

2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

3.1

3.2

3.3

F F (0) F (0) Full Full Full Full (0) Full (0) Full Unknown Unknown Full (0) Full (0) UK UK

Social Network Data Network Data Sources: Missing Data Example 5.

Node population:

2-step neighborhood of Actor X

Relational population:

Only tracing contacts

1-step 2-step 3-step F 1 2 3 4 5 1 2 3 4 5 6 7 8 1 2 3 F 1.1

1.2

1.3

1.4

1.5

2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

3.1

3.2

3.3

F F (0) F (0) Full Unknown Full Full (0) Full (0) Full Unknown Unknown Full (0) Full (0) UK UK

Social Network Data Network Data Sources: Missing Data Example 6 Node population:

2-step neighborhood from 3 focal actors

Relational population:

All relations among actors

Focal Focal 1-Step Full Full 1-Step Full Full 2-Step Full (0) Full 3-Step Full (0) Full (0) 2-Step Full (0) Full Full UK 3-Step Full (0) Full (0) Unknown UK

Social Network Data Network Data Sources: Missing Data Example 7.

Node population: 1

-step neighborhood from 3 focal actors

Relational population:

Only relations from focal nodes

Focal Focal 1-Step Full Full 1-Step Full Unknown 2-Step Full (0) Unknown 3-Step Full (0) Full (0) 2-Step Full (0) Unknown Unknown UK 3-Step Full (0) Full (0) Unknown UK

Local Network Analysis Introduction Local network analysis uses data from a simple ego-network survey. These might include information on relations among ego’s contacts, but often not. Questions include:

Population Mixing

The extent to which one type of person is tied to another type of person (race by race, etc.)

Local Network Composition

Peer behavior Cultural milieu Opportunities or Resources in the network Social Support

Local Network Structural

Network Size Density Holes & Constraint Concurrency

Dyadic behavior

Frequency of contact Interaction content Specific exchange behaviors

Local Network Analysis Introduction Advantages •Cost: data are easy to collect and can be sampled •Methods are relatively simple extensions of common variable-based methods social scientists are already familiar with •Provides information on the

local network context,

which is often the primary substantive interest.

•Can be used to describe general features of the global network context •Population mixing, concurrency, activity distribution (limited) Disadvantages •Treats each local network as independent, which is false. The poor performance of ‘number of partners’ for predicting STD spread is a clear example.

•Impossible to account for how position in a larger context affects local network characteristics. “popular with who” •If “structure matters”, ego-networks are strongly constrained to limit the information you can get on overall structure

Local Network Analysis Network Composition Perhaps the simplest network question is “what types of alters does ego interact with”?

Network composition refers to the

distribution of

types of people in your network.

Networks tend to be more homogeneous than the population. Using the GSS, Marsden reports

heterogeneity

in Age, Education, Race and Gender. He finds that: •Age distribution is fairly wide, almost evenly distributed, though lower than the population at large •Homogenous by education (30% differ by less than a year, on average) •Very homogeneous with respect to race (96% are single race) •Heterogeneous with respect to gender

Local Network Analysis Network Composition Claude Fischer’s book “To Dwell Among Friends” is a classic study of urbanism that makes good use of local network data.

Age heterogeneity varies by ego’s age and across urban settings.

Local Network Analysis Network Composition Claude Fischer’s book “To Dwell Among Friends” is a classic study of urbanism that makes good use of local network data.

Marital composition similarly varies across respondents and settings

Local Network Analysis Network Composition Calculating network composition using “GSS style” data.

Generally you have a separate variable for each alter characteristic, and you can construct items by summing over the relevant variables. You would, for example, have variables on age of each alter such as: Age_alt1 age_alt2 age_alt3 age_alt4 age_alt5 15 35 20 12 .

You get the mean age, then, with a statement such as: meanage=mean(Age_alt1, age_alt2, age_alt3, age_alt4, age_alt5); Be sure you know how the program you use (SAS, SPSS) deals with missing data.

Local Network Analysis Network Composition Calculating local network information

from

global network data Define the local neighborhood: •Distance (1-step, 2-steps, what?) •Direction of tie •Sent, Received, or both?

•Pull the relevant alters •Match the alters to the variables of interest Once you decide on a type of tie, you need to get the information of interest in a form similar to that in the example above.

Local Network Analysis Network Composition An example network: All senior males from a small (n~350) public HS.

SPAN will do this for you

Local Network Analysis Network Composition Common composition measures: Level measures: •Mean of a given attribute (average income of alters) •Proportion with a particular attribute (proportion who smoke) •Counts (number of peers who have had sex) Dispersion measures: •Heterogeneity index (Racial heterogeneity) •Index of dissimilarity •Standard Deviation •Absolute value of the differences •Variable range of values •Composition measures for multiple variables simultaneously •Average correlation across all alters •Euclidean / Mahalanobis distance measures

Local Network Analysis Network Mixing A common interest in network research is identifying how likely persons of one category are to interact with people of another category.

Examples:

Race mixing

: how likely are people of one race to interact with people of another?

Sexual activity mixing

: Are people with many partners likely to associate with each other?

Neighborhood / location mixing: Are people likely to name friends from the same neighborhood.

These questions can be answered by cross classifying the category of the nominator with the category of the nominated in a “mixing matrix”.

Local Network Analysis Network Mixing Race mixing in one of the Add Health schools

Local Network Analysis Network Mixing White Black Hispan Asian Mix/Other White 1099 128 53 0 231 Black 97 10218 1032 0 539 Hispanic 54 961 104 1 91 Asian 0 0 0 0 0 Mix/Other 191 560 66 0 106

Local Network Analysis Network Mixing Working with mixing matrices: •Group segregation index (Freeman 1972) •Associations between rows and columns (valued relations) •“Assortative mixing” •Correlations or Q •Log-linear models Assessing chance levels depends on the data available. If you have full network data you can look at density between groups, without you can only focus on the sheer volume of ties (without information on the size of the “target” groups)

Local Network Analysis Network Structure While network structure data are limited, there are a number of features that can be of interest, assuming you have data on the relations

among

ego’s contacts. Basic arguments: a) b) “structural amplification:” that some feature of the arrangement of ties amplifies any peer effect of network composition (see Haynie’s paper) “Network range effects:” that being connected to a diverse set of alters -- who are not connected to each other – provides profitable returns. Granovetter’s “Strength of Weak Ties”, Burt’s “Structural Holes”  Familiar to students of social theory as the Tertius Gaudens argument from Simmel In both cases, we use the pattern of ties surrounding ego to characterize the local structure. We start with volume measures, then move on to more complex pattern measures.

Local Network Analysis Network Structure: volume Network Size

25

Distribution of total network size, GSS 1985

20 15 10 5 0 0 1 2 3 4 5 6+

Local Network Analysis Network Structure: volume Network Size by: Age: Drops with age at an increasing rate. Elderly have few close ties.

Education: Increases with education. College degree ~ 1.8 times larger Sex (Female): No gender differences on network size.

Race: African Americans networks are smaller (2.25) than White Networks (3.1).

Local Network Analysis Network Structure: volume What does Fischer have to say about the size of local nets (by context)?

Local Network Analysis Network Structure: volume

Density

is the average value of the relation among all pairs of ties. = T / ((N*N-1)/2) 1 5 R Density is usually calculated over the alters in the network.

2 4 3 1 5 2 4 3 1 2 3 4 5 1 2 3 4 5

1 1 1 1 1

D = 5 / ((5*4)/2) = 5 / 10 = 0.5

Local Network Analysis Network Structure: volume What does Fischer have to say about the density of local nets (by context)?

Local Network Analysis Network Structure: volume In general, dense networks should be more cohesive and we would expect that “goods” will flow through the network more efficiently •Social support & peer influence, for example, should be stronger in dense networks •Density is a volume measure, however, and can mask significant structural differences: These two networks, for example, have the same density but very different structures.

Most network analysis programs will calculate ego-network density directly.

Local Network Analysis Network Structure: Weak Ties & Structural Holes “The Strength of Weak Ties” In a classic article, Granovetter (1972) argues that for many purposes (such as getting a job), the most useful network contacts are through “weak ties.” This is because weak ties connect you to a more diverse set of alters, increasing the ‘range’ of your network. Your strong ties

tend to be tied to each other,

making them redundant for the purposes of bringing information.

Essentially this argument works on a spurious relation.

The key value of weak ties is not in the weak affective bond, but in the structural location of the ties. We can measure this directly, and Ron Burt provides a series of measures for doing so.

Local Network Analysis Network Structure: Weak Ties & Structural Holes Number of Contacts Maximum Efficiency Decreasing Efficiency Increasing Efficiency Minimum Efficiency

Local Network Analysis Network Structure: Weak Ties & Structural Holes Effective Size Conceptually the effective size is the number of people ego is connected to, minus the redundancy in the network, that is, it reduces to the non redundant elements of the network.

Effective size = Size - Redundancy  

j

   1

q p iq m jq

   Where

j

indexes all of the people that ego

i

has contact with, and q is every third person other than

i

or

j

.

The quantity (

p iq m jq

) inside the brackets is the level of redundancy between ego and a particular alter, j.

Local Network Analysis Network Structure: Weak Ties & Structural Holes Effective Size:  

j

q p iq m jq

   P iq is the proportion of actor i’s relations that are spent with q.

3 4 2 1 5

Adjacency 1 2 3 4 5 1 0 1 1 1 1 2 1 0 0 0 1 3 1 0 0 0 0 4 1 0 0 0 1 5 1 1 0 1 0 P 1 2 3 4 5 1 .00 .25 .25 .25 .25

2 .50 .00 .00 .00 .50

3 1.0 .00 .00 .00 .00

4 .50 .00 .00 .00 .50

5 .33 .33 .00 .33 .00

Local Network Analysis Network Structure: Weak Ties & Structural Holes Effective Size:  

j

q p iq m jq

   m jq is the marginal strength of contact j’s relation with contact q. Which is j’s interaction with q divided by j’s strongest interaction with anyone.

For a binary network

, the strongest link is always 1 and thus m jq reduces to 0 or 1 (whether j is connected to q or not) The sum of the product

p

iq

m

jq

measures the portion of i’s relation with j that is redundant to i’s relation with other primary contacts.

Local Network Analysis Network Structure: Weak Ties & Structural Holes 3 4 Effective Size: 1 2 5  

j

q p iq m jq

   Working with 1 as ego, we get the following redundancy levels:

P 1 2 3 4 5 1 .00 .25 .25 .25 .25

2 .50 .00 .00 .00 .50

3 1.0 .00 .00 .00 .00

4 .50 .00 .00 .00 .50

5 .33 .33 .00 .33 .00

PM 1jq 1 2 3 4 5 1 --- --- --- --- -- 2 --- .00 .00 .00 .

4 --- .00 .00 .00 .

25 3 --- .00 .00 .00 .00

25 5 --- .

25 .00 .

25 .00

Redundancy = 1

Effective size

= 4-1 = 3

4 3 Local Network Analysis Network Structure: Weak Ties & Structural Holes Effective Size:  

j

   1

q p iq m jq

   2 1 5 When you work it out,

in a binary network,

redundancy reduces to the average degree, not counting ties with ego of ego’s alters. Since the average degree is simply another way to say density, we can calculate redundancy as: 2t/n where t is the number of ties (not counting ties to ego) and n is the number of people in the network (not counting ego).

Meaning that effective size = n - 2t/n UCINET, STRUCTURE, SPAN and PAJEK all calculate effective size

Local Network Analysis Network Structure: Weak Ties & Structural Holes 3 4

Efficiency

is simply effective size divided by observed size. Taken from each ego’s point of view, efficiency in this network would be: 1 2 5 Ego 1 2 3 4 5 Effective Size Size: Efficiency 4 2 1 2 3 1 1 1 3 1.67

.75

.50

1.00

.50

.55

Local Network Analysis Network Structure: Weak Ties & Structural Holes Constraint 3 4 1 Conceptually, constraint refers to how much room you have to negotiate or exploit potential structural holes in your network. 2 5

C ij

 “..opportunities are constrained to the extent that (a) another of your contacts

q

, in whom you have invested a large portion of your network time and energy, has (b) invested heavily in a relationship with contact

j

.” (p.54)  

p ij

 

q p iq p qj

  2

P 1 2 3 4 5 1 .00 .25 .25 .25 .25

2 .50 .00 .00 .00 .50

3 1.0 .00 .00 .00 .00

4 .50 .00 .00 .00 .50

5 .33 .33 .00 .33 .00

Local Network Analysis Network Structure: Weak Ties & Structural Holes

C ij

  

p ij

Constraint  

q p iq p qj

  2

i

p iq

q

p ij p qj

j

C ij = Direct investment (P

ij

) + Indirect investment (P iq P qj )

3 Local Network Analysis Network Structure: Weak Ties & Structural Holes 1 2 Constraint

C ij

  

p ij

 

q p iq p qj

  2 5 4 Given the p matrix, you can get indirect constraint (p iq p qj ) by simply squaring the matrix:

P 1 2 3 4 5 1 .00 .25 .25 .25 .25

2 .50 .00 .00 .00 .50

3 1.0 .00 .00 .00 .00

4 .50 .00 .00 .00 .50

5 .33 .33 .00 .33 .00

P*P 1 2 3 4 5 1 ... .083 .000 .083 .250

2 .165 ... .125 .290 .125

3 .000 .250 ... .250 .250

4 .165 .290 .125 ... .125

5 .330 .083 .083 .083 ...

Local Network Analysis Network Structure: Weak Ties & Structural Holes Constraint

C ij

  

p ij

 

q p iq p qj

  2 Total constraint between any two people then is:

C

= (

P

+

P

2 )##2 Where P is the normalized adjacency matrix, and ## means to square the

elements

of the matrix.

Local Network Analysis Network Structure: Weak Ties & Structural Holes Hierarchy Conceptually, hierarchy (for Burt) is really the extent to which constraint is

concentrated

in a single actor. It is calculated as:

H

 

j

 

C C ij N

  ln  

C C ij N

 

N

ln(

N

)

4 3 Local Network Analysis Network Structure: Weak Ties & Structural Holes Hierarchy

H

 

j

 

C C ij N

  ln  

C C ij N

 

N

ln(

N

) 2 1 5

2 3 4 5 C C: .11 .06 .11 .25 .53

 

C C ij N

 

.83 .46 .83 1.9

H=.514

Local Network Analysis Network Structure: Weak Ties & Structural Holes Burt (2004)

AJS

110:349-399

Local Network Analysis Network Structure: Weak Ties & Structural Holes Burt (2004)

AJS

110:349-399

Local Network Analysis Network Structure: Weak Ties & Structural Holes Burt (2004)

AJS

110:349-399

Local Network Analysis Local Network Models: Modeling Issues Local Network modeling issues Case independence In very clustered settings, the alters that each person names will overlap. This will lead to non-independence among the cases. •If you have enough cases or over time data, you can use random or fixed effect models •If you know the names of alters, you can link them to build in a direct network autocorrelation effect.

Small network effects

Be aware of the size of your networks

. Substantively, having 50% white networks means something different in a net of size 2 vs a net of size 10. I often suggest interactions to check for these kinds of effects Dealing with isolates •Isolated nodes have no network alters, so none of these measures apply. Depending on the context, you can either leave them out of the analysis, or use interaction terms to selectively apply the measures of interest.

Local Network Analysis Local Network Models: Modeling Issues •

Selection

• That some unobserved factor,

z

, creates both friendships

and

the outcome of interest.

Endogeneity

• That the

causal order

of peer relations and outcomes is reversed. Peers do not cause Y, but Y causes friendship relations

Local Network Analysis Local Network Models: Modeling Issues

Selection

•What do we know about how friendships form?

•Opportunity / focal factors - Being members of the same group - In the same class - On the same team - Members of the same church •Structural Relationship factors - Reciprocity - Social Balance •Behavior Homophily - Smoking - Drinking

Local Network Analysis Local Network Models: Modeling Issues

Selection

How to correct this problem?

•Essentially, this is an omitted variable problem, and the obvious “solution” is been to identify as many potentially relevant alternative variables as you can find.

•Sensitivity measures (see Ken Frank’s work here) •Propensity score matching •Individual-level fixed effect models •Substantively you only look at change in Y as a function of change in X, holding constant (because dummied out) any individual level effect.

•This works, but it’s drastic. Any endogenous effect of networks on the self are essentially removed

Local Network Analysis Local Network Models: Modeling Issues

Endogeneity

Estimated: Y = b 0 + b 1 (

P

) + e where

P

= some peer function.

But the actual model may really be:

P

= b ’ 0 + b ’ 1 (

Y

) + e

Local Network Analysis Local Network Models: Modeling Issues

Endogeneity

Does it

matter

?

Algebraically the relation between y and p should be direct translation of the coefficients since: If :

y

b o

b

1 (

p

) Then :

p

 

b

0

b

1  1

b

1 (

y

) The statistical problem of endogeneity is that when you estimate b ’ 1 , it does not equal 1/ b 1 , because of our assumptions about x, and hence e . There are other models that make different assumptions, where this direction is irrelevant.

But they are uncommon and hard to work with in the multivariate context.

(see Joel H. Levine,

Exceptions are the Rule

, for a full discussion of this)

Local Network Analysis Local Network Models: Modeling Issues Possible solutions: • Theory: Given what we know about how friendships form, is it reasonable to assume a bi-directional cause? That is, work through the meeting, socializing, etc. process and ask whether it makes sense that

Y

is a cause of

P

.

• Models: -

Time Order

. We are on somewhat firmer ground if

P

precedes

Y

in time. -

Simultaneous Equation Models

. Model

both

the friendship pattern and the outcome of interest simultaneously. Difficult to identify “instruments” or to specify orders that do not logically make the model inestimable.

Local Network Analysis Local Network Models: Peer influence example Haynie asks whether peers matter for delinquent behavior, focusing on: a) the distinction between selection and influence b) the effect of friendship

structure

on peer influence Two basic theories underlie her work: a) Hirchi’s Social Control Theory •Social bonds constrain otherwise criminal behavior •The theory itself is largely ambivalent toward direction of network effects b) Sutherland’s Differential Association •Behavior is the result of internalized definitions of the situation •The effect of peers is through communication of the appropriateness of particular behaviors Haynie adds to these the idea that the structural context of the network can “boost” the effect of peers: (a) so transmission is more effective in locally dense networks and (b) the effect of peers is stronger on central actors.

Local Network Analysis Local Network Models: Peer influence example

Local Network Analysis Local Network Models: Peer influence example