KliqueFinder: Identifying Clusters in Network Data Kenneth A. Frank Michigan State University Based on: • • • • Frank. | slideum.com

KliqueFinder: Identifying Clusters in Network Data Kenneth A. Frank Michigan State University Based on: • • • • Frank.

Download Report

Transcript KliqueFinder: Identifying Clusters in Network Data Kenneth A. Frank Michigan State University Based on: • • • • Frank.

KliqueFinder: Identifying Clusters in Network Data

Kenneth A. Frank Michigan State University Based on: • • • • Frank. K.A. 1995. Identifying Cohesive Subgroups. Social Networks (17): 27-56 Frank, K. 1996. Mapping interactions within and between cohesive subgroups. Social Networks 18: 93-119.

*Field, S. *Frank, K.A., Schiller, K, Riegle-Crumb, C, and Muller, C. (2006). "Identifying Social Contexts in Affiliation Networks: Preserving the Duality of People and Events. Social Networks 28:97-123 * co first authors.

https://www.msu.edu/user/k/e/kenfrank/web/research.htm#representation

1

Overview

•

Clustering and Graphical Representations of Networks

Running KliqueFinder...

– Step 1) Criteria for Determining Group Membership

Step 2: Maximizing Criterion

Step 3) Examine evidence of clusters

Step 4) Evaluating the Performance of the Algorithm : Did...

Make Sociogram in Netdraw

Confidentiality/Ethical issues in Collecting Network Data

Modifying the Image: Adding Node Data or Relations...

Two mode

Software Challenge...

Batch KliqueFinder

Prepping Converting data

A Priori Clusters

2

video Clustering and Graphical Representations of Networks : (26:09-31:41): ID: [email protected]

PW:kenfrank2014

Goal: to identify patterns in the network

• Rearrange rows and columns of social network matrix to reveal clustering • Plot actors and ties in two dimensions to reveal clustering

3

Theory for defining cluster membership

• •

cohesion

–

Result

(clusters are called subgroups): an actor should be in a cluster if the actor has demonstrated a preference for engaging in ties with members of the cluster. : ties are concentrated within subgroups

structural equivalence (blocks)

–

Result

: an actor should be in a cluster if the actor engages in a similar pattern of ties as members of that cluster. : blocks represent positions, but ties not necessarily concentrated within blocks.

4

Crystallized Sociogram: Friendships Among the French Financial Elite Lines indicate friendships: solid within subgroups, dotted between subgroups.

numbers represent actors Rgt,Cen,Soc,Non = political parties; B=Banker, T=treasury; E=Ecole National D’administration Frank, K.A. & Yasumoto, J. (1998). "Linking Action to Social Structure within a System: Social Capital Within and Between Subgroups."

American Journal of Sociology

, Volume 104, No 3, pages 642-686 5

Crystallized Sociogram: Clusters in Foodwebs Krause, A., Frank, K.A., Mason, D.M., Ulanowicz, R.E. and Taylor, W.M. (2003). "Compartments exposed in food-web structure."

Nature

426:282-285 6

Data Input File name must be less than 20 character. Best if file name is six characters followed by .list: xxxxxx.list . For example stanne.list

Old

(10 spaces for each) Actor 1 interacts with actor 2 at a level of 3 Extent of relation can be binary or weighted

New

: flexible columns, Same results

Prepping data in excel

Prepping Data in UCINET

Converting data using sas

ID’s should be 6 digits or less 7

Data

Edgelist First two rows do not appear in the data – I put them there to show the format: 10 spaces for each entry Actor 1 interacts with actor 2 at a level of 3 Extent of relation can be binary or weighted Best if file name is six characters followed by .list.

xxxxxx.list

For example stanne.list

New version of KliqueFinder is more flexible About 10 column widths.

ID’s should be 6 digits or less

Prepping data in excel

Prepping Data in UCINET

Converting data using sas

8

Steps for finding clusters video

: ( 31:41-43:30): ID: [email protected]

PW:kenfrank2014 1) Determine criterion for defining clusters 2) Maximize criterion 3) Examine evidence of clusters 4) Evaluate performance of the algorithm 5) Interpret clusters commonality of attributes focal experiences subsequent behavior

9

Step 1) Criteria for Determining Group Membership

Structural Equivalence:

Factor analyze sociomatrix (Katz & Kahn) iteratively rearrange and revalue rows and columns (CONCORR -- White el al., 1976)

Cohesion

utilize fixed criteria (e.g., must be connected to at least

k

others in clusters, or must be minimal path length from

k

others, etc).

use flexible criterion -- preference relative to group sizes and number of ties:

10

Model Based Cohesion

W ii’ =1 if tie between actors i and i’, 0 otherwise samegroup ii’

= 1 if actors

i

and i’ are members of the same subgroup, 0 otherwise.

Then θ 1 represents subgroups salience: So ...... Maximize θ 1 (odds ratio) 11

Odds Ratio for Association Between Common Subgroup Membership and The Occurrence of Ties Between Actors

12

Step 2: Maximizing Criterion

• 1) find a subgroup seed (3 actors who interact with each other, and with similar others) • 2) add to the cluster to maximize θ 1 you cannot do any more • 3) start new subgroup with new seed until • 4) shuffle between existing subgroups • 5) make new subgroups as necessary, dissolve existing ones as necessary.

13

KliqueFinder Algorithm: Phase I

Computat ionally intensive, modify for large networks

Initialize: assign each actor to own subgroup Find subgroup seed of 2 or 3 Identify single move that most increases objective function θ 1 For finding best subgroup seed: 1) can only choose from unaffiliated actors 2) Each actor can only be a seed once Does move increase function?

No yes Reassign actor that makes best move If assignment moves actor out of a group of 3, reassign reamaining 2 to next best groups

KliqueFinder Algorithm: Phases II and III

• Phase II: If best move does not increase objective function and there are fewer than 3 actors available for subgroups then – Attach all isolated (or singleton) actors to best existing subgroups, even if this reduces objective function • Phase III: shuffle actors between existing subgroups without seeding new ones or disbanding existing ones – Number of subgroups is fixed – This is simple hill climbing and can be cast as EM algorithm

• Running KliqueFinder

video

:(43:30-1:01:00): ID: [email protected]

Download KliqueFinder at PW:kenfrank2014 – http://hlmsoft.net/wkf/ –Follow instructions to install. Put in c:\kliqfind –Mac users: vmware fusion, Windows 7, 32 bit: http://store.vmware.com/store/vmware/pd/productID.165310200/Currency.USD/ • Click on “ Browse …” button to specify the directory where the data file is located.

16

KliqueFinder

• Choose “ Basic setup ” setup file ” button.

and then click “ Run 17

KliqueFinder

• Click on the “ Browse ” data file.

button to choose a 18

Run Analysis

Data file 19

New Version of Data Input more Flexible File name must be less than 20 characters ID’s should be 6 digits or less Actor 1 interacts with actor 2 at a level of 3 Extent of relation can be binary or weighted New: flexible columns, Old (10 spaces for each) Same results

Prepping data in excel

Prepping Data in UCINET

Converting data using sas

20

View Clusters Output

21

Blocked Network Data

N Group And Actor Id 24 |AAAA|BBBBBB|CCCCCCCC|DDDDDD| | | | | | | 2 1|221 1| 11 2|111122| Group ID|7445|612214|98133560|796037| ------------+----+------+--------+------+ 1 A 7|A213|......|........|...1..| 1 A 24|4A3.|......|.4......|......| 1 A 4|33A.|......|........|......| 1 A 15|433A|......|........|......| ------------+----+------+--------+------+ 2 B 26|.2..|B443..|........|......| 2 B 21|.1..|4B....|...4....|....2.| 2 B 12|....|4.B...|........|......| 2 B 2|....|33.B..|........|...1..| 2 B 1|..3.|3..3B.|........|.3..2.| 2 B 14|....|....1B|........|......| ------------+----+------+--------+------+ 3 C 9|....|......|C...3.33|.3....| 3 C 8|.4..|..4...|.C.4..4.|4.....| 3 C 11|....|......|33C.4.3.|..4...| 3 C 13|.4..|.4....|444C....|......| 3 C 3|3...|.4....|4.44C...|......| 3 C 5|.1..|.....4|3.2.3C..|......| 3 C 6|....|......|444..4C4|......| 3 C 20|....|......|3..3.44C|......| ------------+----+------+--------+------+ 4 D 17|.1..|......|.1......|D.1...| 4 D 19|....|......|4.3.....|3D4...| 4 D 16|....|......|4..4...4|44D...| 4 D 10|..3.|...1..|........|...D3.| 4 D 23|....|.3....|........|.343D.| 4 D 27|.1..|.1....|........|.3..3D|

θ 1 =1.1738

22

Step 3) Examine evidence of clusters 1) randomly redistribute ties 2) apply algorithm 3) record value of odds ratio and θ 1 4) repeat 1000 times to generate distribution 5) use mean of distribution as baseline for comparison 23

Randomly Redistributing Ties 24

Apply Algorithm to Random Data,

θ 1 =.81822 25

Monte Carlo Sampling Distribution video: (1:06:35-1:18:50) ID: [email protected]

PW:kenfrank2014 Output in sampdist.dat

Data can include weights Indicate simulate data θ 1 =Log odds/2 Odds Ratio Set up sampling. Remember to do “new data” set up when done To prepare for next analysis 26

spss Code for Reading in Sample Distribution Data SAS GET DATA /TYPE=TXT /FILE="C:\KLIQFIND\sampdist.dat" /FIXCASE=1 /ARRANGEMENT=FIXED title "Sampling distribution for theta1"; data one; infile "sampdist.dat" missover; Input theta1 odds1; /FIRSTCASE=1 /IMPORTCASE=ALL /VARIABLES= /1 theta1 0-29 F30.10

proc univariate plot; var theta1; oddsratio 30-59 F30.10

samplesize 60-89 F30.10.

Stata CACHE.

EXECUTE.

DATASET NAME DataSet9 WINDOW=FRONT.

*This command imports the data file import delimited C:\KLIQFIND\sampdist.dat, delimiter(" ", asstring) DATASET ACTIVATE DataSet9.

GRAPH /HISTOGRAM=theta1.

*These commands perform data management: 27 drop v1 rename v2 theta1 rename v3 oddsratio rename v4 samplesize *This command plots histogram for theta1: hist theta1,freq

Comparison of Sampling Distributions 28

Distribution of θ 1base From Application of the Algorithm to Data Simulated Without Regard for Subgroup Membership Observed value: 1.1738

29

Sampling Distribution Parameters Edit simulation parameters.

First element is number of replications Must keep # of reps in first 5 columns 30

Approximate p-value Based on Previous Simulations

PREDICTED THETA (1 base) BASED ON SIMULATIONS.

VALUE BASED ON UNWEIGHTED DATA.

0.76985 ESTIMATE OF THETA (1 subgroup processes) 0.40397 (total-predicted=evidence of groups): 1.1738-.76985=.40397

THE TOTAL THETA1 IS: 1.1738 APPROXIMATE TEST OF CONCENTRATION OF TIES WITHIN SUBGROUPS BASED ON SIZE OF THETA1 subgroup processes: THETA1 | SUBGROUP | APPROX | APPROX PROCESSES| LRT | P-VALUE 0.40 34.82 0.00

Reject null hypotheses of no clusters: H 0 : Θ 1 subgroup processes =0

31

Step 4) Evaluating the Performance of the Algorithm : Did the Algorithm Recover the Correct Subgroups?

• Many algorithms search for optimal subgroups. KliqueFinder does not, but how different are the subgroups it finds from the optimal or known subgroups?

32

Output for Recovery of Subgroups PREDICTED ACCURACY: LOG ODDS OF COMMON SUBGROUP MEMBERSHIP, + OR - .5734 (FOR A 95% CI) 1.4989 The Log odds applies to the following table: OBSERVED SUBGROUP DIFFERENT SAME ___________________ | | | DIFFERENT | A | B | KNOWN | | | SUBGROUP |--------|--------| | | | SAME | C | D | | | | ------------------ THE LOGODDS TRANSLATES TO AN ODDS RATIO OF 4.4766 WHICH INDICATES THE INCREASE IN THE ODDS THAT KLIQUEFINDER WILL ASSIGN TWO ACTORS TO THE SAME SUBGROUP IF THEY ARE TRULY IN THE IN THE SAME SUBGROUP.

Specific accuracy for a given data set not known, results predicted from thousands of simulations – see next slide

33

Odds of Recovery (Toy Example)

Simulated data with known subgroups Observed subgroups identified by KliqueFinder 1 1 1 2 3 4 5 6 1 1 0 0 1 2 1 1 1 0 0 1 0 0 3 1 0 4 0 0 0 0 1 5 1 0 0 1 1 6 0 0 1 1 1 1 1 2 3 4 5 6 1 1 0 0 1 2 1 1 1 0 0 3 1 0 1 0 0 4 0 0 0 0 1 5 1 0 0 1 1 6 0 0 1 1 1 Cell A: 6 pairs correctly assigned to different subgroups: 1,5; 2,5; 3,5; 1,6; 2,6; 3,6 OBSERVED SUBGROUP DIFFERENT SAME ___________________ | | | DIFFERENT | | | KNOWN | A (6)| B (3)| SUBGROUP |--------|--------| | | | SAME | | | | C (2)| D (4)| ------------------ Odds of recovery =(AD)/(BC)= 6x4/(3x2)=4.00

Missassignment of actor 4 contributes 3 to cell B and 2 to cell C Cell D: 4 pairs correctly assigned to same subgroup: (1,2; 1,3; 2,3; 5,6)

Make Sociogram in Netdraw video : (1:01:00-1:06:22): ID: [email protected]

PW:kenfrank2014 35

Sometimes Netdraw can’t find file retrieve manually

36

Modifying Image in Netdraw

37

38

Data used for multidimensional Scaling within subgroups. Distance= maximum value/cell entry e.g., maximum value is 4, So a tie of 2 4/2=2, distance of 2



N Group And Actor Id 24 |AAAA|BBBBBB|CCCCCCCC|DDDDDD| | | | | | | 2 1|221 1| 11 2|111122| Group ID|7445|612214|98133560|796037| ------------+----+------+--------+------+ 1 A 7|A213|......|........|...1..| 1 A 24|4A3.|......|.4......|......| 1 A 4|33A.|......|........|......| 1 A 15|433A|......|........|......| ------------+----+------+--------+------+ 2 B 26|.2..|B443..|........|......| 2 B 21|.1..|4B....|...4....|....2.| 2 B 12|....|4.B...|........|......| 2 B 2|....|33.B..|........|...1..| 2 B 1|..3.|3..3B.|........|.3..2.| 2 B 14|....|....1B|........|......| ------------+----+------+--------+------+ 3 C 9|....|......|C...3.33|.3....| 3 C 8|.4..|..4...|.C.4..4.|4.....| 3 C 11|....|......|33C.4.3.|..4...| 3 C 13|.4..|.4....|444C....|......| 3 C 3|3...|.4....|4.44C...|......| 3 C 5|.1..|.....4|3.2.3C..|......| 3 C 6|....|......|444..4C4|......| 3 C 20|....|......|3..3.44C|......| ------------+----+------+--------+------+ 4 D 17|.1..|......|.1......|D.1...| 4 D 19|....|......|4.3.....|3D4...| 4 D 16|....|......|4..4...4|44D...| 4 D 10|..3.|...1..|........|...D3.| 4 D 23|....|.3....|........|.343D.| 4 D 27|.1..|.1....|........|.3..3D|

Density = 4/(4x8)=1/8 Kliqfinder uses Density =4/(4x5)=.20 because maximum number of nominations is 5 DIRECT ASSOCIATIONS GROUP 1 2 3 4 LABEL A B C D N 4 6 8 6 GROUP 1 2.42 0.00 0.20 0.05

2 0.25 1.07 0.13 0.27

3 0.38 0.40 2.40 0.28

4 0.21 0.17 0.67 1.17

In xxxxxx.clusters

Distance in multidimensional Scaling between subgroups =maximum value /density 39

Frank, K. 1996. Mapping interactions within and between cohesive subgroups. Social Networks 18: 93-119.

cohesion Structural similarity video: (1:19:15-1:23:40)) ID: [email protected]

PW:kenfrank2014 40

Choosing lines: Groups 41

Confidentiality/Ethical issues in Collecting Network Data • • •

Need names on survey Data can be confidential but not anonymous (especially for longitudinal)

R.L. Breiger, “Ethical Dilemmas in Social Network Research: Introduction to Special Issue.”

Social Networks

27 / 2 (2005): 89 – 93. Read it online. http://www.u.arizona.edu/~breiger/2005BreigerIntroEthics.pdf

– (All issues of social networks available via science direct) •

Who benefits from network analysis? Who bears the cost?

– Kadushin, Charles “Who benefits from network analysis: ethics of social network research”

Social Networks

27 / 2 (2005):

Pages 139-153.

• •

Issues to raise when dealing with Human Subjects Board:

– Klovdahl, Alden S. Social network research and human subjects protection: Towards more effective infectious disease control

Pages 119-137 Hint on Human Subjects boards: they like precedents. Once you have one network study accepted, refer to it when submitting others!

•

https://www.msu.edu/~kenfrank/social%20network/irb%20with%20network%20data.htm

video : (1:23:41-1:28)ID: [email protected]

PW:kenfrank2014 42

The SRI/KLiqueFinder Solution to confidentiality: aggregate to subgroups 1) Provide information about who is in which cluster as well as information regarding the resources embedded in each cluster. Resources could be information, expertise, material resources, etc. Benefit: reveals location of resources relative to social; structure Protection: does not reveal specific responses because all information is at the cluster level.

2) Provide locations from in a sociogram

unique for each respondent

, indicating where that person is located (“you are here”

).

But figure does not include the lines from a sociogram, so respondents cannot infer others’ responses. Benefit: Respondents then use this as a guide to individual behavior for identifying further resources or information.

Protection: Specific responses of others not revealed, so confidentiality preserved.

43

Can even include names of actors

Using subgroups for feedback to respondents and in a proposal 44

Choosing Lines: Actor Level Within 45

Choosing Lines: Actor Level Remove group nodes 46

Choosing Lines: Actor Level Between 47

Choosing Lines: Group Level 48

Modifying the Image: Adding Node Data or Relations

video : ID: [email protected]

PW:kenfrank2014 : (1:49:35-2:07:48) http://www.analytictech.com/ucinet/download.htm

http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0C B0QFjAA&url=http%3A%2F%2Fwww.analytictech.com%2FNetdraw%2FNetdra wGuide.doc&ei=6pC4Tp29Men3sQLv99WoCA&usg=AFQjCNHg_NTjlHOclmeJ kwQs2xRaiPYgXQ&sig2=WLwXKSjJq_Yinpfkwv0m4w http://faculty.ucr.edu/~hanneman/nettext/C4_netdraw.html#data 49

Input data

Files for KliqueFinder

Network data Node data xxxxxx.list

xxxxxx.ilabel

Alternative network data xxxxxx.xnet

Parameters

Kliqfind.par

Printo Simulate.par

KliqueFinder

Output

xxxxxx.place

Data containing actor ID’s and subgroup placement xxxxxx.clusters

xxxxxx.vna

Diagnostics and matrix formatted data for Netdraw 50

Modifying node data by Editing [datafile].vna: File is read by netdraw. Copy relevant data into excel, edit, and replace *node data id type group gender "0A " 2 1 0 "0B " 2 2 0 "0C " 2 3 0 "0D " 2 4 0 1 1 2 1 2 1 2 2 Add new node variable here (e.g. gender) then add data *Node properties ID x y color shape size shortlabel active "0A " -2.01889 -15.04530 16777215 1 30 A TRUE "0B " -9.41864 15.75047 16777215 1 85 B TRUE "0C " 2.06574 2.09162 16777215 1 52 C TRUE "0D " 8.54812 10.10988 16777215 1 79 D TRUE 1 -10.52314 14.16442 16711680 1 10 1 TRUE 2 -8.29999 13.27802 16711680 1 10 2 TRUE *Tie data from to any strength actor group between within technology 1 2 1 3 1 0 0 1 4 1 3 1 0 1 1 19 1 3 1 0 1 1 23 1 2 1 0 1 1 26 1 3 1 0 0 2 26 1 3 1 0 0 2 10 1 1 1 0 1 *Tie properties FROM TO color size headcolor headsize active "0A " "0B " 12632256 1 12632256 0 TRUE "0A " "0C " 12632256 9 12632256 0 TRUE 1 2 0 3 0 8 TRUE 1 4 12632256 3 0 8 TRUE 51

stanne.list

Adding Node Attributes with Extra File KliqueFinder will put attributes into vna file xxxxxx.list

xxxxxx.Ilabel

File=xxxxxx.ilabel where xxxxxx is the first 6 characters of your data file 10 columns for ID; Skip a space; Name; Node attribute 1-5 1 Jacob 1 3 5 2 Stan 1 2 5 3 Linton 1 2 5 4 Charles 1 3 3 5 Mark 1 3 3 6 Tom 2 3 3 7 Ronald 2 3 5 8 Nan 2 1 3 9 Elizabeth 2 1 4 10 Barry 2 2 3 11 Martin 2 3 1 12 Steve 2 3 1 13 PeterC 2 1 5 14 Patrick 1 1 1 15 Katy 1 1 3 16 Kathleen 3 3 3 17 Ove 2 2 2 18 JamesC 5 5 5 19 Robert 4 4 4 20 JamesM 1 2 3 4 21 Noah 4 3 2 1 22 Marijtje 1 2 1 2 23 Ronald 2 1 2 1 24 Harrison 3 1 3 1 25 Duncan 4 1 4 1 Cut and paste into stanne.Ilabel

52

53

54

Interactive: adding node data or 55

56

Include Node Data in Image

57

Modifying Links Lines indicate friendships: solid within subgroups, dotted between subgroups.

numbers represent actors Rgt,Cen,Soc,Non = political parties; B=Banker, T=treasury; E=Ecole National D’administration Frank, K.A. & Yasumoto, J. (1998). "Linking Action to Social Structure within a System: Social Capital Within and Between Subgroups."

American Journal of Sociology

, Volume 104, No 3, pages 642-686 58

Hostile Actions

59

Supportive Actions

60

-15 -25 -35 -45 -25 35 25 15 5 -5 -15 C -5 E A 5 B D 15 25 • Each number is a teacher • G_ indicates grade in which teacher teaches • Lines connecting two numbers indicate teachers who are close colleagues Solid lines within subgroups, dashed between • Circles indicate cohesive subgroups 61

Ripple Plot

• Overlay talk about technology on social geography of crystallized sociogram • Lines indicate talk about technology • Size of dot indicates teacher’s use of technology at time 1 • Ripples indicate increase in use from time 1 to time 2 62

Frank, K. A

. and Zhao, Y. (2005). "Subgroups as a Meso-Level Entity in the Social Organization of Schools." Chapter 10, pages 279-318. Book honoring Charles Bidwell's retirement, edited by Larry Hedges and Barbara Schneider. New York: Sage publications.

63

Modifying Links by Editing [datafile].vna: File is read by netdraw. Copy relevant data into excel, edit, and replace *node data id type group gender "0A " 2 1 0 "0B " 2 2 0 "0C " 2 3 0 "0D " 2 4 0 1 1 2 1 2 1 2 2 Add new node variable here (e.g. gender) then add data *Node properties ID x y color shape size shortlabel active "0A " -2.01889 -15.04530 16777215 1 30 A TRUE "0B " -9.41864 15.75047 16777215 1 85 B TRUE "0C " 2.06574 2.09162 16777215 1 52 C TRUE "0D " 8.54812 10.10988 16777215 1 79 D TRUE 1 -10.52314 14.16442 16711680 1 10 1 TRUE 2 -8.29999 13.27802 16711680 1 10 2 TRUE Add new relation here (e.g. technology) *Tie data from to any strength actor group between within technology 1 2 1 3 1 0 0 1 4 1 3 1 0 1 1 19 1 3 1 0 1 1 23 1 2 1 0 1 1 26 1 3 1 0 0 2 26 1 3 1 0 0 2 10 1 1 1 0 1 *Tie properties FROM TO color size headcolor headsize active "0A " "0B " 12632256 1 12632256 0 TRUE "0A " "0C " 12632256 9 12632256 0 TRUE 1 2 0 3 0 8 TRUE 1 4 12632256 3 0 8 TRUE then add data 64

stanne.list

Modifying Links with Extra File KliqueFinder will put attributes into vna file xxxxxx.list

xxxxxx.xnet

File=xxxxxx.xnet where xxxxxx is the first 6 characters of your data file File containing extra network Nominator nominee strength of tie 1 2 4 19 15 3 22 26 1 stanne.xnet

65

66

Modifying Links: Interactive – Finicky

67

Interactive Modifying Links 68

Two mode *Field, S. *

Frank, K.A.

, Schiller, K, Riegle Crumb, C, and Muller, C. 2006. “Identifying Social Contexts in Affiliation Networks: Preserving the Duality of People and Events.

Social Networks

28:97-123. * co first authors.

Data source 1 2 video : ID: [email protected]

PW:kenfrank2014:(1:39:25-1:49:35) 69

Copy homact.list from c:\kliqfind/setups to c:\kliqfind

70

Two-mode Data

Edgelist First two rows do not appear in the data – I put them there to show the format: 10 spaces for each entry Actor 1 participates in event 19 at a level of 1 Extent of relation can be binary or weighted New version of KliqueFinder is more flexible About 10 column widths.

ID’s should be 6 digits or less

Prepping data in excel

Prepping Data in UCINET

Converting data using sas

71

Two mode Clusters output 72

Blocked Two-Mode Blocked Network Data

73

Two-mode Crystallized Sociogram 74

Centralization & Centrality in KliqueFinder • KliqueFinder produces a measure of Warp. • Starts with distances defined by – Maximum value in network / observed value • E.g. maximum is 4 and a particular tie is 1, then distance is 4/1=4.

– These are the distances used in the MDS to produce the sociograms (see “running KliqueFinder ppt”) • Obtains eigen values – within each cluster based on raw data within cluster – Between clusters based on 1/density of ties between clusters • Warp =sum of positive eigen values/sum of all eigen values – Note it does not use the square root of the eigen values • Output into xxxxxx.bcord (9th element) and into netdraw as node attribute for groups, called “centrality” • Centrality for individuals is distance to the center of their

• Density=average value in a given block (variances are more additive)

subgroup (radius).

75

Running on a Large Data File (more than 1000 actors) If you start the program and it just sits there, it is looking for the best seed for the first subgroup. Seed is 3 actors, but it looks for all combinations of 3 that share common ties in network. Intensive, and unnecessary for large data (1 st subgroup does not matter so much). To shortcut: change value from 1  2. save & run.

76

Software Challenge

video : ID: [email protected]

PW:kenfrank2014 :( 2:07:57-2:08:15) • Analyze nonpr1.list

– Evidence of clusters?

– Performance of algorithm?

• Replace lines with nonpr2 • Describe the KliqueFinder algorithm

77

KliqueFinder Applications: Adding Individual Attributes in SAS:

run KliqueFinder data file collt1.list

make graph

use ID from other file? Yes: sas file name: c:\kliqfind\indiv [be sure to include full path] id variable: nominator string variable: gradelev Save

In sas, run socgramz in the working directory

78

KliqueFinder Applications: Adding Individual Attributes:

• Select “Yes” for “User ID (character) from other SAS file?” 79

KliqueFinder Applications: Adding Individual Attributes:

• Type the following information in the corresponding boxes • Then Click “Save”

80

Choosing an ID Variable

81

With ID based on Grade

82

KliqueFinder Applications: Replacing Lines

run KliqueFinder data file collt1.list

make graph

save retrieve socgramz.sas in the working directory replace all occurrences of collt1.list with collt2.list

run

83

Opening socgramz.sas

84

Changing lines

85

Change lines to different source

86

New Lines based on Collt2

87

Batch KliqueFinder

88

Basics

• Program runs KliqueFinder on multiple files • Input – List of filenames – Files containing data – BACK UP YOUR DATA FIRST!

• Output – Clustering output (.place, .clusters, vna) for each list file

89

Files

File containing names of data files: testb.txt

BACK UP YOUR DATA FIRST!

Data file: stanne.list

Data file: ffe.list

90

KliqueFinder

• Browse to directory you want to work in • Choose “ Basic setup ” setup file ” button.

and then click “ Run 91

BACK UP DATA FILES BEFORE RUNNING!

File with names of data files Click here to run as batch

Running Batch Mode

92

Prepping data in excel video : ID: [email protected]

PW:kenfrank2014 :Time: (1:28-1:39) Name your file xxxxxx.list

e.g., test01.list

Right click Choose Formatted text (space delimited) 93

Prepping Data in UCINET

Navigate to UCINET data Navigate to where you want to save: c:\kliqfind 94

Must remove “!” from file. There may be several !’s points are there because of Multiple data sets 95

Converting data using sas

video : ID: [email protected]

PW:kenfrank2014 : : Time: (2:10:43-2:19) data one; infile "badform.list"; input chooser chosen wt; data two; set one; file "ready1.list"; if wt ne . then put (chooser chosen wt) (10.); run; 96

A Priori

Clusters A line with 99999 in the data file indicates in which

a priori

cluster an actor is placed. For example, actor 1 is in a priori cluster 3.

Run repeat2 setup, and then proceed as usual. Remember to do “new data” setup when done. based on

a priori

clusters

Comparison of

A Priori

Clusters and Identified Solution

Run as new data Data with a priori cluster assignments Run as usual then look at cluster output SIMILARITY BETWEEN THE START AND END GROUPS: ACTUAL POSS STANDARDIZED 52. 88. 9.55565

QAP standardized measure, compare with normal distribution

Data Containing Cluster Assignments File called stanne.place [datafile.place] There may be Slightly different numeric formats Depending on the version of KliqueFinder Internal ID User ID Cluster ignore: for simulation only 1.0 1.0 2.0 1.0 3.0

2.0 2.0 2.0 1.0 3.0

3.0 4.0 1.0 1.0 3.0

4.0 19.0 4.0 1.0 3.0

5.0 23.0 4.0 1.0 3.0

6.0 26.0 2.0 1.0 3.0

17.0 6.0 3.0 1.0 3.0

18.0 8.0 3.0 1.0 3.0

19.0 20.0 3.0 1.0 3.0

20.0 15.0 1.0 1.0 3.0

21.0 12.0 2.0 1.0 3.0

22.0 17.0 4.0 1.0 3.0

23.0 16.0 4.0 1.0 3.0

24.0 27.0 4.0 1.0 3.0

-27.0 28.0 4.0 1.0 3.0

If first number (internal ID) is negative, this indicates a tagalong – an actor connected to only one other. In this case, the last line should be read as the tagee, tagger, and group. So, actor 28 is connected to only one other actor (27) and is therefore assigned to actor 27’s cluster, which is cluster 4.

99

SPSS

Including Cluster Membership in Influence Model

DATA LIST / intid 1-10 nominee 11-20 cluster 21-30 simx 31-40 extra 41-50.

BEGIN DATA 1.0 1.0 1.0 1.0 3.0

2.0 2.0 1.0 1.0 3.0

3.0 3.0 1.0 1.0 3.0

4.0 4.0 2.0 1.0 3.0

5.0 5.0 2.0 1.0 3.0

6.0 6.0 2.0 1.0 3.0

END DATA.

DATASET NAME clusters WINDOW=FRONT.

SORT CASES BY nominee(A).

EXECUTE.

SAS data clusters;

*groups from KLiqueFinder; input intid nominator cluster simx extra; cards; 1.0 1.0 1.0 1.0 3.0

2.0 2.0 1.0 1.0 3.0

3.0 3.0 1.0 1.0 3.0

4.0 4.0 2.0 1.0 3.0

5.0 5.0 2.0 1.0 3.0

6.0 6.0 2.0 1.0 3.0

proc sort data=groups;

by nominator; MATCH FILES /FILE=yvar1 /FILE='indeg' /FILE=clusters /BY nominee.

EXECUTE.

data withinfl;

merge yvar2 yvar1 infl expanse cluster attract(rename=(nominee=nominator)); by nominator; drop nominee _type_ _freq_; advanced: run influence model for technology Identify clusters from talkt2 Include cluster membership the influence model 100

Adding Patches

Patch for one -mode Patch for Two-mode 101

• • •

Alternative community detection algorithms

http://cs.stanford.edu/people/jure/pubs/co mmunities-www10.pdf

http://www.uvm.edu/~pdodds/files/papers/ others/2009/lancichinetti2009a.pdf

http://fatweasel.net/analytics/network analysis/community-detection-in-networks/ 102