KliqueFinder: Identifying Clusters in Network Data Kenneth A. Frank Michigan State University Based on: • • • • Frank.
Download ReportTranscript KliqueFinder: Identifying Clusters in Network Data Kenneth A. Frank Michigan State University Based on: • • • • Frank.
KliqueFinder: Identifying Clusters in Network Data
Kenneth A. Frank Michigan State University Based on: • • • • Frank. K.A. 1995. Identifying Cohesive Subgroups. Social Networks (17): 27-56 Frank, K. 1996. Mapping interactions within and between cohesive subgroups. Social Networks 18: 93-119.
*Field, S. *Frank, K.A., Schiller, K, Riegle-Crumb, C, and Muller, C. (2006). "Identifying Social Contexts in Affiliation Networks: Preserving the Duality of People and Events. Social Networks 28:97-123 * co first authors.
https://www.msu.edu/user/k/e/kenfrank/web/research.htm#representation
1
Overview
•
Clustering and Graphical Representations of Networks
– Step 1) Criteria for Determining Group Membership
Step 3) Examine evidence of clusters
Step 4) Evaluating the Performance of the Algorithm : Did...
Confidentiality/Ethical issues in Collecting Network Data
Modifying the Image: Adding Node Data or Relations...
Two mode
2
video Clustering and Graphical Representations of Networks : (26:09-31:41): ID: [email protected]
PW:kenfrank2014
Goal: to identify patterns in the network
• Rearrange rows and columns of social network matrix to reveal clustering • Plot actors and ties in two dimensions to reveal clustering
3
Theory for defining cluster membership
• •
cohesion
–
Result
(clusters are called subgroups): an actor should be in a cluster if the actor has demonstrated a preference for engaging in ties with members of the cluster. : ties are concentrated within subgroups
structural equivalence (blocks)
–
Result
: an actor should be in a cluster if the actor engages in a similar pattern of ties as members of that cluster. : blocks represent positions, but ties not necessarily concentrated within blocks.
4
Crystallized Sociogram: Friendships Among the French Financial Elite Lines indicate friendships: solid within subgroups, dotted between subgroups.
numbers represent actors Rgt,Cen,Soc,Non = political parties; B=Banker, T=treasury; E=Ecole National D’administration Frank, K.A. & Yasumoto, J. (1998). "Linking Action to Social Structure within a System: Social Capital Within and Between Subgroups."
American Journal of Sociology
, Volume 104, No 3, pages 642-686 5
Crystallized Sociogram: Clusters in Foodwebs Krause, A., Frank, K.A., Mason, D.M., Ulanowicz, R.E. and Taylor, W.M. (2003). "Compartments exposed in food-web structure."
Nature
426:282-285 6
Data Input File name must be less than 20 character. Best if file name is six characters followed by .list: xxxxxx.list . For example stanne.list
Old
(10 spaces for each) Actor 1 interacts with actor 2 at a level of 3 Extent of relation can be binary or weighted
New
: flexible columns, Same results
ID’s should be 6 digits or less 7
Data
Edgelist First two rows do not appear in the data – I put them there to show the format: 10 spaces for each entry Actor 1 interacts with actor 2 at a level of 3 Extent of relation can be binary or weighted Best if file name is six characters followed by .list.
xxxxxx.list
For example stanne.list
New version of KliqueFinder is more flexible About 10 column widths.
ID’s should be 6 digits or less
8
Steps for finding clusters video
: ( 31:41-43:30): ID: [email protected]
PW:kenfrank2014 1) Determine criterion for defining clusters 2) Maximize criterion 3) Examine evidence of clusters 4) Evaluate performance of the algorithm 5) Interpret clusters commonality of attributes focal experiences subsequent behavior
9
Step 1) Criteria for Determining Group Membership
Structural Equivalence:
Factor analyze sociomatrix (Katz & Kahn) iteratively rearrange and revalue rows and columns (CONCORR -- White el al., 1976)
Cohesion
utilize fixed criteria (e.g., must be connected to at least
k
others in clusters, or must be minimal path length from
k
others, etc).
use flexible criterion -- preference relative to group sizes and number of ties:
10
Model Based Cohesion
W ii’ =1 if tie between actors i and i’, 0 otherwise samegroup ii’
= 1 if actors
i
and i’ are members of the same subgroup, 0 otherwise.
Then θ 1 represents subgroups salience: So ...... Maximize θ 1 (odds ratio) 11
Odds Ratio for Association Between Common Subgroup Membership and The Occurrence of Ties Between Actors
12
Step 2: Maximizing Criterion
• 1) find a subgroup seed (3 actors who interact with each other, and with similar others) • 2) add to the cluster to maximize θ 1 you cannot do any more • 3) start new subgroup with new seed until • 4) shuffle between existing subgroups • 5) make new subgroups as necessary, dissolve existing ones as necessary.
13
KliqueFinder Algorithm: Phase I
Computat ionally intensive, modify for large networks
Initialize: assign each actor to own subgroup Find subgroup seed of 2 or 3 Identify single move that most increases objective function θ 1 For finding best subgroup seed: 1) can only choose from unaffiliated actors 2) Each actor can only be a seed once Does move increase function?
No yes Reassign actor that makes best move If assignment moves actor out of a group of 3, reassign reamaining 2 to next best groups
KliqueFinder Algorithm: Phases II and III
• Phase II: If best move does not increase objective function and there are fewer than 3 actors available for subgroups then – Attach all isolated (or singleton) actors to best existing subgroups, even if this reduces objective function • Phase III: shuffle actors between existing subgroups without seeding new ones or disbanding existing ones – Number of subgroups is fixed – This is simple hill climbing and can be cast as EM algorithm
• Running KliqueFinder
video
:(43:30-1:01:00): ID: [email protected]
Download KliqueFinder at PW:kenfrank2014 – http://hlmsoft.net/wkf/ –Follow instructions to install. Put in c:\kliqfind –Mac users: vmware fusion, Windows 7, 32 bit: http://store.vmware.com/store/vmware/pd/productID.165310200/Currency.USD/ • Click on “ Browse …” button to specify the directory where the data file is located.
16
KliqueFinder
• Choose “ Basic setup ” setup file ” button.
and then click “ Run 17
KliqueFinder
• Click on the “ Browse ” data file.
button to choose a 18
Run Analysis
Data file 19
New Version of Data Input more Flexible File name must be less than 20 characters ID’s should be 6 digits or less Actor 1 interacts with actor 2 at a level of 3 Extent of relation can be binary or weighted New: flexible columns, Old (10 spaces for each) Same results
20
View Clusters Output
21
Blocked Network Data
N Group And Actor Id 24 |AAAA|BBBBBB|CCCCCCCC|DDDDDD| | | | | | | 2 1|221 1| 11 2|111122| Group ID|7445|612214|98133560|796037| ------------+----+------+--------+------+ 1 A 7|A213|......|........|...1..| 1 A 24|4A3.|......|.4......|......| 1 A 4|33A.|......|........|......| 1 A 15|433A|......|........|......| ------------+----+------+--------+------+ 2 B 26|.2..|B443..|........|......| 2 B 21|.1..|4B....|...4....|....2.| 2 B 12|....|4.B...|........|......| 2 B 2|....|33.B..|........|...1..| 2 B 1|..3.|3..3B.|........|.3..2.| 2 B 14|....|....1B|........|......| ------------+----+------+--------+------+ 3 C 9|....|......|C...3.33|.3....| 3 C 8|.4..|..4...|.C.4..4.|4.....| 3 C 11|....|......|33C.4.3.|..4...| 3 C 13|.4..|.4....|444C....|......| 3 C 3|3...|.4....|4.44C...|......| 3 C 5|.1..|.....4|3.2.3C..|......| 3 C 6|....|......|444..4C4|......| 3 C 20|....|......|3..3.44C|......| ------------+----+------+--------+------+ 4 D 17|.1..|......|.1......|D.1...| 4 D 19|....|......|4.3.....|3D4...| 4 D 16|....|......|4..4...4|44D...| 4 D 10|..3.|...1..|........|...D3.| 4 D 23|....|.3....|........|.343D.| 4 D 27|.1..|.1....|........|.3..3D|
θ 1 =1.1738
22
Step 3) Examine evidence of clusters 1) randomly redistribute ties 2) apply algorithm 3) record value of odds ratio and θ 1 4) repeat 1000 times to generate distribution 5) use mean of distribution as baseline for comparison 23
Randomly Redistributing Ties 24
Apply Algorithm to Random Data,
θ 1 =.81822 25
Monte Carlo Sampling Distribution video: (1:06:35-1:18:50) ID: [email protected]
PW:kenfrank2014 Output in sampdist.dat
Data can include weights Indicate simulate data θ 1 =Log odds/2 Odds Ratio Set up sampling. Remember to do “new data” set up when done To prepare for next analysis 26
spss Code for Reading in Sample Distribution Data SAS GET DATA /TYPE=TXT /FILE="C:\KLIQFIND\sampdist.dat" /FIXCASE=1 /ARRANGEMENT=FIXED title "Sampling distribution for theta1"; data one; infile "sampdist.dat" missover; Input theta1 odds1; /FIRSTCASE=1 /IMPORTCASE=ALL /VARIABLES= /1 theta1 0-29 F30.10
proc univariate plot; var theta1; oddsratio 30-59 F30.10
samplesize 60-89 F30.10.
Stata CACHE.
EXECUTE.
DATASET NAME DataSet9 WINDOW=FRONT.
*This command imports the data file import delimited C:\KLIQFIND\sampdist.dat, delimiter(" ", asstring) DATASET ACTIVATE DataSet9.
GRAPH /HISTOGRAM=theta1.
*These commands perform data management: 27 drop v1 rename v2 theta1 rename v3 oddsratio rename v4 samplesize *This command plots histogram for theta1: hist theta1,freq
Comparison of Sampling Distributions 28
Distribution of θ 1base From Application of the Algorithm to Data Simulated Without Regard for Subgroup Membership Observed value: 1.1738
29
Sampling Distribution Parameters Edit simulation parameters.
First element is number of replications Must keep # of reps in first 5 columns 30
Approximate p-value Based on Previous Simulations
PREDICTED THETA (1 base) BASED ON SIMULATIONS.
VALUE BASED ON UNWEIGHTED DATA.
0.76985 ESTIMATE OF THETA (1 subgroup processes) 0.40397 (total-predicted=evidence of groups): 1.1738-.76985=.40397
THE TOTAL THETA1 IS: 1.1738 APPROXIMATE TEST OF CONCENTRATION OF TIES WITHIN SUBGROUPS BASED ON SIZE OF THETA1 subgroup processes: THETA1 | SUBGROUP | APPROX | APPROX PROCESSES| LRT | P-VALUE 0.40 34.82 0.00
Reject null hypotheses of no clusters: H 0 : Θ 1 subgroup processes =0
31
Step 4) Evaluating the Performance of the Algorithm : Did the Algorithm Recover the Correct Subgroups?
• Many algorithms search for optimal subgroups. KliqueFinder does not, but how different are the subgroups it finds from the optimal or known subgroups?
32
Output for Recovery of Subgroups PREDICTED ACCURACY: LOG ODDS OF COMMON SUBGROUP MEMBERSHIP, + OR - .5734 (FOR A 95% CI) 1.4989 The Log odds applies to the following table: OBSERVED SUBGROUP DIFFERENT SAME ___________________ | | | DIFFERENT | A | B | KNOWN | | | SUBGROUP |--------|--------| | | | SAME | C | D | | | | ------------------ THE LOGODDS TRANSLATES TO AN ODDS RATIO OF 4.4766 WHICH INDICATES THE INCREASE IN THE ODDS THAT KLIQUEFINDER WILL ASSIGN TWO ACTORS TO THE SAME SUBGROUP IF THEY ARE TRULY IN THE IN THE SAME SUBGROUP.
Specific accuracy for a given data set not known, results predicted from thousands of simulations – see next slide
33
Odds of Recovery (Toy Example)
Simulated data with known subgroups Observed subgroups identified by KliqueFinder 1 1 1 2 3 4 5 6 1 1 0 0 1 2 1 1 1 0 0 1 0 0 3 1 0 4 0 0 0 0 1 5 1 0 0 1 1 6 0 0 1 1 1 1 1 2 3 4 5 6 1 1 0 0 1 2 1 1 1 0 0 3 1 0 1 0 0 4 0 0 0 0 1 5 1 0 0 1 1 6 0 0 1 1 1 Cell A: 6 pairs correctly assigned to different subgroups: 1,5; 2,5; 3,5; 1,6; 2,6; 3,6 OBSERVED SUBGROUP DIFFERENT SAME ___________________ | | | DIFFERENT | | | KNOWN | A (6)| B (3)| SUBGROUP |--------|--------| | | | SAME | | | | C (2)| D (4)| ------------------ Odds of recovery =(AD)/(BC)= 6x4/(3x2)=4.00
Missassignment of actor 4 contributes 3 to cell B and 2 to cell C Cell D: 4 pairs correctly assigned to same subgroup: (1,2; 1,3; 2,3; 5,6)
Make Sociogram in Netdraw video : (1:01:00-1:06:22): ID: [email protected]
PW:kenfrank2014 35
Sometimes Netdraw can’t find file retrieve manually
36
Modifying Image in Netdraw
37
38
Data used for multidimensional Scaling within subgroups. Distance= maximum value/cell entry e.g., maximum value is 4, So a tie of 2 4/2=2, distance of 2
N Group And Actor Id 24 |AAAA|BBBBBB|CCCCCCCC|DDDDDD| | | | | | | 2 1|221 1| 11 2|111122| Group ID|7445|612214|98133560|796037| ------------+----+------+--------+------+ 1 A 7|A213|......|........|...1..| 1 A 24|4A3.|......|.4......|......| 1 A 4|33A.|......|........|......| 1 A 15|433A|......|........|......| ------------+----+------+--------+------+ 2 B 26|.2..|B443..|........|......| 2 B 21|.1..|4B....|...4....|....2.| 2 B 12|....|4.B...|........|......| 2 B 2|....|33.B..|........|...1..| 2 B 1|..3.|3..3B.|........|.3..2.| 2 B 14|....|....1B|........|......| ------------+----+------+--------+------+ 3 C 9|....|......|C...3.33|.3....| 3 C 8|.4..|..4...|.C.4..4.|4.....| 3 C 11|....|......|33C.4.3.|..4...| 3 C 13|.4..|.4....|444C....|......| 3 C 3|3...|.4....|4.44C...|......| 3 C 5|.1..|.....4|3.2.3C..|......| 3 C 6|....|......|444..4C4|......| 3 C 20|....|......|3..3.44C|......| ------------+----+------+--------+------+ 4 D 17|.1..|......|.1......|D.1...| 4 D 19|....|......|4.3.....|3D4...| 4 D 16|....|......|4..4...4|44D...| 4 D 10|..3.|...1..|........|...D3.| 4 D 23|....|.3....|........|.343D.| 4 D 27|.1..|.1....|........|.3..3D|
Density = 4/(4x8)=1/8 Kliqfinder uses Density =4/(4x5)=.20 because maximum number of nominations is 5 DIRECT ASSOCIATIONS GROUP 1 2 3 4 LABEL A B C D N 4 6 8 6 GROUP 1 2.42 0.00 0.20 0.05
2 0.25 1.07 0.13 0.27
3 0.38 0.40 2.40 0.28
4 0.21 0.17 0.67 1.17
In xxxxxx.clusters
Distance in multidimensional Scaling between subgroups =maximum value /density 39
Frank, K. 1996. Mapping interactions within and between cohesive subgroups. Social Networks 18: 93-119.
cohesion Structural similarity video: (1:19:15-1:23:40)) ID: [email protected]
PW:kenfrank2014 40
Choosing lines: Groups 41
Confidentiality/Ethical issues in Collecting Network Data • • •
Need names on survey Data can be confidential but not anonymous (especially for longitudinal)
R.L. Breiger, “Ethical Dilemmas in Social Network Research: Introduction to Special Issue.”
Social Networks
27 / 2 (2005): 89 – 93. Read it online. http://www.u.arizona.edu/~breiger/2005BreigerIntroEthics.pdf
– (All issues of social networks available via science direct) •
Who benefits from network analysis? Who bears the cost?
– Kadushin, Charles “Who benefits from network analysis: ethics of social network research”
Social Networks
27 / 2 (2005):
Pages 139-153.
• •
Issues to raise when dealing with Human Subjects Board:
– Klovdahl, Alden S. Social network research and human subjects protection: Towards more effective infectious disease control
Pages 119-137 Hint on Human Subjects boards: they like precedents. Once you have one network study accepted, refer to it when submitting others!
•
https://www.msu.edu/~kenfrank/social%20network/irb%20with%20network%20data.htm
video : (1:23:41-1:28)ID: [email protected]
PW:kenfrank2014 42
The SRI/KLiqueFinder Solution to confidentiality: aggregate to subgroups 1) Provide information about who is in which cluster as well as information regarding the resources embedded in each cluster. Resources could be information, expertise, material resources, etc. Benefit: reveals location of resources relative to social; structure Protection: does not reveal specific responses because all information is at the cluster level.
2) Provide locations from in a sociogram
unique for each respondent
, indicating where that person is located (“you are here”
).
But figure does not include the lines from a sociogram, so respondents cannot infer others’ responses. Benefit: Respondents then use this as a guide to individual behavior for identifying further resources or information.
Protection: Specific responses of others not revealed, so confidentiality preserved.
43
Can even include names of actors
Using subgroups for feedback to respondents and in a proposal 44
Choosing Lines: Actor Level Within 45
Choosing Lines: Actor Level Remove group nodes 46
Choosing Lines: Actor Level Between 47
Choosing Lines: Group Level 48
Modifying the Image: Adding Node Data or Relations
video : ID: [email protected]
PW:kenfrank2014 : (1:49:35-2:07:48) http://www.analytictech.com/ucinet/download.htm
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0C B0QFjAA&url=http%3A%2F%2Fwww.analytictech.com%2FNetdraw%2FNetdra wGuide.doc&ei=6pC4Tp29Men3sQLv99WoCA&usg=AFQjCNHg_NTjlHOclmeJ kwQs2xRaiPYgXQ&sig2=WLwXKSjJq_Yinpfkwv0m4w http://faculty.ucr.edu/~hanneman/nettext/C4_netdraw.html#data 49
Input data
Files for KliqueFinder
Network data Node data xxxxxx.list
xxxxxx.ilabel
Alternative network data xxxxxx.xnet
Parameters
Kliqfind.par
Printo Simulate.par
KliqueFinder
Output
xxxxxx.place
Data containing actor ID’s and subgroup placement xxxxxx.clusters
xxxxxx.vna
Diagnostics and matrix formatted data for Netdraw 50
Modifying node data by Editing [datafile].vna: File is read by netdraw. Copy relevant data into excel, edit, and replace *node data id type group gender "0A " 2 1 0 "0B " 2 2 0 "0C " 2 3 0 "0D " 2 4 0 1 1 2 1 2 1 2 2 Add new node variable here (e.g. gender) then add data *Node properties ID x y color shape size shortlabel active "0A " -2.01889 -15.04530 16777215 1 30 A TRUE "0B " -9.41864 15.75047 16777215 1 85 B TRUE "0C " 2.06574 2.09162 16777215 1 52 C TRUE "0D " 8.54812 10.10988 16777215 1 79 D TRUE 1 -10.52314 14.16442 16711680 1 10 1 TRUE 2 -8.29999 13.27802 16711680 1 10 2 TRUE *Tie data from to any strength actor group between within technology 1 2 1 3 1 0 0 1 4 1 3 1 0 1 1 19 1 3 1 0 1 1 23 1 2 1 0 1 1 26 1 3 1 0 0 2 26 1 3 1 0 0 2 10 1 1 1 0 1 *Tie properties FROM TO color size headcolor headsize active "0A " "0B " 12632256 1 12632256 0 TRUE "0A " "0C " 12632256 9 12632256 0 TRUE 1 2 0 3 0 8 TRUE 1 4 12632256 3 0 8 TRUE 51
stanne.list
Adding Node Attributes with Extra File KliqueFinder will put attributes into vna file xxxxxx.list
xxxxxx.Ilabel
File=xxxxxx.ilabel where xxxxxx is the first 6 characters of your data file 10 columns for ID; Skip a space; Name; Node attribute 1-5 1 Jacob 1 3 5 2 Stan 1 2 5 3 Linton 1 2 5 4 Charles 1 3 3 5 Mark 1 3 3 6 Tom 2 3 3 7 Ronald 2 3 5 8 Nan 2 1 3 9 Elizabeth 2 1 4 10 Barry 2 2 3 11 Martin 2 3 1 12 Steve 2 3 1 13 PeterC 2 1 5 14 Patrick 1 1 1 15 Katy 1 1 3 16 Kathleen 3 3 3 17 Ove 2 2 2 18 JamesC 5 5 5 19 Robert 4 4 4 20 JamesM 1 2 3 4 21 Noah 4 3 2 1 22 Marijtje 1 2 1 2 23 Ronald 2 1 2 1 24 Harrison 3 1 3 1 25 Duncan 4 1 4 1 Cut and paste into stanne.Ilabel
52
53
54
Interactive: adding node data or 55
56
Include Node Data in Image
57
Modifying Links Lines indicate friendships: solid within subgroups, dotted between subgroups.
numbers represent actors Rgt,Cen,Soc,Non = political parties; B=Banker, T=treasury; E=Ecole National D’administration Frank, K.A. & Yasumoto, J. (1998). "Linking Action to Social Structure within a System: Social Capital Within and Between Subgroups."
American Journal of Sociology
, Volume 104, No 3, pages 642-686 58
Hostile Actions
59
Supportive Actions
60
-15 -25 -35 -45 -25 35 25 15 5 -5 -15 C -5 E A 5 B D 15 25 • Each number is a teacher • G_ indicates grade in which teacher teaches • Lines connecting two numbers indicate teachers who are close colleagues Solid lines within subgroups, dashed between • Circles indicate cohesive subgroups 61
Ripple Plot
• Overlay talk about technology on social geography of crystallized sociogram • Lines indicate talk about technology • Size of dot indicates teacher’s use of technology at time 1 • Ripples indicate increase in use from time 1 to time 2 62
Frank, K. A
. and Zhao, Y. (2005). "Subgroups as a Meso-Level Entity in the Social Organization of Schools." Chapter 10, pages 279-318. Book honoring Charles Bidwell's retirement, edited by Larry Hedges and Barbara Schneider. New York: Sage publications.
63
Modifying Links by Editing [datafile].vna: File is read by netdraw. Copy relevant data into excel, edit, and replace *node data id type group gender "0A " 2 1 0 "0B " 2 2 0 "0C " 2 3 0 "0D " 2 4 0 1 1 2 1 2 1 2 2 Add new node variable here (e.g. gender) then add data *Node properties ID x y color shape size shortlabel active "0A " -2.01889 -15.04530 16777215 1 30 A TRUE "0B " -9.41864 15.75047 16777215 1 85 B TRUE "0C " 2.06574 2.09162 16777215 1 52 C TRUE "0D " 8.54812 10.10988 16777215 1 79 D TRUE 1 -10.52314 14.16442 16711680 1 10 1 TRUE 2 -8.29999 13.27802 16711680 1 10 2 TRUE Add new relation here (e.g. technology) *Tie data from to any strength actor group between within technology 1 2 1 3 1 0 0 1 4 1 3 1 0 1 1 19 1 3 1 0 1 1 23 1 2 1 0 1 1 26 1 3 1 0 0 2 26 1 3 1 0 0 2 10 1 1 1 0 1 *Tie properties FROM TO color size headcolor headsize active "0A " "0B " 12632256 1 12632256 0 TRUE "0A " "0C " 12632256 9 12632256 0 TRUE 1 2 0 3 0 8 TRUE 1 4 12632256 3 0 8 TRUE then add data 64
stanne.list
Modifying Links with Extra File KliqueFinder will put attributes into vna file xxxxxx.list
xxxxxx.xnet
File=xxxxxx.xnet where xxxxxx is the first 6 characters of your data file File containing extra network Nominator nominee strength of tie 1 2 4 19 15 3 22 26 1 stanne.xnet
65
66
Modifying Links: Interactive – Finicky
67
Interactive Modifying Links 68
Two mode *Field, S. *
Frank, K.A.
, Schiller, K, Riegle Crumb, C, and Muller, C. 2006. “Identifying Social Contexts in Affiliation Networks: Preserving the Duality of People and Events.
Social Networks
28:97-123. * co first authors.
Data source 1 2 video : ID: [email protected]
PW:kenfrank2014:(1:39:25-1:49:35) 69
Copy homact.list from c:\kliqfind/setups to c:\kliqfind
70
Two-mode Data
Edgelist First two rows do not appear in the data – I put them there to show the format: 10 spaces for each entry Actor 1 participates in event 19 at a level of 1 Extent of relation can be binary or weighted New version of KliqueFinder is more flexible About 10 column widths.
ID’s should be 6 digits or less
71
Two mode Clusters output 72
Blocked Two-Mode Blocked Network Data
73
Two-mode Crystallized Sociogram 74
Centralization & Centrality in KliqueFinder • KliqueFinder produces a measure of Warp. • Starts with distances defined by – Maximum value in network / observed value • E.g. maximum is 4 and a particular tie is 1, then distance is 4/1=4.
– These are the distances used in the MDS to produce the sociograms (see “running KliqueFinder ppt”) • Obtains eigen values – within each cluster based on raw data within cluster – Between clusters based on 1/density of ties between clusters • Warp =sum of positive eigen values/sum of all eigen values – Note it does not use the square root of the eigen values • Output into xxxxxx.bcord (9th element) and into netdraw as node attribute for groups, called “centrality” • Centrality for individuals is distance to the center of their
• Density=average value in a given block (variances are more additive)
75
Running on a Large Data File (more than 1000 actors) If you start the program and it just sits there, it is looking for the best seed for the first subgroup. Seed is 3 actors, but it looks for all combinations of 3 that share common ties in network. Intensive, and unnecessary for large data (1 st subgroup does not matter so much). To shortcut: change value from 1 2. save & run.
76
Software Challenge
video : ID: [email protected]
PW:kenfrank2014 :( 2:07:57-2:08:15) • Analyze nonpr1.list
– Evidence of clusters?
– Performance of algorithm?
• Replace lines with nonpr2 • Describe the KliqueFinder algorithm
77
KliqueFinder Applications: Adding Individual Attributes in SAS:
run KliqueFinder data file collt1.list
make graph
use ID from other file? Yes: sas file name: c:\kliqfind\indiv [be sure to include full path] id variable: nominator string variable: gradelev Save
In sas, run socgramz in the working directory
78
KliqueFinder Applications: Adding Individual Attributes:
• Select “Yes” for “User ID (character) from other SAS file?” 79
KliqueFinder Applications: Adding Individual Attributes:
• Type the following information in the corresponding boxes • Then Click “Save”
80
Choosing an ID Variable
81
With ID based on Grade
82
KliqueFinder Applications: Replacing Lines
run KliqueFinder data file collt1.list
make graph
save retrieve socgramz.sas in the working directory replace all occurrences of collt1.list with collt2.list
run
83
Opening socgramz.sas
84
Changing lines
85
Change lines to different source
86
New Lines based on Collt2
87
Batch KliqueFinder
88
Basics
• Program runs KliqueFinder on multiple files • Input – List of filenames – Files containing data – BACK UP YOUR DATA FIRST!
• Output – Clustering output (.place, .clusters, vna) for each list file
89
Files
File containing names of data files: testb.txt
BACK UP YOUR DATA FIRST!
Data file: stanne.list
Data file: ffe.list
90
KliqueFinder
• Browse to directory you want to work in • Choose “ Basic setup ” setup file ” button.
and then click “ Run 91
BACK UP DATA FILES BEFORE RUNNING!
File with names of data files Click here to run as batch
Running Batch Mode
92
Prepping data in excel video : ID: [email protected]
PW:kenfrank2014 :Time: (1:28-1:39) Name your file xxxxxx.list
e.g., test01.list
Right click Choose Formatted text (space delimited) 93
Prepping Data in UCINET
Navigate to UCINET data Navigate to where you want to save: c:\kliqfind 94
Must remove “!” from file. There may be several !’s points are there because of Multiple data sets 95
Converting data using sas
video : ID: [email protected]
PW:kenfrank2014 : : Time: (2:10:43-2:19) data one; infile "badform.list"; input chooser chosen wt; data two; set one; file "ready1.list"; if wt ne . then put (chooser chosen wt) (10.); run; 96
A Priori
Clusters A line with 99999 in the data file indicates in which
a priori
cluster an actor is placed. For example, actor 1 is in a priori cluster 3.
Run repeat2 setup, and then proceed as usual. Remember to do “new data” setup when done. based on
a priori
clusters
Comparison of
A Priori
Clusters and Identified Solution
Run as new data Data with a priori cluster assignments Run as usual then look at cluster output SIMILARITY BETWEEN THE START AND END GROUPS: ACTUAL POSS STANDARDIZED 52. 88. 9.55565
QAP standardized measure, compare with normal distribution
Data Containing Cluster Assignments File called stanne.place [datafile.place] There may be Slightly different numeric formats Depending on the version of KliqueFinder Internal ID User ID Cluster ignore: for simulation only 1.0 1.0 2.0 1.0 3.0
2.0 2.0 2.0 1.0 3.0
3.0 4.0 1.0 1.0 3.0
4.0 19.0 4.0 1.0 3.0
5.0 23.0 4.0 1.0 3.0
6.0 26.0 2.0 1.0 3.0
17.0 6.0 3.0 1.0 3.0
18.0 8.0 3.0 1.0 3.0
19.0 20.0 3.0 1.0 3.0
20.0 15.0 1.0 1.0 3.0
21.0 12.0 2.0 1.0 3.0
22.0 17.0 4.0 1.0 3.0
23.0 16.0 4.0 1.0 3.0
24.0 27.0 4.0 1.0 3.0
-27.0 28.0 4.0 1.0 3.0
If first number (internal ID) is negative, this indicates a tagalong – an actor connected to only one other. In this case, the last line should be read as the tagee, tagger, and group. So, actor 28 is connected to only one other actor (27) and is therefore assigned to actor 27’s cluster, which is cluster 4.
99
SPSS
Including Cluster Membership in Influence Model
DATA LIST / intid 1-10 nominee 11-20 cluster 21-30 simx 31-40 extra 41-50.
BEGIN DATA 1.0 1.0 1.0 1.0 3.0
2.0 2.0 1.0 1.0 3.0
3.0 3.0 1.0 1.0 3.0
4.0 4.0 2.0 1.0 3.0
5.0 5.0 2.0 1.0 3.0
6.0 6.0 2.0 1.0 3.0
END DATA.
DATASET NAME clusters WINDOW=FRONT.
SORT CASES BY nominee(A).
EXECUTE.
SAS data clusters;
*groups from KLiqueFinder; input intid nominator cluster simx extra; cards; 1.0 1.0 1.0 1.0 3.0
2.0 2.0 1.0 1.0 3.0
3.0 3.0 1.0 1.0 3.0
4.0 4.0 2.0 1.0 3.0
5.0 5.0 2.0 1.0 3.0
6.0 6.0 2.0 1.0 3.0
proc sort data=groups;
by nominator; MATCH FILES /FILE=yvar1 /FILE='indeg' /FILE=clusters /BY nominee.
EXECUTE.
data withinfl;
merge yvar2 yvar1 infl expanse cluster attract(rename=(nominee=nominator)); by nominator; drop nominee _type_ _freq_; advanced: run influence model for technology Identify clusters from talkt2 Include cluster membership the influence model 100
Adding Patches
Patch for one -mode Patch for Two-mode 101
• • •
Alternative community detection algorithms
http://cs.stanford.edu/people/jure/pubs/co mmunities-www10.pdf
http://www.uvm.edu/~pdodds/files/papers/ others/2009/lancichinetti2009a.pdf
http://fatweasel.net/analytics/network analysis/community-detection-in-networks/ 102