A Graph-Based Approach to Link Prediction in Social Networks Using a Pareto-Optimal Genetic Algorithm Jeff Naruchitparames University of Nevada, Reno - CSE CS 790: Complex Networks, Fall 2010
Download ReportTranscript A Graph-Based Approach to Link Prediction in Social Networks Using a Pareto-Optimal Genetic Algorithm Jeff Naruchitparames University of Nevada, Reno - CSE CS 790: Complex Networks, Fall 2010
A Graph-Based Approach to Link Prediction in Social Networks Using a Pareto-Optimal Genetic Algorithm Jeff Naruchitparames University of Nevada, Reno - CSE CS 790: Complex Networks, Fall 2010 • biological social 2 3 4 very dynamic ‣ Social networks =heterogeneous ‣ Dynamic, judgmental environment ‣ Affect friendships over time 5 6 ‣ 1-2 hop distance only ‣ Friend-of-friend 7 ‣ Multiple hops; >1 ‣ Structural; purely graphbased ‣ No explicit correlation between potential friends... 8 ‣ Silva, et. al., ‣ A Graph-based Recommendation System Using Genetic Algorithms, 2010 9 10 11 Friends-of-Friends 2 hops Filter Order 12 Filtering “It’s more probable that you know a friend of your friend than any other random person” Mitchell M., Complex Systems: Network Thinking, 2006. 13 14 15 Indexes 16 What’s missing? ‣ Heterogeneity ‣ Human behavior and preferences ‣ Multiple hops 17 My approach • Pretty much a filtering problem... 18 My approach ‣ Components (for filtering) ‣ Betweenness centrality ‣ Community detection ‣ Clique Percolation Method (CPM) ‣ Friends of friends ‣ 10-dimensional Pareto-optimal genetic algorithm 19 Betweenness Centrality 20 Community Detection 21 ‣Remove duplicates ‣Remove our test cases ‣ (More on this later...) 22 The Genetic Algorithm Part 23 Pareto Fronts 24 The Features 1. # of shared friends 2. location 3. age range 4. general interest 5. music 6. attended same events 7. groups 8. movies 9. education 10. religion/politics 25 Pareto Optimality ‣ Localized to implementation of selection ‣ Feature subset selection ‣ We want to find the best combination of these subsets that can give us the best solutions for how we determine friendships 26 Pareto Optimality and Feature Subset Selection F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 C1 0 1 0 1 0 0 0 1 1 0 C2 1 1 0 0 0 1 0 1 0 1 . . . Cn 0 0 0 0 1 0 0 1 0 27 0 A Point System F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 U1 - 3 - 11 - - - 20 44 - U2 - 1 - 13 - - - 31 9 - - 49 61 - - . . . Un - 10 - 14 - 28 Pareto Optimality ‣ ‣ Compare with the test cases we removed earlier... For all chromosomes in population, do: ‣ If ALL test cases ≥ optimal Pareto front ‣ Calculate fitness ‣ ‣ Good to go Else ‣ ‣ Calculate fitness Continue onto next chromosome 29 Fitness Function • ∑ ∑ p ln( f n 10 i i=1 j p ) i-1 j=1 30 Continuing on with the Evolutionary Process ‣ Apply fitness proportional selection ‣ Randomly select 2 parents to mate ‣ Apply 1-point crossover (82% chance) ‣ Bit mutation (0.05% chance) ‣ Do this until ALL test cases better than Pareto front OR fitness does not improve for 5 consecutive generations 31 1-Point Crossover 32 Conclusion ‣ Complex network theory + Genetic algorithm + social theory ‣ Betweenness centrality ‣ Community detection ‣ ‣ Binary 10-dimensional Pareto-optimal genetic algorithm ‣ ‣ Clique Percolation Method Dominant, fitness proportional selection Several levels of filtering and selection (aka filtering ☺) 33 Future Work ‣ Better fitness function (need to ask Sociologists) ‣ Weighted chromosome for Pareto optimization (as opposed to binary) ‣ Prove all this stuff actually works (sociology standpoint??) ‣ Parallelize or GPU-ize the code (it’s in Python) 34 35