Transcript PPT
Community-based Greedy
Algorithm for Mining Top-K
Influential Nodes in Mobile
Social Networks
Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1
1 Peking University, China
2 Nanyang Technological University, Singapore
Problem and Background
Problem: Given a mobile social network, we aim to mine a
set of top-K influential nodes S such that R(S) is maximized
using the extended Independent Cascade information
diffusion model.
•
A mobile social network plays an essential role as the spread of
information and influence in the form of "word-of-mouth“
The problem is NP-hard.
•
•
computationally expensive to run the greedy algorithm on a large
network.
The previous greedy algorithms take days to finish on 723k nodes
Basic Idea of the Algorithm
Construct
Network from
CDR (call
detailed
record)
Community
Detection: it
based on
diffusion
Model on MSN
Dynamic
programming
Algorithm &
greedy
algorithm
on selected
communities
Step1: Extracting Mobile Social Network
Extract a Mobile Social Network from CDR data and
model it as a directed weighted graph
A phone user -- a
node
A directed edge u v
is established, if there
exits communication
from u to v
communication time -the weight of the edge
1
8
2
6
4
2
4
10
5
3
Extended Independent Cascade Model
Two states of nodes
Active
& inactive
Diffusion speed λ
When
an active node vi contacts an inactive
node vj , the inactive node becomes active at
a probability (rate) λij.
Extended Independent Cascade Model
inactive
active
1
8
2
6
4
inactive
10
1
2
4
5
3
inactive
inactive
active
8
2
6
4
active
2
4
10
5
3
inactive
inactive
active
1
8
2
9
4
active
2
4
10
5
3
active
Step2: Influential Model Based Community
Detection Algorithm
Community Partition
Each node is assigned a unique community label from 1 to N
For each node compute the set of its influenced neighbors using
Independent Cascade diffusion model
Iteratively propagate the labels through the network in finite
iterations
for each node v ,the label of the community that the majority
of its influenced neighbors belong to the label of v
Community Combination
the difference between the node’s influence degree in its
community and its influence degree in the network is smaller
than a threshold.
Step3: Community-Based Greedy Algorithm
Choose communities to find the Top-1 influential node
C2
C1
ΔR2=0.3
ΔR1=0.2
ΔR3=0.1
C3
R[1,1]=max{R[0,1], R[3,0]+ΔR1}=0.2
s[1,1]=C1;
R[2,1]=max{R[1,1], R[3,0]+ ΔR2}=0.3
s[2,1]=C2;
R[3,1]=max{R[2,1], R[3,0]+ ΔR3}=0.3
s[3,1]=C2;
So we mine top-1 node in C2
Community-Based Greedy Algorithm
Choose communities to find the Top-2 influential node
C2
C1
ΔR2=0.06
ΔR1=0.2
Note ΔR2 is 0.06, but not 0.3.
ΔR3=0.1
C3
R[1,2]= max{R[0,2], R[3,1]+ΔR1}=0.5
s[1,2]=C1;
R[2,2]= max{R[1,2], R[3,1]+ΔR2}=0.5
s[2,2]=C1;
R[3,2]= max{R[2,2], R[3,1]+ΔR3}=0.5
s[3,2]=C1;
We mine the second node in C1
Experiments
Data Sets
Extract
a Mobile Social Network from a three-month
CDR (call detailed record) data of a city from China
Mobile
Node number: 723,201
Average degree: 13.4
Community distribution
largest
community size: 95,690
Experiments
Top-k Nodes Mining Methods
MixedGreedy
Algorithm
NewGreedy Algorithm
DegreeDiscount
Random Method
CGA
SPCGA
Parameter study:
k,
diffusion speed λ, data size
Results
Influence degree and time vs K
Results
Influence degree and time vs diffusion speed λ
Results
Influence degree and time vs network size
Summary
Handle large-scale networks (power-law
distribution degree)
improve the efficiency of existing algorithms by
an order of magnitude while the loss in
approximation precision is small
Can combine with any existing algorithm to find
influential nodes w.r.t. communities
Related work on Top-K Algorithm
Typical Greedy Algorithm( Kempel et al. KDD2003)
CELF Greedy Algorithm (Leskovec et al. KDD2007)
An improved greedy algorithm (Kimura et al.
AAAI2007)
NewGreedy Algorithm, MixedGreedy,
DegreeDiscount Algorithm (Chen et al. KDD2009)
MIA algorithm (Chen et al. KDD2010)
--None of them considers community property