pp - Computer Science and Engineering

Download Report

Transcript pp - Computer Science and Engineering

11th IEEE International Conference on Peer-to-Peer Computing
Kyoto, Japan, 2011
Inferring Peer Centrality
in Socially-Informed P2P Systems
Nicolas Kourtellis, Adriana Iamnitchi
Department of Computer Science & Engineering
University of South Florida
Tampa, USA
Socially-aware Applications
 Applications collect and use social information:
 Location, collocation, history of interactions, etc.
 Build (implicit/explicit) social network of users
 Use: reduce spam, provide recommendations, etc.
 Wide range of system architectures
 How does the social network of users affect the load
in a P2P architecture?
•
•
•
•
•
PeerSoN
LifeSocial.KOM
Safebook
Prometheus
…
Decentralization of user social data
• MobiClique
• Yarta
• ...
2
Social Graphs & P2P Networks
 Users connected with application-specific edges
 User-contributed peers form a P2P network
 User social graph is partitioned into subgraphs &
stored on peers
Questions:
 How do applications traverse a distributed social graph?
 What does it mean for the P2P routing?
3
Application Example
 Invite user G’s 2-hop hiking contacts to a trip
=> 1-hop={B, C, E} 2-hops={A, D, F, I}
 Social graph traversals => many P2P lookups
 Application performance affected by projection
of social graph on peers
4
Projection Graph
Social
Graph (SG)
Projection
Graph (PG)
P2P Overlay
 How do the properties of the projection graph compare with
the properties of the social graph projected?
5
Projection Graph Model
Social Graph SG = (V,E)
V=set of users, E=set of social edges
Projection Graph PG = (VP , EP )
VP =set of peers, EP =set of P2P edges
PV (i) = set of users mapped on peer Pi , Pi Î VP
(Pi ,Pj ) Î EP iff $ a Î PV (i), $ b Î PV ( j) s.t. (a, b) Î E
{
}
w(Pi ,Pj ) = (a, b) Î E |a Î PV (i), b Î PV ( j)
 Uses:
 Study properties of peers such as centrality
 Study how the social graph topology affects P2P
routing & system performance
6
Outline
 Motivation
 Projection Graph Model
 Social Network Centrality Metrics
 Degree Centrality
 Node Betweenness Centrality
 Edge Betweenness Centrality





Centrality Calculation: Limitations
Experimental Questions
Experimental Methodology
Experimental Results
Impacts on Applications & Systems
7
Degree Centrality
 Number of edges of a node
 High degree centrality peers: Network Hubs
 Can be targeted to directly influence many other
peers with a message broadcast or distribute a
search query
A
B
J
H
G
I
E
C
F
D
N
O
K
M
8
L
Node Betweenness Centrality
 Measures the extent to which a node lies on the
shortest path between two other nodes
 High betweenness centrality peers: Control
communication between distant peers
A
 Can host data caches for reduced latency to locate
data
B
J
H
G
I
E
C
F
D
N
O
K
M
9
Edge Betweenness Centrality
 Measures the extent to which an edge lies on the
shortest path between two nodes
 High betweenness centrality edges: Connect
distant parts of P2P network
 Can be monitored to block malware traffic
A
B
J
H
G
I
E
C
F
D
N
O
K
M
10
L
Calculating Peer Centrality
 Challenging because of:
 Limited access to user data (e.g., privacy settings)
 P2P network scale
 Peer churn
 Through experimental analysis on the social and
projection graph, we investigate how to
circumvent these limitations
11
Experimental Questions
 Can we approximate the centrality of peers using
the centrality scores of their users?
 How does the number of users storing data per
peer affect the centrality scores of their peers?
 Social graph is less dynamic than the P2P network
 Calculate infrequently centrality score of users & use it
to estimate their peer’s centrality
Spoiler Alert!
 [1, ~150] users/peer: Can estimate degree &
betweenness centrality of peers with good
accuracy
 Above 150 users/peer: The projection graph
becomes highly connected => peers do not
differentiate in centrality
12
Experimental Methodology
 Naturally-formed communities offer incentives for resource
sharing  1 community subgraph mapped per peer
 Projection graphs generated from 5 real social graphs


Social Network
Users
Edges
gnutella04
10,876
39,994
gnutella31
62,561
147,878
enron
33,696
180,811
epinions
75,877
405,739
slashdot
82,168
504,230
Communities detected via recursive Louvain algorithm*
Varied average community size: 5,10,20,…,1000 users/peer
 Calculate correlation of centralities of users and their peers
 Compare average centralities of users and their peers
 Identify top centrality peers from their users’ scores
*V. D. Blondel et al, “Fast unfolding of communities in large networks”,
Journal of Statistical Mechanics: Theory and Experiment, vol. 10, 2008.
13
Correlation of Centrality Scores
0.8
0.6
0.4
gnutella04
enron
gnutella31
epinions
slashdot
0.2
0
1
10
100
Users/Peer (a)
Users/Peer
vs.
Degree

1000
1
Edge Betweenness Centrality Correlation
Node Betweenness Centrality Correlation
Degree Centrality Correlation
1
0.8
0.6
0.4
gnutella04
enron
gnutella31
epinions
slashdot
0.2
0
1
10
100
Users/Peer (b)
1000
Users/Peer
vs.
Node Betweenness
[1-150] users/peer:

 Projection graph resembles
closely social graph
 Highest correlation of social &
projection graph metrics
 Degree & node betweenness
estimated from local
information (cumulative scores)
1
0.8
0.6
0.4
gnutella04
enron
gnutella31
epinions
slashdot
0.2
0
1
10
100
1000
Users/Peer (c)
Users/Peer
vs.
Edge Betweenness
After 150 users/peer:
 Projection graph topology
loses social properties
 Highly connected network
 Peers participate equally
in graph traversal
14
Comparison of Centrality Scores
gnutella04_CDCU
gnutella04_DCP
enron_CDCU
enron_DCP
gnutella31_CDCU
gnutella31_DCP
epinions_CDCU
epinions_DCP
slashdot_CDCU
slashdot_DCP
gnutella04_CNBCU
gnutella04_NBCP
enron_CNBCU
enron_NBCP
gnutella31_CNBCU
1
gnutella31_NBCP
epinions_CNBCU
epinions_NBCP
slashdot_CNBCU
slashdot_NBCP
gnutella04_CEBCU
gnutella04_EBCP
enron_CEBCU
enron_EBCP
gnutella31_CEBCU
1
gnutella31_EBCP
epinions_CEBCU
epinions_EBCP
slashdot_CEBCU
slashdot_EBCP
0.01
Degree Centrality
0.1
0.01
0.001
0.0001
Edge Betweenness Centrality
Node Betweenness Centrality
0.001
0.1
0.01
0.001
0.0001
0.0001
1e-05
1e-06
1e-07
1e-08
1e-09
1e-10
1e-05
1
10
100
Users/Peer (a)
Users/Peer
vs.
Degree
1000
1e-05
1
10
100
Users/Peer (b)
1000
Users/Peer
vs.
Node Betweenness
1e-11
1
10
100
1000
Users/Peer (c)
Users/Peer
Vs.
Edge Betweenness
 Increase number of users/peer  turning point in
projection graph
 More connections with other peers
 increase peer degree & betweenness to maximum
 More social edges within peers
 decrease edge betweenness to minimum
15
Finding High Betweenness Peers
 Placing data caches on high betweenness peers
can reduce latency to locate data
 Can we identify such peers, knowing the top
betweenness users or communities?
With Top-N% users
With Top-N% communities
Peer Overlap
1
0.8
0.6
0.4
1%
5%
10%
0.2
0
1
10
1%
5%
10%
100
Users/Peer
(Method 1)
Users/Peer
1000
1
10
100
1000
Users/Peer
(Method 2)
Users/Peer
 Top 5% betweenness centrality users => top betweenness
16
centrality peers with 80–90% accuracy
Summary of Findings
 [1, ~150] users/peer:
 Projection graph resembles closely social graph
 Highest correlation of social & projection graph metrics
 Degree & node betweenness can be estimated from
local information (cumulative scores of users)
 Cannot estimate well edge betweenness
 Above 150 users/peer:
 Projection graph topology loses social properties
 A highly connected projection graph
 No differentiation in peer centrality
 Top betweenness centrality users can pinpoint the top
betweenness centrality peers with good accuracy
 Overall: Applications can calculate infrequently
centrality score of users to estimate peer centrality
 Social graph changes slowly compared to P2P network
17
Impact on Applications & Systems
 Target high degree peers to:
 Decrease search time
 Increase breadth of search and diversity of results
 Target high betweenness peers to:




Monitor information flow and collect traces
Place data caches and indexes of data location
Quarantine malware outbursts
Disseminate software patches
 Tackle P2P churn
 Predict centrality of peers to allocate resources
 Reduce overlay overhead
 Enhance routing tables with P2P edges for faster &
more secure peer discovery
18
Thank you!
This work was supported by NSF Grants:
CNS 0952420 and CNS 0831785
http://www.cse.usf.edu/dsg/
[email protected]
19