pp - Computer Science and Engineering
Download
Report
Transcript pp - Computer Science and Engineering
11th IEEE International Conference on Peer-to-Peer Computing
Kyoto, Japan, 2011
Inferring Peer Centrality
in Socially-Informed P2P Systems
Nicolas Kourtellis, Adriana Iamnitchi
Department of Computer Science & Engineering
University of South Florida
Tampa, USA
Socially-aware Applications
Applications collect and use social information:
Location, collocation, history of interactions, etc.
Build (implicit/explicit) social network of users
Use: reduce spam, provide recommendations, etc.
Wide range of system architectures
How does the social network of users affect the load
in a P2P architecture?
•
•
•
•
•
PeerSoN
LifeSocial.KOM
Safebook
Prometheus
…
Decentralization of user social data
• MobiClique
• Yarta
• ...
2
Social Graphs & P2P Networks
Users connected with application-specific edges
User-contributed peers form a P2P network
User social graph is partitioned into subgraphs &
stored on peers
Questions:
How do applications traverse a distributed social graph?
What does it mean for the P2P routing?
3
Application Example
Invite user G’s 2-hop hiking contacts to a trip
=> 1-hop={B, C, E} 2-hops={A, D, F, I}
Social graph traversals => many P2P lookups
Application performance affected by projection
of social graph on peers
4
Projection Graph
Social
Graph (SG)
Projection
Graph (PG)
P2P Overlay
How do the properties of the projection graph compare with
the properties of the social graph projected?
5
Projection Graph Model
Social Graph SG = (V,E)
V=set of users, E=set of social edges
Projection Graph PG = (VP , EP )
VP =set of peers, EP =set of P2P edges
PV (i) = set of users mapped on peer Pi , Pi Î VP
(Pi ,Pj ) Î EP iff $ a Î PV (i), $ b Î PV ( j) s.t. (a, b) Î E
{
}
w(Pi ,Pj ) = (a, b) Î E |a Î PV (i), b Î PV ( j)
Uses:
Study properties of peers such as centrality
Study how the social graph topology affects P2P
routing & system performance
6
Outline
Motivation
Projection Graph Model
Social Network Centrality Metrics
Degree Centrality
Node Betweenness Centrality
Edge Betweenness Centrality
Centrality Calculation: Limitations
Experimental Questions
Experimental Methodology
Experimental Results
Impacts on Applications & Systems
7
Degree Centrality
Number of edges of a node
High degree centrality peers: Network Hubs
Can be targeted to directly influence many other
peers with a message broadcast or distribute a
search query
A
B
J
H
G
I
E
C
F
D
N
O
K
M
8
L
Node Betweenness Centrality
Measures the extent to which a node lies on the
shortest path between two other nodes
High betweenness centrality peers: Control
communication between distant peers
A
Can host data caches for reduced latency to locate
data
B
J
H
G
I
E
C
F
D
N
O
K
M
9
Edge Betweenness Centrality
Measures the extent to which an edge lies on the
shortest path between two nodes
High betweenness centrality edges: Connect
distant parts of P2P network
Can be monitored to block malware traffic
A
B
J
H
G
I
E
C
F
D
N
O
K
M
10
L
Calculating Peer Centrality
Challenging because of:
Limited access to user data (e.g., privacy settings)
P2P network scale
Peer churn
Through experimental analysis on the social and
projection graph, we investigate how to
circumvent these limitations
11
Experimental Questions
Can we approximate the centrality of peers using
the centrality scores of their users?
How does the number of users storing data per
peer affect the centrality scores of their peers?
Social graph is less dynamic than the P2P network
Calculate infrequently centrality score of users & use it
to estimate their peer’s centrality
Spoiler Alert!
[1, ~150] users/peer: Can estimate degree &
betweenness centrality of peers with good
accuracy
Above 150 users/peer: The projection graph
becomes highly connected => peers do not
differentiate in centrality
12
Experimental Methodology
Naturally-formed communities offer incentives for resource
sharing 1 community subgraph mapped per peer
Projection graphs generated from 5 real social graphs
Social Network
Users
Edges
gnutella04
10,876
39,994
gnutella31
62,561
147,878
enron
33,696
180,811
epinions
75,877
405,739
slashdot
82,168
504,230
Communities detected via recursive Louvain algorithm*
Varied average community size: 5,10,20,…,1000 users/peer
Calculate correlation of centralities of users and their peers
Compare average centralities of users and their peers
Identify top centrality peers from their users’ scores
*V. D. Blondel et al, “Fast unfolding of communities in large networks”,
Journal of Statistical Mechanics: Theory and Experiment, vol. 10, 2008.
13
Correlation of Centrality Scores
0.8
0.6
0.4
gnutella04
enron
gnutella31
epinions
slashdot
0.2
0
1
10
100
Users/Peer (a)
Users/Peer
vs.
Degree
1000
1
Edge Betweenness Centrality Correlation
Node Betweenness Centrality Correlation
Degree Centrality Correlation
1
0.8
0.6
0.4
gnutella04
enron
gnutella31
epinions
slashdot
0.2
0
1
10
100
Users/Peer (b)
1000
Users/Peer
vs.
Node Betweenness
[1-150] users/peer:
Projection graph resembles
closely social graph
Highest correlation of social &
projection graph metrics
Degree & node betweenness
estimated from local
information (cumulative scores)
1
0.8
0.6
0.4
gnutella04
enron
gnutella31
epinions
slashdot
0.2
0
1
10
100
1000
Users/Peer (c)
Users/Peer
vs.
Edge Betweenness
After 150 users/peer:
Projection graph topology
loses social properties
Highly connected network
Peers participate equally
in graph traversal
14
Comparison of Centrality Scores
gnutella04_CDCU
gnutella04_DCP
enron_CDCU
enron_DCP
gnutella31_CDCU
gnutella31_DCP
epinions_CDCU
epinions_DCP
slashdot_CDCU
slashdot_DCP
gnutella04_CNBCU
gnutella04_NBCP
enron_CNBCU
enron_NBCP
gnutella31_CNBCU
1
gnutella31_NBCP
epinions_CNBCU
epinions_NBCP
slashdot_CNBCU
slashdot_NBCP
gnutella04_CEBCU
gnutella04_EBCP
enron_CEBCU
enron_EBCP
gnutella31_CEBCU
1
gnutella31_EBCP
epinions_CEBCU
epinions_EBCP
slashdot_CEBCU
slashdot_EBCP
0.01
Degree Centrality
0.1
0.01
0.001
0.0001
Edge Betweenness Centrality
Node Betweenness Centrality
0.001
0.1
0.01
0.001
0.0001
0.0001
1e-05
1e-06
1e-07
1e-08
1e-09
1e-10
1e-05
1
10
100
Users/Peer (a)
Users/Peer
vs.
Degree
1000
1e-05
1
10
100
Users/Peer (b)
1000
Users/Peer
vs.
Node Betweenness
1e-11
1
10
100
1000
Users/Peer (c)
Users/Peer
Vs.
Edge Betweenness
Increase number of users/peer turning point in
projection graph
More connections with other peers
increase peer degree & betweenness to maximum
More social edges within peers
decrease edge betweenness to minimum
15
Finding High Betweenness Peers
Placing data caches on high betweenness peers
can reduce latency to locate data
Can we identify such peers, knowing the top
betweenness users or communities?
With Top-N% users
With Top-N% communities
Peer Overlap
1
0.8
0.6
0.4
1%
5%
10%
0.2
0
1
10
1%
5%
10%
100
Users/Peer
(Method 1)
Users/Peer
1000
1
10
100
1000
Users/Peer
(Method 2)
Users/Peer
Top 5% betweenness centrality users => top betweenness
16
centrality peers with 80–90% accuracy
Summary of Findings
[1, ~150] users/peer:
Projection graph resembles closely social graph
Highest correlation of social & projection graph metrics
Degree & node betweenness can be estimated from
local information (cumulative scores of users)
Cannot estimate well edge betweenness
Above 150 users/peer:
Projection graph topology loses social properties
A highly connected projection graph
No differentiation in peer centrality
Top betweenness centrality users can pinpoint the top
betweenness centrality peers with good accuracy
Overall: Applications can calculate infrequently
centrality score of users to estimate peer centrality
Social graph changes slowly compared to P2P network
17
Impact on Applications & Systems
Target high degree peers to:
Decrease search time
Increase breadth of search and diversity of results
Target high betweenness peers to:
Monitor information flow and collect traces
Place data caches and indexes of data location
Quarantine malware outbursts
Disseminate software patches
Tackle P2P churn
Predict centrality of peers to allocate resources
Reduce overlay overhead
Enhance routing tables with P2P edges for faster &
more secure peer discovery
18
Thank you!
This work was supported by NSF Grants:
CNS 0952420 and CNS 0831785
http://www.cse.usf.edu/dsg/
[email protected]
19