DSybil: Optimal Sybil-Resistance for Recommendation Systems Haifeng Yu National University of Singapore Chenwei Shi National University of Singapore Michael Kaminsky Intel Research Pittsburgh Phillip B.

Download Report

Transcript DSybil: Optimal Sybil-Resistance for Recommendation Systems Haifeng Yu National University of Singapore Chenwei Shi National University of Singapore Michael Kaminsky Intel Research Pittsburgh Phillip B.

DSybil: Optimal Sybil-Resistance
for Recommendation Systems
Haifeng Yu
National University of Singapore
Chenwei Shi
National University of Singapore
Michael Kaminsky
Intel Research Pittsburgh
Phillip B. Gibbons
Intel Research Pittsburgh
Feng Xiao
National University of Singapore
Attacks on Recommendation Systems
 Netflix, Amazon, Razor,
Digg, YouTube, …
 Attacker may cast
misleading votes
 To be more effective
 Bribe other users
 Compromise other users
 Ultimate form: Sybil
attack
Haifeng Yu, National University of Singapore
2
Sybil Attack
honest
automated
sybil attack
for $147
malicious
launch
sybil
attack
 “Post at random intervals to make
it look like real people”
 “Supports multiple random proxies
to make posts look like they came
from visitors across the world”
 “Multithreaded comment blaster
with account rotation”
 …
Haifeng Yu, National University of Singapore
3
Background: Defending Against Sybil Attack
Sybil defense widely considered challenging: >1000 papers
acknowledging sybil attack, most without having a solution
 Tie identities to human beings based on credentials
(e.g., passport)
 Privacy concerns, etc.
 Resource challenges
 Vulnerable to attacks from botnets
 Social-network-based defense
 SybilGuard [SIGCOMM’06], SybilLimit [Oakland’08], SybilInfer
[NDSS’09], SumUp [NSDI’09]
 Better guarantees
Haifeng Yu, National University of Singapore
4
Rec Systems Are More Vulnerable
# sybil identities we can tolerate
(n identities total)
Byzantine
consensus
n/3
DHT
n/4
…
…
Recommendation n/500
systems
 On an avg Digg
object, only 1 out
of every 500
honest users vote
 n/500 sybil
identities are
sufficient to outvote the honest
voters
Haifeng Yu, National University of Singapore
5
Social-network-based Defenses Not
Sufficiently Strong For Rec Systems
 Lower bound on all social-network-based
approaches
 Applicable to SybilGuard, SybilLimit, SybilInfer,
SumUp, etc…
 Compromising a degree-10 node creates 10 sybil
identities
 To create n/500 sybil identities: Compromise 1 node
out of every 5000 honest nodes is sufficient
Haifeng Yu, National University of Singapore
6
Alternative: Leverage History and Trust
 Ancient idea: Adjust “trust” to an identity based
on its historical behavior
 Numerous heuristics proposed -- target a few
fixed attack strategies
 No guarantees beyond the few strategies targeted
 Attacker is intelligent and will adapt  arms race
Haifeng Yu, National University of Singapore
7
Our Results
 DSybil: A novel defense mechanism
 Based on feedback and trust
 Loss (# of bad recommendations) is provably
O( D log M ) even under worst-case attack
D : Dimension of the objects (< 10 in Digg)
M : Max # of sybil identities voting on each obj
 We prove that DSybil’s loss is optimal
 Experimental results (from 1-year Digg trace):
 High-quality recommendation under potential sybil
attack (with optimal strategy) from million-node botnet
Haifeng Yu, National University of Singapore
8
Outline
 Background and our contribution
 Trust-based approaches – The obvious, the
subtle, and the challenge
 Main component of DSybil: DSybil’s
recommendation algorithm
 Experimental results
Haifeng Yu, National University of Singapore
9
Subtle Aspects of Using Trust
1. How to identify “correct” but “non-helpful” votes?
 Vote for a good object that already has many votes -this additional vote is “non-helpful”
 Sybil identities may gain trust for free
 Determine the “contribution amount” by voting order
does not work – see paper
gain
trust for
my vote: this obj is good!
free
sybil
A good object
identity
Haifeng Yu, National University of Singapore
10
Subtle Aspects of Using Trust
2. How to assign initial trust to new identities?
 Positive initial trust for all: Invites whitewashing
 “Trial period of 5 votes” not effective
 Cast 5 “correct” votes and then cheat
3. How exactly to grow trust?
 Multiplicatively? Additively?
4. How exactly to make recommendations?
 Pick obj with most votes? Probabilistically? How
about negative votes?
…..
Haifeng Yu, National University of Singapore
11
The Central Challenge
 Numerous design choices -- fundamental
tension between
 Giving trust to honest identities
 Not giving trust to sybil identities casting “correct”
votes (who may cause damage later)
 Impossible to explore all design alternatives
 Our approach: Directly design an optimal algorithm
 Needs to strike the optimal balance
Haifeng Yu, National University of Singapore
12
DSybil’s Key Insights
voting behavior of
honest users
 Heavy-tail distribution
 Exist very active users
who cast many votes
% of users
casting x votes
Key #1: Leverage typical
all log-scale
# votes cast (on
various objs)
Key #2: If user is already getting “enough help”,
then do not give out more trust
 Enables us to strike an optimal balance
Haifeng Yu, National University of Singapore
13
System Model and Attack Model
 Objects to be recommended are either good or
bad (e.g., Digg)
 DSybil is personalized
 Each user may have different subjective opinions
 Different users may get different recommendations
 From now on, always with respect to a user Alice
 Run by either Alice or a central server
Haifeng Yu, National University of Singapore
14
System Model and Attack Model
2 good objs
2 bad objs
DSybil does not know which are good
 Each round has a pool of objects
 DSybil recommends one object for Alice to consume
 Alice provide feedbacks after consumption
 DSybil adjust trust based on feedback
 See paper for generalizations…
Haifeng Yu, National University of Singapore
15
System Model and Attack Model
2 good objs
E
F
H
2 bad objs
G
H
 Other identities have cast votes
 DSybil only use positive votes
 We prove that using negative votes will not help…
 Each identity cast at most one vote/object
 At most M (e.g. 1010) sybil identities voting on
each object
Haifeng Yu, National University of Singapore
16
DSybil Rec Algorithm: Classifying Objects
2 good objs
E : 0.2
F : 0.2
H: 0.2
total : 0.4
total : 0.2
2 bad objs
G: 0.2
total : 0.2
H: 0.2
total : 0.2
 Reminder: Trust is always with respect to Alice
(how much Alice “trusts” the given identity)
 Each identity starts with initial trust 0.2 -- Fix
later…
 An object is overwhelming if total trust ≥ C
 C = 1.0
Haifeng Yu, National University of Singapore
17
trustRounds
to E: 0.2 without
0.2
Overwhelming Objects
trust to F: 0.2
 0.2objs

2 good
trust to G: 0.2
 0.2

2 bad
objs
E : 0.2
F : 0.2
H: 0.2
total : 0.4
total : 0.2
G: 0.2
total : 0.2
H: 0.2
total : 0.2
1. Recommend: Uniformly random obj
Recommend
objafter
with largest
total trust would
2.
Adjust trust
feedback:
result
linear
logarithmic)
loss…
 Ifinobj
bad,(instead
multiply of
trust
of voters by
0≤<1
 If obj good, multiply trust of voters by  > 1
Additive increase would result in linear
(instead of logarithmic) loss…
Haifeng Yu, National University of Singapore
18
Defining Guides and Dimension
 Guides: Honest users with same/similar “opinion”
with Alice
 Never/seldom votes for bad objects
 Dimension: # of guides needed to “cover” large fraction
(e.g., 60%) of the good objects -- Called critical guides
X
X
Z
Y
W
DSybil does not know who are the guides
(critical
guides)
or what
the =dimension
Dimension
= 2; Critical
guides
{X, Y} or {X,isW}
Haifeng Yu, National University of Singapore
19
Key #1: Leverage Small Dimension
 Dimension is typically small in practice – results
later…
 Small dimension  Will encounter critical guides
frequently when picking random objects
 Trust to critical guides quickly grow to C
 This will result in overwhelming objects…
Haifeng Yu, National University of Singapore
20
trust
to E: 1.0
 1.0
Rounds
with
Overwhelming
trust to
H: 0.2obj
 0.2
1 good
E : 1.0
H: 0.2
total : 1.2
Objects
2 bad objs
G: 0.2
H: 0.2
total : 0.4
F : 1.0
total : 1.0
1. Recommend: Arbitrary overwhelming obj
 Will confiscate sufficient trust if object is bad…
2. Adjust trust after feedback:
 If obj bad, multiply trust of the voters by 0 ≤  < 1
 If obj good, no additional trust given out
Haifeng Yu, National University of Singapore
21
Key #2: Identify Whether Help is Sufficient
 Consumes good overwhelming object = Alice
already has “sufficient help”
 Thus do not give out additional trust
 Prevent sybil identities from getting trust “for free”
 May hurt honest identities (But remember this is
optimal…)
Haifeng Yu, National University of Singapore
22
Omitted Details
 No “free” initial trust given out when Alice is
getting “sufficient help”
 Proof for O( D log M ) loss even under worstcase attack
 Optimality
 Alternative designs/tweaks
 Most will break optimality
Haifeng Yu, National University of Singapore
23
Results on Dimension
 One-year Digg dataset with half-million users
 Pessimistically assuming guides are only 2% of the honest users
-- see paper for other settings…
 To cover 60% of good objs, need only 3 guides
 Robustness:
 Remove the previous 3 guides – 5 guides to cover 60%
 Remove top 100 heaviest voters – 5 guides needed to cover 60%
 See paper for more…
 Relates to heavy-tail distribution of votes cast by
individual users – see paper
 Exist very active users who cast many votes
 Similar heavy-tail distribution observed in 4 other datasets
Haifeng Yu, National University of Singapore
24
Results on Loss (Based on Digg Dataset)
 Attack capacity: Max 10 billion sybil voters on any obj
 In Digg, avg # honest voters on each obj is only ~1,000
 Fraction of bad recommendations (under worst-case
attack): 12%
 Growing defense: 5% if user has used DSybil for a
week before attack starts
 If attack starts at random point, applies to 51/52 = 98%
users
 1-minute computational puzzle per week
 10 billion identities needs a million-node botnet
Haifeng Yu, National University of Singapore
25
Conclusion
 Defending against sybil attacks is challenging
 It is even harder in the context of rec systems
 DSybil: Provable and optimal O( D log M ) loss
 Almost no previous approaches provide provable
guarantees against worst-case attack
 DSybil key insights:
 Leverage small dimension of the voting pattern
 Carefully identify when help is already “sufficient”
Haifeng Yu, National University of Singapore
26
Haifeng Yu, National University of Singapore
27
Which
object to
pick?
Haifeng Yu, National University of Singapore
28
Central Question Answered by This Work
Can trust sufficiently diminish the influence of
sybil identities in recommendation systems?
Aim for provable guarantees under all attack strategies
(including worst-case attack from intelligent attacker)
Short answer: YES!
Haifeng Yu, National University of Singapore
29
Our Results
 DSybil: A novel defense mechanism
 Growing defense: If the user has used DSybil for
some time before the attack starts, loss will be even
smaller
 Experimental results (from one-year trace of Digg):
High-quality recommendation even under potential
sybil attack (with optimal strategy) from a millionnode botnet
Haifeng Yu, National University of Singapore
30