Transcript [slides]

Ranking Systems:
Manipulability and Efficiency
Eric Friedman*, ORIE
Cornell University
(Currently visiting: Dept of CS,
U.C. Berkeley, 2005-6)
*Work supported by NSF. ITR-0325453
Ranking and Reputations
• Reputations are important
– Webpage ranking: links are
“recommendations”
• High ranks lead to more “clicks”
– P2P: choosing partners
– Ebay: reputations are crucial (and quite
valuable).
• Higher reputations lead to higher prices
– PGP: web of trust.
– Spam and DDoS protections
Problems with Reputation Systems
• Gaming reputation systems is becoming a
serious problem.
– P2P: seti@home, Kazaa-lite
– Webpage ranking: link spamming
• Note: most (all?) current reputation
systems are ad-hoc
– No formal requirements etc.
A research agenda:
Understanding the tradeoffs between
manipulability and efficiency
1) Quantify the manipulability of ranking
systems.
2) Quantify the efficiency of ranking
systems.
3) Find the ranking systems that are on the
efficient frontier and maximize various
objectives.
Today’s talk
(some first steps)
•
A framework for manipulability (w/Alice Cheng)
– Characterization of manipulability of ranking
systems.
• Empirical analysis of PageRank on the WWW
(w/Alice Cheng)
• Evaluating the Efficiency of ranking
mechanisms (work in progress)
Part I: Goals and Approach
• Our goal: create a formalism for analyzing
and designing reputation systems that are
robust to attacks.
– Here we focus on sybils, but although
this is important in itself, our goals are
much broader.
• Note: the definitions were harder than the
proofs.
• Approach: Game theory, mechanism
design (i.e., Arrows Theorem)
1
Trust Graphs
• Most reputation systems use trust graphs:
– G=(V,E)
– e=(i,j) then T(e) = i’s (direct) trust of j.
– higher T(e) is better
3
1
2
1
2
3
• Reputation function: f(G)i = reputation of i.
• Rank: i outranks j if f(G)i >f(G)j
– Note: we focus on rank
• Why use a trust graph?
– Many (most?) interactions are 1st time interactions
• (i,j)E
Some Representative Reputation
Systems
• Pagerank and related systems (Brin and Page
98, Kleinberg 98, Guha et. al. 04)
– Start at an arbitrary node and then take a
random walk on the graph.
• Flow methods (e.g., Flake et. al. 02, Chuang and
Stoica 02)
– Compute the max flow from i to j.
• Shortest path method.
– Let c(e)=1/T(e) then find the shortest path
from i to j in terms of c’s.
Pagerank = Random Walk on Graph
Maxflow = compute flow from a chosen
source to a node
t
s
Shortest Path
t
s
Sybils
• A single “agent” can replicate itself under a
variety of pseudonyms.
Sybil Attacks
• Sybils are essentially unavoidable
(Douceur 02)
• Sybil clouds can forge trust among each
other.
– Using strong cryptography to prevent them is
expensive and awkward.
Sybils in Practice
• Web ranking: Create a large number of dummy
websites and then all link to each other.
• P2P: create a large number of peers and then
give each other high ratings
• Ebay: fake transactions with yourself.
• Amazon shopping: post high evaluations of your
own products.
Robustness Against Sybils
• Pagerank: not robust.
– Empirically, can increase pageranks
dramatically with a few sybils. (more later)
• Max-flow: value robust but not rank robust.
• Shortest path: robust.
Robustness: Pagerank
• Pagerank: not robust.
Robustness: Pagerank
• Pagerank: not robust.
– Create a “flower”
Robustness: Maxflow
• Max-flow: Designed for value robustness
– Flow into and out of sybil cloud cannot be changed!
Min cut
s
Sybil
Cloud
Robustness: Maxflow
• Max-flow: not rank robust
– b is higher ranked than a
[1]
Min cut
1
a
0.7
0.5
b
[1.2]
Robustness: Maxflow
• Max-flow: not rank robust
– a is higher ranked than b
[1]
1
a
0
0.5
b
[0.5]
Robustness: Shortest Path
• Shortest path: robust
– a is higher ranked than b
[1]
c=1
a
c=1
c=3
b
[2]
Robustness: Shortest Path
• Shortest path: robust
– a is higher ranked than b
– a can harm b, but a is already higher ranked than b
– b cannot hurt a, since it is not on the shortest path to
a
[1]
c=1
a
c=3
c=3
b
[3]
Sybilproofness
•
•
•
Def: A sybil strategy for node i in
G=(V,E) is G’=(V’,E’) and U’V’,
such that by collapsing U’, G is
obtained. (T’s are added together)
Def: f is k-sybilproof if there does
not exist any pair of nodes i,j and a
sybil strategy for i such that f(G)i<
f(G)j and f(G’)r> f(G)j for rU and
|U’|k+1.
Def: f is sybilproof if it is ksybilproof for all k>0.
• Key: sybils can only
forge recommendations
among each other.
Results: Symmetric Reputations
• Def: A reputation function is symmetric if it is
covariant under graph isomorphism.
• Theorem: There is no nontrivial symmetric
sybilproof mechanism.
– In fact, for any G, any node (except the top one) can
improve their ranking via sybils
• Theorem: There is no nontrivial symmetric ksybilproof mechanism, for any k1.
– (How often this occurs for small k is open.)
Proof (via the butterfly)
j
s
i
G
U’
•Sybilproofness: by symmetry, f(G’)j=f(G’)s
•K-sybilproofness: build G’ one sybil at a time
Results: Non-Symmetric
• Theorem: There exist sybilproof reputation
functions. (e.g., shortest path)
• Def: Given a root node sV, let P be the
set of all collections of edge disjoint paths*
from s to i. Let g be a function from paths
to reals and  be an (addition-like)
operator on the reals.
Results: Non-Symmetric
• Let f(G)i=max{P  P } {pP} g(p)
• Max flow: g(p)=min{T(e)|ep}, =+
• Shortest path:g(p)=min{T(e)|ep}, =min
• Other generalizations
– Leaky pipes etc.
Results: Non-Symmetric
• Theorem: f as defined above is value
sybilproof assuming
– If p’ is an extension of p, then g(p’)<g(p).
–  is nondecreasing and g is nondecreasing
with respect to T.
– If p=p’+p’’ then g(p)=g(p’)  g(p’’)
Results: Non-Symmetric
• Theorem: f as defined above is rank
sybilproof iff =max, assuming:
– For any p there exist an extension p’ such that
g(p)=g(p’).
• I.e., f depends on the maximal path.
Summary (Part I)
•
•
•
•
•
A framework for the analysis of the
manipulability of ranking systems.
Key distinction: rank vs. value
Result 1: all symmetric ranking systems are
manipulable.
Result 2: “flow based” ranking systems are not
value manipulable but are rank manipulable.
Result 3: “path based” ranking systems are not
manipulable.
Part II: Empirical Analysis of PageRank
• (Joint with Alice Cheng)
• (Inspired by Zhang et. al. on collusion)
• Stanford web matrix -- ~280k pages.
• Question:How often are a small number
of sybils helpful?
• Answer: Surprisingly often!
Value Magnification: 1 sybil
Value Magnification – by # of sybils
Rank as a function of old Rank -- 1-Sybil
Effect of e on values
e on ranks
Summary of Empirical
• Analytic approximations for these.
• PageRank is quite manipulable
– Especially for low ranked pages
• (but that’s where automated methods are
supposed to work!)
Part III: Quantifying the Efficiency of
Ranking Mechanisms
• Work in progress – some preliminary
results.
• Is FlowRank or PageRank better than
PathRank?
Model
• Random graph model (descriptive, not
constructive)
• Follow the intuition behind pagerank
– Pages link more to “better pages”
– Better pages are more selective.
– Pr(link)=f(qi,qj)
• Increasing in qj
• FOSD in qi
– Average outdegree = k, (n∞)
– (many results have k∞, and miss important
aspects of ranking.)
Finding “Baddies”
• 2 layer example:
– ½ nodes are H and ½ L
– L’s link uniformly at random
– H’s link to H with (relative) probability
(1+a) and to L’s with (1-a).
– a=0, random graph
– a=1, two tiered graph
Statistical Inference
• Now, ranking is a problem of statistical
inference
– G is a random variable
– r is a statistical estimate of true qualities
– Note: unlike most inference problems we
only have a single sample
3 methods
• PageRank
• InRank: rank by indegree
• MLRank: compute a maximum likelihood
estimate.
Results
• Pr(error)=Pr(ri>rj|qi<qj)
• InRank: difference of Poissons
• PageRank: two stage calculation
– First by quality then statistical manipulations of
PageRank equations.
• MLRank: find a subgraph with the maximal number of
edges.
– NP complete
– Implemented a greedy algorithm
Results
PageRank
PageRank
Pr(error)
InRank
InRank
MLRank
MLRank
a
Results
• InRank better than PageRank when graph
is close to random and vice versa.
(General Theorem)
• Differences can be significant!
• MLRank is significantly better.
Some Intuition
• Case a=0 (Sketch -- ignoring special cases)
• PageRank
– rj’s are iid (in limit)
rj
ri  e  (1  e ) jP ( i )
| S ( j) |
• InRank
ri  e  (1  e )  jP ( i ) (1)
• Theorem: PageRank is more random.
• (But, also need to consider expected values)
Concluding Comments
• Reputation systems should be designed from
requirements and subject to formal validation.
– Ex: What problem does pagerank solve? How well
does it do it?
– Ex: Why is Flowrank better than Pathrank? Is it?
When and why?
• Aside: fighting link spam
– Results show that most of the proposed methods can
be defeated!
– Perhaps they work so well because they are not
being used and spammers haven’t tried to defeat
them. Endogeneity is important!
Concluding Comments
• Reputation systems are important
and deserve formal, careful, study!
– Axiomatic analyses.
– Econometric analyses.
• Lots of challenging open problems!