*You might also like: ** Privacy risks of collaborative filtering

Download Report

Transcript *You might also like: ** Privacy risks of collaborative filtering

Privacy risks of collaborative filtering
Yuval Madar, June 2012
Based on a paper by J.A. Calandrino, A. Kilzer, A. Narayanan, E. W.
Felten & V. Shmatikov


Help suggesting users items and other users
to their liking by deducing them from
previous purchases.
Numerous examples
◦ Amazon, iTunes, CNN, Last.fm, Pandora, Netflix,
Youtube, Hunch, Hulu, LibraryThing, IMDb and
many others.



User to item – “You might also like…”
Item to item – Similar items list
User to user – Another customer with
common interests

Content Based filtering
◦ Based on A-priori similarity between items, and
recommendations are derived from a user’s own history.
◦ Doesn’t pose a privacy threat.

Collaborative filtering
◦ Based on correlations between other uses purchases.
◦ Our attacks will target this type of systems.

Hybrid
◦ A system employing both filtering techniques



The data the recommendation system uses is
modeled as a matrix, where the rows correspond
to users, and columns correspond to items.
Some auxiliary information on the target user is
available (A subset of a target user’s transaction
history)
An attack is successful, if it allows the attacker to
learn transactions not part of its auxiliary
information.






User public rating and comments on products
Shared transactions (Via facebook, or other
mediums)
Discussions in 3rd party sites
Favorite books in facebook profile
Non-online interactions (With friends,
neighbors, coworkers, etc.)
Other sources…

Input:

Observe the related items list of A, until an item in T appears, or
moves up.

If a target item appears in enough related items lists in the same
time, the attacker may infer it was bought by the target user.

Note 1 – Scoring may be far more complex, since different items in A are
correlated. (Books which belong to a single series, bundle discounts, etc.)

Note 2 – It is preferable that A consist of obscure and uncommon items, to
improve the effect of the target user’s choices on its related items lists.
◦ a set of target items T and a set of auxiliary items A




In some sites, the covariance matrix, describing
the correlation between items in the site, is
exposed to the users. (Hunch is one such
website)
Similarly, the attacker is required to watch for
improvement in the correlation between the
auxiliary items and the target items.
Note 1 – Asynchronous updates to different matrix cells.
Note2 – inference probability improves if the auxiliary items are user-unique.
(No other user bought all auxiliary items) More likely if some of them are
unpopular, or if there are enough of them.

System model

Active Attack

Create k dummy users, each buying all known auxiliary items.

◦ For each user, the system finds the k users most similar to it, and ranks items
purchased by them by total number of sales.
With high probability, the k dummy users and the target user will be
clustered together. (Given auxiliary items list of size logarithmic in the
total number of users. In practice, 8 items were found to be enough for
most sites)

In that case, the recommendations to the dummy users will consist of
transactions of the target user previously unknown to the attacker.

Note – The attack is more feasible in a system where user interactions with items does
not involve spending money.

The main parameters for evaluation of an
inference attack are:
◦ Yield – How many inferences are produced.
◦ Accuracy – How likely is each inference.

Yield-accuracy tradeoff - stricter accuracy
algorithms reject less probable inferences.

The paper further discusses specific attacks
performed against:
◦
◦
◦
◦

Hunch
LibraryThing
Last.fm
Amazon
And measures the accuracy and yield of these
attacks, arriving in some instances to
impressive tradeoff figures. (Such as 70%
accuracy for 100% yield in Hunch)



Not discussed in the paper.
Achieved in other papers for static
recommendation databases.
Remains an open problem for dynamic
systems. (Which all real world examples are)


Limited-length related items list – The first
elements of such lists have low sensitivity to
single purchases.
Factoring item popularity into update
frequency – less popular items are more
sensitive to single purchases. Batching their
purchases together will decrease the
information leak.


Limit data access rate – Preventing largescale privacy attacks, though lowering utility
and may be circumvented using a botnet.
User opt-out – A privacy conscious user may
decide to opt-out of recommender systems
entirely. (At clear cost of utility)



A passive attack on recommender systems
using auxiliary information on a certain user’s
purchases, allowing the attacker to infer
undisclosed private transactions.
Increased user awareness is required
Suggested several methods to decrease the
information leaked by these systems
Questions?