Top-N Recommendation Algorithm Based on Item-Graph Allen, Zhenjiang LIN CSE, CUHK

Download Report

Transcript Top-N Recommendation Algorithm Based on Item-Graph Allen, Zhenjiang LIN CSE, CUHK

Top-N Recommendation Algorithm
Based on Item-Graph
Allen, Zhenjiang LIN
CSE, CUHK
June 7, 2007
1
Outline

1. Top-N Recommendation Problem

2. Top-N Recommendation Algorithm

3. Item-Graph Model and GCP-based Method

Item-Graph Model

Generalized Conditional Probability(GCP)-based
Recommendation Algorithm

4. Preliminary Experimental Results

5. Conclusion and Future Work
2
1. Top-N Recommendation Problem

The Top-N Recommendation Problem

UserItem
matrix
Given the preference information of users, recommend a set of N
items to a certain user that he might be interested in, based on the
items he has selected.
 E-commerce system example: Amazon. COM,
customers vs products.
Item 1
Item 2
Item 3
…
Item m
User 1
1
0
1
0
User 2
1
1
0
0
User n
0
1
0
1
New User
1
?
1
…
Active
User
?
?
Basket
3
Example: the Amazon.com
Active User
Basket
Recommend
ations
4
1. Top-N Recommendation Problem

Challenges in E-commerce Systems





Huge amounts of data: millions of users and/or items;
Real-time return the results set;
Limited new user’s preference information;
Volatile users’ preference information.
Contributions

Propose the Item-Graph model.



simple & incremental
to reflect the relationship among items
Develop the Generalized Conditional Probability-based top-N
recommendation algorithm.


item-centric
based-on the Item-Graph model
5
2. Top-N Recommendation Algorithm

Two main paradigms

Content-based: recommend items based on the content
(textual information) of items.


Fab system [Balabanovic97], Syskill & Webert system [Pazzani97].
Collaborative Filtering (CF): recommend items by
collecting taste information from other users.



Collaborative between users (link information).
More popular than content-based recommendation, since in many
domains (such as music, restaurants) it is hard to extract useful
features from items.
Tapestry system [Goldberg92], Video Recommender [Hill95],
Ringo [Shardanand95], GroupLens [Konstan97],
Jester system [Goldberg01], Amazon [Linden03].
6
2. Top-N Recommendation Algorithm

CF algorithms classified by strategy of using data


Memory-based: make recommendations based on the entire
collection of references of the users.

No pre-computing is needed, suffer serious scalability problem.

E.g., Correlation-based [Resnick94], Cosine-based [Breese98].
Model-based: use the collection of user preferences to learn
a model, which is then used to make recommendations.


Building a model off-line, more scalable.
E.g., Cluster models [Ungar98], Bayesian network model [Breese98],
Association Rule Mining approach [Lin00].
7
2. Top-N Recommendation Algorithm

CF algorithms classified by strategy of using objects

User-centric: look for similar (like-minded) users first and
then make recommendation.



Item-centric: look for similar (or related) items first and
then make recommendation.




Similarity between users is relatively dynamic.
Pre-computing user neighborhood may lead to poor predictions.
Similarity between items is relatively static.
Enables pre-computing of item-item similarity.
Therefore, more scalable.
The aim of our work

Model-based Item-centric CF top-N recommendation algorithm.
8
2. Top-N Recommendation Algorithm

Notations






Item set I = {I1, I2, …, Im}.
User set U = {U1, U2, …, Un}.
User-Item matrix D = (Dn,m).
Basket of the active user B  I.
Similarity score of x and y: sim(x,y).
Formal definition of top-N recommendation problem

Given a user-item matrix D and a set of items B that have
been purchased by the active user, identify an ordered set
of items X such that |X| ≤ N, and X ∩B = 0.
9
2. Top-N Recommendation Algorithm

Two classical item-item similarity measures

Cosine-based (symmetric)
sim(Ii, Ij) = cos(D*,i, D*,j)

(1)
Conditional Probability(CP)-based (asymmetric)
sim(Ii, Ij) = P(Ij | Ii) ≈ Freq(Ii Ij) / Freq(Ii)
(2)
Freq(X): the number of customers who have purchased the
item set X.

The ranking score for item x
RS(x) = ∑
b∈B
sim(b,x)
(3)
10
3. Item-Graph Model & GCP-based Method

Intuitions behind the Item-Graph



The similarity between two items is proportional to the times of
co-purchase of them.
The similarity of item-pairs is transmissible.
E.g.,
1
2
a

b
c
Definition of the Item-Graph

Given a dataset D = (Dn,m), the Item-Graph is defined by a
weighted & undirected graph G(V, E, W), where



V is the item set I.
An edge (x, y)∈E if and only if items x and y have been copurchased.
The weight of edge (x, y) is defined by the number of copurchase of items x and y.
11
3. Item-Graph Model & GCP-based Method

Updating the Item-Graph is easy


Adding new user’s preference information T into the graph needs
O(|T|2) operations, including adding edges and/or increasing
weight of edges.
E.g.,
(a,b,c)
a
2
b
1
c
a
3
b
2
c
1

Potentially direct application of the Item-Graph



Clustering the items.
Measuring item-item similarity.
Measuring importance of items.
12
3. Item-Graph Model & GCP-based Method

Ideas in Generalized Conditional Probability-based method

According to the definition of top-N recommendation problem,
for any x in I-B, we just need to compute the “basket-based”
conditional probability P(x|B) = Freq(xB) / Freq(B). However,





Freq(xB) or Freq(B) may not exist, or
Freq(xB) or Freq(B) are too small to make much sense.
The CP-based method considers the sum of “1-item”-based
conditional probabilities P(x|y) instead, where x∈I-B, y∈B.
However, the “multi-item”-based conditional probabilities may
also contribute to the recommendation.
E.g., suppose the ranking scores of x and y computed by the
CP-based method are equal, and we also know P(x|B)>P(y|B).
Which one should be ranked higher, x or y?
13
3. Item-Graph Model & GCP-based Method

The Generalized Conditional Probability (GCP)-based
recommendation algorithm

The ranking score of item x is defined by the sum of all possible
“multi-item”-based conditional probabilities, that is,
GCP(x|B) = ∑
SB
P(x|S) ≈ ∑
SB
(Freq(xS) / Freq(S)). (4)

However, the number of subsets of B is 2|B|.

Use GCPd(x|B) instead (set d=2 in the following experiments)
GCPd(x|B) = ∑

S  B, |S|≤ d
P(x|S).
(5)
Freq(xS) and Freq(S) can be extracted from the Item-Graph
approximately.
14
3. Item-Graph Model & GCP-based Method

Extracting Freq(A) from Item-Graph approximately

For an item set A, obtaining the exact Freq(A) may not be
possible from the Item-Graph.

Extracting approximate Freq(A) from the Item-Graph instead.

Find out the complete sub-graph of A (denoted by CSG(A)) in the
Item-Graph, running time O(|A|2).

Freq(A) ≈ minimal weight of edges in CSG(A).

E.g.,

for A = {a,b}, Freq(A) ≈ 3.

for B = {a,b,c}, Freq(B) ≈ 1.

P(c|ab) ≈ Freq(abc) / Freq(ab) ≈ 1 / 3.
a
3
b
2
c
1
15
4. Preliminary Experimental Results

Dataset

The MovieLens (http://www.grouplens.org/data)




A web-based movies recommender system;
Contains multi-valued ratings that indicate how much each user
liked a particular movie or not;
Each user has rated at least 20 movies.
We treat the ratings as an indication that the users have seen the
movies (nonzero) or not (zero).
Table 1: The characteristics of the MovieLens dataset
# of Users # of Items Density1 Average Basket Size
943
1Density:
1682
6.31%
106.04
the percentage of nonzero entries in the user-item matrix.
16
4. Preliminary Experimental Results-1

Evaluation Design

Split the dataset into a training and test set by




randomly selecting one rated movie of each user to be part of the test set,
use the remaining rated movies for training.
Cosine(COS)-based, CP-based, GCP-based methods, 10-runs average.
Evaluation Metrics

Hit-Rate (HR)

HR = # of hits / n
Average Reciprocal Hit-Rate (ARHR)
ARHR = (∑i=1,h1/pi) / n
(6)
(7)
# of hits: the number of items in the test set that were also in the top-N lists.
h is the number of hits that occurred at positions p1, p2, … , ph within the
top-N lists (i.e., 1 ≤ pi ≤ N).
17
4. Preliminary Experimental Results-1

Performance of Top-N Recommendation Algorithms
HR
(left): x-axis: top-N items, y-axis: hit-rate of all users.
ARHR (right): x-axis: top-N items, y-axis: average reciprocal hit-rate of all users.
(For the GCP-based method, set d = 2.)
18
4. Preliminary Experimental Results-2

Testing the Parameter d in GCP Method


Testing the effect of d ( d = 1, 2, 3 ).
Evaluation: Online Shopping Simulation

Randomly selecting part of the user records to be the training set;

Use the remaining user records for training.

STEP 0: Constructing the item-graph based on the training set;

STEP 1: for each user in the training set


randomly moving one item out of the user’s basket and make
recommendation based on the remaining items in the basket;

computing the order of this item in the recommendation list;

updating the item-graph.
STEP 2: Computing HR and ARHR metrics.
19
4. Preliminary Experimental Results-2

Performance of Top-N Recommendation Algorithms
HR
(left): x-axis: top-N items, y-axis: hit-rate of all users.
ARHR (right): x-axis: top-N items, y-axis: average reciprocal hit-rate of all users.
20
5. Conclusion and Future Work

Conclusion

Top-N Recommendation Problem and item-centric Algorithms


Item-Graph model



Visualizing the relationship among items.
Easy to update.
Generalized Conditional Probability-based top-N recommendation
algorithm


Cosine-based, conditional probability-based
Item-centric & based on the Item-Graph model
Future Work


Clustering items and measuring item-item similarities based on the ItemGraph model
Speeding up the GCP method.
21
References

[Balabanovic97] M. Balabanovic and Y. Shoham. Fab: Content-based, Collaborative
Recommendation. Commun. ACM, 40(3):66-72, 1997.

[Breese98] J. S. Breese, D. Heckerman, David and C. Kadie. Empirical Analysis of
Predictive Algorithms for Collaborative Filtering. In Proceedings of the 14th
Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 43-52, San
Francisco, 1998.

[Deshpande04] M. Deshpande and G. Karypis. Item-based Top-N Recommendation
Algorithms. ACM Trans. Inf. Syst., 22(1):143-177, 2004.

[Lin00] W. Lin. Association Rule Mining for Collaborative Recommender Systems.
Thesis submitted for the Degree of M.S. in Computer Science.

[Linden03] G. Linden, B. Smith and J. York. Amazon.com Recommendations: Itemto-Item Collaborative Filtering. IEEE Internet Computing, 7(1):76-80, 2003.

[Resnick94] P. Resnick, N. Iacovou, M. Suchak, P. Bergstorm and J. Riedl.
GroupLens: An Open Architecture for Collaborative Filtering of Netnews. Proc.
Computer Supported Cooperative Work Conf., pages 175-186, 1994.
22