LCARS: A Location-Content-Aware Recommender System

Download Report

Transcript LCARS: A Location-Content-Aware Recommender System

LCARS: A Location-Content-Aware
Recommender System
Hongzhi Yin† , Yizhou Sun‡, Bin Cui†
Zhiting Hu†, Ling Chen
Peking University ‡Northeastern University
University of Technology, Sydney
†
1
Outline
■
Introduction
 Background
 Challenges
■
Our Solution – LCARS
 Offline Modeling - LCA-LDA
 Online Recommendation – TA algorithm
■
Experiments
 Experimental Setup
 Experimental Results
■
Conclusions
2
Outline
■
Introduction
 Background
 Challenges
■
Our Solution – LCARS
 Offline Modeling - LCA-LDA
 Online Recommendation – TA algorithm
■
Experiments
 Experimental Setup
 Experimental Results
■
Conclusions
3
Background
■
Location-based Social Networks
Loopt


Foursquare
Facebook Places
Users share photos, comments or check-ins at a location
Expanded rapidly, e.g., Foursquare gets over 3 million check-ins
every day
4
Background
5
Background
■
Event-based Social Networks (E.g. Meetup.com)
6
Outline
■
Introduction
 Background
 Challenges
■
Our Solution – LCARS
 Offline Modeling - LCA-LDA
 Online Recommendation – TA algorithm
■
Experiments
 Experimental Setup
 Experimental Results
■
Conclusions
7
Problem Definition
■
We aim to mine useful knowledge from the user activity
history data in LBSNs and EBSNs to answer two typical
questions in our daily life
 If we want to visit venues in a city such as Beijing, where
should we go?
 If we want to attend local events such as dramas and
exhibitions in a city, which events should we attend?
 For simplicity, we propose the notion of spatial items to denote
both venues and events in a unified way, so that we can
define our problem as follows: given a querying user 𝑢 with a
querying city 𝑙𝑢 , find k interesting spatial items within 𝑙𝑢 , that
match the preference of 𝑢.
8
Challenge(1/5)
■
■
Spatial Item Recommendations in LBSN and EBSN
Existing Solutions
 Based on item/user collaborative filtering
 Similar users gives the similar ratings to similar items
 Latent Factor models
Similar
Users
users
Visit some
spatial items
User
activity
histories
So, what is the
PROBLEM here?
Build
recommendation
models
Similar
Items
based on the model of
co-rating and co-visit
Recommendation
user + querying
city
Why?
Mao Ye, Peifeng Yin, Wang-Chien Lee: “Location recommendation for location-based social networks.” GIS2010
Justin J. Levandoski, Mohamed Sarwat, Ahmed Eldawy, and Mohamed F. Mokbel: “LARS: A Location-Aware Recommender System.” ICDE2012
9
Challenge(2/5)
■
User-item rating/visiting matrix
Millions of locations around the world
Los Angeles
V1
User
U0
…
Ui
Uj
…
Un
V2
V3
New York City
…
…
…
Vm-2 Vm-1 Vm
A user visits ~100
spatial items
User activity
histories are locally
clustered
Recommendation
queries target an
area (very specific
subset)
Noulas, S. Scellato, C Mascolo and M Pontil “An Empirical Study of Geographic User Activity Patterns in Foursquare ” (ICWSM 2011)
.
10
Challenge(3/5)
■
User’s activities are very limited in distant locations
 May NOT get any recommendations in some areas
 Things can get worse in NEW Areas (small cities and abroad)
(Where you need recommendations the most)
11
Challenge(4/5)
User activity histories are locally clustered
New York City
Los Angeles
V2
V1
V5
V4
V3
V6
Gap
U1
U2
U3 U4
U5
U6
U7 U8
New City Problem: When U3 travels to New York City that is new to him since he
has no activity history there, how can we recommend spatial items to her? In other
words, how to link the users in one side to the items in the other side?
Both User-based and Item-based CF methods would fail in this scenario.
12
Challenge(5/5)
■
Existing Latent factor models also fail to alleviate the new
city problem. When we use these existing topic models to
analyze user activity history data, spatial items in the
discovered topics are clustered by their locations
 so, the topics describe the user’s spatial area of activity
rather than users' interest related features (e.g, categories and
genres of spatial items ) such as concert, film and exhibition.
Table 1: Topics discovered by LDA in an event-based social network
13
Outline
■
Introduction
 Background
 Challenges
■
Our Solution – LCARS
 Offline Modeling - LCA-LDA
 Online Recommendation – TA algorithm
■
Experiments
 Experimental Setup
 Experimental Results
■
Conclusions
14
Framework of LCARS
15
Our Main Ideas (1/3)
For spatial item recommendation, we need to consider (1) the querying user’s interest;
(2) the local preference of the querying city, i.e., the local word-of-mouth opinion for
a spatial item in the querying city.
1. User Personal
Interests/Preferences
Recommender
System
2. Local Preference
16
Our Main Ideas (2/3)
User Personal
Interests/Preferences
Main idea #1:
Identify user interest using
semantic information from
the user activity history
Main idea #3:
Combine user interest &
local preference for
recommendation in a
unified way
Local Preference in a
querying city
Main idea #2:
Discover local
preference in a specific
querying city
17
Our Main Ideas (3/3)
Content Words of Items
Such as tags and category (e.g., movie, shopping, nigh life)
V2
V1
U1
U2
V3
U3 U4
Los Angeles
V5
V4
U5
U6
V6
U7 U8
New York City
The users in one side and the items in the other side can be linked together by the item contents.
18
Offline Modeling LCA-LDA Model
■
Some basic definitions
 User Profile: For each user 𝑢 in the dataset, we create a user
profile 𝐷𝑢 , which is a set of triples <v, 𝑙𝑣 , 𝑐𝑣 >. 𝑙𝑣 denotes the
location of item v in a region-level (e.g., city). 𝑐𝑣 is a content word,
such as a tag or a category word, associated with v .
 Topic: Each topic z in our work has two topic models 𝜙𝑧 and 𝜙′𝑧.
The former is a probability distribution over items (item ID) and
the latter is a probability distribution over content words.
 User Interest: The intrinsic interest of user 𝑢 is represented by 𝜃𝑢 ,
a probablity distribution over topics.
 Local Preference: The local preference in region 𝑙 is represent
by 𝜃𝑙 , a probability distribution over topics.
19
The Generative Process of LCA-LDA
We use LCA-LDA model to simulate the process of user decision-making for visiting behaviors.
20
Outline
■
Introduction
 Background
 Challenges
■
Our Solution – LCARS
 Offline Modeling - LCA-LDA
 Online Recommendation – TA algorithm
■
Experiments
 Experimental Setup
 Experimental Results
■
Conclusions
21
Online Recommendation
■
Once we have inferred model parameters in LCA-LDA model,
such as user interest 𝜃𝑢 , the local preference 𝜃𝑙 , topics 𝜙𝑧 and
ϕ′z , and mixing weights 𝜆𝑢 , in the offline modeling phase,
the online recommendation part computes a ranking score for
each spatial item v within querying region 𝑙𝑢 , and then returns
top-k ranked spatial items as the recommendations.
The ranking score of v w.r.t query (u, 𝑙𝑢 )
The preference weigh of query (u, 𝑙𝑢 )
on topic z
The score of item v on topic z
22
Naïve online algorithm
■
■
■
■
■
Given a query (u, 𝑙𝑢 )
Compute the ranking scores for all items within the
querying region 𝑙𝑢
Find the best one, then the second best one, …, the
k-th best one
Good for small-scale problem
Still not feasible for large-scale, e.g., there are
millions of items in the dataset
23
Threshold-based Algorithm
■
■
■
For each region 𝑙 , we pre-compute K sorted lists of spatial
items. In each list 𝐿𝑍 , the items are sorted by their score on
topic z, i.e., F(v, 𝑙, z).
Given a query (u, 𝑙𝑢 ), we sequentially access the items and
compute their ranking scores in each sorted list.
For each list 𝐿𝑍 , let 𝑣𝑧 be the last item examined under sorted
access. Define the threshold value 𝑇𝑎 as follows:
𝑇𝑎 =
𝑊 𝑢, 𝑙𝑢 , 𝑧 𝐹(𝑙𝑢 , 𝑣𝑧 , 𝑧)
𝑧
As soon as at least k items have been examined whose ranking
score is equal or large than the threshold value, then halt.
 Let 𝐿 be a list containing k items that have been examined with
the highest ranking scores. Return 𝐿 to the querying users.
24
Nice Properties of TA
■
The TA algorithm is able to correctly find the top-k items by
examining the minimum number of items, since our defined
the ranking function is strictly monotone.
■
The threshold value 𝑇𝑎 is obtained by aggregating the
maximum 𝐹 𝑙𝑢 , 𝑣, 𝑧 represented by the last seen item in
each list 𝐿𝑍 . Consequently, it is the maximum possible
ranking score that can be achieved by remaining unexamined
items. Hence, if the smallest ranking score of the k examined
items is no less than the threshold score, the algorithm can
terminate immediately because no remaining item will have a
higher ranking score than the found k items.
25
Outline
■
Introduction
 Background
 Challenges
■
Our Solution – LCARS
 Offline Modeling - LCA-LDA
 Online Recommendation – TA algorithm
■
Experiments
 Experimental Setup
 Experimental Results
■
Conclusions
26
Experimental Data Sets
■
Data Sets
 DoubanEvent. DoubanEvent is China’s largest event-based social
networking site where users can publish and participate in social
events. This data set consists of 100,000 users, 300,000 events and
3,500,000 check-ins.
 Foursquare: This dataset contains 11, 326 users, 182, 968 venues
and 1, 385, 223 check-ins.
User and Event Distributions over Cities in DoubanEvent
27
Evaluation Method (1/2)
■
We design two real settings to evaluate the recommendation
effectiveness of our LCA-LDA model:
 Querying cities are new cities to querying users;
 Querying cities are home cities to querying users;
■
We then divide a user’s activity history into a test set and a
training set. We adopt two different dividing strategies with
respect to the two settings.
 For the first setting, we randomly select a visited non-home city as the
new city, mark off all spatial items visited by the user in the city as the
test set and use the rest of the user's activity history in other cities as the
training set.
 For the second setting, we randomly select 20% of spatial items visited
by the user in personal home city as the test set, and use the rest of
personal activity history as the training set.
28
Evaluation Method (2/2)
■
For each test case (𝑢, 𝑣, 𝑙𝑣 ) in the test set
 Randomly select 1000 additional items located at lv and unrated by
user 𝑢.
 Compute the ranking score for the test item 𝑣 as well as the
additional 1000 spatial items.
 Form a ranked list by ordering 1001 items according to the ranking
scores. Let p denote the rank of the item 𝑣 within this list. (The best
result: p=0).
 Form a top-k recommendation list by picking the k top ranked items
from the list. If p<k we have a hit. Otherwise we have a miss.
■
For any single test case
 recall for a single test can assume either 0 (miss) or 1(hit)
 The overall recall is defined by averaging over all test cases
29
Baseline Methods
■
■
USG: A unified location recommendation framework which
linearly fuses User interest along with Social influence and
Geographical influence.
User-based CF methods
 CKNN: A Category-based k-Nearest Neighbors algorithm. CKNN projects
a user's activity history into the category space and models user preference
using a weighted category hierarchy. The similarity between two users in
CKNN is computed according to their weights in the category hierarchy
 IKNN: A Item-based k-Nearest Neighbors algorithm. The similarity
between two users is computed by the Cosine similarity between two users'
item vectors.
■
■
■
LDA: A user is viewed as a document, and the items visited by her
is viewed as words in the document.
Location-Aware LDA (LA-LDA):One component of LCA-LDA
Content-Aware LDA(CA-LDA):Another component of LCA-LDA
30
Outline
■
Introduction
 Background
 Challenges
■
Our Solution – LCARS
 Offline Modeling - LCA-LDA
 Online Recommendation – TA algorithm
■
Experiments
 Experimental Setup
 Experimental Results
■
Conclusions
31
Experimental Results
■
Recommendation Effectiveness
32
Experimental Results
■
Recommendation Effectiveness
33
Experimental Results
■
Efficiency of online recommendation, querying
cities are Beijing and Shanghai
34
Experimental Results
■
In order to clearly see the performance of LCARS,
we zoom the results as follows.
35
Latent Information Analysis
36
Outline
■
Introduction
 Background
 Challenges
■
Our Solution – LCARS
 Offline Modeling - LCA-LDA
 Online Recommendation – TA algorithm
■
Experiments
 Experimental Setup
 Experimental Results
■
Conclusions
37
Conclusion
■
Spatial item Recommendations
 Data sparsity is a big challenge in recommendation systems
 New city problem amplify the data sparsity challenge
 Mobile scenario requires the recommender system to generate
real-time response to the user query.
■
Our Solution - LCARS
 Exploit the Local Preference of the querying city to alleviate the
data sparsity. Local word-of-mouth is a valuable resource for
making a recommendation.
 Take advantage of Content Information of items to overcome
the sparsity. The contents build a bridge between users and
items from disjoint regions.
 Extend the Threshold-based algorithm (TA) to produce fast
online recommendations
■
Result
 LCARS can produce more effective and more efficient
38
Q&A
Thanks
39