Transcript Document

Utilizing Transfer Learning for
Collaborative Recommender Systems
Lior Rokach
Dept. of Information Systems Eng.,
Ben-Gurion University of the Negev
Joint work with: Bracha Shapira, Guy Shani, Orly Moreno , Edita Grolman, Erez
Lefel, Ariel Bar et al.
Agenda
•
•
•
•
•
•
•
Introduction
Transfer Learning for Cross Domain Recommendation
The TALMUD Algorithm
Experimental Study – Take 1
Scalable TALMUD – Getting TALMUD Out of the Lab.
Experimental Study – Take 2
Conclusions and Future Research
Recommender Systems
• A recommender system (RS) helps people that have
no sufficient personal experience or competence to
evaluate the, potentially overwhelming, number of
alternatives offered by a web site.
– In their simplest form, RSs recommend to their users
personalized and ranked lists of items
– RSs provide consumers with information to help them
decide which items to purchase
3
Recommendation Techniques
State-of-the-art solutions
Method
Commonness
Examined Solutions
Jinni
Taste Kid
Nanocrowd
Clerkdogs
Criticker IMDb Flixster
Collaborative Filtering
v
v
v
Content-Based Techniques
v
v
v
v
v
Knowledge-Based Techniques
v
v
v
v
v
Stereotype-Based Recommender Systems
v
v
v
v
v
Ontologies and Semantic Web Technologies for
Recommender Systems
v
v
Hybrid Techniques
v
v
v
Movielens
Netflix
Shazam
v
v
v
v
Pandora
v
LastFM
YooChoose
Think Analytics
Itunes
Amazon
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
Ensemble Techniques for Improving
Recommendation
v
v future
Context Dependent Recommender Systems
v
Conversational/Critiquing Recommender Systems
v
Community Based Recommender Systems and
Recommender Systems 2.0
v
v
v
v
v
v
v
v
v
v
Collaborative Filtering
Overview
abcd
The Idea
 Trying to predict the opinion the user will have on the different items and be able to
recommend the “best” items to each user based on: the user’s previous likings and the
opinions of other like minded users
Selected Techniques
 Nearest Neighbor
Negative
Rating
Positive Rating
 Matrix Factorization
?
21.07.2015
5
Collaborative Filtering
Rating Matrix
Example of Rating Matrix
 The ratings of users and items are represented in a matrix
 All CF methods are based on such rating matrix
abcd
21.07.2015
6
Collaborative Filtering
Approach 1: Nearest Neighbors
abcd
“People who liked this also liked…”
User-to-User
abcd
 Recommendations are made by finding users with similar tastes. Jane and
Tim both liked Item 2 and disliked Item 3; it seems they might have similar
taste, which suggests that in general Jane agrees with Tim. This makes Item
1 a good recommendation for Tim.
This approach does not scale well for millions of users.
Item to
Item
Item-to-Item
 Recommendations are made by finding items that have similar appeal to
many users.
Tom and Sandra are two users who liked both Item 1 and Item 4. That
suggests that, in general, people who liked Item 4 will also like item 1, so
Item 1 will be recommended to Tim. This approach is scalable to millions of
users and millions of items.
User to User
Methods
 Using similarity measures (such as Pearson)
 Learning the relations weight s via optimization
21.07.2015
7
Collaborative Filtering
Approach 2: Matrix factorization
Factorization
abcd
abcd
 The Singular Value Decomposition (SVD) is a widely used
technique to decompose a matrix into several component
matrices, exposing many of the useful and interesting properties
of the original matrix.
 In the Recommendation Systems field, SVD models users and
items as vectors of latent features which when cross product
produce the rating for the user of the item
 With SVD a matrix is factored into a series of linear
approximations that expose the underlying structure of the
matrix.
 The goal is to uncover latent features that explain observed
ratings
21.07.2015
8
Latent Factor Models
Example
Latent Concepts or Factors
Users & Ratings
abcdHidden Concept
 SVD reveals hidden
connections and
its strength
abcd
SVD
 SVD Process
abcd
SVD
 User Rating
21.07.2015
9
Latent Factor Models
Example
Latent Concepts or Factors
Users & Ratings
abcd
Recommendation
 SVD revealed a
movie this user
might like!
21.07.2015
10
Latent Factor Models
Concept space
21.07.2015
11
Popular Factorization
• SVD
𝑋𝑚 ×𝑛 ≈ 𝑈𝑚 ×𝑑 ∙ Σ𝑑 ×𝑑 ∙ 𝑉𝑛×𝑑
𝑇
d=min(m,n)
diagonal matrix where
singular values indicate
the factor importance
• Low Rank Factorization
𝑋𝑚 ×𝑛 ≈ 𝑈𝑚 ×𝑑 ∙ 𝑉𝑛×𝑑
𝑇
• Code-Book
𝑋𝑚 ×𝑛 ≈ 𝑈𝑚 ×𝑑 ∙ 𝐵𝑑 ×𝑙 ∙
Permutation
Matrix
𝑉𝑛×𝑙
𝑇
Estimate latent factors through optimization
• Decision Variables:
– Matrices U, V
• Goal function:
– Minimize some loss function on available entries in the
training rating matrix
– Most frequently MSE is used:
• Easy to optimize
• A proxy to other predictive performance measures
• Methods:
– e.g. use stochastic gradient descent
Three Related Issues
• Sparseness
• Long Tail
– many items in the Long Tail
have only few ratings
• Cold Start
– System cannot draw any
inferences for users or items
about which it has not yet
gathered sufficient data
Alleviating CF issues
Using
External Sources
• Crowdsourcing
– "Using Wikipedia to boost collaborative filtering
techniques." Proceedings of the fifth ACM conference
on Recommender systems. 2011.
• Social Media
– "Facebook single and cross domain data for
recommendation systems." User Modeling and UserAdapted Interaction (2012): 1-37.
• Transfer Learning
– “TALMUD: transfer learning for multiple domains.”,
21st ACM International Conference on Information
and Knowledge Management, CIKM 2012: 425-434
Agenda
•
•
•
•
•
•
•
Introduction
Transfer Learning for Cross Domain Recommendation
The TALMUD Algorithm
Experimental Study – Take 1
Scalable TALMUD – Getting TALMUD Out of the Lab?
Experimental Study – Take 2
Conclusions and Future Research
Transfer Learning (TL)
Transfer previously learned “knowledge” to new domains,
making them capable of learning a model from very few training
examples.
Different
tasks
Source
domain
Target
domain
h
Learning
system
Learning
system
Learning
system
Traditional Machine Learning
knowledge
Transfer learning
Learning
system
17
Transfer Learning
Share-Nothing
Games
18
Music
Transfer Learning
Share-Nothing
Games
Music
Best seller
Best seller
Trendy
Classic
19
Trendy
Classic
𝑉𝑛×𝑙
Users
𝑋𝑚 ×𝑛 ≈ 𝑈𝑚 ×𝑑 ∙ 𝐵𝑑 ×𝑙 ∙
Items
a
b
c
d
e
1
?
1
3
3
1
2
3
3
2
?
3
3
2
2
?
3
?
4
1
1
3
?
?
5
1
?
?
3
1
6
3
?
2
2
3
7
?
2
3
3
2
Rating Matrix
𝑇
𝑉𝑛×𝑙
Users
𝑋𝑚 ×𝑛 ≈ 𝑈𝑚 ×𝑑 ∙ 𝐵𝑑 ×𝑙 ∙
𝑇
Items
a
b
c
d
e
1
1
1
3
3
1
2
3
3
2
?
3
3
2
2
?
3
?
4
1
1
3
?
?
5
1
?
?
3
1
6
3
?
2
2
3
7
?
2
3
3
2
Rating Matrix
21
CBT - Codebook Transfer
• Main Ideas:
– A cluster-level rating pattern (CodeBook) is used to bridge
the two domains.
• Remarks:
– Does not require common users or items (share -nothing).
– Preserves privacy!
24
Codebook Transfer
• Assumption: related domains share similar cluster level
rating patterns.
a
b
c
d
e
f
1
2
3
3
3
2
?
2
3
1
2
2
?
3
3
1
?
2
4
?
2
1
5
2
3
6
3
2
a
e
b
f
c
d
2
3
?
1
1
2
2
1
3
3
3
1
1
?
2
3
1
1
2
2
3
?
3
3
1
3
2
5
2
2
3
3
3
3
3
3
2
3
4
?
3
2
2
1
1
1
?
3
2
6
3
3
2
2
1
?
Source domain (music)
c
b
c
d
e
1
?
1
3
3
1
2
3
3
2
?
3
2
2
?
4
1
1
5
1
6
7
u
s
e
r
s
A
B
C
X
3
1
2
Y
2
3
3
Z
3
2
1
After permutation
c
d
a
b
e
1
3
3
?
1
1
3
4
3
?
1
1
?
3
?
5
?
3
1
?
1
3
?
?
2
2
?
3
3
3
?
?
3
1
6
2
2
3
?
3
3
?
2
2
3
3
?
3
2
2
?
?
2
3
3
2
7
3
3
?
2
2
Target domain (games)
items
After permutation
items
u
s
e
r
s
A
B
X
3
1
Y
2
3
Z
3
2
25
Why does it make sense?
• The rows/columns in the code-book matrix
represents the users’/items’ rating distribution:
A
B
C
D
E
F
G
H
I
J
a
3
1
1
2
2
1
1
3
2
2
b
2
4
4
5
5
5
4
5
3
3
c
1
5
3
2
4
3
4
2
5
1
d
2
1
2
3
2
2
3
4
4
1
e
3
1
5
3
3
4
3
2
2
1
f
3
5
1
3
1
2
1
2
3
2
0.7
0.7
• Less
training instances are required
to match
0.6
0.6
0.5 than discover
0.5
users/items
to existing patterns
0.4
0.4
0.3
0.3
new
patterns
0.2
0.1
6E-16
-0.1
0.2
0.1
0
1
2
3
4
5
1
2
3
4
5
Codebook Transfer From A Single Source Domain
• Problem settings:
- Xsrc –
- Xtgt –
-B–
27
A dense (n x m) rating matrix
A sparse (p x q) rating matrix
containing few observed ratings
The codebook
(k x l) cluster-level user-item rating
matrix that encodes the useritem clusters
Codebook Transfer From A Single Source Domain
Main steps:
1) Construct a “codebook” matrix for the source
domain
2) Transfer the codebook to the sparse target domain
and find the best cluster assignments
3) Reconstruct a dense rating matrix in the target
domain
28
Step 1: CodeBook Construction
• Input :
– Source rating matrix Xsrc
– K , L – user defined codebook dimensions
• Co-Clustering: Simultaneously cluster users and items
items
a
b
c
d
e
f
1
2
3
3
3
2
?
2
3
1
2
2
?
1
3
3
1
?
2
3
1
4
?
2
1
1
3
2
5
2
3
3
3
2
3
6
3
2
1
?
3
2
a
e
b
f
c
d
2
3
?
1
1
2
2
3
3
3
1
1
?
2
1
2
2
3
?
3
3
5
2
2
3
3
3
3
4
?
3
2
2
1
1
6
3
3
2
2
1
?
u
s
e
r
s
A
B
C
X
3
1
2
Y
2
3
3
Z
3
2
1
Codebook
29
Simultaneously cluster users and items
(George and Merugu, 2005)
Procedure: Co-Clustering
1. Randomly initialize U and V
2. Do
1. Compute the co-cluster means (Codebook B)
2. For each user i=1 to m
1. For each user cluster j=1 to K
1. Assign user i to cluster j that minimizes error
3. Update the co-cluster means (Codebook)
4. For each item i=1 to n
1. For each item cluster j=1 to K
1. Assign item i to cluster j that minimizes error
3. Until No Error Improvement
4. Return U,V, and Codebook B
Step 2: Finding the Cluster Assignment
• Algorithm:
– B is known and obtained from the source domain
– Randomly initiate Utgt and Vtgt,
– Alternatively updating Utgt and Vtgt by :
• Fixing one matrix and finding the best setting for the
other matrix
31
Agenda
•
•
•
•
•
•
•
Introduction
Transfer Learning for Cross Domain Recommendation
The TALMUD Algorithm
Experimental Study – Take 1
Scalable TALMUD – Getting TALMUD Out of the Lab?
Experimental Study – Take 2
Conclusions and Future Research
TALMUD
TrAnsfer Learning from MUltiple Domains
• Extends the codebook transfer concept to support:
• Multiple source domains with varying levels of relevance.
33
TALMUD-Problem Definition
1. Objective: Minimizing MSE (Mean squared Error) in
the target domain
min𝑝 ×𝑘
𝑛
𝑈𝑛 ∈ 0,1
𝑉𝑛 ∈ 0,1 𝑞 ×𝑙 𝑛
𝛼 𝑛 ∈𝑅 ∀𝑛∈𝑁
2
𝑁
𝛼𝑛 𝑈𝑛 𝐵𝑛 𝑉𝑛 𝑇
𝑋𝑡𝑔𝑡 −
⃘𝑊
𝑛=1
𝑆. 𝑇 𝑈n 1 = 1, 𝑉n 1 = 1
2. Variables:
• Users and items clusters memberships
in each source domain n - 𝑈𝑛 , 𝑉𝑛
• 𝛼𝑛 – Relatedness coefficient between each
source domain i and the target domain
Min
34
34
The TALMUD Algorithm
•Step 1: creating a cluster (Codebook 𝐵𝑛 )
for each source domain
•Step 2: Learning the target clusters membership based on all
source domains simultaneously.
2.1: finding the users’
corresponding clusters
𝑗 = 𝑎𝑟𝑔𝑚𝑖𝑛𝑗
𝑋𝑡𝑔𝑡
𝑖∗
𝛼𝑛 𝐵𝑛 𝑉𝑛 (𝑡−1)
−
𝑇
𝑛=1
2.2: finding the items’
corresponding clusters
2
𝑁
𝑗∗
2
𝑁
𝑗 = 𝑎𝑟𝑔𝑚𝑖𝑛𝑗
𝑋𝑡𝑔𝑡
∗𝑖
𝛼𝑛 𝑈𝑛 (𝑡) 𝐵𝑛
−
𝑊𝑖∗
𝑛=1
∗𝑗
𝑊∗𝑖
2.3: Learning the
coefficients 𝛼𝑛
•Step 3: Calculate the filled-in
target rating matrix
35
𝑁
𝑋𝑡𝑔𝑡 = 𝑊 ⃘𝑋𝑡𝑔𝑡 + 1 − 𝑊 ⃘
𝛼𝑛 (𝑈𝑛 𝐵𝑛 𝑉𝑛𝑇 )
𝑛=1
Step 2.1
𝛼1 ×
36
K11
K12
K13
U1
1
0
0
U2
0
1
0
U3
0
0
1
U4
1
0
u5
0
U6
1
K21
K22
K23
K24
U1
1
0
0
0
U2
0
0
0
1
U3
0
0
1
0
0
U4
1
0
0
0
1
0
u5
0
0
0
1
0
0
U6
0
1
0
0
× 𝐵1 × 𝑉 1 + 𝛼 2 ×
× 𝐵2 × 𝑉2
L11
L12
L13
K11
3
1
2
K21
3
2
1
2
K12
2
3
3
K22
2
1
2
2
k13
3
2
1
K23
1
3
3
1
K24
3
2
1
3
B1
L21
L22
L23
B2
L24
Step 2.1
𝛼1 ×
37
K11
K12
K13
U1
1
0
0
U2
0
1
0
U3
0
0
1
U4
1
0
u5
0
U6
1
K21
K22
K23
K24
U1
0
1
0
0
U2
0
0
0
1
U3
0
0
1
0
0
U4
1
0
0
0
1
0
u5
0
0
0
1
0
0
U6
0
1
0
0
× 𝐵1 × 𝑉 1 + 𝛼 2 ×
× 𝐵2 × 𝑉2
L11
L12
L13
K11
3
1
2
K21
3
2
1
2
K12
2
3
3
K22
2
1
2
2
k13
3
2
1
K23
1
3
3
1
K24
3
2
1
3
B1
L21
L22
L23
B2
L24
Step 2.1
𝛼1 ×
38
K11
K12
K13
U1
1
0
0
U2
0
1
0
U3
0
0
1
U4
1
0
u5
0
U6
1
K21
K22
K23
K24
U1
0
0
1
0
U2
0
0
0
1
U3
0
0
1
0
0
U4
1
0
0
0
1
0
u5
0
0
0
1
0
0
U6
0
1
0
0
× 𝐵1 × 𝑉 1 + 𝛼 2 ×
× 𝐵2 × 𝑉2
L11
L12
L13
K11
3
1
2
K21
3
2
1
2
K12
2
3
3
K22
2
1
2
2
k13
3
2
1
K23
1
3
3
1
K24
3
2
1
3
B1
L21
L22
L23
B2
L24
Step 2.1
𝛼1 ×
39
K11
K12
K13
U1
1
0
0
U2
0
1
0
U3
0
0
1
U4
1
0
u5
0
U6
1
K21
K22
K23
K24
U1
0
0
0
1
U2
0
0
0
1
U3
0
0
1
0
0
U4
1
0
0
0
1
0
u5
0
0
0
1
0
0
U6
0
1
0
0
× 𝐵1 × 𝑉 1 + 𝛼 2 ×
× 𝐵2 × 𝑉2
L11
L12
L13
K11
3
1
2
K21
3
2
1
2
K12
2
3
3
K22
2
1
2
2
k13
3
2
1
K23
1
3
3
1
K24
3
2
1
3
B1
L21
L22
L23
B2
L24
Step 2.1
𝛼1 ×
40
K11
K12
K13
U1
0
1
0
U2
0
1
0
U3
0
0
1
U4
1
0
u5
0
U6
1
K21
K22
K23
K24
U1
1
0
0
0
U2
0
0
0
1
U3
0
0
1
0
0
U4
1
0
0
0
1
0
u5
0
0
0
1
0
0
U6
0
1
0
0
× 𝐵1 × 𝑉 1 + 𝛼 2 ×
× 𝐵2 × 𝑉2
L11
L12
L13
K11
3
1
2
K21
3
2
1
2
K12
2
3
3
K22
2
1
2
2
k13
3
2
1
K23
1
3
3
1
K24
3
2
1
3
B1
L21
L22
L23
B2
L24
Step 2.1
𝛼1 ×
41
K11
K12
K13
U1
0
1
0
U2
0
1
0
U3
0
0
1
U4
1
0
u5
0
U6
1
K21
K22
K23
K24
U1
0
1
0
0
U2
0
0
0
1
U3
0
0
1
0
0
U4
1
0
0
0
1
0
u5
0
0
0
1
0
0
U6
0
1
0
0
× 𝐵1 × 𝑉 1 + 𝛼 2 ×
× 𝐵2 × 𝑉2
L11
L12
L13
K11
3
1
2
K21
3
2
1
2
K12
2
3
3
K22
2
1
2
2
k13
3
2
1
K23
1
3
3
1
K24
3
2
1
3
B1
L21
L22
L23
B2
L24
Step 2.1
𝛼1 ×
42
K11
K12
K13
U1
0
1
0
U2
0
1
0
U3
0
0
1
U4
1
0
u5
0
U6
1
K21
K22
K23
K24
U1
0
0
1
0
U2
0
0
0
1
U3
0
0
1
0
0
U4
1
0
0
0
1
0
u5
0
0
0
1
0
0
U6
0
1
0
0
× 𝐵1 × 𝑉 1 + 𝛼 2 ×
× 𝐵2 × 𝑉2
L11
L12
L13
K11
3
1
2
K21
3
2
1
2
K12
2
3
3
K22
2
1
2
2
k13
3
2
1
K23
1
3
3
1
K24
3
2
1
3
B1
L21
L22
L23
B2
L24
Step 2.3: Learning The Relatedness Coefficients
The goal is to minimize the error
N
n=1 αn
E = Xtgt −
Un Bn Vn
T
⃘W
2
F
Which can be written as
E=
p
i=1
q
j=1
N
n=1 αn
Xij −
Tn
2
ij
Where we denote (Un Bn VnT ) ⃘W as Tn and Xtgt ⃘W as X
𝜕E
𝜕αn∗
= −2
=−
i,j
N
n=1 αn
Xij −
i,j X ij Tn∗
ij
+
Tn
N
n=1
ij
i,j αn
Tn∗
ij
Tn
ij
Tn∗
ij
=0
Rearranging gives:
43
N
n=1
i,j αn
Tn
ij
Tn∗
ij
=
i,j X ij
Tn∗
ij
Step 2.3: Learning the relatedness coefficients
Thus we obtain the following set of linear equations:
i,j
i,j
2
ij
T1
T2
ij
T1
ij
ij
T1
ij
⋯
⋯
⋱
⋯
⋮
i,j
TN
i,j
T1
ij
i,j
⋮
⋮
TN
TN
α1
α2
⋮
αN
ij
2
ij
The optimal αn is found by solving :
T1
α1
α2
⋮
αN
2
ij
⋯
i,j
=
ij
T1
ij
⋮
TN
i,j
ij
ij
T1
ij
T1
i,j X ij T2
⋮
i,j X ij TN
=
ij
ij
ij
−1
TN
i,j
T2
i,j
T1
i,j X ij
ij
Xij T1
ij
Xij T2
ij
i,j
⋯
⋮
⋱
⋮
⋯
TN
i,j
i,j
2
ij
⋮
Xij TN
i,j
ij
44
• The Hessian matrix is proved to be semi-definite positive and therefore the
function is convex.
• Conclusion: TALMUD monotonically reduces the value of the objective function.
Forward Selection of Sources
1) Adding sources gradually• Begins with empty set of sources
• Examine the addition of each source
• Add the source that improves the
model the most
• Wrapper approach is used to decide
when to stop.
2) Retrain using the entire dataset with the
selected sources
Data
1)
2)
3)
45
Training
Training
Validation
Training
Test
Test
Test
Agenda
•
•
•
•
•
•
•
Introduction
Transfer Learning for Cross Domain Recommendation
The TALMUD Algorithm
Experimental Study – Take 1
Scalable TALMUD – Getting TALMUD Out of the Lab?
Experimental Study – Take 2
Conclusions and Future Research
Datasets
• Netfilx (Movies) - We extracted 110 users, 110 items, and 12,100 rating
records to ensure no sparsity in order to use it as a dense source matrix in
the experiment.
• Jester (Jokes) - We used a subset with no sparsity that contains 500 users,
100 items, and 50,000 rating records.
• MovieLense (Movies) –We randomly extracted 943 users, 1682 items and
100,000 rating records with ~94% sparsity
• Music loads - We randomly selected 632 users who have events on at least
10 different items, and 817 music items with at least 5 events of different
user’s ~97% sparsity.
• Games loads - We randomly selected 632 users who have events on at least
10 different items, and 1264 items with at least 5 events ~97% sparsity.
• BookCrossing (Books) - We randomly extracted 13869 users, 20358 items
and 146,960 rating records with ~99.9% sparsity.
47
Compared Methods
1. CB (George and Merugu, 2005) Codebook based
factorization for a single domain
2. SVD (Koren et al., 2009) - A single domain algorithm that
fills the matrix by finding the users’ and items’ latent
factors.
3. CBT (Li et al. 2009a), which learns only from one source
domain.
4. RMGM (Li et al., 2009b) a multi-tasking TL method that
aims to fill missing values in multiple domains.
5. TALMUD
48
Evaluation Procedure
• Train/Test
– Using timestamps
– Training data - consisted of 80% of the ratings
– Testing data - consisted of the remaining 20%.
• Evaluation metric:
– 𝑀𝐴𝐸 =
𝑇
𝑖=1
𝑃𝑖 −𝑅𝑖
𝑇
49
Comparison Results
250
219.21
Talmud
200
CBT
RMGM
150
SVD
133.3
MAE
CB
120.5
103.15
96.16
100
88.11
74.84
50
85.21
78.1 78.06
61.17
54.58
53.38
48.67
49.56
0
Games
Music
Target Domain
BookCrossing
Curse of Sources
Target Games
100
90
Test Error of Complete Forward Selection
80
Train Error of Complete Forward Selection
70
MAE
60
50
40
30
20
10
0
0
1
2
Number of Sources
Too many sources leads to over-fitting.
51
Not
all given source domains should be used.
3
4
Converge in Practice
Training MAE
1.2
1
MAE
0.8
0.6
0.4
0.2
0
1
2
3
4
5
6
Number of Iterations
Talmud converges to a local minimum in a few iterations;
In our empirical tests it usually takes less than ten iterations
7
8
9
10
Agenda
•
•
•
•
•
•
•
Introduction
Transfer Learning for Cross Domain Recommendation
The TALMUD Algorithm
Experimental Study – Take 1
Scalable TALMUD – Getting TALMUD Out of the Lab?
Experimental Study – Take 2
Conclusions and Future Research
Complexity Analysis
D3(UICD+1+RD)~D3UICD+1
U – Number of users.
I – Number of items.
R – Number of ratings in TT
Note: R<<U*I (R is very sparse).
D – Number of source domains.
C – Number of user/item clusters
in each source domain (typical
values: 20/50).
n – number of machines
Scaling Up TALMUD for Massive Datasets
• Dedicated Data Structures and Matrix
Operations for Permutation Matrix
• Heuristic Methods
• MapReduce Framework
– Parallelizing Co-Clustering
– Parallelizing Search
Compact Representation of Matrix U,V
and Loss Function Calculation
The relatedness
Sum of each
Codebook of each
coefficient
prediction
source domain
between source
from each
domain 𝑛 and the
domain
target domain.
multiplied
with the
source
N
domain min
(TT uid , iid 
U ,V ,
n 1
relatedness.


The mapping of
each user to each
cluster in each
source domain
The mapping of
each item to each
cluster in each
source domain
 n B n [U uid , n ][ V iid , n ])
2
TT
The
Predicted
rating
Observed
ratings in
Xtgt
2
𝑁
min𝑝×𝑘
𝑛
𝑈𝑛 ∈ 0,1
𝑉𝑛 ∈ 0,1 𝑞×𝑙𝑛
𝛼𝑛 ∈𝑅 ∀𝑛∈𝑁
𝛼𝑛 𝑈𝑛 𝐵𝑛 𝑉𝑛 𝑇
𝑋𝑡𝑔𝑡 −
𝑛=1
Observed
ratings in
The
Predicted
rating
⃘𝑊
The matrix W is
a binary matrix.
If entry rated then
𝑊 𝑖𝑗 = 1
Otherwise 𝑊 𝑖𝑗 = 0
Complexity Analysis
D3(UICD+1+RD)~D3UICD+1
Data Structure
D3R(CD+D)~D3RCD
U – Number of users.
I – Number of items.
R – Number of ratings in TT
Note: R<<U*I (R is very sparse).
D – Number of source domains.
C – Number of user/item clusters
in each source domain (typical
values: 20/50).
n – number of machines
Heuristics
• Ranking Based Source Selection instead of
Forward Selection.
• Cluster Assignment using Binary Search Tree
instead of Exhaustive Search.
Ranking Based Source Selection
1) Pre-compute the source target correlation estimate –
Compare the Target’s codebook with each codebook of the sources
1)
2)
Permute the rows and columns in the source codebook to make it alike
the target codebook
Compare the permuted codebooks (using Euclidean distance)
2) In each iteration examine the next top k (k=2) unselected
sources
Complexity Analysis
D3(UICD+1+RD)~D3UICD+1
Data Structure
D3R(CD+D)~D3RCD
Ranking Based Source Selection
D2RCD
U – Number of users.
I – Number of items.
R – Number of ratings in TT
Note: R<<U*I (R is very sparse).
D – Number of source domains.
C – Number of user/item clusters
in each source domain (typical
values: 20/50).
n – number of machines
Cluster Assignment using Search Tree
• Bottom-Up cluster the codebook’s rows to create a
search tree
• Instead of going over all possible codebook
assignments (CD) – examine the codebook in sequence
while fixing all others codebook assignments and by
using the search tree (DlogC)
L1
L2
L3
K1
3
1
2
K2
3
2
1
K3
20
8
18
K4
8
1
3
Distance Measures
The distance between cluster K1 and K2 is:
𝑑 𝐾1, 𝐾2 =
(3 − 3)2 +(1 − 2)2 +(2 − 1)2 = 2
The distance between cluster K1 and K3 is:
𝑑 𝐾1, 𝐾3 =
(3 − 20)2 +(1 − 8)2 +(2 − 18)2 = 40
The distance between cluster K1 and K4 is:
𝑑 𝐾1, 𝐾4 =
(3 − 8)2 +(1 − 1)2 +(2 − 3)2 = 6
2 closest clusters to K1 are : K2 and K4
L1
L2
L3
K1
3
1
2
K2
3
2
1
K3
20
8
18
K4
8
1
3
Complexity Analysis
D3(UICD+1+RD)~D3UICD+1
Data Structure
D3R(CD+D)~D3RCD
Ranking Based Source Selection
D2RCD
Search Tree
D2Rmax(logC,D)
U – Number of users.
I – Number of items.
R – Number of ratings in TT
Note: R<<U*I (R is very sparse).
D – Number of source domains.
C – Number of user/item clusters
in each source domain (typical
values: 20/50).
n – number of machines
MapReduce
[Dean et al., OSDI 2004, CACM Jan 2010]
VLDB 2010 Tutorial
MapReduce For Step 1: Co-Clustering
Procedure: Co-Clustering
1. Randomly initialize U and V
2. Do
1. Compute the group means (Codebook B)
2. For each user i=1 to m
1. For each user cluster j=1 to K
1. Assign user i to cluster j that minimizes error
3. Update group means (Codebook)
4. For each item i=1 to n
1. For each item cluster j=1 to K
1. Assign item i to cluster j that minimizes error
3. Until No Error Improvement
4. Return U,V, and Codebook B
MapReduce For Step 1: Co-Clustering
Procedure: Co-Clustering User Mapper (User i, UserRating Xi)
1. Globals: Codebook B, Matrix U, Matrix V
2. For each user cluster j=1 to K
1. Assign user i to cluster j* that minimizes error and update Ui
3. Cj*Contribution of user i to the aligned codebook row
4. Emit (j*, Cj*)
C is an array of two rows:
first row for aggregating
the rating values and
second row for counting
the number of ratings
Procedure: Co-Clustering User Reducer (Cluster j, Contributions C1,C2,…,Cl)
1. Bj  0
2. For i=1 to l
1. Bj  Bj +Ci,1
2. Tj  Tj + Ci,2
3. Emit (j,Bj/Ti)
In order to save network time we can use combiner to pre-aggregating values at the
mapper nodes of all users that belong to the same user cluster.
MapReduce for Step 2.1 and 2.2
• Each user/item is addressed independently
(similar to step 1) and therefore can be mapped
in a separate node.
Procedure: Step 2.1 (User i, UserRating Xi)
1. Globals: Codebook B, Matrix U, Matrix V
2. For each user cluster j=1 to K
1. Assign user i to cluster j* that minimizes error and update Ui
3. Emit (1, 1)
• A single Reduce node that is used only for
synchronizing the mappers.
Step 2.3
• Calculating the Hessian is done in MapReduce
– Map the matrix for each valid User,Item pair
– Reduce aggregate the matrices
• Solving the set of linear equations can be
done using MapReduce – but don’t bother (N
is relatively small).
Step 3
• Calculate the validation error:
– Map
• Calculate the error of a certain <user,item> pair in the
validation
– Reduce
• Aggregate the errors
Complexity Analysis
D3(UICD+1+RD)~D3UICD+1
Data Structure
D3R(CD+D)~D3RCD
Ranking Based Source Selection
D2RCD
Search Tree
D2Rmax(logC,D)
MapReduce Parallelizing
1/n * D2Rmax(logC,D)
U – Number of users.
I – Number of items.
R – Number of ratings in TT
Note: R<<U*I (R is very sparse).
D – Number of source domains.
C – Number of user/item clusters
in each source domain (typical
values: 20/50).
n – number of machines
Talmud Code
• Map-Reduce implementation for Apache
Hadoop 2.0 using Pig can be obtained from:
– https://github.com/marnun/Talmud
• Non-distributed Matlab implementation can
be obtained from:
– http://ise.bgu.ac.il/faculty/liorr/Talmud
Agenda
•
•
•
•
•
•
•
Introduction
Transfer Learning for Cross Domain Recommendation
The TALMUD Algorithm
Experimental Study – Take 1
Scalable TALMUD – Getting TALMUD Out of the Lab?
Experimental Study – Take 2
Conclusions and Future Research
Results of the Scaling Performance
Target
Dataset
Size of
Target
Rating
Total Size
of All
Source
Rating
MAE
TALMUD
MAE
Scalable
TALMUD
Time
TALMUD
(Sec)
Time
Scalable
TALMUD
(Sec)
Small Games
20K
324K
48.67
49.3
43K
0.9K
Small Music
16K
328K
74.84
74.91
28K
0.4K
Small
BookCrossing
147K
177K
49.56
49.56
208K
2.5K
Full Games
2.5M
125M
N/A
42.66
N/A
92K
Full Music
10.2M
118M
N/A
53.17
N/A
269K
Full
BookCrossing
1.1M
126M
N/A
46.17
N/A
34.1K
Agenda
•
•
•
•
•
•
•
Introduction
Transfer Learning for Cross Domain Recommendation
The TALMUD Algorithm
Experimental Study – Take 1
Scalable TALMUD – Getting TALMUD Out of the Lab?
Experimental Study – Take 2
Conclusions and Future Research
Conclusions
• A new heterogeneous transfer learning for RecSys
(which involves transferring knowledge from multiple
sources)
• Automatically selects which source domains should be
used and in which intensity and by that avoid negative
transfer (which occurs when the new classifier
performs worse than if there had been no transfer at
all)
• Overfitting can be avoided by gradually adding sources
and evaluate them using wrapper approach.
• Outperforms exiting state-of-the-art TL methods (CBT
and RMGM).
• Scalable
75
Current Activities
•
•
•
•
•
•
.
Cross-Sources Transfer
In-Domain Self-Transfer
Cross-Events Transfer
Experiments with other Bi-Partite Graphs tasks
Generalization to DAG
Goodbye MapReduce - Implementations on
top of a DSL for linear algebraic operations
Thank You!
77