Transcript Document
Utilizing Transfer Learning for Collaborative Recommender Systems Lior Rokach Dept. of Information Systems Eng., Ben-Gurion University of the Negev Joint work with: Bracha Shapira, Guy Shani, Orly Moreno , Edita Grolman, Erez Lefel, Ariel Bar et al. Agenda • • • • • • • Introduction Transfer Learning for Cross Domain Recommendation The TALMUD Algorithm Experimental Study – Take 1 Scalable TALMUD – Getting TALMUD Out of the Lab. Experimental Study – Take 2 Conclusions and Future Research Recommender Systems • A recommender system (RS) helps people that have no sufficient personal experience or competence to evaluate the, potentially overwhelming, number of alternatives offered by a web site. – In their simplest form, RSs recommend to their users personalized and ranked lists of items – RSs provide consumers with information to help them decide which items to purchase 3 Recommendation Techniques State-of-the-art solutions Method Commonness Examined Solutions Jinni Taste Kid Nanocrowd Clerkdogs Criticker IMDb Flixster Collaborative Filtering v v v Content-Based Techniques v v v v v Knowledge-Based Techniques v v v v v Stereotype-Based Recommender Systems v v v v v Ontologies and Semantic Web Technologies for Recommender Systems v v Hybrid Techniques v v v Movielens Netflix Shazam v v v v Pandora v LastFM YooChoose Think Analytics Itunes Amazon v v v v v v v v v v v v v v v v v v v Ensemble Techniques for Improving Recommendation v v future Context Dependent Recommender Systems v Conversational/Critiquing Recommender Systems v Community Based Recommender Systems and Recommender Systems 2.0 v v v v v v v v v v Collaborative Filtering Overview abcd The Idea Trying to predict the opinion the user will have on the different items and be able to recommend the “best” items to each user based on: the user’s previous likings and the opinions of other like minded users Selected Techniques Nearest Neighbor Negative Rating Positive Rating Matrix Factorization ? 21.07.2015 5 Collaborative Filtering Rating Matrix Example of Rating Matrix The ratings of users and items are represented in a matrix All CF methods are based on such rating matrix abcd 21.07.2015 6 Collaborative Filtering Approach 1: Nearest Neighbors abcd “People who liked this also liked…” User-to-User abcd Recommendations are made by finding users with similar tastes. Jane and Tim both liked Item 2 and disliked Item 3; it seems they might have similar taste, which suggests that in general Jane agrees with Tim. This makes Item 1 a good recommendation for Tim. This approach does not scale well for millions of users. Item to Item Item-to-Item Recommendations are made by finding items that have similar appeal to many users. Tom and Sandra are two users who liked both Item 1 and Item 4. That suggests that, in general, people who liked Item 4 will also like item 1, so Item 1 will be recommended to Tim. This approach is scalable to millions of users and millions of items. User to User Methods Using similarity measures (such as Pearson) Learning the relations weight s via optimization 21.07.2015 7 Collaborative Filtering Approach 2: Matrix factorization Factorization abcd abcd The Singular Value Decomposition (SVD) is a widely used technique to decompose a matrix into several component matrices, exposing many of the useful and interesting properties of the original matrix. In the Recommendation Systems field, SVD models users and items as vectors of latent features which when cross product produce the rating for the user of the item With SVD a matrix is factored into a series of linear approximations that expose the underlying structure of the matrix. The goal is to uncover latent features that explain observed ratings 21.07.2015 8 Latent Factor Models Example Latent Concepts or Factors Users & Ratings abcdHidden Concept SVD reveals hidden connections and its strength abcd SVD SVD Process abcd SVD User Rating 21.07.2015 9 Latent Factor Models Example Latent Concepts or Factors Users & Ratings abcd Recommendation SVD revealed a movie this user might like! 21.07.2015 10 Latent Factor Models Concept space 21.07.2015 11 Popular Factorization • SVD 𝑋𝑚 ×𝑛 ≈ 𝑈𝑚 ×𝑑 ∙ Σ𝑑 ×𝑑 ∙ 𝑉𝑛×𝑑 𝑇 d=min(m,n) diagonal matrix where singular values indicate the factor importance • Low Rank Factorization 𝑋𝑚 ×𝑛 ≈ 𝑈𝑚 ×𝑑 ∙ 𝑉𝑛×𝑑 𝑇 • Code-Book 𝑋𝑚 ×𝑛 ≈ 𝑈𝑚 ×𝑑 ∙ 𝐵𝑑 ×𝑙 ∙ Permutation Matrix 𝑉𝑛×𝑙 𝑇 Estimate latent factors through optimization • Decision Variables: – Matrices U, V • Goal function: – Minimize some loss function on available entries in the training rating matrix – Most frequently MSE is used: • Easy to optimize • A proxy to other predictive performance measures • Methods: – e.g. use stochastic gradient descent Three Related Issues • Sparseness • Long Tail – many items in the Long Tail have only few ratings • Cold Start – System cannot draw any inferences for users or items about which it has not yet gathered sufficient data Alleviating CF issues Using External Sources • Crowdsourcing – "Using Wikipedia to boost collaborative filtering techniques." Proceedings of the fifth ACM conference on Recommender systems. 2011. • Social Media – "Facebook single and cross domain data for recommendation systems." User Modeling and UserAdapted Interaction (2012): 1-37. • Transfer Learning – “TALMUD: transfer learning for multiple domains.”, 21st ACM International Conference on Information and Knowledge Management, CIKM 2012: 425-434 Agenda • • • • • • • Introduction Transfer Learning for Cross Domain Recommendation The TALMUD Algorithm Experimental Study – Take 1 Scalable TALMUD – Getting TALMUD Out of the Lab? Experimental Study – Take 2 Conclusions and Future Research Transfer Learning (TL) Transfer previously learned “knowledge” to new domains, making them capable of learning a model from very few training examples. Different tasks Source domain Target domain h Learning system Learning system Learning system Traditional Machine Learning knowledge Transfer learning Learning system 17 Transfer Learning Share-Nothing Games 18 Music Transfer Learning Share-Nothing Games Music Best seller Best seller Trendy Classic 19 Trendy Classic 𝑉𝑛×𝑙 Users 𝑋𝑚 ×𝑛 ≈ 𝑈𝑚 ×𝑑 ∙ 𝐵𝑑 ×𝑙 ∙ Items a b c d e 1 ? 1 3 3 1 2 3 3 2 ? 3 3 2 2 ? 3 ? 4 1 1 3 ? ? 5 1 ? ? 3 1 6 3 ? 2 2 3 7 ? 2 3 3 2 Rating Matrix 𝑇 𝑉𝑛×𝑙 Users 𝑋𝑚 ×𝑛 ≈ 𝑈𝑚 ×𝑑 ∙ 𝐵𝑑 ×𝑙 ∙ 𝑇 Items a b c d e 1 1 1 3 3 1 2 3 3 2 ? 3 3 2 2 ? 3 ? 4 1 1 3 ? ? 5 1 ? ? 3 1 6 3 ? 2 2 3 7 ? 2 3 3 2 Rating Matrix 21 CBT - Codebook Transfer • Main Ideas: – A cluster-level rating pattern (CodeBook) is used to bridge the two domains. • Remarks: – Does not require common users or items (share -nothing). – Preserves privacy! 24 Codebook Transfer • Assumption: related domains share similar cluster level rating patterns. a b c d e f 1 2 3 3 3 2 ? 2 3 1 2 2 ? 3 3 1 ? 2 4 ? 2 1 5 2 3 6 3 2 a e b f c d 2 3 ? 1 1 2 2 1 3 3 3 1 1 ? 2 3 1 1 2 2 3 ? 3 3 1 3 2 5 2 2 3 3 3 3 3 3 2 3 4 ? 3 2 2 1 1 1 ? 3 2 6 3 3 2 2 1 ? Source domain (music) c b c d e 1 ? 1 3 3 1 2 3 3 2 ? 3 2 2 ? 4 1 1 5 1 6 7 u s e r s A B C X 3 1 2 Y 2 3 3 Z 3 2 1 After permutation c d a b e 1 3 3 ? 1 1 3 4 3 ? 1 1 ? 3 ? 5 ? 3 1 ? 1 3 ? ? 2 2 ? 3 3 3 ? ? 3 1 6 2 2 3 ? 3 3 ? 2 2 3 3 ? 3 2 2 ? ? 2 3 3 2 7 3 3 ? 2 2 Target domain (games) items After permutation items u s e r s A B X 3 1 Y 2 3 Z 3 2 25 Why does it make sense? • The rows/columns in the code-book matrix represents the users’/items’ rating distribution: A B C D E F G H I J a 3 1 1 2 2 1 1 3 2 2 b 2 4 4 5 5 5 4 5 3 3 c 1 5 3 2 4 3 4 2 5 1 d 2 1 2 3 2 2 3 4 4 1 e 3 1 5 3 3 4 3 2 2 1 f 3 5 1 3 1 2 1 2 3 2 0.7 0.7 • Less training instances are required to match 0.6 0.6 0.5 than discover 0.5 users/items to existing patterns 0.4 0.4 0.3 0.3 new patterns 0.2 0.1 6E-16 -0.1 0.2 0.1 0 1 2 3 4 5 1 2 3 4 5 Codebook Transfer From A Single Source Domain • Problem settings: - Xsrc – - Xtgt – -B– 27 A dense (n x m) rating matrix A sparse (p x q) rating matrix containing few observed ratings The codebook (k x l) cluster-level user-item rating matrix that encodes the useritem clusters Codebook Transfer From A Single Source Domain Main steps: 1) Construct a “codebook” matrix for the source domain 2) Transfer the codebook to the sparse target domain and find the best cluster assignments 3) Reconstruct a dense rating matrix in the target domain 28 Step 1: CodeBook Construction • Input : – Source rating matrix Xsrc – K , L – user defined codebook dimensions • Co-Clustering: Simultaneously cluster users and items items a b c d e f 1 2 3 3 3 2 ? 2 3 1 2 2 ? 1 3 3 1 ? 2 3 1 4 ? 2 1 1 3 2 5 2 3 3 3 2 3 6 3 2 1 ? 3 2 a e b f c d 2 3 ? 1 1 2 2 3 3 3 1 1 ? 2 1 2 2 3 ? 3 3 5 2 2 3 3 3 3 4 ? 3 2 2 1 1 6 3 3 2 2 1 ? u s e r s A B C X 3 1 2 Y 2 3 3 Z 3 2 1 Codebook 29 Simultaneously cluster users and items (George and Merugu, 2005) Procedure: Co-Clustering 1. Randomly initialize U and V 2. Do 1. Compute the co-cluster means (Codebook B) 2. For each user i=1 to m 1. For each user cluster j=1 to K 1. Assign user i to cluster j that minimizes error 3. Update the co-cluster means (Codebook) 4. For each item i=1 to n 1. For each item cluster j=1 to K 1. Assign item i to cluster j that minimizes error 3. Until No Error Improvement 4. Return U,V, and Codebook B Step 2: Finding the Cluster Assignment • Algorithm: – B is known and obtained from the source domain – Randomly initiate Utgt and Vtgt, – Alternatively updating Utgt and Vtgt by : • Fixing one matrix and finding the best setting for the other matrix 31 Agenda • • • • • • • Introduction Transfer Learning for Cross Domain Recommendation The TALMUD Algorithm Experimental Study – Take 1 Scalable TALMUD – Getting TALMUD Out of the Lab? Experimental Study – Take 2 Conclusions and Future Research TALMUD TrAnsfer Learning from MUltiple Domains • Extends the codebook transfer concept to support: • Multiple source domains with varying levels of relevance. 33 TALMUD-Problem Definition 1. Objective: Minimizing MSE (Mean squared Error) in the target domain min𝑝 ×𝑘 𝑛 𝑈𝑛 ∈ 0,1 𝑉𝑛 ∈ 0,1 𝑞 ×𝑙 𝑛 𝛼 𝑛 ∈𝑅 ∀𝑛∈𝑁 2 𝑁 𝛼𝑛 𝑈𝑛 𝐵𝑛 𝑉𝑛 𝑇 𝑋𝑡𝑔𝑡 − ⃘𝑊 𝑛=1 𝑆. 𝑇 𝑈n 1 = 1, 𝑉n 1 = 1 2. Variables: • Users and items clusters memberships in each source domain n - 𝑈𝑛 , 𝑉𝑛 • 𝛼𝑛 – Relatedness coefficient between each source domain i and the target domain Min 34 34 The TALMUD Algorithm •Step 1: creating a cluster (Codebook 𝐵𝑛 ) for each source domain •Step 2: Learning the target clusters membership based on all source domains simultaneously. 2.1: finding the users’ corresponding clusters 𝑗 = 𝑎𝑟𝑔𝑚𝑖𝑛𝑗 𝑋𝑡𝑔𝑡 𝑖∗ 𝛼𝑛 𝐵𝑛 𝑉𝑛 (𝑡−1) − 𝑇 𝑛=1 2.2: finding the items’ corresponding clusters 2 𝑁 𝑗∗ 2 𝑁 𝑗 = 𝑎𝑟𝑔𝑚𝑖𝑛𝑗 𝑋𝑡𝑔𝑡 ∗𝑖 𝛼𝑛 𝑈𝑛 (𝑡) 𝐵𝑛 − 𝑊𝑖∗ 𝑛=1 ∗𝑗 𝑊∗𝑖 2.3: Learning the coefficients 𝛼𝑛 •Step 3: Calculate the filled-in target rating matrix 35 𝑁 𝑋𝑡𝑔𝑡 = 𝑊 ⃘𝑋𝑡𝑔𝑡 + 1 − 𝑊 ⃘ 𝛼𝑛 (𝑈𝑛 𝐵𝑛 𝑉𝑛𝑇 ) 𝑛=1 Step 2.1 𝛼1 × 36 K11 K12 K13 U1 1 0 0 U2 0 1 0 U3 0 0 1 U4 1 0 u5 0 U6 1 K21 K22 K23 K24 U1 1 0 0 0 U2 0 0 0 1 U3 0 0 1 0 0 U4 1 0 0 0 1 0 u5 0 0 0 1 0 0 U6 0 1 0 0 × 𝐵1 × 𝑉 1 + 𝛼 2 × × 𝐵2 × 𝑉2 L11 L12 L13 K11 3 1 2 K21 3 2 1 2 K12 2 3 3 K22 2 1 2 2 k13 3 2 1 K23 1 3 3 1 K24 3 2 1 3 B1 L21 L22 L23 B2 L24 Step 2.1 𝛼1 × 37 K11 K12 K13 U1 1 0 0 U2 0 1 0 U3 0 0 1 U4 1 0 u5 0 U6 1 K21 K22 K23 K24 U1 0 1 0 0 U2 0 0 0 1 U3 0 0 1 0 0 U4 1 0 0 0 1 0 u5 0 0 0 1 0 0 U6 0 1 0 0 × 𝐵1 × 𝑉 1 + 𝛼 2 × × 𝐵2 × 𝑉2 L11 L12 L13 K11 3 1 2 K21 3 2 1 2 K12 2 3 3 K22 2 1 2 2 k13 3 2 1 K23 1 3 3 1 K24 3 2 1 3 B1 L21 L22 L23 B2 L24 Step 2.1 𝛼1 × 38 K11 K12 K13 U1 1 0 0 U2 0 1 0 U3 0 0 1 U4 1 0 u5 0 U6 1 K21 K22 K23 K24 U1 0 0 1 0 U2 0 0 0 1 U3 0 0 1 0 0 U4 1 0 0 0 1 0 u5 0 0 0 1 0 0 U6 0 1 0 0 × 𝐵1 × 𝑉 1 + 𝛼 2 × × 𝐵2 × 𝑉2 L11 L12 L13 K11 3 1 2 K21 3 2 1 2 K12 2 3 3 K22 2 1 2 2 k13 3 2 1 K23 1 3 3 1 K24 3 2 1 3 B1 L21 L22 L23 B2 L24 Step 2.1 𝛼1 × 39 K11 K12 K13 U1 1 0 0 U2 0 1 0 U3 0 0 1 U4 1 0 u5 0 U6 1 K21 K22 K23 K24 U1 0 0 0 1 U2 0 0 0 1 U3 0 0 1 0 0 U4 1 0 0 0 1 0 u5 0 0 0 1 0 0 U6 0 1 0 0 × 𝐵1 × 𝑉 1 + 𝛼 2 × × 𝐵2 × 𝑉2 L11 L12 L13 K11 3 1 2 K21 3 2 1 2 K12 2 3 3 K22 2 1 2 2 k13 3 2 1 K23 1 3 3 1 K24 3 2 1 3 B1 L21 L22 L23 B2 L24 Step 2.1 𝛼1 × 40 K11 K12 K13 U1 0 1 0 U2 0 1 0 U3 0 0 1 U4 1 0 u5 0 U6 1 K21 K22 K23 K24 U1 1 0 0 0 U2 0 0 0 1 U3 0 0 1 0 0 U4 1 0 0 0 1 0 u5 0 0 0 1 0 0 U6 0 1 0 0 × 𝐵1 × 𝑉 1 + 𝛼 2 × × 𝐵2 × 𝑉2 L11 L12 L13 K11 3 1 2 K21 3 2 1 2 K12 2 3 3 K22 2 1 2 2 k13 3 2 1 K23 1 3 3 1 K24 3 2 1 3 B1 L21 L22 L23 B2 L24 Step 2.1 𝛼1 × 41 K11 K12 K13 U1 0 1 0 U2 0 1 0 U3 0 0 1 U4 1 0 u5 0 U6 1 K21 K22 K23 K24 U1 0 1 0 0 U2 0 0 0 1 U3 0 0 1 0 0 U4 1 0 0 0 1 0 u5 0 0 0 1 0 0 U6 0 1 0 0 × 𝐵1 × 𝑉 1 + 𝛼 2 × × 𝐵2 × 𝑉2 L11 L12 L13 K11 3 1 2 K21 3 2 1 2 K12 2 3 3 K22 2 1 2 2 k13 3 2 1 K23 1 3 3 1 K24 3 2 1 3 B1 L21 L22 L23 B2 L24 Step 2.1 𝛼1 × 42 K11 K12 K13 U1 0 1 0 U2 0 1 0 U3 0 0 1 U4 1 0 u5 0 U6 1 K21 K22 K23 K24 U1 0 0 1 0 U2 0 0 0 1 U3 0 0 1 0 0 U4 1 0 0 0 1 0 u5 0 0 0 1 0 0 U6 0 1 0 0 × 𝐵1 × 𝑉 1 + 𝛼 2 × × 𝐵2 × 𝑉2 L11 L12 L13 K11 3 1 2 K21 3 2 1 2 K12 2 3 3 K22 2 1 2 2 k13 3 2 1 K23 1 3 3 1 K24 3 2 1 3 B1 L21 L22 L23 B2 L24 Step 2.3: Learning The Relatedness Coefficients The goal is to minimize the error N n=1 αn E = Xtgt − Un Bn Vn T ⃘W 2 F Which can be written as E= p i=1 q j=1 N n=1 αn Xij − Tn 2 ij Where we denote (Un Bn VnT ) ⃘W as Tn and Xtgt ⃘W as X 𝜕E 𝜕αn∗ = −2 =− i,j N n=1 αn Xij − i,j X ij Tn∗ ij + Tn N n=1 ij i,j αn Tn∗ ij Tn ij Tn∗ ij =0 Rearranging gives: 43 N n=1 i,j αn Tn ij Tn∗ ij = i,j X ij Tn∗ ij Step 2.3: Learning the relatedness coefficients Thus we obtain the following set of linear equations: i,j i,j 2 ij T1 T2 ij T1 ij ij T1 ij ⋯ ⋯ ⋱ ⋯ ⋮ i,j TN i,j T1 ij i,j ⋮ ⋮ TN TN α1 α2 ⋮ αN ij 2 ij The optimal αn is found by solving : T1 α1 α2 ⋮ αN 2 ij ⋯ i,j = ij T1 ij ⋮ TN i,j ij ij T1 ij T1 i,j X ij T2 ⋮ i,j X ij TN = ij ij ij −1 TN i,j T2 i,j T1 i,j X ij ij Xij T1 ij Xij T2 ij i,j ⋯ ⋮ ⋱ ⋮ ⋯ TN i,j i,j 2 ij ⋮ Xij TN i,j ij 44 • The Hessian matrix is proved to be semi-definite positive and therefore the function is convex. • Conclusion: TALMUD monotonically reduces the value of the objective function. Forward Selection of Sources 1) Adding sources gradually• Begins with empty set of sources • Examine the addition of each source • Add the source that improves the model the most • Wrapper approach is used to decide when to stop. 2) Retrain using the entire dataset with the selected sources Data 1) 2) 3) 45 Training Training Validation Training Test Test Test Agenda • • • • • • • Introduction Transfer Learning for Cross Domain Recommendation The TALMUD Algorithm Experimental Study – Take 1 Scalable TALMUD – Getting TALMUD Out of the Lab? Experimental Study – Take 2 Conclusions and Future Research Datasets • Netfilx (Movies) - We extracted 110 users, 110 items, and 12,100 rating records to ensure no sparsity in order to use it as a dense source matrix in the experiment. • Jester (Jokes) - We used a subset with no sparsity that contains 500 users, 100 items, and 50,000 rating records. • MovieLense (Movies) –We randomly extracted 943 users, 1682 items and 100,000 rating records with ~94% sparsity • Music loads - We randomly selected 632 users who have events on at least 10 different items, and 817 music items with at least 5 events of different user’s ~97% sparsity. • Games loads - We randomly selected 632 users who have events on at least 10 different items, and 1264 items with at least 5 events ~97% sparsity. • BookCrossing (Books) - We randomly extracted 13869 users, 20358 items and 146,960 rating records with ~99.9% sparsity. 47 Compared Methods 1. CB (George and Merugu, 2005) Codebook based factorization for a single domain 2. SVD (Koren et al., 2009) - A single domain algorithm that fills the matrix by finding the users’ and items’ latent factors. 3. CBT (Li et al. 2009a), which learns only from one source domain. 4. RMGM (Li et al., 2009b) a multi-tasking TL method that aims to fill missing values in multiple domains. 5. TALMUD 48 Evaluation Procedure • Train/Test – Using timestamps – Training data - consisted of 80% of the ratings – Testing data - consisted of the remaining 20%. • Evaluation metric: – 𝑀𝐴𝐸 = 𝑇 𝑖=1 𝑃𝑖 −𝑅𝑖 𝑇 49 Comparison Results 250 219.21 Talmud 200 CBT RMGM 150 SVD 133.3 MAE CB 120.5 103.15 96.16 100 88.11 74.84 50 85.21 78.1 78.06 61.17 54.58 53.38 48.67 49.56 0 Games Music Target Domain BookCrossing Curse of Sources Target Games 100 90 Test Error of Complete Forward Selection 80 Train Error of Complete Forward Selection 70 MAE 60 50 40 30 20 10 0 0 1 2 Number of Sources Too many sources leads to over-fitting. 51 Not all given source domains should be used. 3 4 Converge in Practice Training MAE 1.2 1 MAE 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 Number of Iterations Talmud converges to a local minimum in a few iterations; In our empirical tests it usually takes less than ten iterations 7 8 9 10 Agenda • • • • • • • Introduction Transfer Learning for Cross Domain Recommendation The TALMUD Algorithm Experimental Study – Take 1 Scalable TALMUD – Getting TALMUD Out of the Lab? Experimental Study – Take 2 Conclusions and Future Research Complexity Analysis D3(UICD+1+RD)~D3UICD+1 U – Number of users. I – Number of items. R – Number of ratings in TT Note: R<<U*I (R is very sparse). D – Number of source domains. C – Number of user/item clusters in each source domain (typical values: 20/50). n – number of machines Scaling Up TALMUD for Massive Datasets • Dedicated Data Structures and Matrix Operations for Permutation Matrix • Heuristic Methods • MapReduce Framework – Parallelizing Co-Clustering – Parallelizing Search Compact Representation of Matrix U,V and Loss Function Calculation The relatedness Sum of each Codebook of each coefficient prediction source domain between source from each domain 𝑛 and the domain target domain. multiplied with the source N domain min (TT uid , iid U ,V , n 1 relatedness. The mapping of each user to each cluster in each source domain The mapping of each item to each cluster in each source domain n B n [U uid , n ][ V iid , n ]) 2 TT The Predicted rating Observed ratings in Xtgt 2 𝑁 min𝑝×𝑘 𝑛 𝑈𝑛 ∈ 0,1 𝑉𝑛 ∈ 0,1 𝑞×𝑙𝑛 𝛼𝑛 ∈𝑅 ∀𝑛∈𝑁 𝛼𝑛 𝑈𝑛 𝐵𝑛 𝑉𝑛 𝑇 𝑋𝑡𝑔𝑡 − 𝑛=1 Observed ratings in The Predicted rating ⃘𝑊 The matrix W is a binary matrix. If entry rated then 𝑊 𝑖𝑗 = 1 Otherwise 𝑊 𝑖𝑗 = 0 Complexity Analysis D3(UICD+1+RD)~D3UICD+1 Data Structure D3R(CD+D)~D3RCD U – Number of users. I – Number of items. R – Number of ratings in TT Note: R<<U*I (R is very sparse). D – Number of source domains. C – Number of user/item clusters in each source domain (typical values: 20/50). n – number of machines Heuristics • Ranking Based Source Selection instead of Forward Selection. • Cluster Assignment using Binary Search Tree instead of Exhaustive Search. Ranking Based Source Selection 1) Pre-compute the source target correlation estimate – Compare the Target’s codebook with each codebook of the sources 1) 2) Permute the rows and columns in the source codebook to make it alike the target codebook Compare the permuted codebooks (using Euclidean distance) 2) In each iteration examine the next top k (k=2) unselected sources Complexity Analysis D3(UICD+1+RD)~D3UICD+1 Data Structure D3R(CD+D)~D3RCD Ranking Based Source Selection D2RCD U – Number of users. I – Number of items. R – Number of ratings in TT Note: R<<U*I (R is very sparse). D – Number of source domains. C – Number of user/item clusters in each source domain (typical values: 20/50). n – number of machines Cluster Assignment using Search Tree • Bottom-Up cluster the codebook’s rows to create a search tree • Instead of going over all possible codebook assignments (CD) – examine the codebook in sequence while fixing all others codebook assignments and by using the search tree (DlogC) L1 L2 L3 K1 3 1 2 K2 3 2 1 K3 20 8 18 K4 8 1 3 Distance Measures The distance between cluster K1 and K2 is: 𝑑 𝐾1, 𝐾2 = (3 − 3)2 +(1 − 2)2 +(2 − 1)2 = 2 The distance between cluster K1 and K3 is: 𝑑 𝐾1, 𝐾3 = (3 − 20)2 +(1 − 8)2 +(2 − 18)2 = 40 The distance between cluster K1 and K4 is: 𝑑 𝐾1, 𝐾4 = (3 − 8)2 +(1 − 1)2 +(2 − 3)2 = 6 2 closest clusters to K1 are : K2 and K4 L1 L2 L3 K1 3 1 2 K2 3 2 1 K3 20 8 18 K4 8 1 3 Complexity Analysis D3(UICD+1+RD)~D3UICD+1 Data Structure D3R(CD+D)~D3RCD Ranking Based Source Selection D2RCD Search Tree D2Rmax(logC,D) U – Number of users. I – Number of items. R – Number of ratings in TT Note: R<<U*I (R is very sparse). D – Number of source domains. C – Number of user/item clusters in each source domain (typical values: 20/50). n – number of machines MapReduce [Dean et al., OSDI 2004, CACM Jan 2010] VLDB 2010 Tutorial MapReduce For Step 1: Co-Clustering Procedure: Co-Clustering 1. Randomly initialize U and V 2. Do 1. Compute the group means (Codebook B) 2. For each user i=1 to m 1. For each user cluster j=1 to K 1. Assign user i to cluster j that minimizes error 3. Update group means (Codebook) 4. For each item i=1 to n 1. For each item cluster j=1 to K 1. Assign item i to cluster j that minimizes error 3. Until No Error Improvement 4. Return U,V, and Codebook B MapReduce For Step 1: Co-Clustering Procedure: Co-Clustering User Mapper (User i, UserRating Xi) 1. Globals: Codebook B, Matrix U, Matrix V 2. For each user cluster j=1 to K 1. Assign user i to cluster j* that minimizes error and update Ui 3. Cj*Contribution of user i to the aligned codebook row 4. Emit (j*, Cj*) C is an array of two rows: first row for aggregating the rating values and second row for counting the number of ratings Procedure: Co-Clustering User Reducer (Cluster j, Contributions C1,C2,…,Cl) 1. Bj 0 2. For i=1 to l 1. Bj Bj +Ci,1 2. Tj Tj + Ci,2 3. Emit (j,Bj/Ti) In order to save network time we can use combiner to pre-aggregating values at the mapper nodes of all users that belong to the same user cluster. MapReduce for Step 2.1 and 2.2 • Each user/item is addressed independently (similar to step 1) and therefore can be mapped in a separate node. Procedure: Step 2.1 (User i, UserRating Xi) 1. Globals: Codebook B, Matrix U, Matrix V 2. For each user cluster j=1 to K 1. Assign user i to cluster j* that minimizes error and update Ui 3. Emit (1, 1) • A single Reduce node that is used only for synchronizing the mappers. Step 2.3 • Calculating the Hessian is done in MapReduce – Map the matrix for each valid User,Item pair – Reduce aggregate the matrices • Solving the set of linear equations can be done using MapReduce – but don’t bother (N is relatively small). Step 3 • Calculate the validation error: – Map • Calculate the error of a certain <user,item> pair in the validation – Reduce • Aggregate the errors Complexity Analysis D3(UICD+1+RD)~D3UICD+1 Data Structure D3R(CD+D)~D3RCD Ranking Based Source Selection D2RCD Search Tree D2Rmax(logC,D) MapReduce Parallelizing 1/n * D2Rmax(logC,D) U – Number of users. I – Number of items. R – Number of ratings in TT Note: R<<U*I (R is very sparse). D – Number of source domains. C – Number of user/item clusters in each source domain (typical values: 20/50). n – number of machines Talmud Code • Map-Reduce implementation for Apache Hadoop 2.0 using Pig can be obtained from: – https://github.com/marnun/Talmud • Non-distributed Matlab implementation can be obtained from: – http://ise.bgu.ac.il/faculty/liorr/Talmud Agenda • • • • • • • Introduction Transfer Learning for Cross Domain Recommendation The TALMUD Algorithm Experimental Study – Take 1 Scalable TALMUD – Getting TALMUD Out of the Lab? Experimental Study – Take 2 Conclusions and Future Research Results of the Scaling Performance Target Dataset Size of Target Rating Total Size of All Source Rating MAE TALMUD MAE Scalable TALMUD Time TALMUD (Sec) Time Scalable TALMUD (Sec) Small Games 20K 324K 48.67 49.3 43K 0.9K Small Music 16K 328K 74.84 74.91 28K 0.4K Small BookCrossing 147K 177K 49.56 49.56 208K 2.5K Full Games 2.5M 125M N/A 42.66 N/A 92K Full Music 10.2M 118M N/A 53.17 N/A 269K Full BookCrossing 1.1M 126M N/A 46.17 N/A 34.1K Agenda • • • • • • • Introduction Transfer Learning for Cross Domain Recommendation The TALMUD Algorithm Experimental Study – Take 1 Scalable TALMUD – Getting TALMUD Out of the Lab? Experimental Study – Take 2 Conclusions and Future Research Conclusions • A new heterogeneous transfer learning for RecSys (which involves transferring knowledge from multiple sources) • Automatically selects which source domains should be used and in which intensity and by that avoid negative transfer (which occurs when the new classifier performs worse than if there had been no transfer at all) • Overfitting can be avoided by gradually adding sources and evaluate them using wrapper approach. • Outperforms exiting state-of-the-art TL methods (CBT and RMGM). • Scalable 75 Current Activities • • • • • • . Cross-Sources Transfer In-Domain Self-Transfer Cross-Events Transfer Experiments with other Bi-Partite Graphs tasks Generalization to DAG Goodbye MapReduce - Implementations on top of a DSL for linear algebraic operations Thank You! 77