Transcript Document
Recommendation Systems Prof. Dr. Daning Hu Department of Informatics University of Zurich Nov 13th, 2012 Outline Introduction Approaches Recommendation Systems Collaborative Filtering Content-based Social Contagion Ref Book: Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences) http://www.amazon.com/Social-Network-Analysis-ApplicationsStructural/dp/0521387078 2 Introduction Recommendation systems are a subclass of information filtering system that seek to predict the 'rating' or 'preference' that a user would give to an item or social element they had not yet considered (Wiki) the user's social approaches) using a model built from the characteristics of an item (Contentbased approaches) or studying consumer purchase behavior in e-commerce setting In particular, the evolution of interactions among consumers and products reflected in online-sales transactions. environment (Collaborative Filtering Underlying Technologies: Machine Learning Recommendation systems are instances of personalization software. adapting to the individual needs, interests, and preferences of each user. as part of Customer Relationship Management (CRM). Machine Learning (ML) aims to learn a user model or profile of a particular user based on: Sample interaction Rated examples Used to filter information and predict consumer behaviors 4 Collaborative Filtering A database of many users’ ratings of a variety of items. For a given user, find other similar users whose ratings strongly correlate with the current user. Recommend items rated highly by these similar users, but have not yet rated by the current user. Amazon, etc. 5 Collaborative Filtering User Database A B C : Z 9 3 : 5 A B C 9 : : Z 10 A B C : Z 5 3 A B C 8 : : Z : 7 Correlation Match Active User A 9 B 3 C . . Z 5 A 6 B 4 C : : Z A B C : Z 9 3 : 5 A 10 B 4 C 8 . . Z 1 A 10 B 4 C 8 . . Z 1 Extract Recommendations C 6 Collaborative Filtering Method Weight all users with respect to similarity with the active user. Select a subset of the users (neighbors) to use as predictors. Normalize ratings and compute a prediction from a weighted combination of the selected neighbors’ ratings. Present items with highest predicted ratings as recommendations. 7 Similarity Weighting Typically use Pearson correlation coefficient between ratings for covar(ra , ru ) active user, a, and another user, u. ca,u = sr sr a u ra and ru are the ratings vectors for the m items rated by both a m and u å(ra,i - ra )(ru,i - ru ) Covariance: covar(ra , ru ) = i=1 m m Standard Deviation: sr = 2 (r r ) å x,i x i=1 x ri,j is user i’s rating for item j m 8 Cons Cold Start: enough users in the system to find a match. Sparsity: The user/ratings matrix is sparse, and it is hard to find users that have rated the same items. First Rater: Not for an item that has not been previously rated Popularity Bias: Cannot recommend items to someone with unique tastes. Tends to recommend popular items. 9 Content-based Approaches Recommendations are based on information on the content of items rather than on other users’ opinions. Uses machine learning algorithms to induce a profile of the users preferences from examples based on content features. No need for data on other users. No cold-start or sparsity problems. Able to recommend to users with unique tastes. No first-rater problem. 10 Combining Content and Collaboration Content-based and collaborative methods have complementary strengths and weaknesses. Combined methods to obtain the best of both. Apply both methods and combine recommendations. Use collaborative data as content. Use content-based predictor as another collaborator. Use content-based predictor to complete collaborative data. 11 Using Social Contagion for Recommendations Intelligent Advertising, Product Recommendation Who are the most influential people? What are the patterns of information diffusion? 12 Social Contagion Thoery – LeBon et al. 1895 Le Bon, Park and Blumer the three major theorists made an assumption that something happens in a crowd situation that can cause people to become irrational. The social pathology and social contagion perspectives – the idea that someone who already has the affliction (behavior) can pass it on the someone else, and it can rapidly infect others Gabrielle Tarde’s work on the ‘laws of imitation’ Applications: Viral marketing, social media marketing 13 Social Recommendations for Marketing Mass marketing is not the best way to attract people $ Expensive $ Usually not very focused Recommendations by people we know are more effective then input by unknown individuals Content: Our friends know what we like Homophily: Our friends and us are more likely to share interests and preferences Biased: We listen more to what our friends say (usually) Inexpensive 14 15 16 Data The dataset for this study was collected from a large online OSS community – Ohloh, which provides information about 11,800 OSS projects involving 94,330 people Positive evaluation relationship Developers’ sociological features Nationality, geographical location, etc. OSS project related information Primary programming language, development activity, ratings, etc. From software revision control repositories – Subversion, CVS and Git. Ohloh web site provides a REST-based application programming interface (API) for users to access and query its data. Name Jason Allen Robin Luckey Scott Collison The Ohloh Slave Created_at 2006-09-15T02:23:01Z 2006-09-15T02:23:01Z 2006-09-15T02:23:01Z 2006-09-15T02:23:02Z Programming Total Location Country Kudo_rank Language Commits Sammamish WA US 9 Java 789 Seattle WA US 9 Java 11358 Seattle WA US 8 C 254 Redmond WA US 9 Php 13 Figure.1. Sample data from Ohloh developers 17 Statistical Analysis on Link Formation Dependent variable: The outcome of a developer D participates in an OSS project P at time T , coded as a binary variable “Kudo” link. Independent variables include three types of possible determinants Homophily factors Share affiliation factors Preferential attachment factors 18 19 20 Conditional Logit Analysis Conditional logistic model (CLM) have been widely used to examine the determinants which affect individuals’ choices (McFadden 1980; McFadden et al. 1974; Powell et al. 2005). Model human choice behavior – project participation choices. It is specified as follows: Pr( y i ) exp( X i ) J j exp( X j ) where y is the observed choice of the new developer to participate in project i , and Xi is a vector of the factors that influence such choice. J is the alternative set of projects available. The unknown coefficients are typically estimated by maximum likelihood methods. 21 Predicting Future Evaluation Choices Our analysis also provided a prediction mechanism using conditional logistic model and the discovered determinants. Pr( y i ) J j exp( X j ) For instance, if developers a and the developer b exp( X i ) Live in New York City (Coefficient of homophily in location: 5.190) Use Java as their primary programming language (Coefficient : 1.623) etc. i The probability Jfor a choose to positively evaluate b from an alternative set can be calculated.