Transcript Document
Recommendation Systems
Prof. Dr. Daning Hu
Department of Informatics
University of Zurich
Nov 13th, 2012
Outline
Introduction
Approaches Recommendation Systems
Collaborative Filtering
Content-based
Social Contagion
Ref Book: Social Network Analysis: Methods and Applications
(Structural Analysis in the Social Sciences)
http://www.amazon.com/Social-Network-Analysis-ApplicationsStructural/dp/0521387078
2
Introduction
Recommendation systems are a subclass of information
filtering system that seek to predict the 'rating' or 'preference'
that a user would give to an item or social element they had
not yet considered (Wiki)
the user's social
approaches)
using a model built from the characteristics of an item (Contentbased approaches) or
studying consumer purchase behavior in e-commerce setting
In particular, the evolution of interactions among consumers
and products reflected in online-sales transactions.
environment
(Collaborative
Filtering
Underlying Technologies: Machine Learning
Recommendation systems are instances of personalization
software.
adapting to the individual needs, interests, and preferences of each
user.
as part of Customer Relationship Management (CRM).
Machine Learning (ML) aims to learn a user model or profile of a
particular user based on:
Sample interaction
Rated examples
Used to filter information and predict consumer behaviors
4
Collaborative Filtering
A database of many users’ ratings of a variety of items.
For a given user, find other similar users whose ratings strongly
correlate with the current user.
Recommend items rated highly by these similar users, but have
not yet rated by the current user.
Amazon, etc.
5
Collaborative Filtering
User
Database
A
B
C
:
Z
9
3
:
5
A
B
C 9
: :
Z 10
A
B
C
:
Z
5
3
A
B
C 8
: :
Z
:
7
Correlation
Match
Active
User
A 9
B 3
C
. .
Z 5
A 6
B 4
C
: :
Z
A
B
C
:
Z
9
3
:
5
A 10
B 4
C 8
. .
Z 1
A 10
B 4
C 8
. .
Z 1
Extract
Recommendations
C
6
Collaborative Filtering Method
Weight all users with respect to similarity with the active user.
Select a subset of the users (neighbors) to use as predictors.
Normalize ratings and compute a prediction from a weighted
combination of the selected neighbors’ ratings.
Present items with highest predicted ratings as recommendations.
7
Similarity Weighting
Typically use Pearson correlation coefficient between ratings for
covar(ra , ru )
active user, a, and another user, u.
ca,u =
sr sr
a
u
ra and ru are the ratings vectors for the m items rated by both a
m
and u
å(ra,i - ra )(ru,i - ru )
Covariance:
covar(ra , ru ) = i=1
m
m
Standard Deviation:
sr =
2
(r
r
)
å x,i x
i=1
x
ri,j is user i’s rating for item j
m
8
Cons
Cold Start: enough users in the system to find a match.
Sparsity: The user/ratings matrix is sparse, and it is hard to find
users that have rated the same items.
First Rater: Not for an item that has not been previously rated
Popularity Bias: Cannot recommend items to someone with
unique tastes.
Tends to recommend popular items.
9
Content-based Approaches
Recommendations are based on information on the content of
items rather than on other users’ opinions.
Uses machine learning algorithms to induce a profile of the users
preferences from examples based on content features.
No need for data on other users.
No cold-start or sparsity problems.
Able to recommend to users with unique tastes.
No first-rater problem.
10
Combining Content and Collaboration
Content-based and collaborative methods have complementary
strengths and weaknesses. Combined methods to obtain the best
of both.
Apply both methods and combine recommendations.
Use collaborative data as content.
Use content-based predictor as another collaborator.
Use content-based predictor to complete collaborative data.
11
Using Social Contagion for Recommendations
Intelligent Advertising, Product Recommendation
Who are the most influential people?
What are the patterns of information diffusion?
12
Social Contagion Thoery – LeBon et al. 1895
Le Bon, Park and Blumer the three major theorists made an
assumption that something happens in a crowd situation that can
cause people to become irrational.
The social pathology and social contagion perspectives – the idea
that someone who already has the affliction (behavior) can pass it
on the someone else, and it can rapidly infect others
Gabrielle Tarde’s work on the ‘laws of imitation’
Applications: Viral marketing, social media marketing
13
Social Recommendations for Marketing
Mass marketing is not the best way to attract people
$ Expensive $
Usually not very focused
Recommendations by people we know are more effective then
input by unknown individuals
Content: Our friends know what we like
Homophily: Our friends and us are more likely to share interests
and preferences
Biased: We listen more to what our friends say (usually)
Inexpensive
14
15
16
Data
The dataset for this study was collected from a large online OSS
community – Ohloh, which provides information about 11,800 OSS projects
involving 94,330 people
Positive evaluation relationship
Developers’ sociological features
Nationality, geographical location, etc.
OSS project related information
Primary programming language, development activity, ratings, etc.
From software revision control repositories – Subversion, CVS and Git.
Ohloh web site provides a REST-based application programming interface
(API) for users to access and query its data.
Name
Jason Allen
Robin Luckey
Scott Collison
The Ohloh Slave
Created_at
2006-09-15T02:23:01Z
2006-09-15T02:23:01Z
2006-09-15T02:23:01Z
2006-09-15T02:23:02Z
Programming
Total
Location
Country Kudo_rank Language
Commits
Sammamish WA
US
9
Java
789
Seattle WA
US
9
Java
11358
Seattle WA
US
8
C
254
Redmond WA
US
9
Php
13
Figure.1. Sample data from Ohloh developers
17
Statistical Analysis on Link Formation
Dependent variable: The outcome of a developer D
participates in an OSS project P at time T , coded as a binary
variable “Kudo” link.
Independent variables include three types of possible
determinants
Homophily factors
Share affiliation factors
Preferential attachment factors
18
19
20
Conditional Logit Analysis
Conditional logistic model (CLM) have been widely used to examine the
determinants which affect individuals’ choices (McFadden 1980; McFadden et al. 1974;
Powell et al. 2005).
Model human choice behavior – project participation choices.
It is specified as follows:
Pr( y i )
exp( X i )
J
j
exp( X j )
where y is the observed choice of the new developer to participate in project i , and
Xi
is a vector
of the factors that influence such choice. J is the alternative set of projects
available. The unknown coefficients
are typically estimated
by maximum likelihood
methods.
21
Predicting Future Evaluation Choices
Our analysis also provided a prediction mechanism using conditional
logistic model and the discovered determinants.
Pr( y i )
J
j
exp( X j )
For instance, if developers a and the developer b
exp( X i )
Live in New York City (Coefficient of homophily in location: 5.190)
Use Java as their primary programming language (Coefficient : 1.623)
etc.
i
The probability Jfor
a choose to positively evaluate b from an
alternative set can be calculated.