Transcript Document

Recommendation Systems
Prof. Dr. Daning Hu
Department of Informatics
University of Zurich
Nov 13th, 2012
Outline

Introduction

Approaches Recommendation Systems


Collaborative Filtering

Content-based

Social Contagion
Ref Book: Social Network Analysis: Methods and Applications
(Structural Analysis in the Social Sciences)

http://www.amazon.com/Social-Network-Analysis-ApplicationsStructural/dp/0521387078
2
Introduction

Recommendation systems are a subclass of information
filtering system that seek to predict the 'rating' or 'preference'
that a user would give to an item or social element they had
not yet considered (Wiki)

the user's social
approaches)

using a model built from the characteristics of an item (Contentbased approaches) or

studying consumer purchase behavior in e-commerce setting

In particular, the evolution of interactions among consumers
and products reflected in online-sales transactions.
environment
(Collaborative
Filtering
Underlying Technologies: Machine Learning


Recommendation systems are instances of personalization
software.

adapting to the individual needs, interests, and preferences of each
user.

as part of Customer Relationship Management (CRM).
Machine Learning (ML) aims to learn a user model or profile of a
particular user based on:

Sample interaction

Rated examples

Used to filter information and predict consumer behaviors
4
Collaborative Filtering

A database of many users’ ratings of a variety of items.

For a given user, find other similar users whose ratings strongly
correlate with the current user.

Recommend items rated highly by these similar users, but have
not yet rated by the current user.

Amazon, etc.
5
Collaborative Filtering
User
Database
A
B
C
:
Z
9
3
:
5
A
B
C 9
: :
Z 10
A
B
C
:
Z
5
3
A
B
C 8
: :
Z
:
7
Correlation
Match
Active
User
A 9
B 3
C
. .
Z 5
A 6
B 4
C
: :
Z
A
B
C
:
Z
9
3
:
5
A 10
B 4
C 8
. .
Z 1
A 10
B 4
C 8
. .
Z 1
Extract
Recommendations
C
6
Collaborative Filtering Method

Weight all users with respect to similarity with the active user.

Select a subset of the users (neighbors) to use as predictors.

Normalize ratings and compute a prediction from a weighted
combination of the selected neighbors’ ratings.

Present items with highest predicted ratings as recommendations.
7
Similarity Weighting

Typically use Pearson correlation coefficient between ratings for
covar(ra , ru )
active user, a, and another user, u.
ca,u =
sr sr
a


u
ra and ru are the ratings vectors for the m items rated by both a
m
and u
å(ra,i - ra )(ru,i - ru )
Covariance:
covar(ra , ru ) = i=1
m
m

Standard Deviation:
sr =
2
(r
r
)
å x,i x
i=1
x

ri,j is user i’s rating for item j
m
8
Cons

Cold Start: enough users in the system to find a match.

Sparsity: The user/ratings matrix is sparse, and it is hard to find
users that have rated the same items.

First Rater: Not for an item that has not been previously rated

Popularity Bias: Cannot recommend items to someone with
unique tastes.

Tends to recommend popular items.
9
Content-based Approaches

Recommendations are based on information on the content of
items rather than on other users’ opinions.

Uses machine learning algorithms to induce a profile of the users
preferences from examples based on content features.

No need for data on other users.

No cold-start or sparsity problems.

Able to recommend to users with unique tastes.

No first-rater problem.
10
Combining Content and Collaboration

Content-based and collaborative methods have complementary
strengths and weaknesses. Combined methods to obtain the best
of both.

Apply both methods and combine recommendations.

Use collaborative data as content.

Use content-based predictor as another collaborator.

Use content-based predictor to complete collaborative data.
11
Using Social Contagion for Recommendations
 Intelligent Advertising, Product Recommendation
 Who are the most influential people?
 What are the patterns of information diffusion?
12
Social Contagion Thoery – LeBon et al. 1895

Le Bon, Park and Blumer the three major theorists made an
assumption that something happens in a crowd situation that can
cause people to become irrational.

The social pathology and social contagion perspectives – the idea
that someone who already has the affliction (behavior) can pass it
on the someone else, and it can rapidly infect others


Gabrielle Tarde’s work on the ‘laws of imitation’
Applications: Viral marketing, social media marketing
13
Social Recommendations for Marketing


Mass marketing is not the best way to attract people

$ Expensive $

Usually not very focused
Recommendations by people we know are more effective then
input by unknown individuals

Content: Our friends know what we like

Homophily: Our friends and us are more likely to share interests
and preferences

Biased: We listen more to what our friends say (usually)

Inexpensive
14
15
16
Data

The dataset for this study was collected from a large online OSS
community – Ohloh, which provides information about 11,800 OSS projects
involving 94,330 people

Positive evaluation relationship
 Developers’ sociological features
 Nationality, geographical location, etc.
 OSS project related information
 Primary programming language, development activity, ratings, etc.
 From software revision control repositories – Subversion, CVS and Git.

Ohloh web site provides a REST-based application programming interface
(API) for users to access and query its data.
Name
Jason Allen
Robin Luckey
Scott Collison
The Ohloh Slave
Created_at
2006-09-15T02:23:01Z
2006-09-15T02:23:01Z
2006-09-15T02:23:01Z
2006-09-15T02:23:02Z
Programming
Total
Location
Country Kudo_rank Language
Commits
Sammamish WA
US
9
Java
789
Seattle WA
US
9
Java
11358
Seattle WA
US
8
C
254
Redmond WA
US
9
Php
13
Figure.1. Sample data from Ohloh developers
17
Statistical Analysis on Link Formation

Dependent variable: The outcome of a developer D
participates in an OSS project P at time T , coded as a binary
variable “Kudo” link.

Independent variables include three types of possible
determinants

Homophily factors

Share affiliation factors

Preferential attachment factors
18
19
20
Conditional Logit Analysis

Conditional logistic model (CLM) have been widely used to examine the
determinants which affect individuals’ choices (McFadden 1980; McFadden et al. 1974;
Powell et al. 2005).


Model human choice behavior – project participation choices.
It is specified as follows:
Pr( y  i ) 
exp( X i  )

J
j
exp( X j  )
where y is the observed choice of the new developer to participate in project i , and
Xi
is a vector
of the factors that influence such choice. J is the alternative set of projects
available. The unknown coefficients
are typically estimated
by maximum likelihood

methods.
21
Predicting Future Evaluation Choices

Our analysis also provided a prediction mechanism using conditional
logistic model and the discovered determinants.
Pr( y  i ) 


J
j
exp( X j  )
For instance, if developers a and the developer b




exp( X i  )
Live in New York City (Coefficient of homophily in location: 5.190)
Use Java as their primary programming language (Coefficient : 1.623)
etc.
i
The probability Jfor
a choose to positively evaluate b from an
alternative set can be calculated.