BobBell_Dimacs1

Download Report

Transcript BobBell_Dimacs1

Lessons from
the Netflix Prize
Robert Bell
AT&T Labs-Research
In collaboration with
Chris Volinsky, AT&T Labs-Research
& Yehuda Koren, Yahoo! Research
“We’re quite curious, really. To the tune of
one million dollars.” – Netflix Prize rules
• Goal to improve on Netflix’s existing movie
recommendation technology
• Contest began October 2, 2006
• Prize
– Based on reduction in root mean squared error
(RMSE) on test data
– $1,000,000 grand prize for 10% drop (19% for MSE)
– Or, $50,000 progress for best result each year
2
Data Details
• Training data
– 100 million ratings (from 1 to 5 stars)
– 6 years (2000-2005)
– 480,000 users
– 17,770 “movies”
• Test data
– Last few ratings of each user
– Split as shown on next slide
3
Test Data Split into Three Pieces
• Probe
– Ratings released
– Allows participants to assess
methods directly
• Daily submissions allowed
for combined Quiz/Test
data
– Identity of Quiz cases
withheld
– RMSE released for Quiz
– Test RMSE withheld
– Prizes based on Test RMSE
4
Higher Mean Rating in Probe Data
40
35
Percentage
30
25
Training (m = 3.60)
20
Probe (m = 3.67)
15
10
5
0
1
2
3
4
5
Rating
5
Something Happened in Early 2004
2004
6
Data about the Movies
Most Loved Movies
Avg rating
Count
The Shawshank Redemption
4.593
137812
Lord of the Rings :The Return of the King
4.545
133597
The Green Mile
4.306
180883
Lord of the Rings :The Two Towers
4.460
150676
Finding Nemo
4.415
139050
Raiders of the Lost Ark
4.504
117456
Most Rated Movies
Highest Variance
Miss Congeniality
The Royal Tenenbaums
Independence Day
Lost In Translation
The Patriot
Pearl Harbor
The Day After Tomorrow
Miss Congeniality
Pretty Woman
Napolean Dynamite
Pirates of the Caribbean
Fahrenheit 9/11
Most Active Users
User ID
# Ratings
Mean Rating
305344
17,651
1.90
387418
17,432
1.81
2439493
16,560
1.22
1664010
15,811
4.26
2118461
14,829
4.08
1461435
9,820
1.37
1639792
9,764
1.33
1314869
9,739
2.95
8
Major Challenges
1. Size of data
–
–
Places premium on efficient algorithms
Stretched memory limits of standard PCs
2. 99% of data are missing
–
–
Eliminates many standard prediction methods
Certainly not missing at random
3. Training and test data differ systematically
–
–
Test ratings are later
Test cases are spread uniformly across users
9
Major Challenges (cont.)
4. Countless factors may affect ratings
–
–
–
–
Genre, movie/TV series/other
Style of action, dialogue, plot, music et al.
Director, actors
Rater’s mood
5. Large imbalance in training data
–
–
Number of ratings per user or movie varies by
several orders of magnitude
Information to estimate individual parameters varies
widely
10
Ratings per Movie in Training Data
Avg #ratings/movie: 5627
11
Ratings per User in Training Data
Avg #ratings/user: 208
12
The Fundamental Challenge
• How can we estimate as much signal
as possible where there are sufficient
data, without over fitting where data
are scarce?
13
Recommender Systems
• Personalized recommendations of items
(e.g., movies) to users
• Increasingly common
– To deal with explosive number of choices on
the internet
– Netflix
– Amazon
– Many others
14
Content Based Systems
• A pre-specified list of attributes
• Score each item on all attributes
• User interest obtained for the same
attributes
– Direct solicitation, or
– Estimated based on user rating, purchases,
or other behavior
15
Pandora
• Music recommendation system
• Songs rated on 400+ attributes
– Music genome project
– Roots, instrumentation, lyrics, vocals
• Two types of user feedback
– Seed songs
– Thumbs up/down for recommended songs
16
Collaborative Filtering (CF)
• Avoids need for:
– Determining “proper” content
– Collecting information about items or users
• Infers user-item relationships from
purchases or ratings
• Used by Amazon and Netflix
• Two main CF tools
– Nearest neighbors
– Latent factor models
17
Nearest Neighbor Methods
• Most common CF tool at the beginning of the contest
• Predict rating for a specific user-item pair based on
ratings of
– Similar items
– By the same user
– Or vice versa
•
rˆui 


j N ( i ; u )
s ij ruj
j N ( i ; u )
s ij
• Pearson correlation or cosine similarity
18
Merits of Nearest Neighbors
• Few modeling assumptions
• Few tuning parameters to learn
• Easy to explain to users
– Dear Amazon.com Customer, We've noticed that
customers who have purchased or rated How Does
the Show Go On: An Introduction to the Theater by
Thomas Schumacher have also purchased Princess
Protection Program #1: A Royal Makeover (Disney
Early Readers).
19
Latent Factor Models
• Models with latent classes of items and users
– Individual items and users are assigned to either a
single class or a mixture of classes
• Neural networks
– Restricted Boltzmann machines
• Singular Value Decomposition (SVD)
– AKA matrix factorization
– Items and users described by unobserved factors
– Main method used by leaders of competition
20
SVD
• Dimension reduction technique for matrices
• Each item summarized by a
d-dimensional vector qi
• Similarly, each user summarized by pu
• Choose d much smaller than number of items or
users
– e.g., d = 50 << 18,000 or 480,000
• Predicted rating for Item i by User u
– Inner product of qi and pu
– rˆ  q ' p or rˆ    a  b  q ' p
ui
i u
ui
u
i
i u
21
serious
Braveheart
The Color Purple
Amadeus
Lethal Weapon
Sense and
Sensibility
Ocean’s 11
Geared towards
females
Geared towards
males
The Lion King
The Princess
Diaries
Dumb and
Dumber
Independence
Day
escapist
22
serious
Braveheart
The Color Purple
Amadeus
Lethal Weapon
Sense and
Sensibility
Ocean’s 11
Geared towards
females
Geared towards
males
Dave
The Lion King
The Princess
Diaries
Dumb and
Dumber
Independence
Day
Gus
escapist
23
Regularization for SVD
• Want to minimize SSE for Test data
• One idea: Minimize SSE for Training data
– Want large d to capture all the signals
– But, Test RMSE begins to rise for d > 2
• Regularization is needed
– Allow rich model where there are sufficient data
– Shrink aggressively where data are scarce
• Minimize  ( r
ui
training

 p qi )    pu
 u
'
u
2
2
  qi
i
2



24
serious
Braveheart
The Color Purple
Amadeus
Lethal Weapon
Sense and
Sensibility
Ocean’s 11
Geared towards
females
Geared towards
males
The Lion King
The Princess
Diaries
Dumb and
Dumber
Independence
Day
Gus
escapist
25
serious
Braveheart
The Color Purple
Amadeus
Lethal Weapon
Sense and
Sensibility
Ocean’s 11
Geared towards
females
Geared towards
males
The Lion King
The Princess
Diaries
Dumb and
Dumber
Independence
Day
Gus
escapist
26
serious
Braveheart
The Color Purple
Amadeus
Lethal Weapon
Sense and
Sensibility
Ocean’s 11
Geared towards
females
Geared towards
males
The Lion King
The Princess
Diaries
Independence
Day
Dumb and
Dumber
Gus
escapist
27
serious
Braveheart
The Color Purple
Amadeus
Lethal Weapon
Sense and
Sensibility
Ocean’s 11
Geared towards
females
Geared towards
males
The Lion King
The Princess
Diaries
Gus
Dumb and
Dumber
Independence
Day
escapist
28
Estimation for SVD
• Fit by gradient descent
–
–
–
–
Loop over observed ratings
Update each relevant parameter
Small step in each parameter, proportional to gradient
Repeat until convergence
• Alternatively, fit by sequence of ridge regressions
–
–
–
–
Fix item factors
Loop over users, estimating user factors
Do same to estimate item factors
Repeat until convergence
29
Improvements to
Collaborative Filtering
• Fine tune existing methods
• Incorporate alternative “effects”
• Incorporate a variety of modeling methods
• Careful regularization to avoid over fitting
Localized SVD
• SVD uses all of a user’s ratings to train the user’s
factors
• But what if the user is multiple people?
– Different factor values may apply to movies rated by
Mom vs. Dad vs. the Kids
• This approach computes user factors, pu ,
specific to the movie being predicted
– Given all the {qi}, pu is the solution of a ridge regression
– Weighted ridge regressions with higher weights for
movies similar to the target movie
Improvement from Localized SVD
Lesson #1: Data >> Models
• Very limited feature set
– User, movie, date
– Places focus on models/algorithms
• Major steps forward associated with
incorporating new data features
– What movies a user rated
– Temporal effects
33
You are What You Rate
• What you rate (and don’t) provides
information about your preferences
• Paterek’s NSVD explicitly characterizes
users by which movies they like
• Incorporate what a user rated into the user
factor
–

'
rˆui    a u  b i  q i  p u  |N(u)|

-1 / 2

y
 j
j N ( u ) 
• Substantially reduces RMSE
34
Temporal Effects
• User behavior may change over time
– Ratings go up or down
– Interests change
– For example, with addition of a new rater
• Allow user biases and/or factors to
change over time 

y 
–

– Model au(t) and pu(t) as linear, unrestricted,
or a sum of both types
'
rˆui ( t )    a u ( t )  b i ( t )  q i  p u ( t )  |N(u)|

-1 / 2
j
j N ( u )
35
serious
Braveheart
The Color Purple
Amadeus
Lethal Weapon
Sense and
Sensibility
Ocean’s 11
Geared towards
females
Geared towards
males
The Lion King
The Princess
Diaries
Dumb and
Dumber
Independence
Day
Gus
escapist
36
serious
Braveheart
The Color Purple
Amadeus
Lethal Weapon
Sense and
Sensibility
Ocean’s 11
Geared towards
females
Geared towards
males
The Lion King
The Princess
Diaries
Independence
Day
Dumb and
Dumber
Gus
escapist
37
serious
Braveheart
The Color Purple
Amadeus
Lethal Weapon
Sense and
Sensibility
Ocean’s 11
Geared towards
females
Geared towards
males
The Lion King
The Princess
Diaries
Dumb and
Gus Dumber
Independence
Day
escapist
38
serious
Braveheart
The Color Purple
Amadeus
Lethal Weapon
Sense and
Sensibility
Ocean’s 11
Gus +
Geared towards
females
Geared towards
males
The Lion King
The Princess
Diaries
Dumb and
Dumber
Independence
Day
escapist
39
#2: The Power of Regularized
SVD Fit by Gradient Descent
• Allowed anyone to approach early leaders
– Powerful predictor
– Efficient
– Easy to program
• Flexibility to incorporate additional features
– Implicit feedback
– Temporal effects
– Neighborhood effects
• Accurate regularization is essential
40
Factor models: RMSE vs. #parameters
0.905
50
Basic SVD
100
0.900
… + What was Rated
200
… + Linear Time Factors
… + Per-Day User Biases
50
0.895
RMSE
100
… + per-Day User Factors
200
50
0.890
100
200
500
0.885
100
200
500
50
100
0.880
200
500
1000
1500
0.875
10
100
1000
10000
100000
Millions of Parameters
41
#3: The Wisdom of Crowds (of Models)
• All models are wrong; some are useful – G. Box
• Used linear blends of many prediction sets
– 107 in Year 1
– Over 800 at the end
• Difficult, or impossible, to build the grand
unified model
• Mega blends are not needed in practice
– A handful of simple models achieves 80 percent of
the improvement of the full blend
#4: Find Good Teammates
• Yehuda Koren
–
–
–
–
The engine of progress for the Netflix Prize
Implicit feedback
Temporal effects
Nearest neighbor modeling
• Big Chaos: Michael Jahrer, Andreas Toscher (Year 2)
– Optimization of tuning parameters
– Blending methods
• Pragmatic Theory: Martin Chabbert, Martin Piotte (Year 3)
– Some movies age better than others
– Link functions
43
The Final Leaderboard
44
Test Set Results
• The Ensemble:
0.856714
45
Test Set Results
• The Ensemble:
0.856714
• BellKor’s Pragmatic Theory: 0.856704
46
Test Set Results
• The Ensemble:
0.856714
• BellKor’s Pragmatic Theory: 0.856704
• Both scores round to
0.8567
47
Test Set Results
• The Ensemble:
0.856714
• BellKor’s Pragmatic Theory: 0.856704
• Both scores round to
0.8567
• Tie breaker is submission date/time
48
Final Test Set Leaderboard
49
Who Got the Money?
• AT&T’s donated its full share to organizations
supporting science education
• Young Science Achievers Program
• New Jersey Institute of Technology pre-college
and educational opportunity programs
• North Jersey Regional Science Fair
• Neighborhoods Focused on African American
Youth
#5: Is This the Way to Do Science?
• Big Success for Netflix
– Lots of cheap labor, good publicity
– Already incorporated 6 percent improvement
– Potential for much more using other data they have
• Big advances to the science of recommender
systems
–
–
–
–
Regularized SVD
Identification of new features
Understanding nearest neighbors
Contributions to literature
51
Why Did this Work so Well?
•
•
•
•
Industrial strength data
Very good design
Accessibility to anyone with a PC
Free flow of ideas
– Leaderboard
– Forum
– Workshop and papers
• Money?
52
But There are Limitations
•
•
•
•
Need a conceptually simple task
Winner-take-all has drawbacks
Intellectual property and liability issues
How many prizes can overlap?
53
Thank You!
• [email protected]
• www.netflixprize.com
– …/leaderboard
– …/community
• Click BellKor’s Pragmatic Chaos or The
Ensemble on Leaderboard for details
54