Quest for $1,000,000: The Netflix Prize Competition

Download Report

Transcript Quest for $1,000,000: The Netflix Prize Competition

Quest for $1,000,000:
The Netflix Prize
Bob Bell
AT&T Labs-Research
July 15, 2009
Joint work with Chris Volinsky, AT&T Labs-Research
and Yehuda Koren, Yahoo! Research
Recommender Systems
• Personalized recommendations of items
(e.g., movies) to users
• Increasingly common
– To deal with explosive number of choices on
the internet
– Netflix
– Amazon
– Many others
2
Content Based Systems
• A pre-specified list of attributes
• Score each item on all attributes
• User interest obtained for the same
attributes
– Direct solicitation, or
– Estimated based on user purchases or ratings
3
Pandora
• Music recommendation system
• Songs rated on 400+ attributes
– Music genome project
– Roots, instrumentation, lyrics, vocals
• Two types of user feedback
– Seed songs
– Thumbs up/down for recommended songs
4
Drawbacks of Content Based Systems
• Effort to score all items on many attributes
– Best attributes may be unknown
– Some attributes may be unscorable
• Need for direct solicitation of data from
users in some systems
5
Collaborative Filtering (CF)
• Does not require content information about
items or solicitation of users
• Infers user-item relationships from
purchases or ratings
• Used by Amazon and Netflix
6
“We’re quite curious, really. To the tune of
one million dollars.” – Netflix Prize rules
• Goal to improve on Netflix’ existing movie
recommendation technology
• Prize
– Based on reduction in root mean squared
error (RMSE) on test data
– $1,000,000 grand prize for 10% drop
– Or, $50,000 progress for best result each
year
• Contest began October 2, 2006
7
Data Details
• Training data
–
–
–
–
100 million ratings (from 1 to 5 stars)
6 years (2000-2005)
480,000 users
17,770 “movies”
• Test data
–
–
–
–
Last few ratings of each user
User, movie, date given
Ratings withheld (for most of test data)
Teams are allowed daily feedback on their RMSE
8
Higher Mean Rating in Test Data
40
35
Percentage
30
25
Training (m = 3.60)
20
Probe (m = 3.67)
15
10
5
0
1
2
3
4
5
Rating
9
Something Happened in Early 2004
2004
10
Movies Rated Most Often
Title
# Ratings
Mean Rating
Miss Congeniality
227,715
3.36
Independence Day
216,233
3.72
The Patriot
200,490
3.78
The Day After Tomorrow
194,695
3.44
Pretty Woman
190,320
3.90
Pirates of the Caribbean
188,849
4.15
The Green Mile
180,883
4.31
Forrest Gump
180,736
4.30
11
Most Active Users
User ID
# Ratings
Mean Rating
305344
17,651
1.90
387418
17,432
1.81
2439493
16,560
1.22
1664010
15,811
4.26
2118461
14,829
4.08
1461435
9,820
1.37
1639792
9,764
1.33
1314869
9,739
2.95
12
Ratings per Movie in Training Data
Avg #ratings/movie: 5627
13
Ratings per User in Training Data
Avg #ratings/user: 208
14
Progress after 2 Months
Top contenders for Progress Prize 2007
10
9
7
ML@Toronto
How low can he go?
wxyzConsulting
Grand prize
6
5
4
3
2
1
10/2/07
9/2/07
8/2/07
7/2/07
6/2/07
5/2/07
4/2/07
3/2/07
2/2/07
1/2/07
12/2/06
11/2/06
0
10/2/06
% improvement
8
15
Progress after 8 Months
Top contenders for Progress Prize 2007
10
9
7
ML@Toronto
How low can he go?
wxyzConsulting
Gravity
BellKor
Grand prize
6
5
4
3
2
1
10/2/2007
9/2/2007
8/2/2007
7/2/2007
6/2/2007
5/2/2007
4/2/2007
3/2/2007
2/2/2007
1/2/2007
12/2/2006
11/2/2006
0
10/2/2006
% improvement
8
16
Nearest Neighbor (NN) Methods
• Most common CF tool
• Predict rating for a specific user-item pair based
on ratings of
– Similar items
– By the same user
– Or vice versa
• Requires no “content” about items or users
• Easy to apply
• Easy to explain to users
• But not as powerful as other methods
17
Latent Factor Models
• Explain ratings by a set of latent factors
(attributes)
– Factors are learned from the data
– No need for pre specification
• Neural networks
• SVD (Singular Value Decomposition)
– AKA matrix factorization
– Dominant method used by leaders of competition
18
Item Factors
• Each item summarized by a
d-dimensional vector qi
• Potential factors
– Comedy vs. drama
– Amount of action
– Depth of character development
– Totally uninterpretable
• Choose d much smaller than number of
items or users
– e.g., d = 50 << 18,000 or 480,000
19
User Factors
• Similarly, each user summarized by pu
• Same number of factors
• User factors measure interest in
corresponding item factors
• Predicted rating for Item i by User u
– Inner product of qi and pu
– rˆui  qi' pu or rˆui    bu  bi  qi' pu
20
serious
Braveheart
The Color Purple
Amadeus
Lethal Weapon
Sense and
Sensibility
Ocean’s 11
Geared towards
females
Geared towards
males
The Lion King
The Princess
Diaries
Dumb and
Dumber
Independence
Day
escapist
21
serious
Braveheart
The Color Purple
Amadeus
Lethal Weapon
Sense and
Sensibility
Ocean’s 11
Geared towards
females
Geared towards
males
Dave
The Lion King
The Princess
Diaries
Dumb and
Dumber
Independence
Day
Gus
escapist
22
Challenges in Using SVD
• Need lots of factors (large d)
23
Challenges in Using SVD
• Need lots of factors (large d)
• Easy to over fit
24
The Fundamental Challenge
• How can we estimate as much signal
as possible where there are sufficient
data, without over fitting where data
are scarce?
25
serious
Braveheart
The Color Purple
Amadeus
Lethal Weapon
Sense and
Sensibility
Ocean’s 11
Geared towards
females
Geared towards
males
The Lion King
The Princess
Diaries
Dumb and
Dumber
Independence
Day
Gus
escapist
26
serious
Braveheart
The Color Purple
Amadeus
Lethal Weapon
Sense and
Sensibility
Ocean’s 11
Geared towards
females
Geared towards
males
The Lion King
The Princess
Diaries
Dumb and
Dumber
Independence
Day
Gus
escapist
27
serious
Braveheart
The Color Purple
Amadeus
Lethal Weapon
Sense and
Sensibility
Ocean’s 11
Geared towards
females
Geared towards
males
The Lion King
The Princess
Diaries
Independence
Day
Dumb and
Dumber
Gus
escapist
28
serious
Braveheart
The Color Purple
Amadeus
Lethal Weapon
Sense and
Sensibility
Ocean’s 11
Geared towards
females
Geared towards
males
The Lion King
The Princess
Diaries
Gus
Dumb and
Dumber
Independence
Day
escapist
29
Challenges in Using SVD
• Need lots of factors (large d)
• Easy to over fit
• User behavior may change over time
– Ratings go up or down
– Interests may change
– Composition of account may change, for
example, with addition of a new rater
30
serious
Braveheart
The Color Purple
Amadeus
Lethal Weapon
Sense and
Sensibility
Ocean’s 11
Geared towards
females
Geared towards
males
The Lion King
The Princess
Diaries
Dumb and
Dumber
Independence
Day
Gus
escapist
31
serious
Braveheart
The Color Purple
Amadeus
Lethal Weapon
Sense and
Sensibility
Ocean’s 11
Geared towards
females
Geared towards
males
The Lion King
The Princess
Diaries
Dumb and
Gus Dumber
Independence
Day
escapist
32
serious
Braveheart
The Color Purple
Amadeus
Lethal Weapon
Sense and
Sensibility
Ocean’s 11
Gus +
Geared towards
females
Geared towards
males
The Lion King
The Princess
Diaries
Dumb and
Dumber
Independence
Day
escapist
33
Challenges in Using SVD
•
•
•
•
Need lots of factors (large d)
Easy to over fit
User behavior may change over time
Misses some types of patterns
34
Neither SVD nor NN is Perfect
• SVD is poorly situated to fully capture
strong “local” relationships
– e.g., among sequels
• NN ignores cumulative effect of many
small signals
– May be ineffective for items with no close
neighbors
• Each method complements the other
35
The Wisdom of Crowds (of Models)
• All models are wrong; some are useful – G. Box
• Our best entry during Year 1 was a linear
combination of 107 sets of predictions
– Nearest neighbors, SVD, neural nets, et al.
– Many variations of model structure and parameter
settings
• Years 2 and 3
– Individual models are more comprehensive and
much more accurate
– Combining many models still helps
– Five models suffice to beat Year 1 score
36
Progress after 1 Year
Top contenders for Progress Prize 2007
10
9
7
6
5
4
ML@Toronto
3
How low can he go?
wxyzConsulting
Gravity
BellKor
Grand prize
2
1
10/2/2007
9/2/2007
8/2/2007
7/2/2007
6/2/2007
5/2/2007
4/2/2007
3/2/2007
2/2/2007
1/2/2007
12/2/2006
11/2/2006
0
10/2/2006
% improvement
8
37
Is this Any Way to do Science?
• Wide participation
– Submissions from 5,000 teams
– 8,300 posts on the Netflix Prize forum
• Generation and dissemination of new methods
– Presentations/workshops in academic conferences
– Journal publications
• Reasons for success
–
–
–
–
Well designed by Netflix
Industrial strength data set
Opportunity to build on work of others
Collegial spirit of competitors
38
The Race is On
Thank You!
• [email protected]
• www.netflixprize.com
– …/leaderboard
– …/community
• Click BellKor on Leaderboard for details
40