Mining Semantic Data for Solving First-rater and Cold

Download Report

Transcript Mining Semantic Data for Solving First-rater and Cold

1
IDEAS 2011
Lisbon
21-23 September
MINING SEMANTIC DATA FOR SOLVING
FIRST-RATER AND COLD-START
PROBLEMS IN RECOMMENDER SYSTEMS
María N. Moreno, Saddys Segrera, Vivian F. López,
M. Dolores Muñoz and Ángel Luis Sánchez
Data Mining Research Group
http://mida.usal.es
Department of
Computing and Automatic
CEDI 2010
Contents

Introduction

Recommender Systems

Recommendation framework

Case Study

Conclusions
Mining Semantic Data for Solving First-rater and Cold-start Problems in Recommender Systems
María N. Moreno, Saddys Segrera, Vivian F. López, M. Dolores Muñoz and Ángel Luis Sánchez
CEDI 2010
Introduction

Recommender systems
commerce
Server
Recommender systems provide users
with intelligent mechanisms to find
products to purchase

Catalog
Applications: e-commerce, e-learning,
tourism, news’ pages…

Drawbacks: low performance, low
reliability of recommendations…
Client
Mining Semantic Data for Solving First-rater and Cold-start Problems in Recommender Systems
María N. Moreno, Saddys Segrera, Vivian F. López, M. Dolores Muñoz and Ángel Luis Sánchez
CEDI 2010
Introduction

Proposal
 Objective:
overcome critical drawbacks in
recommender systems
 Methodology:
Semantic based Web Mining
 Associative classification (Web Mining)

Machine learning technique that combines concepts from
classification and association
 Domain-specific

ontology (Semantic Web)
Enrichment of the data to be mined with semantic annotations
Mining Semantic Data for Solving First-rater and Cold-start Problems in Recommender Systems
María N. Moreno, Saddys Segrera, Vivian F. López, M. Dolores Muñoz and Ángel Luis Sánchez
CEDI 2010
Recommender Systems

Classification of recommendation methods
 Content-based:
compare text documents to user
profiles
 Collaborative filtering: is based on opinions of
other users (ratings)
 Memory
based (User-based): find users with similar
preferences (neighbors) by means of statistical techniques
 Model based (Item-based): use data mining techniques to
develop a model of user ratings
Mining Semantic Data for Solving First-rater and Cold-start Problems in Recommender Systems
María N. Moreno, Saddys Segrera, Vivian F. López, M. Dolores Muñoz and Ángel Luis Sánchez
CEDI 2010
Recommender Systems

Critical drawbacks
Sparsity: the number of ratings needed for prediction is greater
than the number of the ratings obtained from users
 Scalability: performance problems presented mainly in memorybased methods where the computation time grows linearly with
both the number of customers and the number of products in
the site
 First-rater problem: new products never have been rated,
therefore they cannot be recommended
 Cold-Start problem: new users cannot receive recommendations
since they have no evaluations about products

Mining Semantic Data for Solving First-rater and Cold-start Problems in Recommender Systems
María N. Moreno, Saddys Segrera, Vivian F. López, M. Dolores Muñoz and Ángel Luis Sánchez
CEDI 2010
Recommendation framework

Associative classification (Web Mining)
 Sparsity:
slightly sensitive to sparse data
 Scalability: model based approach

Domain-specific ontology (Semantic Web)
 First-rater
problem:
 Use of taxonomies to classify products
 Induction of abstracts patterns which relate user
profiles with categories of products
 Cold-Start problem:
 Recommendations based on user profiles
Mining Semantic Data for Solving First-rater and Cold-start Problems in Recommender Systems
María N. Moreno, Saddys Segrera, Vivian F. López, M. Dolores Muñoz and Ángel Luis Sánchez
CEDI 2010
Recommendation framework
Off-line process
Data mining
algorithms
Historical
data
Domain
ontology
Provide
annotations
Low level model
Historical data
with semantic
annotations
Data mining
algorithms
High level
model
On-line process
[new user]
Registration
Check high
level model
Recommendation
request
Active
user
Check high
level model
new
products
[old user]
Recommendations
Check low
level model
old
products
Mining Semantic Data for Solving First-rater and Cold-start Problems in Recommender Systems
María N. Moreno, Saddys Segrera, Vivian F. López, M. Dolores Muñoz and Ángel Luis Sánchez
CEDI 2010
Case Study

MovieLens Data
Movies Data
User Data
ID
Gender Age Occupation
Num. Binary
Num. String
Zip
ID
Title
Genre (19 attributes)
Num.
Num.
String
Binary
Ratings Data
score
ID
User ID
Movie ID
Rating
Num.
Num.
Num.
Num. (1 - 5)
rating_bin
CEDI 2010
Case Study

MovieLens Data
ID
User Gender
Num. Binary
*User Age
< 18
[18, 24]
[25, 34]
[35, 44]
[45, 49]
[50, 55]
> 55
User Occupation Movie Title
String
String
*Movie Genre
String
CEDI 2010
Case Study

Ontology definition
Mining Semantic Data for Solving First-rater and Cold-start Problems in Recommender Systems
María N. Moreno, Saddys Segrera, Vivian F. López, M. Dolores Muñoz and Ángel Luis Sánchez
CEDI 2010
Case Study

Results

Associative classification methods (CBA, CMAR, FOIL and CPAR)
were compared to non-associative classification algorithms
Mining Semantic Data for Solving First-rater and Cold-start Problems in Recommender Systems
María N. Moreno, Saddys Segrera, Vivian F. López, M. Dolores Muñoz and Ángel Luis Sánchez
CEDI 2010
Conclusions


A framework for recommender systems is proposed in order to
overcome some critical drawbacks
The proposal combines web mining methods and domain specific
ontologies in order to induce models at two abstraction levels:




The low level model relates users, movies and ratings for making the
recommendations
High level model is used for recommender not rated movies or for
making recommendation to new users and overcome the first-rater and
the cold-start problem
The off-line model induction avoids scalability problems in
recommendation time
Associative classification methods provides a way to deal with
sparsity problem
Mining Semantic Data for Solving First-rater and Cold-start Problems in Recommender Systems
María N. Moreno, Saddys Segrera, Vivian F. López, M. Dolores Muñoz and Ángel Luis Sánchez
IDEAS 2011
Lisbon
21-23 September
THANKS FOR YOUR ATTENTION !
MINING SEMANTIC DATA FOR SOLVING FIRST-RATER AND COLDSTART PROBLEMS IN RECOMMENDER SYSTEMS
María N. Moreno*, Saddys Segrera, Vivian F. López, M. Dolores Muñoz & Ángel Luis Sánchez
*[email protected]
Department of
Computing and Automatic