Multiclass Sentiment Analysis with Restaurant Reviews Moontae Lee and Patrick Grafe OpenTable.com Data Set • Overall Rating (1 to 5 stars) • • • • Food Rating (1

Download Report

Transcript Multiclass Sentiment Analysis with Restaurant Reviews Moontae Lee and Patrick Grafe OpenTable.com Data Set • Overall Rating (1 to 5 stars) • • • • Food Rating (1

Multiclass Sentiment Analysis
with Restaurant Reviews
Moontae Lee and Patrick Grafe
OpenTable.com Data Set
• Overall Rating (1 to 5 stars)
•
•
•
•
Food Rating (1 to 5 stars)
Ambiance Rating (1 to 5 stars)
Service Rating (1 to 5 stars)
Noise Rating (1 to 3)
• Data Set statistics
• Heavily biased toward 5 star ratings
Strategies
•
•
•
•
•
Spell Correction
POS Tagging
Unigram/Bigram/Trigram
Stop Words
Pruning
Spell Correction
Common Spelling Mistakes:
• Restaurant: resturant, restuarant, restaurante
• Waiter: waitor
• Service: sevice, serivce
Distance Metrics:
• Edit Distance
• Levenstein Distance
• Keyboard Distance
• Sound Distance
Parsing
Problem Sentences:
• The atmosphere is pretty bad and food is quite
good
• The food, service, and atmosphere were
fantastic!
Results
Training Set
Training Set
Test Set
Accuracy
MSE
Accuracy
Test Set MSE
Unigram
84.62%
0.3398
57.36%
0.8231
Unigram with spell check
84.53%
0.3256
57.12%
0.8297
Unigram/Bigram/Trigram
95.65%
0.1058
57.18%
0.9181
Unigram/Bigram/Trigram with
98.66%
0.0321
56.71%
0.9052
95.53%
0.1088
57.27%
0.8984
Unigram/Bigram/Trigram with no 95.38%
0.1110
57.42%
0.8936
0.0304
56.56%
0.8970
pruning
Unigram/Bigram/Trigram with
spell checking
stop words and
spell check
Unigram/Bigram/Trigram with no 98.77%
stop words,
pruning, and spell check
Conclusions
• Inherently Difficult Data Set
• More Advanced Techniques
Necessary