Opinion Lexicon - Columbia University

Transcript Opinion Lexicon - Columbia University

Sentiment Lexicons
Instructor: Smaranda Muresan
Columbia University
[email protected]
Announcements
• Class setup on Courseworks too. Class website
linked to Courseworks (“Class website” tab)
• TA’s (Arpit Gupta) office hours
– Monday 4:15-5:15pm in TA room in Mudd
• TA’s email:
– [email protected]
Class Today
• Word level sentiment analysis (Sentiment Lexicons)
• Discussion of the two papers
• Introduction to Sentiment Analysis beyond words
(sentence level, text level)
(to facilitate discussion of articles next week)
What is sentiment analysis?
• Attempts to identify the sentiment/opinion that a
person may hold towards an object/person/topic etc
• It is a finer grain analysis compared to subjectivity
analysis
Sentiment Analysis
Positive
Negative
Subjectivity analysis
Subjective
This film should be brilliant. It sounds like a great plot,
Neutral
Objective
the actors are first grade, and the supporting cast is
good as well, and Stallone is attempting to deliver a
good performance. However, it can’t hold up.
Why sentiment analysis?
• Movie: is this review positive or negative?
• Products: what do people think about the new iPhone?
• Public sentiment: how is consumer confidence? Is despair
increasing?
• Politics: what do people think about this candidate or issue?
• Prediction: predict election outcomes or market trends from
sentiment
5
Goal of today’s lecture
• Gain insights into how sentiment is expressed
lexically
• Begin developing resources that are useful in
higher level classification (phrase level,
sentence level, document level)
• Explore different philosophies on how to build
such large scale sentiment lexicons
What are we classifying
gross
(gross,adj)
(gross,noun)
(gross,verb)
gross out
GROSS!!!
The soup was gross – 1 star
The horror movie was gross – 5 stars
Words
• Adjectives
– positive: honest important mature large patient
• Ron Paul is the only honest man in Washington.
• Kitchell’s writing is unbelievably mature and is only
likely to get better.
• To humour me my patient father agrees yet again to
my choice of film
Words
• Adjectives
– negative: harmful hypocritical inefficient
insecure
• It was a macabre and hypocritical circus.
• Why are they being so inefficient ?
Slide from Janyce Wiebe
Other parts of speech
• Verbs
– positive: praise, love
– negative: blame, criticize
• Nouns
– positive: pleasure, enjoyment
– negative: pain, criticism
How to build sentiment
lexicons
• Hand Annotated/Compiled Lexicons
• WordNet-based approaches
• Distributional Approaches
General Inquirer (GI)
• Harvard General Inquirer Database (Stone, 1966)
– Total of 11,788 terms
– http://www.wjh.harvard.edu/~inquirer/spreadsheet_guide
.htm
– http://www.wjh.harvard.edu/~inquirer/homecat.htm
– Positive (1915 words) vs Negative (2291 words)
• (rest of 7582 could be consider Neutral)
–
–
–
–
–
Strong vs Weak
Active vs Passive
Overstated versus Understated
Pleasure, Pain, Virtue, Vice
Motivation, Cognitive Orientation, etc
WordNet (Miller, 1995; Fellbaum, 1998)
• Semantic Lexical resource
•
http://wordnetweb.princeton.edu/perl/webwn
www.globalwordnet.org (multilingual)
Synsets
(denote different
senses of a word)
Micro-WNOp
(Cerini et al 1997)
1105 Wordnet
Sysnsets related
to opinion topic
(initial words
were selected
from the GI)
http://www-3.unipv.it/wnop/
Micro-WNOp (Carrenini et al 1997)
Micro-WNOp statistics reduced to the 702 sysnsets when
everyone agreed
ISSUES with Hand built Lexicons such as GI, Micro-WNOp???
How to build sentiment
lexicons
• Hand Annotated/Compiled Lexicons
• WordNet-based approaches
• Distributional Approaches
Simple sense/sentiment propagation
• Hypothesis: Sentiment is constant throughout regions of lexically
related items. Thus, sentiment properties of hand-built seed-sets
will be preserved as we follow WordNet relations out from them.
• SentiWordNet (Esuli and Sebastiani, 2006)
– Approx 1.7 Million words
– Using WordNet and Machine Learning (Classifiers).
– Each synset is assigned three scores
• Positive
• Negative
• Objective
Values in 3 dimension sum to 1.
Ex:
P=0.75, N=0, O=0.25
Building SentiWordNet
•
Lp, Ln, Lo are the three seed sets
•
Iteratively expand the seed sets through K steps
•
Train the classifier for the expanded sets
Expansion of seed sets
Ln
Lp
The sets at the end of kth step are called Tr(k,p) and Tr(k,n)
Tr(k,o) is the set that is not present in Tr(k,p) and Tr(k,n)
Committee of classifiers
• Train a committee of classifiers of different
types and different K-values for the given data
• Observations:
– Low values of K give high precision and low recall
– Accuracy in determining positivity or negativity,
however, remains almost constant
Useful Sentiment Tutorial
• http://sentiment.christopherpotts.net/
• Has code related to WordNet propagation
methods (used in SentiWordNet)
• Many other pointers!
• Issues with the WordNet based propagation
lexicons?
Other Sentiment Lexicons
MPQA Subjectivity Cues Lexicon
Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing Contextual Polarity in
Phrase-Level Sentiment Analysis. Proc. of HLT-EMNLP-2005.
Riloff and Wiebe (2003). Learning extraction patterns for subjective expressions. EMNLP-2003.
• Home page:
http://www.cs.pitt.edu/mpqa/subj_lexicon.html
• 6885 words from 8221 lemmas
– 2718 positive
– 4912 negative
• Each word annotated for intensity (strong, weak)
• GNU GPL
24
Bing Liu Opinion Lexicon
Minqing Hu and Bing Liu. Mining and Summarizing Customer Reviews. ACM SIGKDD-2004.
• Bing Liu's Page on Opinion Mining
• http://www.cs.uic.edu/~liub/FBS/opinionlexicon-English.rar
• 6786 words
– 2006 positive
– 4783 negative
25
Disagreements between polarity lexicons
Christopher Potts, Sentiment Tutorial, 2011
Opinion
Lexicon
MPQA
Opinion Lexicon
General Inquirer
33/5402
(0.6%)
General
Inquirer
SentiWordNet
49/2867
(2%)
1127/4214
(27%)
32/2411
(1%)
1004/3994
(25%)
520/2306
(23%)
SentiWordNet
26
How to build sentiment
lexicons
• Hand Annotated/Compiled Lexicons
• WordNet-based approaches
• Distributional Approaches
– 2 papers for discussion today
Predicting the semantic orientation of adjectives
Hatzivassiloglou & McKeown 1997
Presenter: Smaranda Muresan
Goal
• Predicting polarity of adjectives from a large
corpus
• Test the hypothesis:
the morphosyntactic properties of coordination
provide reliable information about adjectival
oppositions and lexical polarities
• Adjectives conjoined by “and” have same polarity
– Fair and legitimate, corrupt and brutal
– *fair and brutal, *corrupt and legitimate
• Adjectives conjoined by “but” do not
– fair but brutal
Approach
• Extract conjunctions of adjectives from a large
corpus, along with relevant morphological
relations
• Use a log-linear regression model to predict
orientation of two different adjectives
• Use a clustering algorithm to separate the
adjectives into two subsets of different
orientation
• Use average frequencies in each group to assign
the label (group with highest frequency is labeled
positive)
Seed data
• Label seed set of 1336 adjectives (all >20 in 21 million word Wall
Street Journal corpus)
– 657 positive
• adequate central clever famous intelligent remarkable
reputed sensitive slender thriving…
– 679 negative
• contagious drunken ignorant lanky listless primitive
strident troublesome unresolved unsuspecting…
Further validation: ask 4 human judges to label a subset of
500 adjectives: 96.97% average inter-judge agreement
32
Validating the Hypothesis
• Run a parser on 21 million words dataset to get 15,048 conjunction
tokens involving 9,296 pairs of distinct adjective pairs.
• Each conjunction was classified into :
– 1) conjunction used (and, or, but ,…)
– 2) type of modification (attributive, predicative)
– 3) number modified noun (singular or plural)
• Considered conjunction where both members were in the seed set
(e.g. clever and sensitive)
• Count percentage of conjunction in each category with adjectives of
same or different orientation
Validating Hypothesis
For almost all the cases p-values are low. Hence the statistics are significant.
‘and’ usually joins adjectives of same orientation
‘but’ is opposite and joins adjectives of different orientation
Link Prediction
brutal
helpful
corrupt
nice
fair
irrational
classy
Baseline: always use same orientation – 77.84%
the “but” rule
morphological rules (adequate-inadequate)
Better idea: supervised learning using log-linear
regression
Result of Prediction
• Log Linear Regression models performs
slightly better than baseline
Clustering for partitioning the graph
into two groups
Log Linear model generates a dissimilarity score between two
adjective between 0 and 1
brutal
helpful
corrupt
nice
fair
irrational
classy
37
Labeling the clusters
Two key insights about pairs of words of opposite orientations:
- semantically unmarked member has positive orientation
(e.g honest (unmarked) vs dishonest (marked))
- semantically unmarked member is the most frequent
+
brutal
helpful
corrupt
nice
fair
irrational
classy
38
Output polarity lexicon
• Positive
– bold decisive disturbing generous good honest
important large mature patient peaceful positive
proud sound stimulating straightforward strange
talented vigorous witty…
• Negative
– ambiguous cautious cynical evasive harmful
hypocritical inefficient insecure irrational irresponsible
minor outspoken pleasant reckless risky selfish
tedious unsupported vulnerable wasteful…
39
Output polarity lexicon
• Positive
– bold decisive disturbing generous good honest
important large mature patient peaceful positive
proud sound stimulating straightforward strange
talented vigorous witty…
• Negative
– ambiguous cautious cynical evasive harmful
hypocritical inefficient insecure irrational irresponsible
minor outspoken pleasant reckless risky selfish
tedious unsupported vulnerable wasteful…
40
Evaluating Clustering of Adjectives
• Tried to account for graph connectivity
• Used the adjectives from seed set (A) and links
given by conjunction and morphological rules
• Separate in training/testing using a parameter
α
– higher α creates subset of A such that more
adjectives are connected to each other.
Clustering Results
• Highest accuracy obtained when highest number
of links were present.
• Ratio of group frequency correctly identified the
positive subgroup
Graph Connectivity and Performance
• Parameter P measures how well each link is
predicted independently – Precision
• Parameter k – average number of links for
each adjective:
• Goal: even if P is low, given enough data (high
k) a high performance for group prediction is
achieved
Results
Discussion points
What do you see the major contribution of this paper?
- Helps to highlight in a quantitative way the
relationship between sentiment and particular words and
constructions (coordination)- useful linguistic insight
- corpus best method (thus avoiding limitation of
human built resources such as WordNet)
- Can be extended to nouns and verbs.
• Classic paper, cited 1127 times
Discussion points
• Does it have all the information for anyone to
be able to replicate the results?
– How is the dissimilarity value computed? (multiple
values are delivered for an adjective pair in
different environments)
• What are the limitations of the approach?
– Method is limited by human cleverness in coming
up with useful constructions
Velikovich et al
Class Today
• Word level sentiment analysis (Sentiment Lexicons)
• Discussion of the two papers
• Introduction to Sentiment Analysis beyond words
(phrase level, text level)
(to facilitate discussion of articles next week)
What is sentiment analysis?
• Attempts to identify the sentiment/opinion/attitude
that a person may hold towards an
object/person/topic etc
Components
1. Holder (source) of attitude
2. Target (aspect) of attitude
3. Type of attitude
• From a set of types
– Like, love, hate, value, desire, etc.
• Or (more commonly) simple weighted polarity:
– positive, negative, neutral, together with strength
4. Text containing the attitude
• Sentence or entire document
50
This film should be brilliant. It sounds like a great plot,
the actors are first grade, and the supporting cast is
good as well, and Stallone is attempting to deliver a
good performance. However, it can’t hold up.
Sentiment Analysis
• Simplest task:
– Is the attitude of this text positive or negative?
• More complex:
– Rank the attitude of this text from 1 to 5
• Advanced:
– Detect the target, source, or complex attitude
types
Sentiment Analysis
• Simplest task:
– Is the attitude of this text positive or negative?
• More complex:
– Rank the attitude of this text from 1 to 5
• Advanced:
– Detect the target, source, or complex attitude
types
Sentiment Analysis
A Baseline Algorithm
Sentiment Classification in Movie Reviews
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment Classification using Machine
Learning Techniques. EMNLP-2002, 79—86.
Bo Pang and Lillian Lee. 2004. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based
on Minimum Cuts. ACL, 271-278
• Polarity detection:
– Is an IMDB movie review positive or negative?
• Data: Polarity Data 2.0:
– http://www.cs.cornell.edu/people/pabo/movie
-review-data
Text Classification: definition
• The classifier (test phase):
– Input: a document d (e.g., a movie review)
– Output: a predicted class c from some fixed set of
labels c1,...,cK
(e,g,pos, neg)
• The learner (training phase):
– Input: a set of m hand-labeled documents
(d1,c1),....,(dm,cm)
– Output: a learned classifier f:d  c
IMDB data in the Pang and Lee
database
✓
when _star wars_ came out some twenty years ago
, the image of traveling throughout the stars has
become a commonplace image . […]
when han solo goes light speed , the stars change
to bright lines , going towards the viewer in lines
that converge at an invisible point .
cool .
_october sky_ offers a much simpler image–that of
a single white dot , traveling horizontally across the
night sky . [. . . ]
✗
“ snake eyes ” is the most aggravating
kind of movie : the kind that shows so
much potential then becomes
unbelievably disappointing .
it’s not just because this is a brian
depalma film , and since he’s a great
director and one who’s films are always
greeted with at least some fanfare .
and it’s not even because this was a film
starring nicolas cage and since he gives a
brauvara performance , this film is hardly
worth his talents .
Baseline Algorithm (adapted from
Pang and Lee)
• Tokenization
• Feature Extraction
• Classification using different classifiers
– Naïve Bayes
– MaxEnt
– Support Vector Machines (SVM)
Sentiment Tokenization Issues
•
•
•
•
•
•
Deal with HTML and XML markup
Twitter mark-up (names, hash tags)
Capitalization (preserve for words in all caps)
Phone numbers, dates
Emoticons
Useful code:
– Christopher Potts sentiment tokenizer
– Brendan O’Connor twitter tokenizer
58
Extracting Features for Sentiment
Classification
• How to handle negation
– I didn’t like this movie
vs
– I really like this movie
• Which words to use?
– Only adjectives
– All words
• All words turns out to work better, at least on this data
59
Negation
Das, Sanjiv and Mike Chen. 2001. Yahoo! for Amazon: Extracting market sentiment from stock
message boards. In Proceedings of the Asia Pacific Finance Association Annual Conference (APFA).
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment Classification using Machine
Learning Techniques. EMNLP-2002, 79—86.
Add NOT_ to every word between negation and following punctuation:
didn’t like this movie , but I
didn’t NOT_like NOT_this NOT_movie but I
Classification methods
• Naïve Bayes
• MaxEnt
• SVM
Evaluating Classification
• Evaluation must be done on test data that are independent of
the training data
– usually a disjoint set of instances
• Classification accuracy: c/n where n is the total number of
test instances and c is the number of test instances correctly
classified by the system.
– Adequate if one class per document
• Results can vary based on sampling error due to different
training and test sets.
– Average results over multiple training and test sets (splits of the
overall data) for the best results.
Slide from Chris Manning
Cross-Validation
Iteration
• Break up data into 10
folds
– (Equal positive and
negative inside each fold?)
• For each fold
– Choose the fold as a
temporary test set
– Train on 9 folds, compute
performance on the test
fold
• Report average
performance of the 10
runs
1
Test
Training
2
3
4
5
Training
Test
Training
Test
Training
Training
Training
Test
Test
Other issues in Classification
• MaxEnt and SVM tend to do better than Naïve
Bayes
64
Problems:
What makes reviews hard to
classify?
• Subtlety:
– Perfume review in Perfumes: the Guide:
• “If you are reading this because it is your darling fragrance,
please wear it at home exclusively, and tape the windows
shut.”
65
Thwarted Expectations
and Ordering Effects
• “This film should be brilliant. It sounds like a great plot,
the actors are first grade, and the supporting cast is
good as well, and Stallone is attempting to deliver a
good performance. However, it can’t hold up.”
• Well as usual Keanu Reeves is nothing special, but
surprisingly, the very talented Laurence Fishbourne is
not so good either, I was surprised.
66
Due Next Class
• Readings
– Chapter 4 from Pang and Lee “Opinion Mining and
Sentiment Analysis” book
– 2 papers for discussions
• A short data analysis assignment
– Description on Courseworks under Assignments
– Goal is to get a better understanding of data and the
problems discussed in class
– Grade: Excellent/Good/Insufficient
– Due before class. No late submissions
Next class
• Discussion of 2 papers (50 minutes)
– 25 minutes per paper
– Prepare a 15 min presentations and lead discussion
for 10 minutes
• 5 min break
• More in depth lecture on sentiment analysis &
open questions (can lead to ideas for projects)
– 30 minutes
• Introduction to Emotion/Mood (25 minutes)
Announcements
• The assignments of paper for discussions will be
done by Saturday, Feb 1, 5pm.
• TA office hours
– 4:15-5:15pm Mondays in the TA room in Mudd
• TA email:
[email protected]
Email TA if you’d like a tutorial on Text Classification
and existing toolkits
Announcements
• Grading policy slightly updated to include data
analysis assignments
– 10% data analysis assignments (3 assignments,
grading Excellent/Good/Insufficient). No late
submissions! See class website or details
– 30% discussion of papers
– 60% project
• 10% literature review part
• 5% class presentation
• 45% final paper and project