Topic-Sentiment Mixture: Modeling Facets and Opinions in Weblogs Qiaozhu Mei†, Xu Ling†, Matthew Wondra†, Hang Su‡, and ChengXiang Zhai† † University of Illinois at Urbana-Champaign ‡

Download Report

Transcript Topic-Sentiment Mixture: Modeling Facets and Opinions in Weblogs Qiaozhu Mei†, Xu Ling†, Matthew Wondra†, Hang Su‡, and ChengXiang Zhai† † University of Illinois at Urbana-Champaign ‡

Topic-Sentiment Mixture:
Modeling Facets and Opinions
in Weblogs
Qiaozhu Mei†, Xu Ling†, Matthew Wondra†,
Hang Su‡, and ChengXiang Zhai†
† University of Illinois at Urbana-Champaign
‡ Yahoo! Inc.
1
Why Opinion Analysis?
• Customers: need peer opinions to make
purchase decisions
• Business providers:
– need customers’ opinions to improve product
– need to track opinions to make marketing decisions
• Social researchers: want to know people’s
reactions about social events
• Government: wants to know people’s reactions
to a new policy
• Psychology, education, etc.
2
An Illustrative Example
Should I buy an iPod?
• What do people say about ipod?
Price, battery, warranty,
nano, … (Topics)
• Thumb up or thumb down?
Positive, negative,
neutral… (Sentiments)
• What aspects are good/bad?
Sound is good, battery is bad..
(Faceted opinions)
• Are their opinions changing?
Negative before 2005, but positive
recently… (Dynamics)
3
Why Extracting Opinions from Blogs?
• Easy to collect: huge amount, clean format
• Broadly distributed: demographics
• Topic diversified: free discussion about any
topic/product/event
• Opinion rich: highly personalized
4
Evidence from Blog Search
Topic diversity
availability
Broad distribution
Positive: …the trail leads to
Opinion rich
fascinating places that are
richly …
Negative: …when I first watched
the big-screen version of The Da
Vinci Code, I fell asleep twice. Not
once. Twice! …
5
Existing Blog-opinion Analysis Work
• Opinmind: sentiment classification/search of blogs
No faceted analysis, no neutral fact description:
Not informative enough to support decision making
6
Existing Blog-opinion Analysis
Work (Cont.)
• Use content to predict
sales
– Blog level topic analysis
– Information Diffusion
through blogspace
– Use topic bursting to
predict sales spikes
– E.g., [Gruhl et al. 2005]
[from Gruhl et al. 2005]
No sentiment analysis, no faceted analysis:
what if the hot discussion is “Negative”?
Hot criticisms may not lead to sales spikes
7
What’s Missing Here?
• Discussions are faceted
– E.g. iPod: battery? Price? Nano? …
– Usually different opinions on different facets
• Opinions have polarities
– Positive, negative, and neutral …
– Non-discriminative analysis may lead to
wrong decision
• Opinions are changing over time …
8
Our Goal
• Model the mixture of facets and opinions (topics and sentiments)
• Generate a faceted opinion summarization for ad hoc query
• Track the change of opinions over time
Topic-sentiment summary
Query: Dell Laptop
Topic-sentiment dynamics
(Topic = Price)
strength
Topic 1
(Price)
Topic 2
(Battery)
positive
negative
neutral
• it is the best
site and they
show Dell
coupon code as
early as possible
• Even though
Dell's price is
cheaper, we still
don't want it.
• mac pro vs. dell
precision: a price
comparis..
• One thing I
really like about
this Dell battery
is the Express
Charge feature.
• my Dell battery
sucks
• ……
• Stupid Dell
laptop battery
• ……
Positive
Negative
Neutral
• DELL is trading
at $24.66
• i still want a
free battery from
dell..
• ……
time
9
Challenges in
Opinion Analysis from Blogs
•
•
•
•
Topics and sentiments are mixed together
No existing facet structure for ad hoc topics
Difficult to identify sentiment polarities
Difficult to associate sentiment polarities with
facets
• Difficult to segment topics and sentiments
– Tracking sentiment dynamics
10
Our Approach: Modeling TopicSentiment Mixture
• Use language models to represent facets and
sentiments
– Facets represented with topic models, extracted in an
unsupervised/semi-supervised way
– Sentiment models extracted in a supervised way
• Model the mixture of topics and sentiments with
a probabilistic generative model
• Segment associated topics and sentiments with
a topical hidden Markov model
11
Probabilistic Model of Topic-Sentiment
Mixture
Choose a facet (subtopic) i Draw a word from the mixture of topics
and sentiments ( F P N )
Facet 1
Facet 2
…
battery 0.3
life 0.2..
nano 0.1
release 0.05
screen 0.02 ..
apple 0.2
microsoft 0.1
Facet k
compete 0.05
..
Is 0.05
Background B
the 0.04
a 0.03 ..
battery
F
P
N
1
love
F
P
2
N
…
F
k
hate
P
N
B
the
P
love 0.2
awesome 0.05
good 0.01 ..
N
suck 0.07
hate 0.06
stupid 0.02 ..
12
The “Generation” Process
log(C )    c( w, d ) log[B p( w | B) 
1
1, d, F
2, d, F
…
k k, d, F
j, d, P
p(w|
(1  B )
 dj  T )
1
2
…
Positive Negative
k
P
j 1
d1 1 - 
B
d2
B
p(w| i )
( j ,d , F p ( w |  j )   j ,d , P p( w |  P )   j ,d , N p( w |  N ))]
w
dk
B
j, d, N
N
k
Topics
Neutral, Facts
2
d C wV
d
• p(w|i), p(w| p), p(w| N)
can be estimated with
Maximum Likelihood
Estimator (MLE) through
an EM algorithm
13
Learning Sentiment Models
• Problem: Sentiment expressions are topic-biased
– E.g., “fearful” is negative in general , but how about for
a ghost movie?
– E.g., “heavy” is positive for rock music, but how about
for laptops?
• Impossible to create training data for every ad
hoc topic
• Solution:
– Collect sentiment labeled data with diversified topics
– Learn a general sentiment model from the mixed training data in
training mode
– Use this general sentiment model as prior, get the topic-biased
sentiment models in testing mode
16
Estimating Topic Models
• Problem: no existing facet structure for ad hoc
topics
• Unsupervised extraction: facets might not be
what you like
– E.g., user wants “battery”, “price” and “sound quality”
– System returns “ipod nano”, “ipod video”, “ipod
shuffle”..
• Solution: Incorporate user specified interests into
automatically extracted facets
– User provides hints; add priors into the topic model
– Using MAP estimation instead of MLE
– See paper for technical details
17
Sentiment Segmentation and
Dynamics Tracking
• Design a topic-sentiment
enhanced HMM
P
N
• Associate states with
T1

topic/sentiment models
E
• Learn the transition prob.
From and
T2
T3
and segment the text
to E
• Plot the sentiment
… the battery really sucks and
dynamics by counting
it's really heavy in my part
segments over time
but where could you find laptops
( tagged with each facet
so affordable nowadays?...
and sentiment)
B
1
18
Experiment Setup
• Training data for sentiment models (diversified topics,
downloaded from Opinmind)
Topic
# Pos
# Neg
Topic
# Pos
# Neg
laptops
346
142
people
441
475
movies
396
398
banks
292
229
universities
464
414
insurances
354
297
airlines
283
400
nba teams
262
191
cities
500
500
cars
399
334
• Test dataset: created by querying Google blog search
and crawling from original sites (ad hoc)
Datasets
# docs
Time Period
Query Term
iPod
2988
01/06 ~ 11/06
ipod
Da Vinci Code
1000
01/06 ~ 10/06
da+vinci+code
19
Results: General Sentiment Models
• Sentiment models trained from diversified topic mixture
v.s. single topics
Pos-Cities
Neg-Cities
Pos-Mix
Neg-Mix
beautiful
hate
love
suck
love
suck
awesome
hate
awesome
people
good
stupid
amaze
traffic
miss
ass
live
drive
amaze
fuck
good
fuck
pretty
horrible
night
stink
job
shitty
nice
move
god
crappy
time
weather
yeah
terrible
air
city
bless
people
greatest
transport
excellent
evil
KL Divergence between learnt
p and N and unseen topic
# topic mixture in training data
20
Results: Facets and Topic Models (I)
• Facets for iPod :
No Prior
With Prior
Battery, nano
Marketing
Ads, spam Nano
Battery
battery
apple
free
nano
battery
shuffle
microsoft
sign
color
shuffle
charge
market
offer
thin
charge
nano
zune
freepay
hold
usb
dock
device
complete
model
hour
itune
company
virus
4gb
mini
usb
consumer
freeipod
dock
life
hour
sale
trial
inch
rechargable
21
Results: Facets and Topic Models (II)
• Facets for the Da Vinci Code
No Prior
With Prior
Story
Book
Background
Movie
Religion
landon
author
jesus
movie
religion
secret
idea
mary
hank
belief
murder
holy
gospel
tom
cardinal
louvre
court
magdalene
film
fashion
thrill
brown
testament
watch
conflict
clue
blood
gnostic
howard
metaphor
neveu
copyright
constantine
ron
complaint
curator
publish
bible
actor
communism
22
Results: Faceted Opinions
(the Da Vinci Code)
Neutral
Positive
Negative
... Ron Howards selection
of Tom Hanks to play
Robert Langdon.
Tom Hanks stars in
the movie,who can be
mad at that?
But the movie might get
delayed, and even killed off
if he loses.
Facet 1: Directed by: Ron Howard
Writing credits: Akiva
Movie
Goldsman ...
After watching the movie I
went online and some
research on ...
I remembered when i first
read the book, I finished
Facet 2: the book in two days.
I’m reading “Da Vinci
Book
Code” now.
…
Tom Hanks, who is my protesting ... will lose your
favorite movie star act faith by ... watching the
the leading role.
movie.
Anybody is interested
in it?
... so sick of people making
such a big deal about a
FICTION book and movie.
Awesome book.
... so sick of people making
such a big deal about a
FICTION book and movie.
So still a good book to
past time.
This controversy book
cause lots conflict in west
society.
23
Results: Comparison with Opinmind
• Faceted opinions from TSM
Facets
Thumbs Up
Thumbs Down
iPod Nano
(sweat) iPod Nano ok so ...
Ipod Nano is a cool design, ...
WHAT IS THIS SHIT??!!
ipod nanos are TOO small!!!!
Battery
the battery is one serious
example of excellent relibability
Poor battery life ...
...iPod’s battery completely died
iPod Video My new VIDEO ipod arrived!!!
Oh yeah! New iPod video
Opinions
from
Opinmind:
fake video ipod
Watch video podcasts ...
Thumbs Up
Thumbs Down
I love my iPod, I love my G5...
I hate ipod.
I love my little black 60GB iPod
Stupid ipod out of batteries...
I LOVE MY iPOD
“ hate ipod ” = 489..
I love my iPod.
my iPod looked uglier...surface...
- I love my iPod.
i hate my ipod.
... iPod video looks SO awesome
... microsoft ... the iPod sucks
24
Results: Sentiment Dynamics
Facet: the book “ the da vinci
code”. ( Bursts during the
movie, Pos > Neg )
Facet: the impact on religious
beliefs. ( Bursts during the
movie, Neg > Pos )
25
Summary and Future Work
• Algorithm: A new way to model the mixture of
topics and sentiments
• Application: A new way to summarize faceted
opinions, and track their dynamics
• Future Work:
–
–
–
–
Beyond unigram language model?
Better segmentation of sentiments and topics?
Adapting existing facet structures?
Develop an end user application for opinion analysis
26
Thank You!
27