A Holistic Lexicon-Based Approach to Opinion Mining

Download Report

Transcript A Holistic Lexicon-Based Approach to Opinion Mining

Conference on
Web Search and Data Mining
WSDM’08
Xiaowen Ding、Bing Liu 、 Philip S. Yu
Department of Computer Science
University of Illinois at Chicago
1
Target: Customer Reviews of Products
 an increasing number of people are writing reviews
→ user 沒讀完全部review的話, 或許會得到偏頗的意見
→ business 考量上, 要能追蹤商品
 It is thus highly desirable to produce a summary of reviews
 opinion mining or sentiment analysis
 product features that have been commented on by reviewers
 whether the comments are positive or negative (Neutral)
 lexicon-based method
 “small” can indicate a positive or a negative opinion on a
product feature depending on the product feature and the
context
2


基本上這篇經典
 M. Hu and B. Liu. Mining and summarizing
customer reviews. KDD’04, 2004.

被作掉了
 A-M. Popescu and O. Etzioni. Extracting
Product Features and Opinions from Reviews.
EMNLP-05, 2005.
3

Review - Amazon
4
解決形容跟
上下文有關問題
解決一句內有好
評有壞評問題
5



a holistic approach that can accurately
infer the semantic orientation of an opinion
word based on the review context
a new function aggregating multiple
opinion words in the same sentence
better than the state-of-the-art existing
methods
6
Two main research directions are sentiment classification and
feature-based opinion mining
 Document level vs. Sentence level
▪ based on identification of opinion words or phrases
▪ corpus-based approaches
▪ dictionary-based approaches
 Holistic lexicon-based approach to identifying the orientations of
context dependent opinion words is closely related to works that
identify domain opinion words
 use conjunction rules to find such words from large domain
corpora
 “this room is beautiful and spacious”
 “the battery life is very long” && “it takes a long time to focus”

7

Object
 the entity that has been commented on 被評東
 has a set of components (or parts) and also a
set of attributes (or properties) 其成分
 can be hierarchically decomposed according
to the part-of relationship 階層的成分關係
8

Example 1





特定品牌的數位相機 : object
電池 : component
畫素 : attribute
電池壽命 : attribute of component
Example 2
“I do not like this camera”,
 User 可以對 object, component or attribute 表示意見
 Example 3
“the picture quality of this camera is poor”
 “This camera is too large”
▪ “large” is called a feature indicator
 “The battery life of this camera is too short”
▪ “Size” is an implicit feature in the following sentence as it does not appear
in the sentence
9

“The picture quality is good, but the battery life is short”.


Definition (explicit and implicit opinion): An explicit opinion
on feature f is a subjective sentence that directly expresses a
positive or negative opinion. An implicit opinion on feature f is
an objective sentence that implies an opinion.
Example 4: The following sentence expresses an explicit
positive opinion:
 “The picture quality of this camera is amazing.”
 following sentence expresses an implicit negative opinion:
 “The earphone broke in two days.”
10


Definition (opinion holder)
 The holder of a particular opinion is the person or the
organization that holds the opinion.
 “John expressed his disagreement on the treaty”
Definition (semantic orientation of an opinion)
 The semantic orientation of an opinion on a feature f
states whether the opinion is positive, negative or
neutral.
complex case
“the view-finder and the lens of this camera are
too close”,
11




Both F and W are unknown. Then, in opinion analysis,
we need to perform three tasks
Task 1
 Identifying and extracting object features that have
been commented on in each review d ∈ D.
Task 2
 Determining whether the opinions on the features are
positive, negative or neutral.
Task 3
 Grouping synonyms of features, as different people
may use different words to express the same feature.
12

F is known but W is unknown. This is
similar to Problem 1, but slightly easier.

All the three tasks for Problem 1 still need
to be performed,

but Task 3 becomes the problem of
matching discovered features with the set
of given features F
13

W is known (then F is also known).

We only need to perform Task 2 above,
namely, determining whether the opinions
on the known features are positive,
negative or neutral

after all the sentences that contain them
are extracted.
14


The final output for each evaluative text d
is a set of pairs.
Each pair is denoted by (f, SO)
 f is a feature
 SO is the semantic or opinion orientation
(positive or negative) expressed in d on
feature f
15

to use the opinion words around each product feature in a
review sentence to determine the opinion orientation on
the product feature (蒐集words, idioms)
1. how to combine multiple opinion words (which may be
conflicting) to arrive at the final decision
2. how to deal with context or domain dependent opinion
words without any prior knowledge from the user
3. how to deal with many important language constructs
which can change the semantic orientations of opinion
words
16
the feature itself can be an
opinion word as it may be
an adjective representing a
feature indicator,
“This camera is very reliable”
Negation Rules
“But” Clause Rules
17
18
 Adjectives as feature indicators
▪ “this camera is very small”
 Explicit features that are not adjectives
▪ “the battery life of this camera is long”
多數決
 Intra-sentence conjunction rule
 “the battery life is very long”
 “This camera takes great pictures and has a long battery life”
 Pseudo intra-sentence conjunction rule
 “The camera has a long battery life, which is great”
 Inter-sentence conjunction rule
 “The picture quality is amazing. The battery life is long”
19
20
21