Transcript Document

An Opinion Mining System for Chinese Automobile Reviews
Tianfang Yao Qingyang Nie Jianchao Li Linlin Li Decheng Lou Ke Chen Yu Fu
Department of Computer Science and Engineering, Shanghai Jiao Tong University
800 Dong Chuan Rd., Shanghai 200240, China
Email: [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
1. Introduction
Nowadays, when online business becomes a fashion, the quantity of the reviews
towards the products given by customers is growing surprisingly as well, so that it is
difficult for a customer to read over all of the reviews and make a reasonable decision
when he/she is facing the problem whether to purchase a certain product or not. Our
main task is to extract the opinions of reviews given by customers towards different
features for different brands of cars, and determine whether these opinions are
positive, negative or neutral and how strong they are. In this paper, a practical system
named Surveyer that can accomplish opinion mining tasks by natural language
processing techniques, and its related algorithms will be introduced.
4. Resource Building: Ontology & Polarity Dictionary
2. Interface of Opinion Observer
Ontology:
There are two taxonomies in our ontology, which represent cars and
features of cars. Each category in a taxonomy has a name, weight attributes, and contained
extra information like synonyms of the name. All categories are arranged in a hierarchical
structure to describe relations between different cars or car features.
Polarity Dictionary:
In this system users can make two kinds of comparisons between different brands as well
as different parts of a certain car.
In the left figure, we can see that six products are selected for comparison. Users choose
brands from the left column of the interface and “compared cars” from the top menu. A
bar chart will appear on the right. The bars above the x-axis show positive opinion
quantity (in red color) and the ones below x-axis show negative opinion quantity (in blue
color). Thus, we can clearly observe the statistical evaluation of consumer reviews.
The right figure looks much the same as the left one, while the main difference is that it
deals with features of cars. You can get a distinct impression of how consumers view
different features of each product.
Two kinds of words, which are polarity words and modifier
words, are involved in the polarity dictionary. The polarity words have 6 attributes including
text, POS, def, exceptional-feature, dynamic-polarity, and strength attribute. The text
attribute stands for the word itself. POS attribute depicts part-of-speech of words. The def
attribute means the concept definition of a word from HowNet. The exceptional-feature and
dynamic-polarity attributes are to deal with special case, in which words may have a
different polarity from its basic polarity. For example, the word “high” is positive when it
modifies the word “quality”, but negative when modifies the word “price”. The strength
attribute reflects the strength of polarity for a word. Modifier words are words that can
strengthen, weaken or even reverse polarity of polarity words, and they have very similar
attributes as the polarity words.
5. Pattern Generation and Effective Evaluation
car-feature-patterns
feature-polarity-patterns
Patterns
from
syntactic
parser
car-polarity-patterns
from POS tagger
car-car-patterns
Topic
feature-feature-patterns
Simple Sentence Split
Ontology
Comments
Element Construction
POS Tagger
POS for each word
polarity-modifier-patterns
Preprocess
Syntactic Parser
up-route
down-route
3. System Architecture
Corpus
rules:
Element
s
Polarity Dictionary
Pronominal Resolution &
Ellipsis Recovery
Elements Resoluted
Element Reconstruction
Generation: Two features which are syntactic nodes in the parsed tree and part-ofspeech of each related words are used to generate patterns. Four annotators have handcrafted the training data, and rules are automatically generated with predefined criteria from
annotated texts. Several optimization methods are used before the automatically generated
rules are put into the pattern library, which is the source for new relation identification.
Evaluation: Some tests have been used to evaluate the effectiveness of this pattern
building method. Human annotated test data are used as gold standard, and we got an
average 80% recall rate and 60% precision rate, which mainly towards feature-polaritypatterns and car-polarity-patterns. While most mistakes occur with polarity strength, the
direction of polarity is correct most of the time. The result shows quite promising, in that only
with part-of-speech and syntactic features, this method could achieve a relative high
performance. In the future research, we consider adding more features to rebuild the pattern
knowledge base.
Elements
Constituent Relation
Extraction
6. A Self-developed Annotation Tool
Patterns
Structured Analysis Result
Paragraph Polarity
Analysis
Merged Structured Analysis Result
The corpus used in our system is the reviews from Bulletin Board, which is available from
the following website: http://autobbs.pconline.com.cn/
In the corpus, there are a lot of reviews written with irregular punctuation, so criteria to split
sentence needs to be built first. Then each sentence is processed in a stage we called
element construction, in which we use several tools and resources that are the syntactic
parser, POS tagger, Ontology and Polarity Dictionary to build a dependency syntactical
structure and assign different tags to each word in the sentence according to their potential
use in the following stage. The pronominal resolution and ellipsis recovery model mainly
deals with feature words, which mean car names or feature names of cars in our system.
After that, a stage of the reconstruction for elements is arranged. In the last two stages, we
first identify constituent relations using a pattern library which we have built using training
data, and then summarize these opinions from a paragraph level. Finally, visualized results
could be shown with the Opinion Observer.
Surveyer annotation tool
is designed not only to
meet the needs of
annotation, but also to
describe the processing
flow of the system.
You can get a legible
view of how Surveyer
extracts opinions and
determines their
polarization step by
step.
You can also export the
automatically generated
rule file from annotated
data here.