Using a Sentiment Map for Visualizing Credibility of News
Download
Report
Transcript Using a Sentiment Map for Visualizing Credibility of News
Using a Sentiment Map for Visualizing
Credibility of News Sites on the Web
Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**,
Jianwei Zhang*, Katsumi Tanaka***
* Kyoto Sangyo University, Japan
** Chiba Institute of Technology, Japan
*** Kyoto University, Japan
1
Outline
Background
Research goal
System overview
Offline processing
Online processing
Experimental evaluation
Conclusion and future work
2
3
Background
A question:
What is your attitude towards
“Iraq war”?
agree or disagree?
To answer this question, I want to read some news to have
an opinion about this topic.
Rapid spread of web news sites (e.g., MSN, GoogleNews)
Different sites may have different opinions about the topic
4
Background
???
News Site
If it is a prowar site
If it is
an anti-war
site
Is the Iraq war
right or
wrong?
I agree
this war
I disagree
this war
A misconception
may be caused,
if sites’
tendencies are
not known in
advance
Sentiment tendencies of sites
???
Is the Iraq war
right or
wrong?
Site A
Site B
positive
negative
positive
negative
Well, I have now
opinions on different
sites
This may cause a
more fair-minded
judgment
Information
credibility
is improved
Outline
Background
Research goal
System overview
Offline processing
Online processing
Experimental evaluation
Conclusion and future work
5
A concept of sentiment map
Mapping Graph of
sentiment based on
location
A query is “Iraq war”
Positive
Negative
Top ranked articles from
each news site
Demonstration
6
Outline
Background
Research goal
System overview
Offline processing
Online processing
Experimental evaluation
Conclusion and future work
7
8
System overview
Online processing
(Runtime processing)
news sites
Offline processing
(Preprocessing)
Yomiuri
Yomiuri
Asahi
(Osaka) (Tokyo) (Tokyo) ・・・
Web
query
sentiment map
1) retrieve articles from
each news site
2) rank the articles
based on tf-idf in each site
crawling
articles database
(including tf-idf,
sentiment values)
3) calculate the average
of sentiment values
for each site
4) generate a sentiment map
sentiment
dictionary
news articles collection
morphological analysis
tf-idf value calculation
sentiment values calculation
Outline
Background
Research goal
System overview
Offline processing
Online processing
Experimental evaluation
Conclusion and future work
9
10
Offline processing
News articles collection
Crawl news articles from various news sites and store them into DB
News articles analysis
Eliminate HTML tags
Make morphological analysis to extract nouns, verbs, and adjectives
Calculate tf-idf values of extracted word j for each news article pi
tf idf ij
log( F j 1)
log( Fall )
log
N
Nj
Fj: the frequency of word j appearing on article pi
Fall: the number of all words on pi
N: the number of all articles
Nj: the number of articles including j
Attach a sentiment vector to each news article
Use a sentiment dictionary
11
Sample of sentiment dictionary
e = a, b, c, d
Entry word
(w)
Sentiment (e)
a: Dark
Sentiment (e)
b: Rejection
Sentiment (e)
c: Tension
Sentiment (e)
d: Fear
⇔
⇔
⇔
⇔
Bright
Acceptance
Relaxation
Anger
challenge
0.618
0.687
0.752
0.500
collide
0.344
0.353
0.315
0.529
death
0.28
0.358
derailment
0.31
0.546
0.260
0.364
Oc(death) = 0.260
0.403
0.291
revival
0.91
0.521
0.429
0.000
rich
0.597
0.676
0.761
0.466
• Sentiment value Oe(w) of an entry word w
• A value between 0~1, (e.g., 0: dark, 1: bright)
• Calculated by analyzing co-occurrence with the original sentiment words,
based on 200 million articles of Nikkei newspapers
12
Calculation of Sentiment value Oe(w)
• Sentiments and their corresponding original sentiment words
Sentiment (e = a, b, c, d)
a: Bright ⇔ Dark
Original sentiment words (e1, e2)
bright, glad, happy
dark, sad, painful
b: Acceptance ⇔ Rejection
e1
approval, love, like
reject, aversion, dislike
c: Relaxation ⇔ Tension
e2
comfortable, peaceful, slow
tension, emergency
d: Anger ⇔ Fear
angry, roar
fear, scary, dread
Sentiment value:
Oe ( w)
P(e, w)
df (e & w)
df (e)
P(e1 , w)
df(e):
occurrence times of original sentiment words e
P(e1 , w) P(e2 , w) df(e&w): co-occurrence times of original sentiment
words e and an entry word w
13
Calculation of Sentiment value Oe(w)
• Sentiments and their corresponding original sentiment words
Sentiment (e = a, b, c, d)
a: Bright ⇔ Dark
Original sentiment words (e1, e2)
bright, glad, happy
dark, sad, painful
b: Acceptance ⇔ Rejection
approval, love, like
reject, aversion, dislike
c: Relaxation ⇔ Tension
comfortable, peaceful, slow
tension, emergency
d: Anger ⇔ Fear
angry, roar
fear, scary, dread
Sentiment value of word “death” on the dimension c:
Oc(death) = 0.260
Because df(“comfortable” & “death”), df(“peaceful” & “death”), df(“slow” & “death”)
<<
df(“tension”& “death”), df(“emergency”& “death”)
e1
e2
Sentiment vector O(TEXT) of a news article
a news article text = TEXT
TEXT has the number of n keywords
keywords = {w}
Each sentiment value Oe(TEXT)
n
Oe (TEXT ) Oe ( wi )
n
i 0
Sentiment vector O(w) of the article for the keyword w
O(TEXT ) (Oa (TEXT ), Ob (TEXT ), Oc (TEXT ), Od (TEXT ))
14
Outline
Background
Research goal
System overview
Offline processing
Online processing
Experimental evaluation
Conclusion and future work
15
Online processing
When a user enters query keywords,
1. Retrieve news articles including the keywords
2. Rank articles based on tf-idf values for each news site
3. Calculate the average of sentiment vectors of top n articles for
each site
4. Attach sentiment graphs to corresponding locations of news sites
Also present a list of articles grouped by each site
16
Outline
Background
Research goal
System overview
Offline processing
Online processing
Experimental evaluation
Conclusion and future work
17
Experimental evaluation
Query: Daisuke Matsuzaka
A famous Japanese Major Leaguer
A reviewer read all the retrieved articles of different
news sites and decided the sentiments of each news site
positive, negative or neutral
For comparison, numeric sentiment values given from
our system are categorized to discrete values
positive, negative or neutral
18
19
Experimental evaluation
a: Dark
b: Rejection
c: Tension
d: Fear
⇔
⇔
⇔
⇔
Bright
Acceptance
Relaxation
Anger
reviewer
Bright
Acceptance
Tension
Neutral
Web site 1
Bright
Acceptance
Tension
Neutral
reviewer
Bright
Acceptance
Relaxation
Neutral
Web site 2
Bright
Acceptance
Tension
Neutral
reviewer
Bright
Acceptance
Relaxation
Fear
Web site 3
Bright
Acceptance
Tension
Fear
reviewer
Neutral
Neutral
Neutral
Anger
Web site 4
Dark
Acceptance
Tension
Fear
Precision is about 70%
There exist some distinctions among different news sites
Outline
Background
Research goal
System overview
Offline processing
Online processing
Experimental evaluation
Conclusion and future work
20
Conclusion and future work
21
Conclusion
Developed a system called sentiment map for visualizing the
sentiment distinction of different news sites
Tested its effectiveness
A prototype: http://klab.kyoto-su.ac.jp/~fujita/cgi-bin/Fuzilla/News/
Future work
More experiments
Sentiment analysis of readers and information recommendation
based on it
22
Thank you for your attention
24
Sample of sentiment dictionary
Entry word
(w)
Sentiment (e)
a: Bright ⇔ Dark
Sentiment (e)
b: Acceptance ⇔
Rejection
Sentiment (e)
c: Relaxation ⇔
Tension
Sentiment (e)
d: Anger ⇔ Fear
chosen-suru
(challenge)
0.618
0.687
0.752
0.500
1.399
1.330
1.251
1.090
0.31
0.546
0.403
0.291
0.514
0.603
0.737
0.549
hofu-da
(rich)
0.597
0.676
0.761
0.466
1.416
1.352
1.299
1.109
shibou
(death)
0.28
0.358
0.260
0.364
Sc(death) = 0.260
1.132
1.272
1.306
1.112
Mc(death) = 1.306
shototsu-suru
(collide)
0.344
0.353
0.315
0.529
1.004
1.016
1.099
0.948
sosei
(revival)
0.91
0.521
0.429
0.000
0.464
0.582
0.732
0.328
dassen
(derailment)
e = a, b, c, d
Se(w): impression value
Me(w): weight
Sentiment value Oe(w) of an entry word w
• Original impression words and their correspondence with sentiments
Sentiment (e)
e = a, b, c, d
Original impression words (e1, e2)
a: Bright ⇔
Dark
akarui (bright), ureshii (glad), tanoshii (happy)
b: Acceptance ⇔
Rejection
shonin (approval), aikou (love), suki-da (like)
c: Relaxation ⇔
Tension
yuttari (comfortable), nonbiri (peaceful), yukkuri (slow)
d: Anger ⇔
Fear
kurai (dark), kanashii (sad), kurushii (painful)
e1
kyohi (reject), ken’o (aversion), kirai-da (dislike)
kincho (tension), kinkyuu (emergency)
e2
okoru (angry), dogou (roar)
osoreru (fear), kowai (scary), kyofu (dread)
• Sentiment value Oe(w) of an entry word w
• A value between 1~0, (1: positive, 0: negative)
• Calculated by analyzing the co-occurrence with the original
impression words, based on Nikkei Newspaper Full Text Database
(about 200 million articles)
25
Sentiment value Oe(w) of an entry word w
Sentiment (e)
e = a, b, c, d
Original impression words (e1, e2)
a: Bright ⇔
Dark
akarui (bright), ureshii (glad), tanoshii (happy)
b: Acceptance ⇔
Rejection
shonin (approval), aikou (love), suki-da (like)
c: Relaxation ⇔
Tension
yuttari (comfortable), nonbiri (peaceful), yukkuri (slow)
d: Anger ⇔
Fear
kurai (dark), kanashii (sad), kurushii (painful)
kyohi (reject), ken’o (aversion), kirai-da (dislike)
e2
kincho (tension), kinkyuu (emergency)
okoru (angry), dogou (roar)
osoreru (fear), kowai (scary), kyofu (dread)
oe ( w) Se (w) M e (w)
S e ( w)
e1
P(e1 , w)
P(e1 , w) P(e2 , w)
Se(w): impression value
P(e, w)
Me(w): weight
df (e & w)
df (e)
M e (w) log( df (e1 & w) df (e2 & w))
Sentiment value of word “death” on the dimension c: Oc(death) = 0.260
“comfortable” and “death”, “peaceful” and “death” << “tension” and “death”, “emergency” and “death”
26
A proposition of sentiment map
query is “scandal”
Sentiment map
for each news site
0.5
positive
0
Top ranked articles from
each news site
-0.5 negative
Demonstration
27
28
System overview
Online processing
(Runtime processing)
news sites
Offline processing
(Preprocessing)
Yomiuri
Yomiuri
Asahi
(Osaka) (Tokyo) (Tokyo) ・・・
Web
query
sentiment map
1) retrieve articles from
each news site
2) rank the articles
based on tf-idf in each site
3) calculate the average
of sentiment values
for each site
4) generate a sentiment map
crawling
articles database
(including tf-idf,
sentiment values)
sentiment
dictionary
news articles collection
morphological analysis
tf-idf value calculation
sentiment values calculation