Using a Sentiment Map for Visualizing Credibility of News

Download Report

Transcript Using a Sentiment Map for Visualizing Credibility of News

Using a Sentiment Map for Visualizing
Credibility of News Sites on the Web
Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**,
Jianwei Zhang*, Katsumi Tanaka***
* Kyoto Sangyo University, Japan
** Chiba Institute of Technology, Japan
*** Kyoto University, Japan
1
Outline
 Background
 Research goal
 System overview
 Offline processing
 Online processing
 Experimental evaluation
 Conclusion and future work
2
3
Background
A question:
What is your attitude towards
“Iraq war”?
agree or disagree?
 To answer this question, I want to read some news to have
an opinion about this topic.
 Rapid spread of web news sites (e.g., MSN, GoogleNews)
 Different sites may have different opinions about the topic
4
Background
???
News Site
If it is a prowar site
If it is
an anti-war
site
Is the Iraq war
right or
wrong?
I agree
this war
I disagree
this war
A misconception
may be caused,
if sites’
tendencies are
not known in
advance
Sentiment tendencies of sites
???
Is the Iraq war
right or
wrong?
Site A
Site B
positive
negative
positive
negative
Well, I have now
opinions on different
sites
This may cause a
more fair-minded
judgment
Information
credibility
is improved
Outline
 Background
 Research goal
 System overview
 Offline processing
 Online processing
 Experimental evaluation
 Conclusion and future work
5
A concept of sentiment map
Mapping Graph of
sentiment based on
location
A query is “Iraq war”
Positive
Negative
Top ranked articles from
each news site
Demonstration
6
Outline
 Background
 Research goal
 System overview
 Offline processing
 Online processing
 Experimental evaluation
 Conclusion and future work
7
8
System overview
Online processing
(Runtime processing)
news sites
Offline processing
(Preprocessing)
Yomiuri
Yomiuri
Asahi
(Osaka) (Tokyo) (Tokyo) ・・・
Web
query
sentiment map
1) retrieve articles from
each news site
2) rank the articles
based on tf-idf in each site
crawling
articles database
(including tf-idf,
sentiment values)
3) calculate the average
of sentiment values
for each site
4) generate a sentiment map
sentiment
dictionary
news articles collection
morphological analysis
tf-idf value calculation
sentiment values calculation
Outline
 Background
 Research goal
 System overview
 Offline processing
 Online processing
 Experimental evaluation
 Conclusion and future work
9
10
Offline processing
 News articles collection
 Crawl news articles from various news sites and store them into DB
 News articles analysis
 Eliminate HTML tags
 Make morphological analysis to extract nouns, verbs, and adjectives
 Calculate tf-idf values of extracted word j for each news article pi
tf  idf ij 
log( F j  1)
log( Fall )
 log
N
Nj
Fj: the frequency of word j appearing on article pi
Fall: the number of all words on pi
N: the number of all articles
Nj: the number of articles including j
 Attach a sentiment vector to each news article
 Use a sentiment dictionary
11
Sample of sentiment dictionary
e = a, b, c, d
Entry word
(w)
Sentiment (e)
a: Dark
Sentiment (e)
b: Rejection
Sentiment (e)
c: Tension
Sentiment (e)
d: Fear
⇔
⇔
⇔
⇔
Bright
Acceptance
Relaxation
Anger
challenge
0.618
0.687
0.752
0.500
collide
0.344
0.353
0.315
0.529
death
0.28
0.358
derailment
0.31
0.546
0.260
0.364
Oc(death) = 0.260
0.403
0.291
revival
0.91
0.521
0.429
0.000
rich
0.597
0.676
0.761
0.466
• Sentiment value Oe(w) of an entry word w
• A value between 0~1, (e.g., 0: dark, 1: bright)
• Calculated by analyzing co-occurrence with the original sentiment words,
based on 200 million articles of Nikkei newspapers
12
Calculation of Sentiment value Oe(w)
• Sentiments and their corresponding original sentiment words
Sentiment (e = a, b, c, d)
a: Bright ⇔ Dark
Original sentiment words (e1, e2)
bright, glad, happy
dark, sad, painful
b: Acceptance ⇔ Rejection
e1
approval, love, like
reject, aversion, dislike
c: Relaxation ⇔ Tension
e2
comfortable, peaceful, slow
tension, emergency
d: Anger ⇔ Fear
angry, roar
fear, scary, dread
Sentiment value:
Oe ( w) 
P(e, w) 
df (e & w)
df (e)
P(e1 , w)
df(e):
occurrence times of original sentiment words e
P(e1 , w)  P(e2 , w) df(e&w): co-occurrence times of original sentiment
words e and an entry word w
13
Calculation of Sentiment value Oe(w)
• Sentiments and their corresponding original sentiment words
Sentiment (e = a, b, c, d)
a: Bright ⇔ Dark
Original sentiment words (e1, e2)
bright, glad, happy
dark, sad, painful
b: Acceptance ⇔ Rejection
approval, love, like
reject, aversion, dislike
c: Relaxation ⇔ Tension
comfortable, peaceful, slow
tension, emergency
d: Anger ⇔ Fear
angry, roar
fear, scary, dread
Sentiment value of word “death” on the dimension c:
Oc(death) = 0.260
Because df(“comfortable” & “death”), df(“peaceful” & “death”), df(“slow” & “death”)
<<
df(“tension”& “death”), df(“emergency”& “death”)
e1
e2
Sentiment vector O(TEXT) of a news article
 a news article text = TEXT
 TEXT has the number of n keywords
 keywords = {w}
 Each sentiment value Oe(TEXT)
n
Oe (TEXT )   Oe ( wi )
n
i 0
 Sentiment vector O(w) of the article for the keyword w
O(TEXT )  (Oa (TEXT ), Ob (TEXT ), Oc (TEXT ), Od (TEXT ))
14
Outline
 Background
 Research goal
 System overview
 Offline processing
 Online processing
 Experimental evaluation
 Conclusion and future work
15
Online processing
 When a user enters query keywords,
1. Retrieve news articles including the keywords
2. Rank articles based on tf-idf values for each news site
3. Calculate the average of sentiment vectors of top n articles for
each site
4. Attach sentiment graphs to corresponding locations of news sites
 Also present a list of articles grouped by each site
16
Outline
 Background
 Research goal
 System overview
 Offline processing
 Online processing
 Experimental evaluation
 Conclusion and future work
17
Experimental evaluation
 Query: Daisuke Matsuzaka
 A famous Japanese Major Leaguer
 A reviewer read all the retrieved articles of different
news sites and decided the sentiments of each news site
 positive, negative or neutral
 For comparison, numeric sentiment values given from
our system are categorized to discrete values
 positive, negative or neutral
18
19
Experimental evaluation
a: Dark
b: Rejection
c: Tension
d: Fear
⇔
⇔
⇔
⇔
Bright
Acceptance
Relaxation
Anger
reviewer
Bright
Acceptance
Tension
Neutral
Web site 1
Bright
Acceptance
Tension
Neutral
reviewer
Bright
Acceptance
Relaxation
Neutral
Web site 2
Bright
Acceptance
Tension
Neutral
reviewer
Bright
Acceptance
Relaxation
Fear
Web site 3
Bright
Acceptance
Tension
Fear
reviewer
Neutral
Neutral
Neutral
Anger
Web site 4
Dark
Acceptance
Tension
Fear
 Precision is about 70%
 There exist some distinctions among different news sites
Outline
 Background
 Research goal
 System overview
 Offline processing
 Online processing
 Experimental evaluation
 Conclusion and future work
20
Conclusion and future work
21
 Conclusion
 Developed a system called sentiment map for visualizing the
sentiment distinction of different news sites
 Tested its effectiveness
 A prototype: http://klab.kyoto-su.ac.jp/~fujita/cgi-bin/Fuzilla/News/
 Future work
 More experiments
 Sentiment analysis of readers and information recommendation
based on it
22
Thank you for your attention
24
Sample of sentiment dictionary
Entry word
(w)
Sentiment (e)
a: Bright ⇔ Dark
Sentiment (e)
b: Acceptance ⇔
Rejection
Sentiment (e)
c: Relaxation ⇔
Tension
Sentiment (e)
d: Anger ⇔ Fear
chosen-suru
(challenge)
0.618
0.687
0.752
0.500
1.399
1.330
1.251
1.090
0.31
0.546
0.403
0.291
0.514
0.603
0.737
0.549
hofu-da
(rich)
0.597
0.676
0.761
0.466
1.416
1.352
1.299
1.109
shibou
(death)
0.28
0.358
0.260
0.364
Sc(death) = 0.260
1.132
1.272
1.306
1.112
Mc(death) = 1.306
shototsu-suru
(collide)
0.344
0.353
0.315
0.529
1.004
1.016
1.099
0.948
sosei
(revival)
0.91
0.521
0.429
0.000
0.464
0.582
0.732
0.328
dassen
(derailment)
e = a, b, c, d
Se(w): impression value
Me(w): weight
Sentiment value Oe(w) of an entry word w
• Original impression words and their correspondence with sentiments
Sentiment (e)
e = a, b, c, d
Original impression words (e1, e2)
a: Bright ⇔
Dark
akarui (bright), ureshii (glad), tanoshii (happy)
b: Acceptance ⇔
Rejection
shonin (approval), aikou (love), suki-da (like)
c: Relaxation ⇔
Tension
yuttari (comfortable), nonbiri (peaceful), yukkuri (slow)
d: Anger ⇔
Fear
kurai (dark), kanashii (sad), kurushii (painful)
e1
kyohi (reject), ken’o (aversion), kirai-da (dislike)
kincho (tension), kinkyuu (emergency)
e2
okoru (angry), dogou (roar)
osoreru (fear), kowai (scary), kyofu (dread)
• Sentiment value Oe(w) of an entry word w
• A value between 1~0, (1: positive, 0: negative)
• Calculated by analyzing the co-occurrence with the original
impression words, based on Nikkei Newspaper Full Text Database
(about 200 million articles)
25
Sentiment value Oe(w) of an entry word w
Sentiment (e)
e = a, b, c, d
Original impression words (e1, e2)
a: Bright ⇔
Dark
akarui (bright), ureshii (glad), tanoshii (happy)
b: Acceptance ⇔
Rejection
shonin (approval), aikou (love), suki-da (like)
c: Relaxation ⇔
Tension
yuttari (comfortable), nonbiri (peaceful), yukkuri (slow)
d: Anger ⇔
Fear
kurai (dark), kanashii (sad), kurushii (painful)
kyohi (reject), ken’o (aversion), kirai-da (dislike)
e2
kincho (tension), kinkyuu (emergency)
okoru (angry), dogou (roar)
osoreru (fear), kowai (scary), kyofu (dread)
oe ( w)  Se (w)  M e (w)
S e ( w) 
e1
P(e1 , w)
P(e1 , w)  P(e2 , w)
Se(w): impression value
P(e, w) 
Me(w): weight
df (e & w)
df (e)
M e (w)  log( df (e1 & w)  df (e2 & w))
Sentiment value of word “death” on the dimension c: Oc(death) = 0.260
“comfortable” and “death”, “peaceful” and “death” << “tension” and “death”, “emergency” and “death”
26
A proposition of sentiment map
query is “scandal”
Sentiment map
for each news site
0.5
positive
0
Top ranked articles from
each news site
-0.5 negative
Demonstration
27
28
System overview
Online processing
(Runtime processing)
news sites
Offline processing
(Preprocessing)
Yomiuri
Yomiuri
Asahi
(Osaka) (Tokyo) (Tokyo) ・・・
Web
query
sentiment map
1) retrieve articles from
each news site
2) rank the articles
based on tf-idf in each site
3) calculate the average
of sentiment values
for each site
4) generate a sentiment map
crawling
articles database
(including tf-idf,
sentiment values)
sentiment
dictionary
news articles collection
morphological analysis
tf-idf value calculation
sentiment values calculation