Social Psychological Analysis of Public Political Comments

Download Report

Transcript Social Psychological Analysis of Public Political Comments

Social Psychological Analysis of
Public Political Comments on
Facebook
Márton Miháltz
21st November 2014, Budapest
www.trendminer-project.eu
TrendMiner Overview
• What kind of social political trends are there in Hungarian
comments to political posts on Facebook?
– Facebook in Hungary: 4.27M registered users = 59.2% of internet
users, 43% of total population
• Download all public comments from Hungarian politicians’,
parties’ facebook pages
• Analysis of comments:
–
–
–
–
Basic NLP (tokenization, PoS, stemming), domain-adapted
Entities: political actors (people, organizations)
Sentiment
Social psychology dimensions: agency/communion, individualism/collectivism,
optimism/pessimism, primordial/conceptual thinking
• In cooperation with Narrative Psychology Research Group,
Hungarian Academy of Sciences
2
Data Acquisition
• Get comments via fb Graph API
– 1.9M comments for 141K fb posts (2013.10.01 – 2014.09.02)
– from 1344 fb pages
• Organizations: parties, regional and associated branches
• People: candidate and elected representatives (MPs), government,
party officials
• Official and fan pages
– In 3 categories
• Hungarian parliament 2010-2014
• Hungarian parliament elections 2014 (6th April)
• EU parliament elections 2014 (25th May)
• Sources: valasztas.hu, wikipedia.hu
• Everything in a MySQL database
– For arbitrary queries (political groups, time etc.)
Data model
•
Fb_pages
– Id, URL, Page title
– Type: person or organization
– Affiliated party (3 campaigns)
•
Fb_posts, Fb_comments
– Id, Created_timestamp
– Message text, Author_user_id
•
Comments_annotations
– Sentence_id, Start_token,
End_token index
– Annotated text,
Lemmatized_annotated_text,
Annotation_tag
•
Fb_comments_scores
– 16 scores and counts
(sentiment, RID,, agency,
communion, optimism, …)
Hungarian Political Ontology
• Extending TM multilingual political ontology
– 8 New classes, 3+3 new object/data properties, 1579
new instances (1 Country,18 Party, 661 Politician,
899 Nomination)
– Nominated and elected MPs (2010 Hu. Parl., 2014
Hu. Parl., 2014 EU Parl.), nominating parties;
– Names, abbreviated names, nicknames, Facebook
page URLs etc.
• Example:
5
Hungarian Political Ontology
Example: Benedek Jávor was member of Hungarian
Parliament during 2010-2014 (nominated by
LMP), member of European Parliament from
2014 (nominated by EGYÜTT-PM).
6
Processing Pipeline
•
•
•
•
•
•
•
•
Downloading (Fb Graph API py script)
Tokenizaton (huntoken tool)
PoS-tagging (hunmorph tool)
Morphological analysis (hunmorph tool)
Stem+analysis disambiguation (Python script)
Content analysis (Java NooJ)
Scoring & storage in DB
Uploading in RDF to TM Integration Server
Domain Adaptation
• Problem: existing NLP tools developed on
different domain, (f)ail on social media language
(facebook comments)
• Using corpus for survey:
–
–
–
–
–
1.25M fb comments (29M tokens)
2.25M unknown tokens (694K types)
Frequency list, f > 15 items manually revised
Identify common problems
Lists of frequent, relevant unknown, new words etc.
Domain Adaptation: Tokenization
• Huntoken tool
• Frequent problems:
– missing spaces around punctuation
... end of sentence.Beginning of another ...
– Multiplicated punctuation
first part……. Second part
– Contracted words (slang)
asszem = azt hiszem (“I think”)
– Consonant multiplication (interjections, onomatopeic words etc.)
e.g. pfffffffff, uffffff, ejjjjjjjj (pff(f*), uff(f*), ej(j*))
– split large numbers by decimal groups
125 000
– split URLS
– split emoticons
:D
Domain Adaptation: PoS/stemming
• Hunpos tagger + hunmorph analyzer + stemming script
• Frequent problems:
– Unknown words (no lemma/PoS)
• add to hunmorph analyzer’s lexicon
• using analogous words (morphological paradigm)
• Compounds, abbreviations, acronyms, slang words etc.
– Frequently misspelled word forms:
• replace with correct forms
– Wrong capitalization
e.g. SENTENCES IN ALL CAPS
– Missing accent characters –disambiguation model needed
E.g. kor (age), kór (disease), kör (circle)
NooJ, Java NooJ, Nooj-cmd
• Java NooJ
– Open source version of NooJ: define and run finite state
machines for querying, annotation etc. (morphology, syntax)
– NooJ-Cmd extension: all NooJ GUI features => command line
options
– Open source: https://github.com/tkb-/nooj-cmd
• NooJ grammars (FSMs) for annotation:
–
–
–
–
–
–
Actors (entities)
Emotional valence (sentiment polarity)
Regressive imagery dictionary
Agency-communion
Optimism-pessimism
Individualism-collectivism
Development of NooJ Grammars
• In collaboration with social psychologist researchers
– Social Psychology Department, Eötvös Lóránd University,
Budapest
– Narrative Psychology Research Group, Hungarian Academy of
Sciences
• Development Corpus
– 176K sample fb comments from 570 fb pages (4.9M tokens)
– NLP annotation
– Frequency lists (lemmas, lemmas+PoS, lemmas+morphological
info etc.)
• Development:
–
–
–
–
f > 100 content words from development corpus (3500 types)
7 independent annotators
>= 4 annotartors agree: manual revision
Compile into NooJ grammar with polarity shifters, items to be
excluded etc.
1. Political Actors (NEs)
• Maxent NE tool (huntag): low performance on
domain
– Trained on standard language news texts
– Miscategorization, false positive NEs, entity boundary
recognition problems
• NooJ grammar/lexicon for Trendminer
– Person names:
family_name (given_name_lemmatized)? |
frequent_nicknames …
– Organization names:
Standard_form | abbreviated_forms… | nicknames…
– Created automatically (names from DB) + manually
(nicknames from freq. lists)
2. Emotional Valence
• Emotions with positive or negative polarity
• Polarity in context: recognize negation using simple
rules
• Nouns, adjectives, verbs, adverbs, emoticons, multiword expressions
• 500 Positive, 420 negative entries
3. Regressive Imagery Dictionary
• Martindale (1975, 1990): uncover psychological
processes reflected in the text
• 2 basic categories of thinking:
– Primordial (primary): associative, concrete, and takes little
account of reality (fantasy, dreams)
– Conceptual (secondary): abstract, logical, reality oriented,
aimed at problem solving
• 7+29 more subcategories (social behavior,
cognition, perceptions, sensations etc.)
• Hungarian version by Pólya and Szász
• 3000+ terms
4. Agency/Communion
• 2 fundamental dimensions of social values:
– Communion: moral and emotional aspects of an individual’s
relations to others (affection, expressiveness, cooperation,
social benefit etc.)
– Agency: efficiency of an individual’s goal-orientated behavior
(motivation, competence, control)
• Positive or negative for both dimensions
– Context dependent (e.g. negation)
• 640 expressions
5. Optimism/Pessimism
• Based on PoS and morphology annotations +
time expressions
• 2 measures:
1. |future_tense_verbs| /
(|present_tense_verbs| + |past_tense_verbs|)
2. |present_tense_verbs| /
|past_tense_verbs|
• Both correlate with degree of optimism
6. Individualism/Collectivism
• Based on PoS and morphology annotations
• 1 measure:
|personal pronouns| /
(|verbs with personal inflection| +
|nouns with possessive inflection|)
• Higher score: higher degree of individualism
Visualisation
19
20
21
22
23
Dissemination and Exploitation
• Presentations
– Hungarian NLP Meetup, Sept. 25. 2014., Budapest
– conText, Nov. 20. 2014, Budapest
• Conference papers, presentations
– 2 papers at 11th Conference on Hungarian Computational Linguistics (January
15-16. 2015., Szeged)
• Source code
– https://github.com/mmihaltz/trendminer-hunlp
– https://github.com/mmihaltz/trendminer-hutools
– https://github.com/tkb-/nooj-cmd
• Project website (http://corpus.nytud.hu/trendminer)
– Download political ontology
– Download 1.9M facebook comments corpus (w/ annotations)
– Project info, papers, presentations slides
24
Thank You!
21st November 2014, Budapest
www.trendminer-project.eu