CS 424P/ LINGUIST 287 Extracting Social Meaning and Sentiment Dan Jurafsky Lecture 5: Romantic Interest and Personality.
Download
Report
Transcript CS 424P/ LINGUIST 287 Extracting Social Meaning and Sentiment Dan Jurafsky Lecture 5: Romantic Interest and Personality.
CS 424P/ LINGUIST 287
Extracting Social Meaning and Sentiment
Dan Jurafsky
Lecture 5: Romantic Interest and Personality
Joint work with:
Rajesh Ranganath,
Dan McFarland
Dan Jurafsky, Rajesh Ranganath, and Dan McFarland. 2009.
Extracting Social Meaning: Identifying Interactional Style in
Spoken Conversation. Proceedings of NAACL HLT 2009.
Rajesh Ranganath, Dan Jurafsky, and Dan McFarland. 2009.
It's Not You, it's Me: Detecting Flirting and its Misperception
in Speed-Dates. EMNLP-2009
Detecting social meaning:
our study
Given speech and text from a conversation
Can we detect `styles’, like whether a speaker is
Awkward?
Flirtatious?
Friendly?
Can we tell if the speakers like each other?
Dataset:
991 4-minute “speed-dates”
Each participant rated their partner and themselves for
these styles
Speed dating
Our
speed
date
setup
Our
speed
date
setup
What do you do for fun? Dance?
Uh, dance, uh, I like to go, like camping. Uh, snowboarding, but I'm not
good, but I like to go anyway.
You like boarding.
Yeah. I like to do anything. Like I, I'm up for anything.
Really?
Yeah.
Are you open-minded about most everything?
Not everything, but a lot of stuffWhat is not everything [laugh]
I don't know. Think of something, and I'll say if I do it or not. [laugh]
Okay. [unintelligible].
Skydiving. I wouldn't do skydiving I don't think.
Yeah I'm afraid of heights.
F: Yeah, yeah, me too.
M: [laugh] Are you afraid of heights?
F: [laugh] Yeah [laugh]
The SpeedDate corpus
991 4-minute dates
3 events, each with ~20x20=400 dates, some data loss
Participants: graduate student volunteers in 2005
participated in return for the chance to date
Speech
~60 hours, from shoulder sash recorders; high noise
Transcripts
~800K words, hand-transcribed, w/turn boundary times
Surveys
(Pre-test surveys, event scorecards, post-test surveys)
Date perceptions and follow-up interest
General attitudes, preferences, demographics
Largest experiment with audio, text, + survey info
What we attempted to predict
Conversational style:
How often did you behave in the following ways
on this date?
How often did they behave in the following ways
on this date?
On a scale of 1-10 (1=never, 10=constantly)
1. flirtatious
2. friendly
3. awkward
4. assertive
Features
Prosodic
pitch (min, mean, max, std)
intensity (min, max, mean, std)
duration of turn
rate of speech (words per second)
Dialog
questions
backchannels (“uh-huh”, “yeah”)
appreciations (“Wow!”, “That’s great!”)
Lexical
negative emotion (bad, weird, crazy, hate) words
storytelling words (past tense) + food words (eat, dinner)
love and sexual/emotional words (love, passionate, screw)
personal pronouns (I, you, we, us)
Features extracted within turns
F0 max in
this turn
F0 max in
this turn
F0 min in
this turn
Features: Pitch
F0 min, max, mean
Thus to compute, e.g., F0 min for a conversation side
Take F0 min of each turn (not counting zero values)
Average over all turns in the side
“F0 min, F0 max, F0 mean”
We also compute measures of variation
Standard deviation, pitch range
F0 min sd, F0 max sd, F0 mean sd
pitch range = (f0 max – f0 min)
Features: Other Prosodic
Intensity min, max, mean, std
computed as for pitch
Duration of turn
Total time for conversation side
Rate of speech (words per second)
Prosodic features
Dialog act features
Questions
Laughter
Turns
Backchannels
Uh-huh.
Yeah.
Appreciations
Wow.
# of questions in side
# of instances of laughter in side
total # of turns in a side
# of backchannels in side
Right.
Oh, okay.
# of appreciations in side
That’s true.
Oh, great!
Oh, gosh!
Regular expressions drawn from hand-labeled
Switchboard Dialogue Act Corpus (Jurafsky, Biasca,
Shriberg 1997)
Appreciations
Backchannels
Wow.
Oh, wow.
Uh-huh
That's great.
Yeah
That's good.
Right
That's right.
Oh, no.
Oh
Oh, my goodness.
Yes
That's true.
Huh
Well, that's good.
Oh, yeah
Oh, that's great.
Oh, gosh.
Okay
Great.
Sure
Good.
Really
Oh, my.
Oh, really
Oh, that's good.
I see
Oh, great!
Oh, boy.
yep
I know.
Oh, yeah.
Clarifications
I’ve been
goofing off big
time
You’ve been
what?
I’ve been
goofing off big
time
Collaborative Completion
a turn where a speaker completes the utterance
begun by the alter (Lerner, 1991; Lerner, 1996).
And I’m wearing a
yellow shirt
And black pants
Heuristic:
first word of sentencei
is predictable
from last two words of sentencei-1
(using a trigram grammar trained on Switchboard)
Dialog feature:
Collaborative Completion
Heuristic: first word of sentencei is predictable from
last two words of sentencei-1
Result: Tends to find “locally coherent phrasal answers”
M: What year did you graduate?
F: From high school?
F: What department are you in?
M: The business school.
But not:
F: What department are you in?
M: I’m in the teacher education program.
Disfluency features
UH/UM:
# of filled pauses (uh or um) in side
M: Um, eventually, yeah, but right now I want to get some more experience,
uh, in research.
F: Oh.
M: Uh, so I will probably work for, uh, a research lab for, uh, big companies.
RESTART:
# of disfluent restarts in side
Uh, I–there’s a group of us that came in–
OVERLAP: # of turns in side where speakers overlapped
M: But-and also obviously–
F: It sounds bigger.
M: –people in the CS school are not quite as social in general as other–
Livejournal.com:
I, me, my on or after Sep 11, 2001
Cohn, Mehl, Pennebaker. 2004. Linguistic markers of psychological change
surrounding September 11, 2001. Psychological Science 15, 10: 687-693.
7.2
7.0
6.8
6.6
6.4
6.2
6.0
5.8
s12
s16
Graph from Pennebaker slides
s20
o30-n5
o2-o8
s22
s18
s14
B
s24
o16-o22
September 11 LiveJournal.com study:
We, us, our
Cohn, Mehl, Pennebaker. 2004. Linguistic markers of psychological change
surrounding September 11, 2001. Psychological Science 15, 10: 687-693.
1.1
1.0
.9
.8
.7
.6
.5
B
s14
s12
s18
s16
Graph from Pennebaker slides
s22
s20
o2-o8
s24
o30-n5
o16-o22
LiveJournal.com September 11, 2001 study:
Positive and negative emotion words
Cohn, Mehl, Pennebaker. 2004. Linguistic markers of psychological change
surrounding September 11, 2001. Psychological Science 15, 10: 687-693.
Graph from Pennebaker slides
LIWC
Linguistic Inquiry and Word Count
Pennebaker, Francis, & Booth, 2001
dictionary of 2300 words grouped into > 70 classes
negative emotion (bad, weird, hate, problem, tough)
sexual (love, loves, lover, passion, passionate, sex,)
1st person pronouns (I me mine myself I’d I’ll I’m…)
2nd person pronouns (you, you’d you’ll your you’ve…)
ingest (food, eat, eats, cook, dinner, drink, restaurant…)
swear (hell, sucks, damn, fuck,…)
…
after 9/11
greater negative emotion
more socially engaged
Lexical features
Domain-specific lexical features
via an autoencoder
Our first paper showed lexical features help
but not as much as prosodic or dialog features
Better: data-driven lexical features?
Pilot experiment: Using only Naïve Bayes with word existence
features works better than chance
How do we extract lexical features that we can combine
with the previous features?
Intuition:
Create multinomial vector of all words with counts
Use dimensionality reduction to create a 30-dimensional
vector
Use these 30 dimensions as 30 features
Dimensionality reduction:
autoencoders
Goal: Reduce the lexical
information in the
document to a smaller
number of features.
Autoencoders have
been shown to perform
better than other
compressive techniques
(G. E. Hinton and R. R
Salakhutdinov. 2006).
Autoencoder
A deep belief network (Hinton and Salakhutdinov
2006, Hinton 2007) used to form compact
representations of an input space
The input space, for each conversation:
multinomial distribution (1000 most common words) for
words used by each speaker x 2
Two phases of training:
Pretraining: Use contrastive divergence to train
hierarchichal RBM’s to find a good initial point
Fine-tuning: Use backpropogation to fine tune the
weights
Autoencoder stages
Pre-processing before classifier
training
Standardized all variables to have zero mean and unit
variance
Removed all features correlated greater than .7
To remove colinearity from the regression so weights
could be interpreted
To use less features, since # of training examples was
small
Example: Male Flirtatious
Removed f0 range (correlated with f0 max)
Removed f0 min sd (correlated with f0 min)
Removed Swear (correlated with Anger)
Architecture: 6 binary classifiers
Female
±Awkward, Male ±Awkward,
Female ±Friendly,
Male ±Friendly,
Female ±Flirtatious, Male ±Flirtatious,
Multiple classifier experiments
L1-regularized logistic regression
SVM w/RBF kernel
5-fold cross-validation
tested on held-out test set of 10% highest and 10% lowest
5 folds: 3 train, 1 validation, 1 test
Experiments
K-fold cross validation.
5 folds: 3 train, 1 validation, 1 test
Randomized the data ordering, repeated k-fold cross
validation 25 times.
Feature weights (θ)
We calculated a separate θ for each randomized run.
Resulting in a vector of weights for each feature.
We kept any features if the median of its weight vector
was non-zero
Illustrating features: 10 most
significant features, 1 not
For male flirtation intention
Results with SVM:
predicting flirt intention
Using my speech to predict whether I say I am
flirting
I say I’m
flirting
Male
speaker
72%
Female
speaker
76%
Results with SVM:
Predicting flirt perception
Using my speech to predict whether partner says I
am flirting
Male
speaker
Partner says 80%
I’m flirting
Female
speaker
68%
Summary: flirt detection
Using my speech to predict whether I am flirting
Male
speaker
72%
I say I’m
flirting
Partner says 80%
I’m flirting
Female
speaker
76%
68%
Fine, but how good is 72 or 76?
In speech we generally use human
performance as a “ceiling”
Checking human performance:
If John says Jane is flirting
And Jane says Jane is flirting
Then we say John is right.
Details of human experiment
We converted the Likert values to a binary classification by
splitting the space around the mid-value
John thinks Jane is flirting:
If John’s Likert (1-10) value for “Jane flirting” is > 5
We evaluate John
By comparing John’s perception to Jane’s intention
We used only the relatively certain cases of intention
Computed by taking the top 10%/bottom 10% of intention ratings
(We also tried other ways to derive binary classes like
median, z-scores, etc. this was the most generous to the
humans)
Fine, but how good is 72 or 76?
In NLP we use human performance as a “ceiling”
Checking human performance:
If John says Jane is flirting
And Jane says Jane is flirting
Then we say John is right.
Male speaker
Female speaker
(female perceiver) (male perceiver)
64%
57%
Implication #1
Females are better than males at
detecting flirting
or males give off clearer flirting cues
Male speaker
Female speaker
(female perceiver) (male perceiver)
64%
57%
Implication #2: Machines are better
than humans at detecting flirting
Computer
detector
Human
detector
Overall Male
Female
speaker speaker
74%
72%
76%
61%
64%
57%
How can this be?
Why are humans so bad at detecting flirtation?
(Busso and Narayanan 2008: similar result for emotion detection)
Our Intuition:
I am flirting Other is flirting
Male 101 says:
8
7
Female 127
says:
1
1
What correlates with my perception
of others flirting
Pearson correlation coefficients
Variable
How I see other flirting
&
How other sees themself flirting
How I see other flirting
&
How I see myself flirting
ρ
.15
.73
What correlates with my perception
of others style
Pearson correlation coefficients
Variable
My perception of other
& self-intention
My perception of other
& other-intention
Flirting
.73
.15
Friendly
.77
.05
Awkward .58
.07
Assertive .58
.09
“It’s not you, it’s me”
My perception of whether my date is flirting
Is the same as my perception of whether I am
flirting
Why?
Speakers aren’t very good at capturing
intentions of others in 4 minutes
Speakers instead base judgments on their own
behavior/intentions
What about the features?
How much do autoencoders help?
SVM
+autoencoder
Male Intention
66%
72%
Female Intention
72%
76%
Male Perception
77%
80%
Female Perception 60%
68%
Likely (positive or negative) words from
one of the 30 autoencoder features
More likely to flirt:
S_phone
O_phone
S_party
S_girl
O_girl
S_dating
S_hate
S_weird
S_dating
O_party
Less likely to flirt:
O_academia
S_academia
S_interview
S_teacher
O_phd
O_advisor
O_lab
S_research
S_management
O_management
Intention Regression weights -Men
a
Intention regression weights women
Gender differences in flirt intention
Both genders when flirting:
use words related to negative emotion
especially men
Women when flirting:
use words related to love or sex
use appreciations
laugh, and use I
Men when flirting:
raise their pitch floor
are more fluent
What are these“negative emotion”
words we use when flirting?
M: “Oh wow, that’s terrible”
M: “That is awful”
M: “Wow, are you serious?”
M: “Yeah, like, I hated it too”
F: That’s crazy.
M: It’s like kind of weird
Sympathy!
What are these“love/sex” words
women use when flirting?
love, loved, loves, passion, passionate
Well, I love to cook.
I really love San Francisco.
Oh, I love that show
…my passion is teaching.
…cooking is my passion.
Um, right now I’m passionate about getting
through my first year of my PhD program.
Strong positive affect toward
hobbies or interests!
Missing the cues!!
Men think women are flirting when women:
use love/sex words,
tell stories
have higher pitch max,
vary their loudness.
But women who are flirting actually:
use love/sex words [men get this right]
use more I
laugh more
use more appreciations
Missing the cues!!
Women think men are flirting when:
men ask questions
men speak faster.
But men who are flirting actually:
raise their pitch floor
are sympathetic
are more fluent
What about friendliness,
awkwardness, etc?
Detecting awkward and friendly
speakers
Using what I do & what my date does to predict what
my date calls me
Simpler (logistic regression) classifier
Awkward
Friendly
M
F
M
F
51
72
68
64
73
75
Using speaker 63%
words/speech
+ partner
64
words/speech
What makes someone seem friendly?
“Collaborative conversational style”
Related to the “collaborative floor” of Edelsky (1981), Coates (1996)
Collaborative completions (Lerner 1991, 1996)
M: And I’m wearing a green shirt.
F: And blue pants.
Clarifications
F: I'm working at Pottery Barn this summer.
M: I'm sorry, who?
Other questions
You
Laughter
Plus perhaps
Appreciations (for women)
Overlaps (for men)
What makes a man seem awkward?
More disfluent
Increased uh/um and restarts
Not collaborative conversationalists
(no appreciations, repair questions, collab completions,
you)
Take fewer turns
Don’t overlap
(Prosodically hard to characterize)
Work in progress:
Can we predict liking?
That is, can we predict the binary variable:
‘willing to give this person my email’
Either for a single speaker (baseline 53%=no)
Or for a dyad (baseline 81% = no)
What you do when you like someone:
Preliminary results
Men when they like their date
use more appreciations (“Great!”, “Wow!”,
“That’s cool”)
Women when they like their date
vary their pitch and loudness more,
raise their max pitch
laugh
tell stories
Who do you say yes to?
Preliminary results
Men say yes to women who:
show interest by asking clarification
questions (“excuse me?”)
use “love” and “passion”
talk about food
Women say yes to men who:
don’t use appreciations
talk about food
tell stories
laugh
Current work: Accommodation
In general, speakers change their behavior to match
(or not match) their interlocutor
Natale 1975, Giles, Mulac, Bradac, & Johnson 1987, Bilous & Krauss
1988, Giles, Coupland, and Coupland, 1991, Giles and Coupland
1992, Niederhoffer and Pennebaker 2002, Pardo 2006, Nenkova
and Hirschberg 2008, inter alia.
Matching rate of speech
Matching F0
Matching intensity (loudness)
Matching vocabulary and grammar
Matching dialect
Our question:
Do we see more accommodation when people like each
other?
Future: New variables!
“How would you rate the other person on each of the
following attributes? (1=not at all, 10=very much)”
Attractive
Sincere
Intelligent
Funny
Ambitious
Courteous
Conclusions – for daters
Talking about your advisor is a bad idea
on a date
Sympathy is a good idea, if you’re a guy
Passion is good, if you’re a woman
Food is good, if you eat
Conclusions – for psychology
Humans project their internal
state on others
Men and women (at least in 4
minutes) seem to focus on the
wrong verbal cues to flirtation
Conclusions – for computer science
We can do automatic extraction of rich
social variables from speech and text.
For at least this variable (“does speaker
intend to flirt”) we beat human
performance
Work in progress:
Flirting for fun and for real
“Flirting but not interested” -> “For Fun Flirting”
“Flirting and interested” -> “For Real Flirting”
For fun flirters
Men: raise min pitch
Men: use more “we”
Women: laugh
For real flirters
Men + Women: “love”, “passionate”, “sexy”
Women: eating words
Men: use less “we” and less hedges (“I think”)
I think: softener, but also characteristic of formal situations and
middle class speech
Work in progress:
laughter and irony
more on hedges
http://blog.okcupid.com/index.php/online-dating-
advice-exactly-what-to-say-in-a-first-message/
Part II: Personality
Personality and Cultural Values
Personality refers to the structures and propensities
inside a person that explain his or her characteristic
patterns of thought, emotion, and behavior.
Personality captures what people are like.
Traits are defined as recurring regularities or trends
in people’s responses to their environment.
Cultural values, defined as shared beliefs about desirable
end states or modes of conduct in a given culture,
influence the expression of a person’s traits.
McGraw-Hill/Irwin Chapter 9
The Big Five Dimensions of
Personality
Extraversion vs. Introversion
(sociable, assertive, playful vs. aloof, reserved, shy)
Emotional stability vs. Neuroticism
(calm, unemotional vs. insecure, anxious)
Agreeableness vs. Disagreeable
(friendly, cooperative vs. antagonistic, faultfinding)
Conscientiousness vs. Unconscientious
(self-disciplined, organised vs. inefficient, careless)
Openness to experience
(intellectual, insightful vs. shallow, unimaginative)
73
Aside: Do Animals Have
Personalities?
Gosling (1998) studied spotted hyenas. He:
had human observers use personality scales to
rate the different hyenas in the group
did a factor analysis on these findings
found five dimensions
three closely resembled the Big Five traits of
neuroticism, openness to experience, and
agreeableness
Slide from Randall E. Osborne
74
BFI – Big Five Inventory –John et al.
http://www.outofservice.com/bigfive/
The Big Five Personality Traits
Conscientiousness - dependable, organized, reliable,
ambitious, hardworking, and persevering.
McGraw-Hill/Irwin Chapter 9
The Big Five Personality Traits,
Cont’d
Agreeableness - warm, kind, cooperative, sympathetic,
helpful, and courteous.
Prioritize communion striving, which reflects a strong desire
to obtain acceptance in personal relationships as a means of
expressing personality.
Agreeable people focus on “getting along,” not necessarily
“getting ahead.”
McGraw-Hill/Irwin Chapter 9
The Big Five Personality Traits,
Cont’d
Extraversion - talkative, sociable, passionate,
assertive, bold, and dominant.
Easiest to judge in zero acquaintance situations —
situations in which two people have only just met.
Prioritize status striving, which reflects a strong desire
to obtain power and influence within a social structure
as a means of expressing personality.
Tend to be high in what’s called positive affectivity — a
dispositional tendency to experience pleasant, engaging
moods such as enthusiasm, excitement, and elation.
McGraw-Hill/Irwin Chapter 9
The Big Five Personality Traits,
Cont’d
Neuroticism - nervous, moody, emotional,
insecure, and jealous.
Synonymous with negative affectivity —a dispositional
tendency to experience unpleasant moods such as
hostility, nervousness, and annoyance.
Associated with a differential exposure to stressors,
meaning that neurotic people are more likely to appraise
day-to-day situations as stressful.
Associated with a differential reactivity to stressors,
meaning that neurotic people are less likely to believe
they can cope with the stressors that they experience.
McGraw-Hill/Irwin Chapter 9
The Big Five Personality Traits,
Cont’d
Neuroticism, continued
Neuroticism is also strongly related to locus of control,
which reflects whether people attribute the causes of
events to themselves or to the external environment.
Tend to hold an external locus of control, meaning that they
often believe that the events that occur around them are driven
by luck, chance, or fate.
Less neurotic people tend to hold an internal locus of control,
meaning that they believe that their own behavior dictates
events.
McGraw-Hill/Irwin Chapter 9
External and Internal Locus of
Control
McGraw-Hill/Irwin Chapter 9
The Big Five Personality Traits,
Cont’d
Openness to experience - curious, imaginative, creative,
complex, refined, and sophisticated.
Also called “Inquisitiveness” or “Intellectualness” or even
“Culture.”
Openness to experience is also more likely to be valuable in
jobs that require high levels of creativity, defined as the
capacity to generate novel and useful ideas and solutions.
Highly open individuals are more likely to migrate into artistic
and scientific fields.
McGraw-Hill/Irwin Chapter 9
Changes in Big Five Dimensions
Over the Life Span
McGraw-Hill/Irwin Chapter 9
Personality demo
Demo:
http://mi.eng.cam.ac.uk/~farm2/personality/demo.html: find
your personality type
11/7/2015
Relationship between Dating and
Personality studies
Observed versus self-reports
Agreeableness (in Mairesse et al) and Friendliness (in
Jurafsky et al):
as
Pickiness in Dating
Finkel and Eastwick 2009, Psych Science
Men are less selective than women in speed dating
Novel explanation: act of physically approaching a
partner increases attraction to that partner
traditional events, always men rotates
Ran 15 speed dating events
in 8, men rotated: men more selective
in 7, women rotated: men equally selective to women
Conclusion?