CS 424P/ LINGUIST 287 Extracting Social Meaning and Sentiment Dan Jurafsky Lecture 5: Romantic Interest and Personality.
Download ReportTranscript CS 424P/ LINGUIST 287 Extracting Social Meaning and Sentiment Dan Jurafsky Lecture 5: Romantic Interest and Personality.
CS 424P/ LINGUIST 287 Extracting Social Meaning and Sentiment Dan Jurafsky Lecture 5: Romantic Interest and Personality Joint work with: Rajesh Ranganath, Dan McFarland Dan Jurafsky, Rajesh Ranganath, and Dan McFarland. 2009. Extracting Social Meaning: Identifying Interactional Style in Spoken Conversation. Proceedings of NAACL HLT 2009. Rajesh Ranganath, Dan Jurafsky, and Dan McFarland. 2009. It's Not You, it's Me: Detecting Flirting and its Misperception in Speed-Dates. EMNLP-2009 Detecting social meaning: our study Given speech and text from a conversation Can we detect `styles’, like whether a speaker is Awkward? Flirtatious? Friendly? Can we tell if the speakers like each other? Dataset: 991 4-minute “speed-dates” Each participant rated their partner and themselves for these styles Speed dating Our speed date setup Our speed date setup What do you do for fun? Dance? Uh, dance, uh, I like to go, like camping. Uh, snowboarding, but I'm not good, but I like to go anyway. You like boarding. Yeah. I like to do anything. Like I, I'm up for anything. Really? Yeah. Are you open-minded about most everything? Not everything, but a lot of stuffWhat is not everything [laugh] I don't know. Think of something, and I'll say if I do it or not. [laugh] Okay. [unintelligible]. Skydiving. I wouldn't do skydiving I don't think. Yeah I'm afraid of heights. F: Yeah, yeah, me too. M: [laugh] Are you afraid of heights? F: [laugh] Yeah [laugh] The SpeedDate corpus 991 4-minute dates 3 events, each with ~20x20=400 dates, some data loss Participants: graduate student volunteers in 2005 participated in return for the chance to date Speech ~60 hours, from shoulder sash recorders; high noise Transcripts ~800K words, hand-transcribed, w/turn boundary times Surveys (Pre-test surveys, event scorecards, post-test surveys) Date perceptions and follow-up interest General attitudes, preferences, demographics Largest experiment with audio, text, + survey info What we attempted to predict Conversational style: How often did you behave in the following ways on this date? How often did they behave in the following ways on this date? On a scale of 1-10 (1=never, 10=constantly) 1. flirtatious 2. friendly 3. awkward 4. assertive Features Prosodic pitch (min, mean, max, std) intensity (min, max, mean, std) duration of turn rate of speech (words per second) Dialog questions backchannels (“uh-huh”, “yeah”) appreciations (“Wow!”, “That’s great!”) Lexical negative emotion (bad, weird, crazy, hate) words storytelling words (past tense) + food words (eat, dinner) love and sexual/emotional words (love, passionate, screw) personal pronouns (I, you, we, us) Features extracted within turns F0 max in this turn F0 max in this turn F0 min in this turn Features: Pitch F0 min, max, mean Thus to compute, e.g., F0 min for a conversation side Take F0 min of each turn (not counting zero values) Average over all turns in the side “F0 min, F0 max, F0 mean” We also compute measures of variation Standard deviation, pitch range F0 min sd, F0 max sd, F0 mean sd pitch range = (f0 max – f0 min) Features: Other Prosodic Intensity min, max, mean, std computed as for pitch Duration of turn Total time for conversation side Rate of speech (words per second) Prosodic features Dialog act features Questions Laughter Turns Backchannels Uh-huh. Yeah. Appreciations Wow. # of questions in side # of instances of laughter in side total # of turns in a side # of backchannels in side Right. Oh, okay. # of appreciations in side That’s true. Oh, great! Oh, gosh! Regular expressions drawn from hand-labeled Switchboard Dialogue Act Corpus (Jurafsky, Biasca, Shriberg 1997) Appreciations Backchannels Wow. Oh, wow. Uh-huh That's great. Yeah That's good. Right That's right. Oh, no. Oh Oh, my goodness. Yes That's true. Huh Well, that's good. Oh, yeah Oh, that's great. Oh, gosh. Okay Great. Sure Good. Really Oh, my. Oh, really Oh, that's good. I see Oh, great! Oh, boy. yep I know. Oh, yeah. Clarifications I’ve been goofing off big time You’ve been what? I’ve been goofing off big time Collaborative Completion a turn where a speaker completes the utterance begun by the alter (Lerner, 1991; Lerner, 1996). And I’m wearing a yellow shirt And black pants Heuristic: first word of sentencei is predictable from last two words of sentencei-1 (using a trigram grammar trained on Switchboard) Dialog feature: Collaborative Completion Heuristic: first word of sentencei is predictable from last two words of sentencei-1 Result: Tends to find “locally coherent phrasal answers” M: What year did you graduate? F: From high school? F: What department are you in? M: The business school. But not: F: What department are you in? M: I’m in the teacher education program. Disfluency features UH/UM: # of filled pauses (uh or um) in side M: Um, eventually, yeah, but right now I want to get some more experience, uh, in research. F: Oh. M: Uh, so I will probably work for, uh, a research lab for, uh, big companies. RESTART: # of disfluent restarts in side Uh, I–there’s a group of us that came in– OVERLAP: # of turns in side where speakers overlapped M: But-and also obviously– F: It sounds bigger. M: –people in the CS school are not quite as social in general as other– Livejournal.com: I, me, my on or after Sep 11, 2001 Cohn, Mehl, Pennebaker. 2004. Linguistic markers of psychological change surrounding September 11, 2001. Psychological Science 15, 10: 687-693. 7.2 7.0 6.8 6.6 6.4 6.2 6.0 5.8 s12 s16 Graph from Pennebaker slides s20 o30-n5 o2-o8 s22 s18 s14 B s24 o16-o22 September 11 LiveJournal.com study: We, us, our Cohn, Mehl, Pennebaker. 2004. Linguistic markers of psychological change surrounding September 11, 2001. Psychological Science 15, 10: 687-693. 1.1 1.0 .9 .8 .7 .6 .5 B s14 s12 s18 s16 Graph from Pennebaker slides s22 s20 o2-o8 s24 o30-n5 o16-o22 LiveJournal.com September 11, 2001 study: Positive and negative emotion words Cohn, Mehl, Pennebaker. 2004. Linguistic markers of psychological change surrounding September 11, 2001. Psychological Science 15, 10: 687-693. Graph from Pennebaker slides LIWC Linguistic Inquiry and Word Count Pennebaker, Francis, & Booth, 2001 dictionary of 2300 words grouped into > 70 classes negative emotion (bad, weird, hate, problem, tough) sexual (love, loves, lover, passion, passionate, sex,) 1st person pronouns (I me mine myself I’d I’ll I’m…) 2nd person pronouns (you, you’d you’ll your you’ve…) ingest (food, eat, eats, cook, dinner, drink, restaurant…) swear (hell, sucks, damn, fuck,…) … after 9/11 greater negative emotion more socially engaged Lexical features Domain-specific lexical features via an autoencoder Our first paper showed lexical features help but not as much as prosodic or dialog features Better: data-driven lexical features? Pilot experiment: Using only Naïve Bayes with word existence features works better than chance How do we extract lexical features that we can combine with the previous features? Intuition: Create multinomial vector of all words with counts Use dimensionality reduction to create a 30-dimensional vector Use these 30 dimensions as 30 features Dimensionality reduction: autoencoders Goal: Reduce the lexical information in the document to a smaller number of features. Autoencoders have been shown to perform better than other compressive techniques (G. E. Hinton and R. R Salakhutdinov. 2006). Autoencoder A deep belief network (Hinton and Salakhutdinov 2006, Hinton 2007) used to form compact representations of an input space The input space, for each conversation: multinomial distribution (1000 most common words) for words used by each speaker x 2 Two phases of training: Pretraining: Use contrastive divergence to train hierarchichal RBM’s to find a good initial point Fine-tuning: Use backpropogation to fine tune the weights Autoencoder stages Pre-processing before classifier training Standardized all variables to have zero mean and unit variance Removed all features correlated greater than .7 To remove colinearity from the regression so weights could be interpreted To use less features, since # of training examples was small Example: Male Flirtatious Removed f0 range (correlated with f0 max) Removed f0 min sd (correlated with f0 min) Removed Swear (correlated with Anger) Architecture: 6 binary classifiers Female ±Awkward, Male ±Awkward, Female ±Friendly, Male ±Friendly, Female ±Flirtatious, Male ±Flirtatious, Multiple classifier experiments L1-regularized logistic regression SVM w/RBF kernel 5-fold cross-validation tested on held-out test set of 10% highest and 10% lowest 5 folds: 3 train, 1 validation, 1 test Experiments K-fold cross validation. 5 folds: 3 train, 1 validation, 1 test Randomized the data ordering, repeated k-fold cross validation 25 times. Feature weights (θ) We calculated a separate θ for each randomized run. Resulting in a vector of weights for each feature. We kept any features if the median of its weight vector was non-zero Illustrating features: 10 most significant features, 1 not For male flirtation intention Results with SVM: predicting flirt intention Using my speech to predict whether I say I am flirting I say I’m flirting Male speaker 72% Female speaker 76% Results with SVM: Predicting flirt perception Using my speech to predict whether partner says I am flirting Male speaker Partner says 80% I’m flirting Female speaker 68% Summary: flirt detection Using my speech to predict whether I am flirting Male speaker 72% I say I’m flirting Partner says 80% I’m flirting Female speaker 76% 68% Fine, but how good is 72 or 76? In speech we generally use human performance as a “ceiling” Checking human performance: If John says Jane is flirting And Jane says Jane is flirting Then we say John is right. Details of human experiment We converted the Likert values to a binary classification by splitting the space around the mid-value John thinks Jane is flirting: If John’s Likert (1-10) value for “Jane flirting” is > 5 We evaluate John By comparing John’s perception to Jane’s intention We used only the relatively certain cases of intention Computed by taking the top 10%/bottom 10% of intention ratings (We also tried other ways to derive binary classes like median, z-scores, etc. this was the most generous to the humans) Fine, but how good is 72 or 76? In NLP we use human performance as a “ceiling” Checking human performance: If John says Jane is flirting And Jane says Jane is flirting Then we say John is right. Male speaker Female speaker (female perceiver) (male perceiver) 64% 57% Implication #1 Females are better than males at detecting flirting or males give off clearer flirting cues Male speaker Female speaker (female perceiver) (male perceiver) 64% 57% Implication #2: Machines are better than humans at detecting flirting Computer detector Human detector Overall Male Female speaker speaker 74% 72% 76% 61% 64% 57% How can this be? Why are humans so bad at detecting flirtation? (Busso and Narayanan 2008: similar result for emotion detection) Our Intuition: I am flirting Other is flirting Male 101 says: 8 7 Female 127 says: 1 1 What correlates with my perception of others flirting Pearson correlation coefficients Variable How I see other flirting & How other sees themself flirting How I see other flirting & How I see myself flirting ρ .15 .73 What correlates with my perception of others style Pearson correlation coefficients Variable My perception of other & self-intention My perception of other & other-intention Flirting .73 .15 Friendly .77 .05 Awkward .58 .07 Assertive .58 .09 “It’s not you, it’s me” My perception of whether my date is flirting Is the same as my perception of whether I am flirting Why? Speakers aren’t very good at capturing intentions of others in 4 minutes Speakers instead base judgments on their own behavior/intentions What about the features? How much do autoencoders help? SVM +autoencoder Male Intention 66% 72% Female Intention 72% 76% Male Perception 77% 80% Female Perception 60% 68% Likely (positive or negative) words from one of the 30 autoencoder features More likely to flirt: S_phone O_phone S_party S_girl O_girl S_dating S_hate S_weird S_dating O_party Less likely to flirt: O_academia S_academia S_interview S_teacher O_phd O_advisor O_lab S_research S_management O_management Intention Regression weights -Men a Intention regression weights women Gender differences in flirt intention Both genders when flirting: use words related to negative emotion especially men Women when flirting: use words related to love or sex use appreciations laugh, and use I Men when flirting: raise their pitch floor are more fluent What are these“negative emotion” words we use when flirting? M: “Oh wow, that’s terrible” M: “That is awful” M: “Wow, are you serious?” M: “Yeah, like, I hated it too” F: That’s crazy. M: It’s like kind of weird Sympathy! What are these“love/sex” words women use when flirting? love, loved, loves, passion, passionate Well, I love to cook. I really love San Francisco. Oh, I love that show …my passion is teaching. …cooking is my passion. Um, right now I’m passionate about getting through my first year of my PhD program. Strong positive affect toward hobbies or interests! Missing the cues!! Men think women are flirting when women: use love/sex words, tell stories have higher pitch max, vary their loudness. But women who are flirting actually: use love/sex words [men get this right] use more I laugh more use more appreciations Missing the cues!! Women think men are flirting when: men ask questions men speak faster. But men who are flirting actually: raise their pitch floor are sympathetic are more fluent What about friendliness, awkwardness, etc? Detecting awkward and friendly speakers Using what I do & what my date does to predict what my date calls me Simpler (logistic regression) classifier Awkward Friendly M F M F 51 72 68 64 73 75 Using speaker 63% words/speech + partner 64 words/speech What makes someone seem friendly? “Collaborative conversational style” Related to the “collaborative floor” of Edelsky (1981), Coates (1996) Collaborative completions (Lerner 1991, 1996) M: And I’m wearing a green shirt. F: And blue pants. Clarifications F: I'm working at Pottery Barn this summer. M: I'm sorry, who? Other questions You Laughter Plus perhaps Appreciations (for women) Overlaps (for men) What makes a man seem awkward? More disfluent Increased uh/um and restarts Not collaborative conversationalists (no appreciations, repair questions, collab completions, you) Take fewer turns Don’t overlap (Prosodically hard to characterize) Work in progress: Can we predict liking? That is, can we predict the binary variable: ‘willing to give this person my email’ Either for a single speaker (baseline 53%=no) Or for a dyad (baseline 81% = no) What you do when you like someone: Preliminary results Men when they like their date use more appreciations (“Great!”, “Wow!”, “That’s cool”) Women when they like their date vary their pitch and loudness more, raise their max pitch laugh tell stories Who do you say yes to? Preliminary results Men say yes to women who: show interest by asking clarification questions (“excuse me?”) use “love” and “passion” talk about food Women say yes to men who: don’t use appreciations talk about food tell stories laugh Current work: Accommodation In general, speakers change their behavior to match (or not match) their interlocutor Natale 1975, Giles, Mulac, Bradac, & Johnson 1987, Bilous & Krauss 1988, Giles, Coupland, and Coupland, 1991, Giles and Coupland 1992, Niederhoffer and Pennebaker 2002, Pardo 2006, Nenkova and Hirschberg 2008, inter alia. Matching rate of speech Matching F0 Matching intensity (loudness) Matching vocabulary and grammar Matching dialect Our question: Do we see more accommodation when people like each other? Future: New variables! “How would you rate the other person on each of the following attributes? (1=not at all, 10=very much)” Attractive Sincere Intelligent Funny Ambitious Courteous Conclusions – for daters Talking about your advisor is a bad idea on a date Sympathy is a good idea, if you’re a guy Passion is good, if you’re a woman Food is good, if you eat Conclusions – for psychology Humans project their internal state on others Men and women (at least in 4 minutes) seem to focus on the wrong verbal cues to flirtation Conclusions – for computer science We can do automatic extraction of rich social variables from speech and text. For at least this variable (“does speaker intend to flirt”) we beat human performance Work in progress: Flirting for fun and for real “Flirting but not interested” -> “For Fun Flirting” “Flirting and interested” -> “For Real Flirting” For fun flirters Men: raise min pitch Men: use more “we” Women: laugh For real flirters Men + Women: “love”, “passionate”, “sexy” Women: eating words Men: use less “we” and less hedges (“I think”) I think: softener, but also characteristic of formal situations and middle class speech Work in progress: laughter and irony more on hedges http://blog.okcupid.com/index.php/online-dating- advice-exactly-what-to-say-in-a-first-message/ Part II: Personality Personality and Cultural Values Personality refers to the structures and propensities inside a person that explain his or her characteristic patterns of thought, emotion, and behavior. Personality captures what people are like. Traits are defined as recurring regularities or trends in people’s responses to their environment. Cultural values, defined as shared beliefs about desirable end states or modes of conduct in a given culture, influence the expression of a person’s traits. McGraw-Hill/Irwin Chapter 9 The Big Five Dimensions of Personality Extraversion vs. Introversion (sociable, assertive, playful vs. aloof, reserved, shy) Emotional stability vs. Neuroticism (calm, unemotional vs. insecure, anxious) Agreeableness vs. Disagreeable (friendly, cooperative vs. antagonistic, faultfinding) Conscientiousness vs. Unconscientious (self-disciplined, organised vs. inefficient, careless) Openness to experience (intellectual, insightful vs. shallow, unimaginative) 73 Aside: Do Animals Have Personalities? Gosling (1998) studied spotted hyenas. He: had human observers use personality scales to rate the different hyenas in the group did a factor analysis on these findings found five dimensions three closely resembled the Big Five traits of neuroticism, openness to experience, and agreeableness Slide from Randall E. Osborne 74 BFI – Big Five Inventory –John et al. http://www.outofservice.com/bigfive/ The Big Five Personality Traits Conscientiousness - dependable, organized, reliable, ambitious, hardworking, and persevering. McGraw-Hill/Irwin Chapter 9 The Big Five Personality Traits, Cont’d Agreeableness - warm, kind, cooperative, sympathetic, helpful, and courteous. Prioritize communion striving, which reflects a strong desire to obtain acceptance in personal relationships as a means of expressing personality. Agreeable people focus on “getting along,” not necessarily “getting ahead.” McGraw-Hill/Irwin Chapter 9 The Big Five Personality Traits, Cont’d Extraversion - talkative, sociable, passionate, assertive, bold, and dominant. Easiest to judge in zero acquaintance situations — situations in which two people have only just met. Prioritize status striving, which reflects a strong desire to obtain power and influence within a social structure as a means of expressing personality. Tend to be high in what’s called positive affectivity — a dispositional tendency to experience pleasant, engaging moods such as enthusiasm, excitement, and elation. McGraw-Hill/Irwin Chapter 9 The Big Five Personality Traits, Cont’d Neuroticism - nervous, moody, emotional, insecure, and jealous. Synonymous with negative affectivity —a dispositional tendency to experience unpleasant moods such as hostility, nervousness, and annoyance. Associated with a differential exposure to stressors, meaning that neurotic people are more likely to appraise day-to-day situations as stressful. Associated with a differential reactivity to stressors, meaning that neurotic people are less likely to believe they can cope with the stressors that they experience. McGraw-Hill/Irwin Chapter 9 The Big Five Personality Traits, Cont’d Neuroticism, continued Neuroticism is also strongly related to locus of control, which reflects whether people attribute the causes of events to themselves or to the external environment. Tend to hold an external locus of control, meaning that they often believe that the events that occur around them are driven by luck, chance, or fate. Less neurotic people tend to hold an internal locus of control, meaning that they believe that their own behavior dictates events. McGraw-Hill/Irwin Chapter 9 External and Internal Locus of Control McGraw-Hill/Irwin Chapter 9 The Big Five Personality Traits, Cont’d Openness to experience - curious, imaginative, creative, complex, refined, and sophisticated. Also called “Inquisitiveness” or “Intellectualness” or even “Culture.” Openness to experience is also more likely to be valuable in jobs that require high levels of creativity, defined as the capacity to generate novel and useful ideas and solutions. Highly open individuals are more likely to migrate into artistic and scientific fields. McGraw-Hill/Irwin Chapter 9 Changes in Big Five Dimensions Over the Life Span McGraw-Hill/Irwin Chapter 9 Personality demo Demo: http://mi.eng.cam.ac.uk/~farm2/personality/demo.html: find your personality type 11/7/2015 Relationship between Dating and Personality studies Observed versus self-reports Agreeableness (in Mairesse et al) and Friendliness (in Jurafsky et al): as Pickiness in Dating Finkel and Eastwick 2009, Psych Science Men are less selective than women in speed dating Novel explanation: act of physically approaching a partner increases attraction to that partner traditional events, always men rotates Ran 15 speed dating events in 8, men rotated: men more selective in 7, women rotated: men equally selective to women Conclusion?