Transcript Slide 1

Computational
Models of
Discourse Analysis
Carolyn Penstein Rosé
Language Technologies Institute/
Human-Computer Interaction Institute
Warm-Up Discussion

What is the
distinction between
personality,
identity, and
perspective?



Identity
Does the distinction
matter
computationally
How do they
related to one
another as lenses
for understanding
social media data?
What do we take
from today’s
readings for
assignment 4?
Personality
Perspective
Student Comment

At first the paper did not seem related to
our task of identifying gender but perhaps
this paper shows that the way we see
ourselves is extremely consistent. No
matter how you ask the question a subject
will always give you an honest answer as to
how they see themselves. This could mean
that no matter how hard we try we will
sooner or later embed signals into our blog
posts that indicate our perceived gender.
Student Comment

It seems that the importance of "spiritual self" in
presentation is the most important takeaway from
this paper. 96% of users attempt to describe
themselves with aspects of their "spiritual self"
(i.e., perceived abilities). So focusing on these
instead of the material or the social might be
better (although, it's possible that a particular
gender uses one of these sub-types significantly
more than another, which could also be handy,
but we don't have that information).

Is this personality or identity? How would you
expect it to relate to other online behavior?
Semester Review
Semester in Review

Unit 1: Theoretical
Foundation

Unit 2: Linguistic Structure

Unit 3: Sentiment

Unit 4: Identity and
Personality

Unit 5: Social Positioning

In each Unit:

Readings from
Discourse Analysis and
Sociolinguistics

Readings from
Language Technologies

Hands-on assignment



Implementation and
corpus based
experiment
Competitive error
analysis
Student Presentations
Building Tasks

According to Gee’s theory, whenever we speak
or write, we are constructing 7 areas of reality

What we build: Significance, Practices,
Identities, Relationships, Politics, Connections,
Sign systems and knowledge
How we build them: Social languages, Socially
situated identities, Discourses, Conversations,
Figured worlds, intertextuality

What we Build







Significance: things and people made more or less significant through
the text
Practices: ritualized activities and how are they being enacted
through the text (for example, lecturing or mentoring)
Identities: manner in which things and people are being cast in a role
through the text
Relationships: style of social relationship, like level of formality
Politics: how “social goods” are being distributed, who is responsible
for the flow, where is it going
Connections: connections and disconnections between things and
people, e.g., what ideas are related, how are things causally
connected, what is affecting what?
Sign Systems and Knowledge: languages, social languages, and
ways of knowing, what ways of communicating and knowing are
treated as standard and acceptable in the context, e.g., that you’re
expected to speak in English in class
Form-Function Correspondence
Range of meanings for the word “sustainability”
Discourse
Environmentalism
Socially Situated Identity
Environmentalist
Social Language
Liberal rhetoric
Figured World
Expected structure of
Conservationist Commercial
Situated Meaning
Meaning of “sustainability” in the commercial
Imagine an environmentalist
commercial
Conversation
Global Warming
Discourse
StatusQuo
Computationalizing Gee?

Challenge: not variationist
Form-function correspondences can be
modeled naturally through rules
 Cells of table like feature extractors?
 Social Languages like topic models?
 Figured worlds related to “social causality”

Metafunctions
What is a system?
Computationalizing SFL?
See Elijah’s ACL paper!
 We had to REALLY simplify to get there
 Not clear how to do that for Heteroglossia
yet

Computational Techniques






Text entailment/ similarity measures/ paraphrase/
constraint relaxation
Topic models
Machine Learning
Techniques: bootstrapping, HMMs, other
statistical modeling techniques
Basic features: unigrams, bigrams, POS bigrams,
acoustic and prosodic features (speech)
Created features: dictionaries, templates,
syntactic dependency relations
Basic Aspects of Discourse
Structure are Easiest to Model





Turn taking
Topic segments
Speech acts (at least direct ones)
More recent computational work focuses on more
challenging “discoursey” problems like sentiment
and stance
Some recent work on metaphors (related to
frames), but not applied to discourse level
problems
Problems

Labels in public datasets don’t necessarily match the theory


Computational approaches embody variationist assumptions, but much of
the theory is grounded in a more contextualized view of meaning making
Lack of a fully satisfying operationalization of style (style is hard to
separate from content)

Grammatical metaphor and other indirect strategies
 Same effect can be achieved in so many ways – each technique only
captures one slice – so you’re always just grasping a glimpse of what’s
there

Overfitting spurious correlations
“subpopulations” leading to problems with generalization
 Similar variation arising due to numerous different factors (gender, age,
SES)
 Features at too low level – words serving multiple purposes
simultaneously

Engagement and Personality?
How would you expect an Engagement style
analysis to relate to personality?
•What effect would
you expect to see
on conversations?
•Are these
necessarily
connected?
Freshman Engineering Study

131 Freshman engineering students
worked in groups of 3 or 4 to design
a better wrench
 Applying
principles related to stress
and leverage

Procedure
 Tutorial
on computer aided engineering
 Pretest
 Collaborative
design activity
 Posttest
 Questionnaire
Heteroglossia Manipulation
Social Manipulation
Tutor Agent Design
Tutor
ConcertChat
Server
ConcertChatActor
S95
ConcertChatListener
Tutor
ProgressDetector
MessageFilter
PresenceFilter
Tutor
AnnotationFilter
IntroductionsManag
er
IntroductionsActor
SocialControlle
r
PlanExecutor
RequestDetecto
r
PromptingManage TutoringManag
r
er
PromptingActor
S89
ActivityDetector
TutoringActor
Kumar, R. & Rosé, C. P. (2011). Architecture for building
Conversational Agents that support Collaborative Learning, IEEE
Transactions on Learning Technologies special issue on Intelligent
and Innovative Support Systems for Computer Supported
Collaborative Learning
T.TakingCoordinator
OutputCoordinator
DiscourseMemory
One last thing on this topic, Does
more (or less) stress in a wrench make
it easier to use?
no change?
You are correct about that.. Stress
doesn't determine ease of use.
It's the moment achieved by the
wrench that determines the ease of
use.
yay!
Tutor
It's good to have your contributions
in the discussion Jackie :-)
Tutor
Go team :-)
S89
Go team yay
Tutor
I am happy to be working with our
team
S89
Me too
S95
whoa the bot knows my name
Tutor
Tutor
Unfortunately maximum stress
(12800) in our design1 is way above
the maximum allowed stress (i.e.
8750)
This wrench cannot be safely used!
Results on Breadth of Coverage of
Design Space

Significant main effect of
Heteroglossia on number of
ideas mentioned
 Heteroglossia
was better
than Monoglossia and
Neutral

Significant interaction
 In
the Social condition,
Monoglossia was worse than
the other two
Results on Perception





Students were significantly happier with the interaction in
the Heteroglossia condition than Neutral, with
Monoglossia in the middle
Students liked the Heteroglossic and Monoglossic agents
better than the Neutral agent
Students in the Heteroglossia condition felt marginally
more successful than students in the Monoglossia
condition
No effect on Personality indicators such as Pushy, Wishy
Washy, etc.
Does that mean that impression of personality and how
you feel about an interaction with someone are not
linked?
Student Comment

I would also note that English is a very
gender neutral language, so gender
performativity is harder to classify.
Engagement

Already established: Positioning a
proposition
 But
can it also be primarily positioning between
people?
 Patterns of positioning propositions as having
the same or different alignment between
speaker and hearer could do this

Is positioning in communication always
positioning by means of propositional
content?
Connection between
Heteroglossia and Attitude
But is this
really
different from
a disclaim?
And is this
really
different from
a proclaim?
Hedging and Occupation?

And as such, I believe hedging is a much
more effective tool in showing generational
or occupational differences rather than
gender differences.
 For
example, teenagers often use verbs such
as 'like' and 'all' to report speech: he was all
'that's stupid' and then he was like ''but I'm
stupid too'. The occupational differences I
would attribute to the differences between
people who need exact values as opposed to
people who can accept generalizations or
approximations.
Questions?