Transcript Slide 1

Computational
Models of
Discourse Analysis
Carolyn Penstein Rosé
Language Technologies Institute/
Human-Computer Interaction Institute
Warm-Up

BTW: Excellent posts! As usual, you’re doing
MUCH better than you seem to think!!
Can you describe the take aways from these graphs
and tables?
CD = likelihood of state X
is conditioned on state Y.
2k(2k – 1) transition
probabilities.
CI = likelihood of person
X’s behavior is
conditioned on state Y.
k(2k) transition
probabilities.
MI = likelihood of person
X’s behavior just depends
on that person’s behavior
at the previous time point.
2k transition probabilities.
Models with subscript k
are person specific.
Models with subscript any
are general across people.
Summary of Results
In Laskowski's model, how does the
system who out of K participants is
speaking next? How does it assign the
speaker? From what I could tell it only
knows someone else is most likely
speaking.
Elijah’s continuation
of SIDE discussion
and discussion about
Assignment 2….
What does it take for Language
technologies students and HCI students
to be able to collaborate?
(1) Pace yourself. Keep your
expectations realistic.
(2) Find your place. You all
have something very
valuable to contribute from
your own field and
expertise.
(3) Respect what others can
contribute from their field.
(4) Have the patience to
listen to each other. Ask
questions. Answer
questions. And approach
issues iteratively.
What usually happens….
* And I would add this is mostly what is happening in the fields we are drawing from….
What I would rather see…
* My goal is just to start you on this path… the big question is where you’ll go from here.
Tips!




Look up words you don’t
know in wikipedia
Look for definitions
embedded in the text
Don’t get hung up on
formalism – look for
where the author gives a
conceptual feel for the
argument
If you’re not a
computational linguist,
then a good goal to
shoot for is just to be
able to join in a
discussion about
whether the knowledge
bring brought to bear
in a model is
reasonable
Tips!



use your knowledge about the rhetorical structure of
a research paper to zen out the message in a topdown way.
Keep it high level! Don’t get bogged down…
For example, start with the conclusion - that will tell
you what the evaluation is trying to show. Then look
at what is being compared : there are several
different models. Now go back to the intro to see
what the hypotheses or questions are -- they must
map on to what is being compared. Now look at the
graphs and tables -- try to match them to the
structure of the evaluation and the conclusions that
are drawn.
Interesting Student/Levison quote
Student: Entirely non-vocal cues like gaze
and posture probably do help, when
available - I'd be curious to compare the
rate of turn-taking collisions between
telephone and in-person conversations
(especially with more than two
participants).
 Levinson: the same system seems to work
equally well both in face-to-face interaction
and in the absence of visual monitoring, as
on the telephone…

What is a turn?




Syntactic units: sentences, clauses, noun
phrases
Prosody: intonation tells us where we are in “the
arc” of communicating an idea
Projectability: we need to be able to identify
places where control over the floor could shift –
doesn’t mean it will shift.
What do you think would happen if the data was
not from meetings per se?
Tricky…

How were backchannels handled in the
Laskowski models?
I was just saying how it doesn't look like
anyone has jumped on the use of "talk
spurts" as opposed to something more
linguistically relevant. I would argue that
the use of a heuristic like this encourages
Lekowski to not look at the data for
linguistic elements if the data was not split
correctly in the first place.
What else comes into play?

Devices for selecting a next speaker:
 Questions
directed at a person
 Address terms
 Gaze
 Gesture
How would Levinson evaluate
the models in today’s paper?
(p299 of Levinson)

A good model should
 Predict
where we find
overlaps?
 Predict which overlaps seem
rude?
 Predict where we find pauses?

So what would he say?
 Does
the perplexity measure
answer any of these
questions?
Nice perspective!!

For question 2, there are a number of ways
that we judge when we should talk. One
way is using physical cues. In a multi party
conversation, the speaker turning to you is
a good indication that you should be the
next person to speak. Similarly, another
indicator could be when someone gestures
towards you.

One thing missing from Laskowski is a
notion that it’s someone’s turn.
Good question!

Beka's musings on chatiquette suggest both oneon-one and group chats as an interesting
environment for further experiments - with only
the written text and the one nonverbal hint that
another speaker is preparing to take the floor
(plus explicit naming of participants by
@username), how would some of these turntaking models fare? Are the turn-taking patterns
of group-chat participants comparable to
speakers in a meeting? How can interruptions be
discusssed when the utterances are delivered in
non-overlapping (but inter-weaving) chunks?
Another good point!

One thing I found odd about Laskowski's
approach is that it's rare to have more than
one speaker going at once (less than 5% of
the time, according to Levinson). How is it
that a representation of who's talking is
useful, especially if precisely who doesn't
matter? Won't every erstwhile non-speaker
be equally likely to take their turn next?

How could you tweak his model to do this?
Student comment


Exactly right to question this!!
For question 1, I wanted to first outline what it meant for something to
be locally managed. In the Levinson reading, he describes a theory
suggested by Sacks, Schegloff, and Jefferson, that turns are
constructed of syntactic units which a speaker can employ until the
end of a unit/transitional relevance place where speakers may
change. Levinson later writes that a locally managed system is
indifferent to the pool of potential next speakers, and is, instead,
concerned with the transitions/relationships between speakers.
Consequently, speaking as somewhat dependent on what other
participants say, if only to fit in with where the last speaker left off and
to resolve overlaps. This fits in with the idea of adjacency pairs that
he discusses in the next section.
After reading the Laskowski paper, I'm not sure how what I've
read fits in with the idea of conversations being locally managed,
though I am leaning towards Laskowski rejecting models that suggest
that dialogue can be analyzed with turns/speakers being analyzed
separately, and therefore supporting the ideas put forth by Levinson.
Interesting Student Idea!

I guess what I'm trying to say is that one
feature of the interpersonal dependence
should be their relative rank (I understand
that this is hard because a younger sister
may be the boss of an older brother in the
workplace, but they are ranked higher in
some cultures at home).

Note: studies show men interrupt women
more than the reverse…
Food for thought…

Of course, then the problem becomes, how
do we define what these roles are? When
do we assign someone to one role or
another? When do we allow people to shift
from one role to another? What happens
when multiple people are acting in the
same role at the same time? All of these
things make that modeling enormously
complex and probably reliant on far more
data than is actually available, especially
annotated. …
In your experience with Machine
Learning, what has a bigger
effect on performance:
the representation of the data or
the algorithm you use?
Try to think of specific
examples…
Nice Idea!

I understand the computational challenges
of storing and computing on exponential
factors but I think they could have picked
the most likely, say 10 future instances of
who was talking for the vocal interaction
record and add the rest on, or hash the
different possibilities.

How might you rework the model? What
would be the states?
What do you conclude from this:
•R-specific:
specific to one
set of speakers
•K-specific:
specific to one
ordering of users
• See other
interesting
comments in the
conclusion –
where you’ll find
a lot of what is
most relevant for
this class.
Questions?
Tips for next time





We will look at a paper about turn taking
When perplexity is high, the model is having a
harder time predicting what is next
For turn taking perplexity, we have a state
representation that specifies at one time point
which participants are talking and which are not
The model takes the current state into account
and measures how surprised it is at the next state
If the next state is surprising given the current
state, the perplexity at that time point is high
Tips for next time
If you compare models based on turn
taking perplexity, the one with lower
perplexity probably has more of the
information needed to account for
transitions between states
 Differences between models:

 Whose
behavior is contingent on whose
behavior
 Which data is used to build the model, and
which data is used to test