Transcript Slide 1

Computational
Models of
Discourse Analysis
Carolyn Penstein Rosé
Language Technologies Institute/
Human-Computer Interaction Institute
Warm-Up
Look at the data and find places where you
disagree with the tags.
 Do the AMI tags “feel” like illocutionary acts
to you? Why or why not?
 How would you change the definitions of
the codes to seem more like illocutionary
acts?

Announcement

Please fill in the early course evaluation
 http://www.surveymonkey.com/s/T3NCXWG
Chicken and Egg…
Operationalization
Computationalization
Reminder…
AMI Annotation Scheme
Student Comment

That being said, while looking through the
conversations for this week I was struck by
the fact that this scheme doesn't really
cover anything related to roles or
identities...or really context. The whole idea
of 'figured worlds' and various discourses
going on doesn't really exist. Granted, a
similar thing could be said for speech acts
and I'm having difficulty finding the words to
distinguish these "dialogue acts" from speech
acts. They seem rather similar to me.
Student Comment

I could not find a lot of evidence to suggest
that a dialog act is synonymous with a
speech act, but I can imagine that this
system makes it a lot easier to find speech
acts because the scope of data is now
reduced to a sentence or long
paragraph. For our purposes the resulting
data would be useful to apply a further
analysis to split the dialog acts into
speech acts.
What is the goal of the DA
coding scheme?
Provide information on
the structure of the
conversation
 Capture speaker
attitudes and
intentions
 Capture speaker roles
 Capture level of
involvement

* If you just saw this
description without
having seen the
coding manual or
data, what would you
expect about the
relationship between
their coding and what
we have discussed
from Gee, Martin &
Rose, and Levinson?
Are DA tags really like
illocutionary acts?
* What’s the
real distinction
here?
Feature Extraction from Speech
* Not used for
DA
recognition!
Factored Language Models

Language models predict the probability of
sequences of words: P(w1,w2,w3,…wn)

A factorized language model computes this joint
probability by multiplying the conditional probability
for each word: P(wn|w1,w2,…wn-1)
Interpolated Factored Language Models

Rather than words a tokens, we can use feature
bundles
 previous
word, DA, position in sequence (i.e., which
block of 5 words)
 Replace word level conditional probabilities with
P(wn|wn-1,position,DA)

Interpolation allows us to simplify the model by
dropping some of the complexity, thus making the
models less sparse
 i.e.,
each probability is computed based on more data,
and is thus less idiosyncratic
 Drop one feature in the bundle at a time in the order
listed
Student Question

I can't quite tell from the paper, but it
seems that they relied mostly on relatively
plain word-level data - it isn't clear where or
if the prosodic features were incorporated.

Not used for the DA recognition task as all
(see p. 9)
Student Comment

As noted by the coding manual, the labels
assigned to a given utterance are very
dependent on the content of the utterances
before them, and not just on the preceding
label.

Was this information ever used?...
What does word position give us?
Is it really just standing in for length?
Model Structure
•Note that although this model jointly
predicts segmentation and DAs,
once the segmentation is done, the
DA assignment is done again by a
CRF classifier (exact results for DA
classification not given)
•Note that links that predict
sequences of words, segmentation,
and how all this is influenced by
prosodic information can be
enhanced with data from other
corpora, but if those corpora aren’t
annotated with the same DAs, then
that data won’t help for predicting
either sequences of DAs or
association between words and
DAs.
Results
* Uni+POSbi+ SVM gives 49.1% error
* What do we conclude from these results?
What about ordering
information? (computed on one dialogue only)

Inform is the most common class
(37.4%)


With bigrams, if we look for conditional
probabilities above 25%




Next most frequent is Assess (18.5%)
The only case where the most likely
next class is not Inform is ElicitAssessment, which is followed by
Assessment 36% of the time
It is followed by Inform 33% of the time
It only occurs about 1% of the time
Trigrams might be better, but this
makes ordering information look pretty
useless
Student Comment

Then they describe a whole bunch of things
I don't understand and, after readings the
summary and conclusion sections, I'm not
even sure if these results are good
compared to other work in this field,
though they say their method improved
recognition accuracy.
Student Comment

One of the things I found most confusing was the
evaluation section. I maybe skimmed through too quickly,
but I couldn't figure out:
a) what kind of agreement do humans get in coding
b) how well is the system performing compared to
something which breaks up into acts randomly and
assigns the a random category using one of the frequent
categories?
c) how well is the system doing compared to humans
d) whether it was possible to ascertain which features
were the most discriminating. And how various
features were balanced (e.g does it make sense that
probability according to the DA sequence model
should be given the same weight as probability
according to energy?)
Assignment 2 (not due til Feb23)




Look at the Maptask dataset and Negotiation
coding that is provided
Think about what distinguishes the codes at a
linguistic level
Do an error analysis on the dataset using a simple
unigram baseline, and from that propose one or a
few new types of features motivated by your
linguistic understanding of the Negotiation
framework
Due on Week 7 lecture 2


Turn in data your feature extractors (documented code)
and a formal write up of your experimentation
Have a 5 minute powerpoint presentation ready for
class on Week 7 lecture 2
Questions?