A Question of Questions: Prosodic Julia Hirschberg (Joint work with)

Download Report

Transcript A Question of Questions: Prosodic Julia Hirschberg (Joint work with)

A Question of Questions: Prosodic
Cues to Question Form and Function
Julia Hirschberg
(Joint work with)
Jennifer Venditti and Jackson Liscombe
Questioning in Dialogue
• A fundamental activity in conversation
• Elicit information
• Elicit action
• But
• How to define a question?
• Bolinger ’57: “fundamentally an attitude…an utterance
that ‘craves’ a verbal or other semiotic … response”
• Ginzburg & Sag ‘00: “the semantic object associated with
the attitude of wondering and the speech act of
questioning”
• How to identify a question as such
• How to represent its semantics? The intention of the
questioner?
Distinguishing Question Form and
Function
• Questions may take many syntactic forms
• Is it a question? What is a question? It’s a question, isn’t it?
Is it a question or an answer? Right? It’s a question?
• Questions may serve many pragmatic functions
• Clarification-seeking? Information-seeking? Confirmationseeking?
• Possible Indicators
• Syntactic cues
• Context
• Intonation
Questions in Spoken Dialogue Systems
• Goals
• Examine question form and function
• How are they related?
• What features characterize them?
• Identify form and function automatically in an
Intelligent Tutoring domain
Previous Studies
• Integration of prosodic tree model with language
model based on words yields best performance
accuracy in detecting questions/question form
(Shriberg et al.’98: English)
• Some corpus-based (MapTask) studies have
examined tune/accent types wrt. question function
(Kowtko’96: Glaswegian English; Grice et al.’95:
German, Italian, Bulgarian)
• Studies of different types (functions) of clarification
questions (Rodríguez & Schlangen’94: German;
Edlund et al.’95: Swedish)
• Our goal: a comprehensive quantitative analysis of
question form and function in English which will
permit question form/function identification
Domain: Intelligent Tutoring Systems
• ITSs must be able to recognize both the form
and function of student questions
• Students ask human tutors many questions
• More questions  better learning
• Different question FORMs seek different
information
• e.g. polar questions seek yes-no answer
• wh-questions seek different information
• Different question FUNCTIONs also often
require different types of answers
• Wh-questions, e.g.
• Information-seeking:
(S has just submitted an essay to the tutor)
S: Ok, what do you think about that?
T: Uh, well that uh you have uh there are too many
parameters here which uh need definition ...
• Clarification-seeking:
• T: So if there is if the only force on an object in
earth’s gravity then what is its motion called?
• S: What was the motion called?
• T: Yes, what’s the name for this motion?
•
Yes-no questions, e.g.
•
•
•
Information-seeking  tutor provides additional
information
Clarification  clarification subdialogue
Successful ITSs must be able to recognize
the presence of a question in a student turn
and its form and function
Question Corpus
• Human-human tutoring dialogs collected by Litman et
al.’04 for development of ITSpoke, a speech-enabled
ITS designed to teach physics
• Why2-Atlas (Kurt VanLehn (U. Pitt), Art Graesser (U.
Memphis))
• Corpus includes 1030 student questions
• ‘Question’ defined a la Bolinger ‘57 as “an utterance
that craves a response”
• 25.2 Qs/hour
• 13.3% of total student speaking time
• This study: a subset of 643 tokens
[pr01_sess00_prob58]
Question Detection
what symbol are you talking about
do i have to rewrite this again
am i ok with that
so it’d be one meter per second squared
Coding question type
• Form coding based on surface syntax
•
•
•
•
•
•
Declarative question (dQ): It’s a vector? A vector?
Yes-no question (ynQ): Is it a vector?
Wh-question (whQ): What is a vector?
Tag question (ynTAG): It’s a vector, isn’t it?
Alternative question (altQ): Is it a vector or a scalar?
Particle (part): Huh?
• Function coding derived from Stenström ‘84
•
•
•
•
Confirmation-seeking check question (chk)
Clarification-seeking question (clar)
Information-seeking question (info)
Other (oth)
Form/Function Distribution
dQ
ynQ
whQ
ynTAG
altQ
part
N
(%)
chk
257
53
41
6
357
(55.5)
clar
81
80
47
5
5
8
226
(35.1)
info
2
27
21
1
51
(7.9)
oth
4
5
9
(1.4)
N (%)
344 (53.5)
165 (25.7)
68 (10.6)
46 (7.2)
12 (1.9)
8 (1.2)
643
(100)
Falling (L-L%) F0 contours
dQ
ynQ
whQ
ynTAG
altQ
part
N
(%)
chk
3
1
2
6
(1.7)
clar
4
4
12
1
5
26
(11.5)
info
5
17
1
23
(45.1)
oth
2
2
(22.2)
N (%)
7 (2.0)
11 (6.7)
29 (42.6)
2 (4.3)
8 (66.7)
57
(100)
F0 measures of non-falling questions
• Quantitative analysis of F0 height in the 573
non-falling tokens w/sufficient data for
analysis
• Examined question nucleus (nucF0) and tail
(btF0) only
• Speaker-normalized (z-score) F0 of:
• 1. nuclear accent (nucF0)
• 2. rightmost edge of question (btF0)
• 3. difference between 1 & 2 (riserange)
Question Form and F0
• DeclQs and YNQs both thought to rise (H*HH% vs. L*H-H%?): Are there F0 height
differences between them?
• 2-way ANOVA on form x function:
FORM: nucF0: F(5)=19.34, p=0
btF0: F(5)10.71, p=0
riserange: F(5)=3.6, p<.01
• Planned comparisons (Tukey, alpha=.01) show no
difference between declarative Qs and yes-no Qs
• Main effect of form caused by yes-no tags (low
F0) and particles (high F0)
Normalized means at nucF0 and btF0
boundary
nuclear accent
2.5
normalized F0
2.5
2.5
2
2
2
1.5
1.5
1.5
1
boundary
1
1
0.5
0.5
0.5
0
0
0
-0.5
-0.5
-1
chk
ynQ
-0.5
dQ
-1
clar
-1
ynTAG
ynQ
info
whQ
part
dQ
chk
ynQ
ynTAG
clar
dQ
whQ
ynTAG
part
info
whQ
part
Question Function and F0
• Question dialog acts thought to correlate with
F0: Does question FUNCTION affect F0?
• 2-way ANOVA on form x function:
FUNCTION:
nucF0: F(3)=16.6, p=0
btF0: F(3)=8.56, p<.001
riserange: F(3)=3.94, p<.01
• Main effect; planned comparisons show:
• clarQ > chkQ
(nucF0 & btF0)
• infoQ > clarQ/chkQ
(nucF0)
• No interactions for any measure
Clarification types and F0
Clark ‘96 levels of coordination: sources of communication problems
1
Channel: Problem hearing if the tutor actually said something or not
(Huh?, Hm?)
2
Perception: Problem hearing what the tutor said (‘G’ as in God?, Did you
say a word or a letter?, including reprise/echo questions (A what?)
3
Understanding: Problem with reference resolution (This up here?, What
did I imply or what does the statement imply?), or with general
understanding (Is that the same thing or is that different?, What do you
mean?)
4
Intention: Problem determining what the tutor intended by his utterance
(You want an exact number?, Uh are you asking me another characteristic
of freefall?)
+
Non-interlocutor-related (NIR): Problem understanding the task (Am I
supposed to speak this or type it?), or clarification of the examination
question (Should I assume both vehicles are going at the same speed?)
Effects of Clarification Type
• One-way ANOVA combining levels 1&2 into
single acoustic/perceptual category:
nucF0: F(3)=5.41, p=.001
btF0: F(3)=6.6, p<.001
riserange: F(3)=2.59, p=.05
• Main effect for clarification type
• Ranking for each measure:
higher F0 > > > > > > > > > > > > > > > lower F0
acoust/percept > understanding > NIR > intention
• Planned comparisons (Tukey, alpha=.01)
show only significant comparison was
acoust/percep > intention
Can Prosody Distinguish Question Form?
Question Function?
• Only a few question forms prosodically
distinct in our study – lexico/syntactic
information can help
• Question function more successfully
differentiated prosodically – where there is
less reliable lexico/syntactic information
• Can we use prosodic information with lexicosyntactic information to help identify question
form and function automatically?
Detecting Student Questions
• Syntax
• Wh-words, subject/auxiliary inversion
• Prosody
• Phrase-final rising intonation (Pierrehumbert &
Hirschberg ‘90)
• Duration and pausing (Shriberg et al. ‘98)
• Lexico-pragmatics
• personal pronouns, utterance-initial pronouns
(Geluykens 1987; Beun 1990)
Corpus
•
•
•
•
•
•
•
141 ITSpoke dialogues
5 hours of student speech
Student turns average 2.5 seconds
1,030 questions
25 questions per hour
70% of turns consist entirely of the question
89% of questions are turn-final
Question Form Distribution in ITSpoke
Form
Example
Distr.
yes/no
Is that right?
24%
wh-
What do you mean?
10%
yes/no tag
It will stay the same, right?
7%
alternative
Force or something?
3%
particle
Huh?
2%
declarative
The weight?
54%
Question-Bearing Turns
• Contain one or more questions
• N = 918
Features Extracted
• Prosodic
•
•
•
•
•
pitch
loudness
pausing
speaking rate
calculated over entire turn and last 200 ms
• Syntactic
• unigram and bigram part-of-speech tags
Feature Extraction
• Lexical
• unigram and bigram hand-labeled transcriptions
• Student and task dependent
•
•
•
•
pre-test score
gender
correctness
previous tutor dialogue act
Machine Learning Experiments
•
•
•
•
Question-bearing vs. non-question-bearing
Down-sampled to 50/50 distribution
Experimented by feature type
Adaboosted C4.5 decision trees
• 5-fold cross validation
• Best results with all features
• Accuracy = 79.7%
• Precision = Recall = F-measure = 0.8
Accuracy by Feature Type
prosody: pausing and speaking rate
student and task dependent
52.6%
56.1%
prosody: loudness
syntactic
lexical
prosody: last 200 ms
61.8%
65.3%
67.2%
70.3%
prosody: pitch
prosody: all
72.6%
74.5%
Feature Type Discussion
• Which features most informative?
• pitch slope of last 200 ms and entire turn
• maximum and mean pitch of turn
• Which features most often used in learning?
•
•
•
•
pre-test score
slope of last 200 ms
maximum pitch of entire turn
cumulative pause duration
Other Observations
• Syntactic features were informative
• personal pronoun + verb, wh-pronoun, interjection
• Lexical features were informative
• yes, right, what, I, you
Conclusions
• Most questions in our tutoring corpus are
declarative in form
• More than syntax is needed to identify these as
questions
• Prosodic features are very important
• Detecting question-bearing turns is possible
• Detecting question function is needed
Question Forms in ITSpoke
Form
Distr.
Example
declarative
54%
The weight?
yes/no
24%
Is that right?
wh-
10%
What do you mean?
yes/no tag
7%
It will stay the same, right?
alternative
3%
Force or something?
particle
2%
Huh?