A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces Dan Bohus www.cs.cmu.edu/~dbohus [email protected] Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217
Download ReportTranscript A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces Dan Bohus www.cs.cmu.edu/~dbohus [email protected] Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217
A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces
Dan Bohus
www.cs.cmu.edu/~dbohus [email protected]
Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217
2
problem
spoken language interfaces lack robustness when faced with understanding errors.
3
more concretely …
S: What city are you leaving from?
U: Urbana Champaign
[OKAY IN THAT SAME PAY]
S: Sorry I’m not sure I understood what you said. What city are you leaving from ?
U: Urbana Champaign
[ FOR MINUTE SINCE HEY]
S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from?
U: Chicago
[CHICAGO]
S: traveling from Chicago. Where would you like to go?
U: Huntsville
[SEOUL]
S: traveling to Seoul. What day did you need to travel?
U: no no I’m traveling to Birmingham
[THE TRAVELING TO BERLIN P_M]
S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august
[AT THE TENTH OF AUGUST]
S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that?
U: my destination is Birmingham
[ FLIGHT DESTINATION MR WEEKEND]
S: traveling on Saturday, August 12 1:40pm arrives Seoul at ……… th … I have a flight departing Chicago at
4
problem source
stems mostly from speech recognition spans most domains and interaction types exacerbated by operating conditions spontaneous speech medium / large vocabularies large, varied, and changing user populations
5
speech recognition impact
typical word-error-rates 10-20% for natives (novice users) 40% and above for non-native users significant negative impact on performance [Walker, Sanders]
task success word-error-rate
6
approaches for increasing robustness
fix recognition gracefully handle errors through interaction
detect
the problems develop a set of
recovery strategies
know how to choose between them (
policy
)
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
outline
a closer look at the problem RL in spoken dialog systems current challenges a proposed RL approach for error handling 7
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
non-
and
mis
understandings
8
NON understanding MIS understanding
S: What city are you leaving from?
U: Urbana Champaign
[OKAY IN THAT SAME PAY]
S: Sorry I’m not sure I understood what you said. What city are you leaving from ?
U: Urbana Champaign
[ FOR MINUTE SINCE HEY]
S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from?
U: Chicago
[CHICAGO]
S: traveling from Chicago. Where would you like to go?
U: Huntsville
[SEOUL]
S: traveling to Seoul. What day did you need to travel?
U: no no I’m traveling to Birmingham
[THE TRAVELING TO BERLIN P_M]
S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august
[AT THE TENTH OF AUGUST]
S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that?
U: my destination is Birmingham
[ FLIGHT DESTINATION MR WEEKEND]
S: traveling on Saturday, August 12 arrives Seoul at ……… th … I have a flight departing Chicago at 1:40pm
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
9
six
not-so-
easy pieces
detection misunderstandings
recognition or semantic confidence scores
non-understandings
typically trivial
[some exceptions may apply] strategies
explicit confirmation
Did you say 10am?
implicit confirmation
Starting at 10am… until what time?
accept, reject
policy
confidence threshold model
explicit implicit 0 1 reject accept Sorry, I didn’t catch that … Can you repeat that?
Can you rephrase that?
You can say something like “at 10 a.m.” [MoveOn]
Handcrafted heuristics
first notify, then ask repeat, then give help, then give up
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
outline
a closer look at the problem RL in spoken dialog systems current challenges a proposed RL approach for error handling 10
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
spoken dialog system architecture
Speech Recognition Language Understanding Dialog Manager
Domain Back-end
Speech Synthesis Language Generation
11
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
12
reinforcement learning in dialog systems
debate over design choices
Speech Recognition Language Understanding
learn choices using reinforcement learning
noisy semantic input Dialog Manager
agent interacting with an environment
Generation
noisy inputs temporal / sequential aspect task success / failure
actions (semantic output)
Domain Back-end
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
13
NJFun
“Optimizing Dialog Management with Reinforcement Learning: Experiments with the NJFun System”
[Singh, Litman, Kearns, Walker] provides information about
“fun things to do in New Jersey”
slot-filling dialog type-of-activity location time provide information from a database
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
NJFun as an MDP
define define define state-space action-space reward structure collect data for training & evaluate learned policy learn policy 14
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
15
NJFun as an MDP: state-space
internal system state: 14 variables state for RL → vector of 7 variables greet: has the system greeted the user attribute: which attribute the system is currently querying confidence: recognition confidence level (binned) value: value has been obtained for current attribute tries: how many times the current attribute was asked grammar: non-restrictive or restrictive grammar was used history: was there any trouble on previous attributes 62 different states
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
NJFun as an MDP: actions & rewards
type of initiative (3 types) system initiative mixed initiative user initiative confirmation strategy (2 types) explicit confirmation no confirmation resulting MDP has only 2 action choices / state 16 reward: binary task success
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
17
NJFun as an MDP: learning a policy
training data: 311 complete dialogs collected using exploratory policy learned the policy using value iteration begin with user initiative back-off to mixed or system initiative when re-asking for an attribute specific type of back-off is different for different attributes confirm when confidence is low
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
18
NJFun as an MDP: evaluation
evaluated policy on 124 testing dialogs task success rate: 52% → 64% weak task completion: 1.72 → 2.18
subjective evaluation: no significant improvements, but move-to-the-mean effect learned policy better than hand-crafted policies comparatively evaluated policies on learned MDP
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
outline
a closer look at the problem RL in spoken dialog systems current challenges a proposed RL approach for error handling 19
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
20
challenge 1:
scalability
contrast NJFun with RoomLine conference room reservation and scheduling mixed-initiative task-oriented interaction system obtains list or rooms matching initial constraints system negotiates with user to identify room that best matches their needs 37 concepts (slots), 25 questions that can be asked another example: LARRI full-blown MDP is intractable not clear how to do state-abstraction
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
21
challenge 2:
reusability
underlying MDP is system-specific MDP design still requires a lot of human expertise new MDP for each system new training & new evaluation are we really saving time & expertise?
maybe we’re asking for too much?
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
22
addressing the
scalability
problem
approach 1:
user models / simulations
costly to obtain real data → simulate simplistic simulators [Eckert, Levin] more complex, task-specific simulators [Scheffler & Young] real-world evaluation becomes paramount approach 2:
value function approximation
data-driven state abstraction / state aggregation [Denecke]
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
outline
a closer look at the problem RL in spoken dialog systems current challenges a proposed RL approach for error handling 23
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
24
reinforcement learning in dialog systems
Speech Recognition Speech Synthesis Language Understanding Language Generation semantic input Dialog Manager actions / semantic output
Domain Back-end Focus RL only on the
difficult
decisions!
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
25
task-decoupled approach
error handling decisions use reinforcement learning decouple domain-specific dialog control decisions use your favorite DM framework advantages reduces the size of the learning problem favors reusability of learned policies lessens system authoring effort
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
26
RavenClaw
Welcome AskRegistered registered Login GreetUser AskName user_name DateTime RoomLine query GetQuery Location Network Dialogue Task (Specification) Domain-Independent Dialogue Engine GetResults results DiscussResults Properties Projector Whiteboard Error Indicators Error Handling Decision Process Strategies
ExplicitConfirm AskRegistered Login RoomLine
Dialogue Stack
registered: [No]-> false, [Yes] -> true registered: [No]-> false, [Yes] -> true user_name: [UserName] registered: [No]-> false, [Yes] -> true user_name: [UserName] query.date_time: [DateTime] query.location: [Location] query.network: [Network]
Expectation Agenda
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
decision process architecture
27 RoomLine Login Topic-MDP No Action Welcome GreetUser Concept-MDP AskRegistered registered AskName user_name Concept-MDP Topic-MDP No Action Explicit Confirm Small-size models Parameters can be tied across models Accommodate dynamic task generation Explicit Confirmation Gating Mechanism No Action Favors reusability of policies Initial policies can be easily handcrafted Independence assumption
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
reward structure & learning
Local rewards Action Gating Mechanism Global, post-gate rewards Reward Action Gating Mechanism Reward Reward Reward 28 MDP MDP MDP Multiple, standard RL problems Risk solving local problems, but not the global one MDP MDP MDP Rewards based on any dialogue performance metric Atypical, multi-agent reinforcement learning setting
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
29
conclusion
reinforcement learning – very appealing approach for dialog control in practical systems, scalability is a big issue how to leverage knowledge we have?
state-space design solutions that account or handle sparse data bounds on policies hierarchical models
30
thankyou!
31
Structure of Individual MDPs
Concept MDPs State-space: belief indicators Action-space: concept scoped system actions ImplConf LC ExplConf NoAct ImplConf MC ExplConf ImplConf HC ExplConf NoAct NoAct NoAct 0 Topic MDPs State-space: non-understanding, dialogue-on-track indicators Action-space: non-understanding actions, topic-level actions