A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces Dan Bohus www.cs.cmu.edu/~dbohus [email protected] Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217

Download Report

Transcript A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces Dan Bohus www.cs.cmu.edu/~dbohus [email protected] Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217

A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces

Dan Bohus

www.cs.cmu.edu/~dbohus [email protected]

Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217

2

problem

spoken language interfaces lack robustness when faced with understanding errors.

3

more concretely …

S: What city are you leaving from?

U: Urbana Champaign

[OKAY IN THAT SAME PAY]

S: Sorry I’m not sure I understood what you said. What city are you leaving from ?

U: Urbana Champaign

[ FOR MINUTE SINCE HEY]

S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from?

U: Chicago

[CHICAGO]

S: traveling from Chicago. Where would you like to go?

U: Huntsville

[SEOUL]

S: traveling to Seoul. What day did you need to travel?

U: no no I’m traveling to Birmingham

[THE TRAVELING TO BERLIN P_M]

S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august

[AT THE TENTH OF AUGUST]

S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that?

U: my destination is Birmingham

[ FLIGHT DESTINATION MR WEEKEND]

S: traveling on Saturday, August 12 1:40pm arrives Seoul at ……… th … I have a flight departing Chicago at

4

problem source

   stems mostly from speech recognition spans most domains and interaction types exacerbated by operating conditions    spontaneous speech medium / large vocabularies large, varied, and changing user populations

5

speech recognition impact

 typical word-error-rates   10-20% for natives (novice users) 40% and above for non-native users  significant negative impact on performance [Walker, Sanders]

task success word-error-rate

6

approaches for increasing robustness

 fix recognition  gracefully handle errors through interaction   

detect

the problems develop a set of

recovery strategies

know how to choose between them (

policy

)

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

outline

    a closer look at the problem RL in spoken dialog systems current challenges a proposed RL approach for error handling 7

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

non-

and

mis

understandings

8

NON understanding MIS understanding

S: What city are you leaving from?

U: Urbana Champaign

[OKAY IN THAT SAME PAY]

S: Sorry I’m not sure I understood what you said. What city are you leaving from ?

U: Urbana Champaign

[ FOR MINUTE SINCE HEY]

S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from?

U: Chicago

[CHICAGO]

S: traveling from Chicago. Where would you like to go?

U: Huntsville

[SEOUL]

S: traveling to Seoul. What day did you need to travel?

U: no no I’m traveling to Birmingham

[THE TRAVELING TO BERLIN P_M]

S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august

[AT THE TENTH OF AUGUST]

S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that?

U: my destination is Birmingham

[ FLIGHT DESTINATION MR WEEKEND]

S: traveling on Saturday, August 12 arrives Seoul at ……… th … I have a flight departing Chicago at 1:40pm

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

9

six

not-so-

easy pieces

detection misunderstandings

recognition or semantic confidence scores

non-understandings

typically trivial

[some exceptions may apply] strategies

explicit confirmation

Did you say 10am?

implicit confirmation

Starting at 10am… until what time?

accept, reject

policy

confidence threshold model

explicit implicit 0 1 reject accept Sorry, I didn’t catch that … Can you repeat that?

Can you rephrase that?

You can say something like “at 10 a.m.” [MoveOn]

Handcrafted heuristics

first notify, then ask repeat, then give help, then give up

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

outline

    a closer look at the problem RL in spoken dialog systems current challenges a proposed RL approach for error handling 10

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

spoken dialog system architecture

Speech Recognition Language Understanding Dialog Manager

Domain Back-end

Speech Synthesis Language Generation

11

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

12

reinforcement learning in dialog systems

 debate over design choices 

Speech Recognition Language Understanding

learn choices using reinforcement learning

noisy semantic input Dialog Manager

    agent interacting with an environment

Generation

noisy inputs temporal / sequential aspect task success / failure

actions (semantic output)

Domain Back-end

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

13

NJFun

“Optimizing Dialog Management with Reinforcement Learning: Experiments with the NJFun System”

[Singh, Litman, Kearns, Walker]  provides information about

“fun things to do in New Jersey”

 slot-filling dialog  type-of-activity  location  time  provide information from a database

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

NJFun as an MDP

     define define define state-space action-space reward structure collect data for training & evaluate learned policy learn policy 14

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

15

NJFun as an MDP: state-space

  internal system state: 14 variables state for RL → vector of 7 variables        greet: has the system greeted the user attribute: which attribute the system is currently querying confidence: recognition confidence level (binned) value: value has been obtained for current attribute tries: how many times the current attribute was asked grammar: non-restrictive or restrictive grammar was used history: was there any trouble on previous attributes  62 different states

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

NJFun as an MDP: actions & rewards

 type of initiative (3 types)  system initiative  mixed initiative  user initiative  confirmation strategy (2 types)  explicit confirmation  no confirmation  resulting MDP has only 2 action choices / state 16  reward: binary task success

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

17

NJFun as an MDP: learning a policy

 training data: 311 complete dialogs  collected using exploratory policy  learned the policy using value iteration     begin with user initiative back-off to mixed or system initiative when re-asking for an attribute specific type of back-off is different for different attributes confirm when confidence is low

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

18

NJFun as an MDP: evaluation

 evaluated policy on 124 testing dialogs    task success rate: 52% → 64% weak task completion: 1.72 → 2.18

subjective evaluation: no significant improvements, but move-to-the-mean effect  learned policy better than hand-crafted policies  comparatively evaluated policies on learned MDP

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

outline

    a closer look at the problem RL in spoken dialog systems current challenges a proposed RL approach for error handling 19

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

20

challenge 1:

scalability

 contrast NJFun with RoomLine    conference room reservation and scheduling mixed-initiative task-oriented interaction   system obtains list or rooms matching initial constraints system negotiates with user to identify room that best matches their needs 37 concepts (slots), 25 questions that can be asked  another example: LARRI   full-blown MDP is intractable not clear how to do state-abstraction

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

21

challenge 2:

reusability

 underlying MDP is system-specific  MDP design still requires a lot of human expertise  new MDP for each system  new training & new evaluation  are we really saving time & expertise?

 maybe we’re asking for too much?

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

22

addressing the

scalability

problem

 approach 1:

user models / simulations

 costly to obtain real data → simulate  simplistic simulators [Eckert, Levin]  more complex, task-specific simulators [Scheffler & Young]  real-world evaluation becomes paramount  approach 2:

value function approximation

 data-driven state abstraction / state aggregation [Denecke]

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

outline

    a closer look at the problem RL in spoken dialog systems current challenges a proposed RL approach for error handling 23

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

24

reinforcement learning in dialog systems

Speech Recognition Speech Synthesis Language Understanding Language Generation semantic input Dialog Manager actions / semantic output

Domain Back-end  Focus RL only on the

difficult

decisions!

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

25

task-decoupled approach

  error handling decisions use reinforcement learning  decouple  domain-specific dialog  control decisions use your favorite DM framework  advantages    reduces the size of the learning problem favors reusability of learned policies lessens system authoring effort

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

26

RavenClaw

Welcome AskRegistered registered Login GreetUser AskName user_name DateTime RoomLine query GetQuery Location Network Dialogue Task (Specification) Domain-Independent Dialogue Engine GetResults results DiscussResults Properties Projector Whiteboard Error Indicators Error Handling Decision Process Strategies

ExplicitConfirm AskRegistered Login RoomLine

Dialogue Stack

registered: [No]-> false, [Yes] -> true registered: [No]-> false, [Yes] -> true user_name: [UserName] registered: [No]-> false, [Yes] -> true user_name: [UserName] query.date_time: [DateTime] query.location: [Location] query.network: [Network]

Expectation Agenda

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

decision process architecture

27 RoomLine    Login Topic-MDP No Action Welcome GreetUser Concept-MDP AskRegistered registered AskName user_name Concept-MDP Topic-MDP No Action Explicit Confirm Small-size models Parameters can be tied across models Accommodate dynamic task generation Explicit Confirmation Gating Mechanism No Action   Favors reusability of policies Initial policies can be easily handcrafted  Independence assumption

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

reward structure & learning

Local rewards Action Gating Mechanism Global, post-gate rewards Reward Action Gating Mechanism Reward Reward Reward 28 MDP MDP MDP   Multiple, standard RL problems Risk solving local problems, but not the global one MDP MDP MDP   Rewards based on any dialogue performance metric Atypical, multi-agent reinforcement learning setting

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

29

conclusion

 reinforcement learning – very appealing approach for dialog control   in practical systems, scalability is a big issue  how to leverage knowledge we have?

 state-space design solutions that account or handle sparse data   bounds on policies hierarchical models

30

thankyou!

31

Structure of Individual MDPs

 Concept MDPs  State-space: belief indicators  Action-space: concept scoped system actions ImplConf LC ExplConf NoAct ImplConf MC ExplConf ImplConf HC ExplConf NoAct NoAct NoAct 0  Topic MDPs   State-space: non-understanding, dialogue-on-track indicators Action-space: non-understanding actions, topic-level actions