HIS Demo System - University of Cambridge

Download Report

Transcript HIS Demo System - University of Cambridge

Still Talking to Machines
(Cognitively Speaking)
Steve Young
Machine Intelligence Laboratory
Information Engineering Division
Cambridge University Engineering Department
Cambridge, UK
Outline of Talk
 A brief historical perspective
 Cognitive User Interfaces
 Statistical Dialogue Modelling
 Scaling to the Real World
 System Architecture
 Some Examples and Results
 Conclusions and future work.
2
Interspeech Plenary September 2010 © Steve Young
Why Talk to Machines?
 it should be an easy and efficient way of finding out information
and controlling behaviour
 sometimes it is the only way
 hands-busy eg surgeon, driver, package handler, etc.
 no internet and no call-centres e.g. areas of 3rd world
 very small devices
 one day it might be fun
c.f. Project Natal - Milo
3
Interspeech Plenary September 2010 © Steve Young
VODIS - circa 1985
Natural language/mixed initiative Train-timetable Inquiry Service
150 word DTW connected speech recognition
8 x 8086
Processors
Logos
Speech
Recogniser
DecTalk
Synthesiser
PDP11/45
Words
Recognition
Grammars
Text
Framebased
Dialogue
Manager
128k Mem/
2x5Mb Disk
Demo
Collaboration between BT, Logica and Cambridge U.
4
Interspeech Plenary September 2010 © Steve Young
Some desirable properties of a Spoken Dialogue System
 able to support reasoning and inference
 interpret noisy inputs and resolve ambiguities in context
 able to plan under uncertainty
Cognitive
User
Interface
 clearly defined communicative goals
 performance quantified as rewards
 plans optimized to maximize rewards
 able to adapt on-line
 robust to speaker (accent, vocab, behaviour,..)
 robust to environment (noise, location, ..)
 able to learn from experience
 progressively optimize models and plans over time
S. Young (2010). "Cognitive User Interfaces." Signal Processing Magazine 27(3)
5
Interspeech Plenary September 2010 © Steve Young
Essential Ingredients of a Cognitive User Interface (CUI)
• Explicit representation of uncertainty using a probability
model over dialogue states e.g. using Bayesian networks
• Inputs regarded as observations used to update the
posterior state probabilities via inference
• Responses defined by plans which map internal states to
actions
• The system’s design objectives defined by rewards
associated with specific state/action pairs
• Plans optimized via reinforcement learning
• Model parameters estimated via supervised learning
and/or optimized via reinforcement learning
Partially Observable
Markov Decision Process
(POMDP)
6
Interspeech Plenary September 2010 © Steve Young
A Framework for Statistical Dialogue Management
Distribution
Parameters λ
Observation
ot
Speech
Understanding
Belief
bt = P(st|ot-1,bt-1; λ )
Model
Distribution
of Dialogue
States st
User
Policy
π(at|bt,θ)
Response
Generation
Action at
bt,at
Reward
Function r
Policy
Parameters θ
Reward
R = Σ r(bt,at)
t
7
Interspeech Plenary September 2010 © Steve Young
Belief Tracking aka Belief Monitoring
Belief is updated following each new user input
b¢( s¢) = k × P(o¢ | s¢,a)å P( s¢ | a, s)b(s)
s
However, the state space is huge and the above equation is
intractable for practical systems. So we approximate:
Track just the N most likely states
Hidden Information
State System (HIS)
Factorise the state space and
ignore all but major conditional
dependencies
Graphical Model
System (GMS aka BUDS)
S. Young (2010). "The Hidden Information State Model" Computer Speech and Language 24(2)
B. Thomson (2010). "Bayesian update of dialogue state" Computer Speech and Language 24(4)
8
Interspeech Plenary September 2010 © Steve Young
Dialogue State
Tourist Information Domain
• type = bar,restaurant
• food = French, Chinese, none
gtype
gfood
User
Behaviour
User
Act
utype
ufood
otype
Memory
History
Recognition/
Understanding
Errors
ofood
hfood
htype
Next Time Slice t+1
Goal
Observation
at time t
J. Williams (2007). ”POMDPs for Spoken Dialog Systems." Computer Speech and Language 21(2)
9
Interspeech Plenary September 2010 © Steve Young
Dialogue Model Parameters
(ignoring history nodes for simplicity)
gtype
gfood
p(u|g)
g
French
French
gtype
0.7
Chinese
NoMention
utype
ufood
p(o|u)
French

uu type
NoMention
ofoo
d
time t
gfood
0.3
Chinese
otype
0
otype
French
0.8
Chinese
None
0
0
0.7
0
0.3
1.0
ufoodChinese
0.2
NoMention
0
0.2
0.8
0
0
0
1.0
ofoo
d
time t+1
10
Interspeech Plenary September 2010 © Steve Young
Belief Monitoring (Tracking)
BR
FC-
BR
FC-
gtype
gfood
gtype
gfood
utype
ufood
utype
ufood
t=1
otype
t=2
ofoo
otype
d
inform(food=french) {0.9}
ofoo
d
confirm(food=french)
affirm() {0.9}
11
Interspeech Plenary September 2010 © Steve Young
Belief Monitoring (Tracking)
BR
FC-
BR
FC-
gtype
gfood
gtype
gfood
utype
ufood
utype
ufood
t=1
otype
t=2
ofoo
otype
d
inform(type=bar,
food=french) {0.6}
inform(type=restaurant,
food=french) {0.3}
ofoo
d
confirm(type=restaurant,
food=french)
affirm() {0.9}
12
Interspeech Plenary September 2010 © Steve Young
Belief Monitoring (Tracking)
BR
FC-
BR
FC-
gtype
gfood
gtype
gfood
utype
ufood
utype
ufood
t=1
otype
t=2
ofoo
otype
d
inform(type=bar) {0.4}
ofoo
d
select(type=bar,
type=restaurant)
inform(type=bar) {0.4}
13
Interspeech Plenary September 2010 © Steve Young
Choosing the next action – the Policy
f1 f2 f 3
type
food
0 0 0 1 0 1 0 0
Quantize
BR
FC-
gtype
gfood
inform(type=bar) {0.4}
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Policy Vector
×q
p (a | b,q ) =
All Possible Summary
Actions: inform, select,
confirm, etc
q .f a (b)
e
åa¢ e
q .f a¢ (b)
Sample
a = select
Map
select(type=bar,
type=restaurant)
14
Interspeech Plenary September 2010 © Steve Young
Policy Optimization
Policy parameters chosen to maximize expected reward
é1
ù
J(q ) = Eê å r(st ,at ) | p q ú
ëT t
û
Natural gradient ascent works well
Fisher
Information
Matrix
Gradient is estimated by sampling dialogues and in practice
Fisher Information Matrix does not need to be explicitly computed.
This is the Natural Actor Critic Algorithm.
J. Peters and S. Schaal (2008). "Natural Actor-Critic." Neurocomputing 71(7-9)
15
Interspeech Plenary September 2010 © Steve Young
Dialogue Model Parameter Optimization
f
Approximation of belief distribution via feature vectors a
prevents policy differentiation wrt Dialogue Model parameters l.
However a trick can be used. Assume that l are drawn from
a prior p(l; a ) which is differentiable wrt a . Then optimize
reward wrt to a and sample p(l; a ) to get l .
This is the Natural Belief Critic Algorithm.
It is also possible to do maximum likelihood model parameter
estimation using Expectation Propagation.
F. Jurcicek (2010). "Natural Belief-Critic" Interspeech 2010
B. Thomson (2010). "Parameter learning for POMDP spoken dialogue models. SLT 2010
16
Interspeech Plenary September 2010 © Steve Young
Performance Comparison in Simulated TownInfo Domain
Handcrafted Model
and Handcrafted Policy
Handcrafted Model
and Trained Policy
Trained Model
and Trained Policy
Reward = 100 for success – 1 for each
turn taken
Handcrafted Policy
and Trained Model
17
Interspeech Plenary September 2010 © Steve Young
Scaling up to Real World Problems
Several of the key ideas have already been covered
 compact representation of dialogue state eg HIS, BUDS
 mapping belief states into summary states via quantisation,
feature vectors, etc
 mapping actions in summary space back into full space
But inference itself is also a problem …
18
Interspeech Plenary September 2010 © Steve Young
CamInfo Ontology
entity
venue
placetostay
staytype
hasinternet
guesthous
e
hotel
hasparking
price
restaurant
food
pricerange
stars
eattype
coffeeshop
food
placetodrin
k
placetoeat
childrenallo
wed
hasinternet
pricerange
openhours
price
bar
hasmusic
hastv
openhours
drinktype
pricerange
placetosee
openhours
pub
coffeeshop
architectur
e
museum
childrenallo
wed
hasfood
hasinternet
hastv
seetype
park
cinema
name
type
area
entsvenue
pricerange
openhours
entstype
theatre
nightclub
entertainm
ent
openhours
price
pricerange
near
univenue
boat
concert
location
sportsvenu
e
addr
phone
postcode
transvenue
unitype
openhours
sport
college
department
library
rating
shopvenue
transtype
airport
reviews
busstation
trainstation
openhours
amenity
shoptype
openhours
supermark
et
shoppingce
ntre
amtype
hospital
bank
postoffice
touristinfo
openhours
openhours
openhours
entity
• Many concepts
• Many values
per concept
venue
name
type
location
addr
placetoeat
eattype
restaurant
pricerange
coffeeshop
openhours
price
phone
multiple
nodes per
concept
Complex
Dialogue
State
food
19
Interspeech Plenary September 2010 © Steve Young
Belief Propagation Times
Time
Standard
LBP
LBP with
Grouping
LBP with
Grouping &
Const Prob
of Change
Network Branching Factor
B. Thomson (2010). "Bayesian update of dialogue state" Computer Speech and Language 24(4)
20
Interspeech Plenary September 2010 © Steve Young
Architecture of the Cambridge Statistical SDS
Run-time mode
speech
y
speech
p(x|a)
Speech
Recognition
Speech
Synthesiser
words
p(w|y)
words
p(m|a)
Semantic
Decoder
Message
Generator
dialog
acts
p(v|y)
dialog
acts
Dialogue
Manager
HIS
or
BUDS
a
Corpus Data
21
Interspeech Plenary September 2010 © Steve Young
Architecture of the Cambridge Statistical SDS
Training mode
Error
Model
dialog
acts
p(v|y)
User
Simulator
dialog
acts
Dialogue
Manager
HIS
or
BUDS
a
Corpus Data
22
Interspeech Plenary September 2010 © Steve Young
CMU Let’s Go Spoken Dialogue Challenge
• Telephone-based spoken dialog system to provide bus
schedule information for the City of Pittsburgh, PA (USA).
• Based on existing system with real users.
• Two stage evaluation process
1. Control Test with recruited subjects given specific
known tasks
2. Live Test with competing implementations switched
according to a daily schedule
•
Full results to be presented at a special session at SLT
Organised by the Dialog Research Center, CMU
See http://www.dialrc.org/sdc/
23
Interspeech Plenary September 2010 © Steve Young
Let’s Go 2010 Control Test Results
All Qualifying Systems
Predicted
Success
Rate
System Z
89% Success
33% WER
System X
65% Success
42% WER
B. Thomson
"Bayesian
Update of State
for the Let's Go
Spoken
Dialogue
Challenge.”
SLT 2010.
Average Success = 64.8%
Average WER = 42.4%
Word Error Rate (WER)
System Y
75% Success
34% WER
24
Interspeech Plenary September 2010 © Steve Young
CamInfo Demo
25
Interspeech Plenary September 2010 © Steve Young
Conclusions
 End-end statistical dialogue systems can be built and are competitive
 Core is a POMDP-based dialogue manager which provides an explicit
representation of uncertainty with the following benefits
o robust to recognition errors
o objective measure of goodness via reward function
o ability to optimize performance against objectives
o reduced development costs – no hand-tuning, no complex design
processes, easily ported to new applications
o natural dialogue – say anything, any time
 Still much to do
o faster learning, off-policy learning, long term adaptation, dynamic
ontologies, multi-modal input/output
 Perhaps talking to machines is within reach ….
26
Interspeech Plenary September 2010 © Steve Young
Credits
EU FP7 Project: Computational Learning in
Adaptive Systems for Spoken Conversation
Spoken Dialogue Management using Partially
Observable Markov Decision Processes
Past and Present Members of the CUED Dialogue Systems Group
Milica Gasic, Filip Jurcicek, Simon Keizer, Fabrice Lefevre,
Francois Mairesse, Jorge Prombonas, Jost Schatzmann,
Matt Stuttle, Blaise Thomson, Karl Weilhammer, Jason Williams,
Hui Ye, Kai Yu
27
Interspeech Plenary September 2010 © Steve Young