POSESL T-bot (postech English as a second language
Download
Report
Transcript POSESL T-bot (postech English as a second language
POSTECH Dialog-Based Computer
Assisted Language Learning System
Intelligent Software Lab. POSTECH
Prof. Gary Geunbae Lee
Contents
Introduction
Methods
DB-CALL System
Language Learner Simulation
Example-based Dialog Modeling
Feedback Generation
Translation Assistance
Comprehension Assistance
User Simulation
Grammar Error Simulation
Discussion
RESEARCH BACKGROUND
BACKGROUND
•
•
•
•
Globalization makes English more important as a world language
Extremely high cost of native speaker tutors
Most language learning software are dedicated to pronunciation practice
Dialog-based Computer-assisted Language Learning will be an excellent
solution
ISSUES
• DB-CALL system should be able to understand student’s poor and non-n
ative expressions
• DB-CALL system should have high domain scalability to support various
practical scenarios
• DB-CALL system should provide educational functionalities which help s
tudents improve their linguistic ability
PREVIOUS WORKS ON DB-CALL
Let’s Go (CMU, 02-04)
Providing bus schdule information for CMU Non-native
students
Adaptation the acoustic model and language model to nonnative speakers
Edit-distance based corrective feedback
PREVIOUS WORKS ON DB-CALL
SPELL (Edinburgh, 05)
Restourant Domain
Scenario-based virtual space
Incorporating mal-rules into
the ASR grammar
PREVIOUS WORKS ON DB-CALL
DEAL (KTH, 07)
Trade Domain
Finite State Network-based
limited dialog management
When leaners get stuck, the
system provides hints
POSTECH DB-CALL System
ESL Dialog Tutoring
Tutor: ---------User: ---------Tutor: ---------User: ---------Tutor: ---------User: ---------Tutor: ---------User: ---------Tutor: ----------
> Expression
> Description
…
Crawler
> Korean EXP
> English EXP
…
Try this
expression
User Input
+
+
Example 1 Description 1
Example 2 Description 2
Example 3 Description 3
…
…
Description
Extractor
Parallel
Sentence
Extractor
<parallel>
<parallel>
<source>~~~~~</source>
<source>~~~~
<target>~~~~~</target>
</parallel>
<Alignment Info>
~~~</source>
<s2t>~~~~~~~~</s2t>
<t2s>~~~~~~~~</t2s>
<target
<composition>~</composition>
<Additional>
<url>~~~~~~</url>
DB-CALL System
1. Example-based Dialog Modeling
INTRODUCTION
Spoken Dialog System
Applications
Human-Robot Interface, Telematics, Tutoring, ...
PROBLEM & GOAL
PROBLEM
How to determine the next system action
Knowledge-based approach
Plan recipe / ISU rule / Agenda
Data-driven approach
Statistical approach
Supervised Learning based on state approximation
Reinforcement Learning based on MDP/POMDP
Example-based approach
GOAL
To develop a simple and practical approach to dialog modeling
for multi-domain dialog systems
IDEA
Turn #1 (Domain=Building_Guidance)
Dialog Corpus
USER: 회의 실 이 어디 지 ?
[Dialog Act = WH-QUESTION]
[Main Goal = SEARCH-LOC]
[ROOM-TYPE =회의실]
SYSTEM: 3층에 교수회의실, 2층에 대회의실, 소회의실이 있습
니다.
[System Action = inform(Floor)]
Indexed by using semantic & discourse features
Domain = Building_Guidance
Dialog Example
Dialog Act = WH-QUESTION
Main Goal = SEARCH-LOC
ROOM-TYPE=1 (filled), ROOM-NAME=0 (unfilled)
LOC-FLOOR=0, PER-NAME=0, PER-TITLE=0
Previous Dialog Act = <s>, Previous Main Goal = <s>
Discourse History Vector = [1,0,0,0,0]
Lexico-semantic Pattern = ROOM_TYPE 이 어디 지 ?
System Action = inform(Floor)
e* argmaxS (ei , h)
ei E
Having the
similar state
Dialog State Space
Lee et al., (2006), A Situation-based Dialogue Management using Dialogue Examples, IEEE ICASSP
ALGORITHM
Noisy Input
(from ASR/SLU)
Query
Generation
Query Generation
Discourse
History
Example Search
Relaxation
Strategy
System
Template
Example
Search
Content
DB
Example
Selection
Example
DB
NLG
Making SQL statement using Discourse
History and SLU results.
Trying to search semantically close
dialog examples in example DB
given the current dialog state.
Example Selection
Selecting the best example to
maximize the utterance similarity
measure based on lexical and
discourse information.
EXPERIMENTAL RESULTS
Real user evaluation
10 undergraduates
Evaluation Metric
STR (Success Turn Rate)
TCR (Task Completion Rate)
# of successful turns / # of total turns
# of successful dialogs / # of total dialogs
AvgUserTurn
Average user’s turn length per dialog
System
#Dialogs
AvgUserTurn
STR
(%)
TCR
(%)
Car Navigation
50
4.54
86.25
92.00
Weather
Information
50
4.46
89.01
94.00
EPG
50
4.50
83.99
90.00
Chatbot
50
5.60
64.31
-
Multi-domain
15
6.08
78.77
86.67
Lee et al., (2009), Example-based Dialog Modelng for Practical Multi-domain Dialog Systems, SPECOM
EXPERIMENTAL RESULTS
System
Exact match
Partial match
No example
Car Navigation
50.22
44.49
5.29
Weather
Information
69.49
25.00
5.51
EPG
58.33
37.22
4.45
Chatbot
50.71
14.29
35.00
Multi-domain
69.23
24.62
6.15
Example match rate of each dialog system
Lee et al., (2009), Example-based Dialog Modelng for Practical Multi-domain Dialog Systems, SPECOM
ROBUST DIALOG MANAGEMENT
PROBLEM
How to overcome errors in the real world
ASR
Noise reduction
Adaptation
N-best & lattice & CN
+ERROR
SLU
+ERROR
Robust parsing
Data-driven app.
DM
Error handling
N-best support
ROBUST DIALOG MANAGEMENT
Error handling
Recovering ASR/SLU errors by interacting with the user at the
conversational level
N-best support
Estimating the current state with uncertanity
Lee et al., (2008), Robust management with n-best hypotheses using dialog examples and agenda, ACL
GOAL & IDEA
To increase the robustness of EBDM with prior
knowledge
1) Error Handling
AgendaHelp
S: Next, you can do the subtask 1) Asking the
room's role, or 2)Asking the office phone
number, or 3) Selecting the desired room for
navigation.
If the system knows what the user
will do next
Dynamic Help Generation
FOCUS NODE
LOCATION
UtterHelp
S: Next, you can say 1) “What is it?”, or 2)
“What’s the phone number of [ROOM_NAME]?”,
or 3) “ Let’s go there.
ROOM
ROLE
OFFICE
PHONE
NUMBER
GUIDE
NEXT_TASK
GOAL & IDEA
To increase the robustness of EBDM with prior
knowledge
2) N-best support
Subtask
LOCATION
System Utterance
The director’s room is
Room No. 201.
System Action
Inform(RoomNumber)
If the system knows which subtask
will be more probable next
Rescoring N-best hypotheses (h1~hn)
h1
N-best
User Utterances
Subtas
k
P(hi|S)
U1 (h1)
What are office rooms in
this building?
ROOM
NAME
0.2
U2 (h2)
What is the floor?
FLOOR
0.4
U3 (h3)
Where is it?
U4 (h4)
What is the phone
number?
LOCATION
OFFICE
PHONE
NUMBER
0.3
0.5
(More probable)
ROOM
NAME
h3
FLOOR
LOCATION
h2
OFFICE
PHONE
NUMBER
h4
ALGORITHM
From
User
ASR
w1
w2
u1
u2
SLU
wn
un
V6
s1
V2
s2
EBDM
V1
sn
Focus Stack
Discourse Interpretation
V1
V2
Argmax
Example
ej *
am*
e1
e2
Argmax
Node
V6
ek
V3
V4
V3
V4
V6
V5
V7
V6
V9
V8
EXPERIMENT SET-UP
Simulated User Evaluation
Test set : 1000 simulated dialogs (<20 user turns)
Domain : Intelligent robot for building guidance
Using 5-best recognition hypotheses
Evaluation Metric
TCR
AvgUserTurn
# of successful dialogs / # of total dialogs
Average user’s turn length per dialog
AvgScore
20 * TCR + (-1) * AvgUserTurn
EXPERIMENTAL RESULTS
17
15
Legends
Average Score
13
P-E
11
9
Methods
Using only Examples
P-ER
Using Examples + Recovery
P-EA
Using Examples + Agenda Graph
P-E
7
P-ER
P-EAR
Using Examples + Agenda Graph + Recovery
P-EA
5
P-EAR
3
0
10
20
30
40
50
WER (%)
The average score of different methods
Lee et al., (2009), Hybrid Approach to Robust Dialog Management using Agenda and Dialog Examples, CSL, (Submitted)
EXPERIMENTAL RESULTS
18
16
14
Average Score
12
10
8
WER0
WER10
WER20
WER30
WER40
WER50
6
4
2
1
2
5
10
15
20
30
50
100
n-best size
The average score of the P-EAR system according to n-best size
Lee et al., (2009), Hybrid Approach to Robust Dialog Management using Agenda and Dialog Examples, CSL, (Submitted)
DEMO VIDEO
PC demo
DEMO VIDEO
Robot demo
2. Feedback Generation
INTRODUCTION
Recast Feedback
Tutoring Process
> Expression
> Description
Tutor: ---------User: ---------Tutor: ---------User: ---------Tutor: What is the purpose of you trip?
User: My purpose business
Tutor: Sorry, I don’t understand. What did you say?
User: I am here on business
…
> Korean EXP
> English EXP
…
Clarification
Request
Recast
Feedback
Try this expression
“I am here on business”
User Input
Learner
Uptake
INTRODUCTION
Expression Suggestion
Tutoring Process
> Expression
> Description
Tutor: ---------User: ---------Tutor: ---------User: ---------Tutor: What is the purpose of you trip?
…
> Korean EXP
> English EXP
…
Tutor: Sorry, I can’t hear you.
User: I am here on business
TIMEOUT
Expression
Suggestion
Try this expression
“I am here on business”
User Input
Learner
Uptake
PROBLEMS
How to recognize user intentions despite numerous errors in their
utterances
The mal-rule based technique used in previous studies doesn’t work on
low level learners due to multiple errors
Some utterances even seem to have a meaning that differs from
what they intended to say
Intended meaning : When does the bus leave?
learner’s utterance : Which time I have to leave?
How to choose appropriate user intentions to suggest when a
timeout is expired
The system should take into consideration the dialog context as
human tutors do
Performing Intention-based soft pattern-matching to generate
correct feedback
MATHODS
Context-aware & Level-specific Intention Recognition
Intention-based pattern matching
Learner’s Utterance
Level 1
Data
Level 1
Utterance Model
Level 2
Data
Level 2
Utterance Model
Level N
Data
Level N
Utterance Model
Dialog State
Example Search
Dialog State –based
Model
Intention Recognizer
Learner‘s Intention
Dialog State
Update
Example
Expressions
Pattern Matching
Dialog Manager
Feedback
Example
Expresssion DB
EXPERIMENT SET-UP
Primitive data set
Immigration domain
192 dialogs, 3517 utterances (18.32 utt/dialog)
Annotation
Manually annotated each utterance with the speaker’s intention and
component slot-values
Automatically annotated each utterance with the discourse
information
EXPERIMENTAL RESULTS
Utterance Model
Hybrid Model
EXPERIMENTAL RESULTS
Level-spec Hybrid
Level-ignore Hybrid
Level-spec Utterance
Level-ignore Utterance
EXPERIMENTAL RESULTS
Demo: POSTECH DB-CALL initial version 2008
3. Translation Assistance
Architecture
Example format
Web
Extraction
Parallel
Sentence
Example
Analysis
<parallel>
<parallel>
<source>~~~~~~~</source>
<parallel>
<source>~~~~~~~</source>
<target>~~~~~~~~</target>
<source>~~~~~~~</source>
<target>~~~~~~~~</target>
</parallel>
<target>~~~~~~~~</target>
</parallel>
</parallel>
<Alignment
Info>
<Alignment Info>
<s2t>~~~~~~~~</s2t>
<Alignment>
<s2t>~~~~~~~~</s2t>
<t2s>~~~~~~~~</t2s>
<s2t>~~~~~~~~</s2t>
<t2s>~~~~~~~~</t2s>
<composition>~~~~<composition>
<t2s>~~~~~~~~</t2s>
<composition>~~~~<composition>
<composition>~~~~</composition>
</Alignment>
<Additional>
<Additional>
<url>~~~~~~</url>
<Additional>
<url>~~~~~~</url>
<url>~~~~~~</url>
</Additional>
Search Engine
Query
Expression
ESL Dialog system / Other Applications
Interface
(function call)
Building Bilingual Example
Word alignment
Widely used in Statistical Machine Translation
IBM Model 1~5, Symmetrization heuristics
Word alignment presents a correspondence of each
word/phrase in a given bilingual example
Example word alignment ( GIZA++ )
4. Comprehension Assistance
INTRODUCTION
English Expression-Description Example
Suggestion System
When the user asks for a unfamiliar English
expression, the system present its description to
help understanding
ESL pobcast
website
Expression
detection
Recommend
Expressiondescription
DB
Description Suggestion System
sentence
description
Dialog
System
INTRODUCTION
Expression-Description Pair Extraction
System
To present the expression example and its
description, the system extracts expressiondescription pair from ESL podcast site
Phrase
Description
routine test
… we mean it's a normal,
regular test that the doctor runs
many, many different times
with different
patients, not a special test.
Treatment
“Treatment” is another
word for what the doctor gives
you or does to you to help you.
EXAMPLE
[script]
[description]
EXAMPLE
[script]
[description]
Language Learner Simulation
1. User Simulation
INTRODUCTION
User Simulation For Spoken Dialog System
Developing `simulated user’ who can replace real users
Application
Automated evaluation of Spoken Dialog System
Detecting potential flaws
Predicting overall behaviors of system
Learning dialog strategy in reinforcement learning framework
PROBLEM & GOAL
PROBLEM
How to model real user
User Intention simulation
User Surface simulation
ASR channel simulation
GOAL
Natural Simulation
Diverse Simulation
Controllable Simulation
IDEA – User Intention Simulation
Dialog is sequential behaviors
Especially, user intention
User Intention simulation should take care of various
discourse information
User
User
Sys
User
Sys
Discourse Factors + Knowledge + Events
…
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
Sys
User Intention Simulation
- Linear Conditional Random Field model
UI
UI
UI
UI
DI
DI
DI
DI
Turn
Turn
Turn
Assumption
Turn
An user utterance has only one intention
UI : User Intention State
State=[dialog_act, main_goal, named_entities]
DI : Previou Discourse Information
System Response + Discourse History
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
ALGORITHM
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
User Surface Simulation
PROBLEM
How to generate user surface utterance which express given
user intention
Approach
2-phase user utterance generation
1-phase : candidate generation
2-phase : rescoring
1 - phase
User
Utterance
Model
Simulation
2 -phase
Utterance
Utterance
Utterance
Utterance
..
Rescoring
Selected Utterance
Selected Utterance
Selected Utterance
…
1 phase - Generation
Dialog_Act _X_ Main_Goal
Structure Tag Transition
S1
S2
Generation
Emission
Prob.
Generation
Generation
W1
S3
Generation
Generation
W2
S4
Generation
Generation
W3
S5
Generation
W4
W5
Structure Tags : Component Slot Names + Part of Speech Tags
S : member of Structure Tags given space
W : member of vocabulary given space
2phase - Rescoring
PROBLEM
Rescoring and Selecting the good utterances
Criteria
Human-like utterance
Natural word transition
APPROACH
Structure and Word interpolated BLEU score
SWB score
Notice that
Evaluation on system generated utterances on utterance simulation and
machine translation shares the same task
SWB = β * Structure_Sequence_BLEU + (1- β)* Word_Sequence_BLEU,
where 0 ≤ β ≤1
We set beta as 0.2 since Korean language is an agglutinative language
so that it is relatively free to the structural grammar.
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
ALGORITHM
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
ASR Channel Simulation
PROBLEM
How to simulate ASR channel
Knowledge-based approach
Statistical Approach
It is difficult to collect ‘speech’ data for target domain.
WER controllable simulation
APPROACH
Linguistic Knowledge based simulation
Step 1 : Determining error position
Step 2 : Generating Error types on error marked words
Step 3 : Generating ASR Errors ( Substitution, Deletion, Insertions)
Step 4 : Rescoring and selecting erroneous utterance
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
Error Type Distribution
Determining Error types
Based on the results of English Speech Recognition
We assume that Korean speech recognition has similar error
distribution generally.
Greenberg et al., 2000
Error Generation
Insertion error
Deletion error
Insert random word before the ‘insertion error mark’
Just delete it
Substitution Error
Based on Sequence Alignment Algorithm
Syllable-and Phone-based Alignment
Selecting some candidates in a dictionary
Dynamic local alignment algorithm :
Needleman and Wunsch (1970)
Get the similarity score
Similarity = α * Syllable_Alignment_Score + (1- α)* Phoneme_Alignment_Score,
where 0 ≤ α ≤1
Vowel Confusion Matrix example
EXPERIMENT SET-UP
Korean Car navigation Dialog system
SLU : Jeong and Lee (2006)
DM : Lee et al. (2009)
Word Error Rate : 0.0 ~ 0.4
5000 dialog samples at each WER setting
Intention
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
Intention
D-BLEU ( Discourse BLEU) is a metric for measuring naturalness of simulated
dialogs in the sense of n-gram precision based on BLEU metric calculation.
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
Utterance
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
ASR channel
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
ASR channel
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
Overall prediction
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
2. Grammar Error Simulation
INTRODUCTION
Language learner simulation requires us to invent
grammar error simulation on top of the general user
simulation
Grammar Errors
Simulator
Non-native ASR
SLU
User Utterance Simulator
ASR Errors
Simulator
Dialog Manager
User Intention Simulator
TTS
System Utterance
Generator
Language Learner
Simulator
Dialog System
REALISTIC ERROR
He wants to go to a movie theater
He wants to
to a movie theater
VS.
He want
go to
movie theater
PROBLEMS
How to incorporate expert knowledge about error
characteristics of Korean language learners into the
statistical model
Subject-verb agreement errors
Omission errors of the preposition of prepositional verbs
Omission errors of articles
Etc.
MARKOV LOGIC NETWORK
Sungjin Lee, Gary Geunbae Lee. Realistic grammar error simulation using markov logic. ACL 2009
METHOD
The generation procedure involves three steps:
Generating probability over error types for each word through MLN
inference
Determining an error type by sampling the generated probability for
each word
Creating an ill-formed output sentence by realizing the chosen error
types
He
wants
to
go
to
a
movie
theater
v_agr_sub
prp_lex_del
at_del
0.000
0.000
0.000
0.371
0.000
0.000
0.000
0.284
0.000
0.000
0.000
0.000
0.000
0.269
0.000
0.000
0.000
0.355
0.000
0.000
0. 000
0.000
0.000
0.000
none
0.921
0.449
0.604
0.866
0.605
0.506
0.781
0.798
Sampling
none
v_agr_sub
prp_lex_del
none
none
at_del
none
none
Realization
He
want
go
to
movie
theater
1 step
Inference
2 step
3 step
EXPERIMENT SET-UP
Data Sets
NICT JLE Corpus
Dividing the 167 error annotated files into 3 level groups:
Beginner(1-4) : 2,905
Intermediate(5-6) : 3,296
Advanced(7-9) : 2,752
Evaluation
10-fold cross validations performed for each group
The validation results were added together across the rounds
EXPERIMENTAL RESULTS
Advanced
DKL(Real || Proposed)=0.068 vs. DKL(Real || Baseline)=0.122
EXPERIMENTAL RESULTS
Intermediate
DKL(Real || Proposed)=0.075 vs. DKL(Real || Baseline)=0.142
EXPERIMENTAL RESULTS
Beginner
DKL(Real || Proposed)=0.075 vs. DKL(Real || Baseline)=0.092
EXPERIMENTAL RESULTS
Human Judgment
Evaluated 100 randomly chosen sentences consisting of 50 sent
ences each from the real and simulated data
The sequence of the test sentences was mixed so that the hum
an judges did not know whether the source of the sentence wa
s real or simulated
Two-level scale (0: Unrealistic, 1: Realistic)
Sungjin Lee, Gary Geunbae Lee. Realistic grammar error simulation using markov logic. ACL 2009
Q&A