POSESL T-bot (postech English as a second language

Transcript POSESL T-bot (postech English as a second language

POSTECH Dialog-Based Computer
Assisted Language Learning System
Intelligent Software Lab. POSTECH
Prof. Gary Geunbae Lee
Contents


Introduction
Methods

DB-CALL System





Language Learner Simulation



Example-based Dialog Modeling
Feedback Generation
Translation Assistance
Comprehension Assistance
User Simulation
Grammar Error Simulation
Discussion
RESEARCH BACKGROUND
BACKGROUND
•
•
•
•
Globalization makes English more important as a world language
Extremely high cost of native speaker tutors
Most language learning software are dedicated to pronunciation practice
Dialog-based Computer-assisted Language Learning will be an excellent
solution
ISSUES
• DB-CALL system should be able to understand student’s poor and non-n
ative expressions
• DB-CALL system should have high domain scalability to support various
practical scenarios
• DB-CALL system should provide educational functionalities which help s
tudents improve their linguistic ability
PREVIOUS WORKS ON DB-CALL

Let’s Go (CMU, 02-04)



Providing bus schdule information for CMU Non-native
students
Adaptation the acoustic model and language model to nonnative speakers
Edit-distance based corrective feedback
PREVIOUS WORKS ON DB-CALL

SPELL (Edinburgh, 05)



Restourant Domain
Scenario-based virtual space
Incorporating mal-rules into
the ASR grammar
PREVIOUS WORKS ON DB-CALL

DEAL (KTH, 07)



Trade Domain
Finite State Network-based
limited dialog management
When leaners get stuck, the
system provides hints
POSTECH DB-CALL System
ESL Dialog Tutoring
Tutor: ---------User: ---------Tutor: ---------User: ---------Tutor: ---------User: ---------Tutor: ---------User: ---------Tutor: ----------
> Expression
> Description
…
Crawler
> Korean EXP
> English EXP
…
Try this
expression
User Input
+
+
Example 1 Description 1
Example 2 Description 2
Example 3 Description 3
…
…
Description
Extractor
Parallel
Sentence
Extractor
<parallel>
<parallel>
<source>~~~~~</source>
<source>~~~~
<target>~~~~~</target>
</parallel>
<Alignment Info>
~~~</source>
<s2t>~~~~~~~~</s2t>
<t2s>~~~~~~~~</t2s>
<target
<composition>~</composition>
<Additional>
<url>~~~~~~</url>
DB-CALL System
1. Example-based Dialog Modeling
INTRODUCTION

Spoken Dialog System

Applications

Human-Robot Interface, Telematics, Tutoring, ...
PROBLEM & GOAL

PROBLEM

How to determine the next system action

Knowledge-based approach


Plan recipe / ISU rule / Agenda
Data-driven approach

Statistical approach




Supervised Learning based on state approximation
Reinforcement Learning based on MDP/POMDP
Example-based approach
GOAL

To develop a simple and practical approach to dialog modeling
for multi-domain dialog systems
IDEA
Turn #1 (Domain=Building_Guidance)
Dialog Corpus
USER: 회의 실 이 어디 지 ?
[Dialog Act = WH-QUESTION]
[Main Goal = SEARCH-LOC]
[ROOM-TYPE =회의실]
SYSTEM: 3층에 교수회의실, 2층에 대회의실, 소회의실이 있습
니다.
[System Action = inform(Floor)]
Indexed by using semantic & discourse features
Domain = Building_Guidance
Dialog Example
Dialog Act = WH-QUESTION
Main Goal = SEARCH-LOC
ROOM-TYPE=1 (filled), ROOM-NAME=0 (unfilled)
LOC-FLOOR=0, PER-NAME=0, PER-TITLE=0
Previous Dialog Act = <s>, Previous Main Goal = <s>
Discourse History Vector = [1,0,0,0,0]
Lexico-semantic Pattern = ROOM_TYPE 이 어디 지 ?
System Action = inform(Floor)
e*  argmaxS (ei , h)
ei E
Having the
similar state
Dialog State Space
Lee et al., (2006), A Situation-based Dialogue Management using Dialogue Examples, IEEE ICASSP
ALGORITHM

Noisy Input
(from ASR/SLU)
Query
Generation
Query Generation

Discourse
History

Example Search

Relaxation
Strategy
System
Template
Example
Search
Content
DB
Example
Selection
Example
DB
NLG

Making SQL statement using Discourse
History and SLU results.
Trying to search semantically close
dialog examples in example DB
given the current dialog state.
Example Selection

Selecting the best example to
maximize the utterance similarity
measure based on lexical and
discourse information.
EXPERIMENTAL RESULTS

Real user evaluation


10 undergraduates
Evaluation Metric

STR (Success Turn Rate)


TCR (Task Completion Rate)


# of successful turns / # of total turns
# of successful dialogs / # of total dialogs
AvgUserTurn

Average user’s turn length per dialog
System
#Dialogs
AvgUserTurn
STR
(%)
TCR
(%)
Car Navigation
50
4.54
86.25
92.00
Weather
Information
50
4.46
89.01
94.00
EPG
50
4.50
83.99
90.00
Chatbot
50
5.60
64.31
-
Multi-domain
15
6.08
78.77
86.67
Lee et al., (2009), Example-based Dialog Modelng for Practical Multi-domain Dialog Systems, SPECOM
EXPERIMENTAL RESULTS
System
Exact match
Partial match
No example
Car Navigation
50.22
44.49
5.29
Weather
Information
69.49
25.00
5.51
EPG
58.33
37.22
4.45
Chatbot
50.71
14.29
35.00
Multi-domain
69.23
24.62
6.15
Example match rate of each dialog system
Lee et al., (2009), Example-based Dialog Modelng for Practical Multi-domain Dialog Systems, SPECOM
ROBUST DIALOG MANAGEMENT

PROBLEM

How to overcome errors in the real world
ASR
Noise reduction
Adaptation
N-best & lattice & CN

+ERROR
SLU
+ERROR
Robust parsing
Data-driven app.
DM
Error handling
N-best support
ROBUST DIALOG MANAGEMENT

Error handling


Recovering ASR/SLU errors by interacting with the user at the
conversational level
N-best support

Estimating the current state with uncertanity
Lee et al., (2008), Robust management with n-best hypotheses using dialog examples and agenda, ACL
GOAL & IDEA

To increase the robustness of EBDM with prior
knowledge
1) Error Handling
AgendaHelp
S: Next, you can do the subtask 1) Asking the
room's role, or 2)Asking the office phone
number, or 3) Selecting the desired room for
navigation.
If the system knows what the user
will do next
Dynamic Help Generation
FOCUS NODE
LOCATION
UtterHelp
S: Next, you can say 1) “What is it?”, or 2)
“What’s the phone number of [ROOM_NAME]?”,
or 3) “ Let’s go there.
ROOM
ROLE
OFFICE
PHONE
NUMBER
GUIDE
NEXT_TASK
GOAL & IDEA

To increase the robustness of EBDM with prior
knowledge
2) N-best support
Subtask
LOCATION
System Utterance
The director’s room is
Room No. 201.
System Action
Inform(RoomNumber)
If the system knows which subtask
will be more probable next
Rescoring N-best hypotheses (h1~hn)
h1
N-best
User Utterances
Subtas
k
P(hi|S)
U1 (h1)
What are office rooms in
this building?
ROOM
NAME
0.2
U2 (h2)
What is the floor?
FLOOR
0.4
U3 (h3)
Where is it?
U4 (h4)
What is the phone
number?
LOCATION
OFFICE
PHONE
NUMBER
0.3
0.5
(More probable)
ROOM
NAME
h3
FLOOR
LOCATION
h2
OFFICE
PHONE
NUMBER
h4
ALGORITHM
From
User
ASR
w1
w2
u1
u2
SLU
wn
un
V6
s1
V2
s2
EBDM
V1
sn
Focus Stack
Discourse Interpretation
V1
V2
Argmax
Example
ej *
am*
e1
e2
Argmax
Node
V6
ek
V3
V4
V3
V4
V6
V5
V7
V6
V9
V8
EXPERIMENT SET-UP

Simulated User Evaluation




Test set : 1000 simulated dialogs (<20 user turns)
Domain : Intelligent robot for building guidance
Using 5-best recognition hypotheses
Evaluation Metric

TCR


AvgUserTurn


# of successful dialogs / # of total dialogs
Average user’s turn length per dialog
AvgScore

20 * TCR + (-1) * AvgUserTurn
EXPERIMENTAL RESULTS
17
15
Legends
Average Score
13
P-E
11
9
Methods
Using only Examples
P-ER
Using Examples + Recovery
P-EA
Using Examples + Agenda Graph
P-E
7
P-ER
P-EAR
Using Examples + Agenda Graph + Recovery
P-EA
5
P-EAR
3
0
10
20
30
40
50
WER (%)
The average score of different methods
Lee et al., (2009), Hybrid Approach to Robust Dialog Management using Agenda and Dialog Examples, CSL, (Submitted)
EXPERIMENTAL RESULTS
18
16
14
Average Score
12
10
8
WER0
WER10
WER20
WER30
WER40
WER50
6
4
2
1
2
5
10
15
20
30
50
100
n-best size
The average score of the P-EAR system according to n-best size
Lee et al., (2009), Hybrid Approach to Robust Dialog Management using Agenda and Dialog Examples, CSL, (Submitted)
DEMO VIDEO

PC demo
DEMO VIDEO

Robot demo
2. Feedback Generation
INTRODUCTION
Recast Feedback
Tutoring Process
> Expression
> Description
Tutor: ---------User: ---------Tutor: ---------User: ---------Tutor: What is the purpose of you trip?
User: My purpose business
Tutor: Sorry, I don’t understand. What did you say?
User: I am here on business
…
> Korean EXP
> English EXP
…
Clarification
Request
Recast
Feedback
Try this expression
“I am here on business”
User Input
Learner
Uptake
INTRODUCTION
Expression Suggestion
Tutoring Process
> Expression
> Description
Tutor: ---------User: ---------Tutor: ---------User: ---------Tutor: What is the purpose of you trip?
…
> Korean EXP
> English EXP
…
Tutor: Sorry, I can’t hear you.
User: I am here on business
TIMEOUT
Expression
Suggestion
Try this expression
“I am here on business”
User Input
Learner
Uptake
PROBLEMS

How to recognize user intentions despite numerous errors in their
utterances


The mal-rule based technique used in previous studies doesn’t work on
low level learners due to multiple errors
Some utterances even seem to have a meaning that differs from
what they intended to say


Intended meaning : When does the bus leave?
learner’s utterance : Which time I have to leave?

How to choose appropriate user intentions to suggest when a
timeout is expired

The system should take into consideration the dialog context as
human tutors do
Performing Intention-based soft pattern-matching to generate
correct feedback

MATHODS


Context-aware & Level-specific Intention Recognition
Intention-based pattern matching
Learner’s Utterance
Level 1
Data
Level 1
Utterance Model
Level 2
Data
Level 2
Utterance Model
Level N
Data
Level N
Utterance Model
Dialog State
Example Search
Dialog State –based
Model
Intention Recognizer
Learner‘s Intention
Dialog State
Update
Example
Expressions
Pattern Matching
Dialog Manager
Feedback
Example
Expresssion DB
EXPERIMENT SET-UP

Primitive data set



Immigration domain
192 dialogs, 3517 utterances (18.32 utt/dialog)
Annotation


Manually annotated each utterance with the speaker’s intention and
component slot-values
Automatically annotated each utterance with the discourse
information
EXPERIMENTAL RESULTS
Utterance Model
Hybrid Model
EXPERIMENTAL RESULTS
Level-spec Hybrid
Level-ignore Hybrid
Level-spec Utterance
Level-ignore Utterance
EXPERIMENTAL RESULTS

Demo: POSTECH DB-CALL initial version 2008
3. Translation Assistance
Architecture
Example format
Web
Extraction
Parallel
Sentence
Example
Analysis
<parallel>
<parallel>
<source>~~~~~~~</source>
<parallel>
<source>~~~~~~~</source>
<target>~~~~~~~~</target>
<source>~~~~~~~</source>
<target>~~~~~~~~</target>
</parallel>
<target>~~~~~~~~</target>
</parallel>
</parallel>
<Alignment
Info>
<Alignment Info>
<s2t>~~~~~~~~</s2t>
<Alignment>
<s2t>~~~~~~~~</s2t>
<t2s>~~~~~~~~</t2s>
<s2t>~~~~~~~~</s2t>
<t2s>~~~~~~~~</t2s>
<composition>~~~~<composition>
<t2s>~~~~~~~~</t2s>
<composition>~~~~<composition>
<composition>~~~~</composition>
</Alignment>
<Additional>
<Additional>
<url>~~~~~~</url>
<Additional>
<url>~~~~~~</url>
<url>~~~~~~</url>
</Additional>
Search Engine
Query
Expression
ESL Dialog system / Other Applications
Interface
(function call)
Building Bilingual Example


Word alignment
Widely used in Statistical Machine Translation



IBM Model 1~5, Symmetrization heuristics
Word alignment presents a correspondence of each
word/phrase in a given bilingual example
Example word alignment ( GIZA++ )
4. Comprehension Assistance
INTRODUCTION

English Expression-Description Example
Suggestion System

When the user asks for a unfamiliar English
expression, the system present its description to
help understanding
ESL pobcast
website
Expression
detection
Recommend
Expressiondescription
DB
Description Suggestion System
sentence
description
Dialog
System
INTRODUCTION

Expression-Description Pair Extraction
System

To present the expression example and its
description, the system extracts expressiondescription pair from ESL podcast site
Phrase
Description
routine test
… we mean it's a normal,
regular test that the doctor runs
many, many different times
with different
patients, not a special test.
Treatment
“Treatment” is another
word for what the doctor gives
you or does to you to help you.
EXAMPLE
[script]
[description]
EXAMPLE
[script]
[description]
Language Learner Simulation
1. User Simulation
INTRODUCTION

User Simulation For Spoken Dialog System


Developing `simulated user’ who can replace real users
Application

Automated evaluation of Spoken Dialog System



Detecting potential flaws
Predicting overall behaviors of system
Learning dialog strategy in reinforcement learning framework
PROBLEM & GOAL

PROBLEM

How to model real user




User Intention simulation
User Surface simulation
ASR channel simulation
GOAL



Natural Simulation
Diverse Simulation
Controllable Simulation
IDEA – User Intention Simulation

Dialog is sequential behaviors


Especially, user intention
User Intention simulation should take care of various
discourse information
User
User
Sys
User
Sys
Discourse Factors + Knowledge + Events
…
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
Sys
User Intention Simulation
- Linear Conditional Random Field model
UI
UI
UI
UI
DI
DI
DI
DI
Turn


Turn
Turn
Assumption


Turn
An user utterance has only one intention
UI : User Intention State
 State=[dialog_act, main_goal, named_entities]
DI : Previou Discourse Information

System Response + Discourse History
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
ALGORITHM
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
User Surface Simulation

PROBLEM


How to generate user surface utterance which express given
user intention
Approach

2-phase user utterance generation


1-phase : candidate generation
2-phase : rescoring
1 - phase
User
Utterance
Model
Simulation
2 -phase
Utterance
Utterance
Utterance
Utterance
..
Rescoring
Selected Utterance
Selected Utterance
Selected Utterance
…
1 phase - Generation
Dialog_Act _X_ Main_Goal
Structure Tag Transition
S1
S2
Generation
Emission
Prob.
Generation
Generation
W1
S3
Generation
Generation
W2
S4
Generation
Generation
W3
S5
Generation
W4
W5
 Structure Tags : Component Slot Names + Part of Speech Tags
 S : member of Structure Tags given space
 W : member of vocabulary given space
2phase - Rescoring

PROBLEM

Rescoring and Selecting the good utterances

Criteria

Human-like utterance


Natural word transition
APPROACH

Structure and Word interpolated BLEU score


SWB score
Notice that



Evaluation on system generated utterances on utterance simulation and
machine translation shares the same task
SWB = β * Structure_Sequence_BLEU + (1- β)* Word_Sequence_BLEU,
where 0 ≤ β ≤1
We set beta as 0.2 since Korean language is an agglutinative language
so that it is relatively free to the structural grammar.
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
ALGORITHM
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
ASR Channel Simulation

PROBLEM

How to simulate ASR channel


Knowledge-based approach
Statistical Approach



It is difficult to collect ‘speech’ data for target domain.
WER controllable simulation
APPROACH

Linguistic Knowledge based simulation

Step 1 : Determining error position

Step 2 : Generating Error types on error marked words

Step 3 : Generating ASR Errors ( Substitution, Deletion, Insertions)

Step 4 : Rescoring and selecting erroneous utterance
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
Error Type Distribution

Determining Error types

Based on the results of English Speech Recognition

We assume that Korean speech recognition has similar error
distribution generally.
Greenberg et al., 2000
Error Generation

Insertion error


Deletion error


Insert random word before the ‘insertion error mark’
Just delete it
Substitution Error

Based on Sequence Alignment Algorithm

Syllable-and Phone-based Alignment

Selecting some candidates in a dictionary

Dynamic local alignment algorithm :


Needleman and Wunsch (1970)
Get the similarity score
 Similarity = α * Syllable_Alignment_Score + (1- α)* Phoneme_Alignment_Score,
where 0 ≤ α ≤1
Vowel Confusion Matrix example
EXPERIMENT SET-UP

Korean Car navigation Dialog system

SLU : Jeong and Lee (2006)
DM : Lee et al. (2009)

Word Error Rate : 0.0 ~ 0.4


5000 dialog samples at each WER setting
Intention
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
Intention
D-BLEU ( Discourse BLEU) is a metric for measuring naturalness of simulated
dialogs in the sense of n-gram precision based on BLEU metric calculation.
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
Utterance
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
ASR channel
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
ASR channel
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
Overall prediction
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems,
Computer Speech and Language.
2. Grammar Error Simulation
INTRODUCTION

Language learner simulation requires us to invent
grammar error simulation on top of the general user
simulation
Grammar Errors
Simulator
Non-native ASR
SLU
User Utterance Simulator
ASR Errors
Simulator
Dialog Manager
User Intention Simulator
TTS
System Utterance
Generator
Language Learner
Simulator
Dialog System
REALISTIC ERROR
He wants to go to a movie theater
He wants to
to a movie theater
VS.
He want
go to
movie theater
PROBLEMS

How to incorporate expert knowledge about error
characteristics of Korean language learners into the
statistical model




Subject-verb agreement errors
Omission errors of the preposition of prepositional verbs
Omission errors of articles
Etc.
MARKOV LOGIC NETWORK
Sungjin Lee, Gary Geunbae Lee. Realistic grammar error simulation using markov logic. ACL 2009
METHOD

The generation procedure involves three steps:



Generating probability over error types for each word through MLN
inference
Determining an error type by sampling the generated probability for
each word
Creating an ill-formed output sentence by realizing the chosen error
types
He
wants
to
go
to
a
movie
theater
v_agr_sub
prp_lex_del
at_del
0.000
0.000
0.000
0.371
0.000
0.000
0.000
0.284
0.000
0.000
0.000
0.000
0.000
0.269
0.000
0.000
0.000
0.355
0.000
0.000
0. 000
0.000
0.000
0.000
none
0.921
0.449
0.604
0.866
0.605
0.506
0.781
0.798
Sampling
none
v_agr_sub
prp_lex_del
none
none
at_del
none
none
Realization
He
want
go
to
movie
theater
1 step
Inference
2 step
3 step
EXPERIMENT SET-UP

Data Sets


NICT JLE Corpus
Dividing the 167 error annotated files into 3 level groups:




Beginner(1-4) : 2,905
Intermediate(5-6) : 3,296
Advanced(7-9) : 2,752
Evaluation

10-fold cross validations performed for each group

The validation results were added together across the rounds
EXPERIMENTAL RESULTS

Advanced
DKL(Real || Proposed)=0.068 vs. DKL(Real || Baseline)=0.122
EXPERIMENTAL RESULTS

Intermediate
DKL(Real || Proposed)=0.075 vs. DKL(Real || Baseline)=0.142
EXPERIMENTAL RESULTS

Beginner
DKL(Real || Proposed)=0.075 vs. DKL(Real || Baseline)=0.092
EXPERIMENTAL RESULTS

Human Judgment



Evaluated 100 randomly chosen sentences consisting of 50 sent
ences each from the real and simulated data
The sequence of the test sentences was mixed so that the hum
an judges did not know whether the source of the sentence wa
s real or simulated
Two-level scale (0: Unrealistic, 1: Realistic)
Sungjin Lee, Gary Geunbae Lee. Realistic grammar error simulation using markov logic. ACL 2009
Q&A

POSESL T-bot (postech English as a second language

Transcript POSESL T-bot (postech English as a second language

Directory