Speech Recognition

Download Report

Transcript Speech Recognition

Speech Recognition
서울대학교 산업공학과 공장자동화연구실
심 억 수 ([email protected])
98. 7. 28
Factory Automation Lab.
Dept. of Industrial Engineering
1 / 28
2015-07-20
Contents
• Hybrid Method of Data Collection for Evaluating Speech Dialogue
System
• The role of speech processing in human-computer intelligent
communication
Factory Automation Lab.
Dept. of Industrial Engineering
2 / 21
2015-07-20
Hybrid Method of Data Collection
for Evaluating Speech Dialogue System
IEICE Trans. Inf. & Syst. Vol. E79-D, No.1 January 1996
Shu Nakazato*, Ikuo Kudo**, Katsuhiko Shirai*
*Dep. of Electrical Engineering School of Science and Engineering,
Waseda Univ., Tokyo
**Texas Instruments Tsukuba Research and Development Center
Limited, Tsukuba-shi
Factory Automation Lab.
Dept. of Industrial Engineering
3 / 28
2015-07-20
Introduction
•
New data collection method - hybrid method
– human-human dialogue data : (장) spontaneous speech, (단) filler, restart, pause,
overlap, ellipse, hesitation 같은 복잡한 현상을 다루기 힘들다
– human-machine dialogue :(장) simple - registered word, (단) 부자연스럽고, 가용
어휘에서 제한적이다.
– A human takes the role of some modules of the system
– merits : lexicon, grammar, topic
•
Hybrid Method of Data Collection
– An example of classification of data collection (Table 1)
– Another important factor for data collection - mental factor of the users
Factory Automation Lab.
Dept. of Industrial Engineering
4 / 21
2015-07-20
Table1 Method of data collection
Type
A
B
C
D
E
F
G
Module
Speech
Dialogue
recognition management
H
H
H
S
S
S
S
H
S
H
H
H
S
S
Response
generation
H
S
S
S
H
H
S
H:human, S:system
Every type has 2 conditions (SR and HR)
Factory Automation Lab.
Dept. of Industrial Engineering
5 / 21
2015-07-20
Data Collection
•
Method
– method C 를 사용, Car-navigation system을 가지고 수행
– 13명의 일본 학생 참여, 사전 훈련 없음, 6분 정도의 비디오 가이드가 있었음,
SR 6명, HR 7명, (그림 1, 그림 2)
•
Analysis
–
–
–
–
–
–
collected data - 1040
예제 데이터 (표 2)
Lexicon : lexicon은 제한된 단어로 구성되고 작업에 의존적이다.
Grammar : 피험자에 의한 의도는 간결하고 명료하다.
Topics : task domain 밖의 utterance는 거의 없었다.
SR과 HR간의 주요한 차이점은 filler와 pause length였다.
Factory Automation Lab.
Dept. of Industrial Engineering
6 / 21
2015-07-20
Fig.1 An example of Hybrid Method
Input
Human
System
Speech Recognition
Speech Recognition
Dialogue Management
Dialogue Management
Response Generation
Response Generation
output
Factory Automation Lab.
Dept. of Industrial Engineering
7 / 21
2015-07-20
Fig.2 The configuration of experiment
Factory Automation Lab.
Dept. of Industrial Engineering
8 / 21
2015-07-20
Table 2 Example of dialogue
Factory Automation Lab.
Dept. of Industrial Engineering
9 / 21
2015-07-20
Evaluation of a Module Test and Conclusion
•
A Car-Navigation System
– 시스템 구성 (그림 6)
•
Evaluation of the Intention Extractor
– Error 의 원인
• Word spotting, Calculation of words-chains, Decision of an intention label
•
Conclusion
– Hybrid data collection method 에 의한 collected data가 어떤 modification 없이
module evaluation을 위한 test data로 사용될 수 있다.
Factory Automation Lab.
Dept. of Industrial Engineering
10 / 21
2015-07-20
Fig. 6 Modules of our system
Factory Automation Lab.
Dept. of Industrial Engineering
11 / 21
2015-07-20
The role of speech processing in humancomputer intelligent communication
Speech Communication 23 (1997) 263-278
Candace Kamm, Marilyn Walker, Lawrence Rabiner
Speech and Image Processing Services Research Laboratory
AT&T Labs-Research, Florham Park, NJ 07932, USA
Factory Automation Lab.
Dept. of Industrial Engineering
12 / 28
2015-07-20
Introduction
•
Communication에서 발생하는 변화의 주된 양상
– switches from narrowband to broadband
– switches from wireline to combinations of wired and wireless
– expands from people-to-people to people-to-machine
•
장단점이 명확히 이해되어야 하는 기술들
– coding technology
– speech synthesis, speech recognition, spoken language understanding technology
– user interface technology
Factory Automation Lab.
Dept. of Industrial Engineering
13 / 21
2015-07-20
The evolution of telecommunication networks
•
그림 1
– 현재의 network에서의 주된 문제점 : 개별적인 발전으로 인해 POTS와
PACKET network가 약하게 연결되어 있다.
•
그림 2
Factory Automation Lab.
Dept. of Industrial Engineering
14 / 21
2015-07-20
Factory Automation Lab.
Dept. of Industrial Engineering
15 / 21
2015-07-20
Applications of spoken language interfaces
•
•
Interactive Voice Response (IVR) Systems
Application domains for SLIs 는 다음의 두 범위로 나뉠 수 있다.
– Human-computer communications applications
• Personal calendar, stock market quote, movie schedules and review, classified
advertisement, train and airline schedule information
– Computer-mediated human-human communication applications
• voice calling, email or voice mail by phone, speech-to-speech
Factory Automation Lab.
Dept. of Industrial Engineering
16 / 21
2015-07-20
Speech technologies
•
Speech and audio coding technology - mouth
– efficient transmission or storage of speech
– 주된 역할을 하는 영역 : wired telephone network, wireless network, voice
security for privacy and encryption
– speech coder 특징에 대한 네가지 속성 : bit rate, quality, signal delay, complexity
•
Speech synthesis - ear
– text-to-speech (TTS) systems
– evaluated : intelligibility of the resulting speech and the naturalness of the speech
•
Speech recognition and spoken language understanding - brain
– word-for-word transcription 에 대해 word error rate가 좋은 measure가 된다.
– keyword(key phrase)로 classification하는 것에 대해 concept accuracy가 좋은
measure 가 된다.
– AT & T’s CHRONUS system for air travel information
Factory Automation Lab.
Dept. of Industrial Engineering
17 / 21
2015-07-20
Human-computer interface technologies
Design
Evaluate
Build
Experiment
Fig.5. The spoken dialogue development cycle
•
GUI에 대한 세가지 주요 design 원리들
– Continuous representation of the objects and actions of interest
– Rapid incremental, reversible operations whose impact on the object of interest is
immediately visible
– Physical actions or labeled button presses instead of complex syntax with natural
language text commands
– examples
– Expert dialogue and Novice dialogue
Factory Automation Lab.
Dept. of Industrial Engineering
18 / 21
2015-07-20
Future research directions
•
•
•
•
encompass not only the underlying speech technologies, but also include the
integration of the component technologies, the relevant information sources,
and the user interface
provide the capability for TTS systems to have an arbitrary voice, accent and
language so as to customize the system for different applications
try to improve the robustness of ASR to variations in the acoustic environment,
user populations and transmission characteristics of the system
A critical research topic related to the integration of user interface technologies
into dialogue systems is the basic issue of how to evaluate spoken dialogue
systems
Factory Automation Lab.
Dept. of Industrial Engineering
19 / 21
2015-07-20