Transcript Slide 1
Information Retrieval using Intelligent Speech Communication Interface
Institute of Informatics of the Slovak Academy of Sciences, Bratislava [email protected]
Overview
1.
2.
3.
4.
5.
Introduction IRKR system Architecture Pilot applications Realization of service
WIKT 2006 2
WIKT 2006
What is a Speech Communicarion Interface (SCI)?
• A SCI, or task Spoken Language Dialog System ( SLDS) is a computer system that you can talk to in order to carry out some • Contemporary SLDSs are typically of two kinds: – Transaction-based systems, allowing to undertake some transaction, such as buying or selling stocks, or reserving a seat on a plane – Information-provision systems, providing information in response to a query, such as a request for timetable information or weather information • The circle of typical speech dialog in SCI shows also main components of SCI
WIKT 2006
The Speech Dialog Circle in SLDS
Speech Speech Text-to-Speech TTS
Which date do you want to fly from Košice to Bratislava?
Response Generation RG Data, Rules Action
GET DEPARTURE DATE
DM Dialog Management ASR Automatic Speech Recognition Words spoken
” I need a flight from Ko šice to Bratislava roundtrip”
SLU Spoken Language Understanding Meaning
ORIGIN_CITY: KOŠICE DESTINATION_CITY: BRATISLAVA FLIGHT_TYPE: ROUNDTRIP
IRKR
• first SLDS which is able to interact in the Slovak language • developed in the period from July 2003 to June 2006 • supported by the National program for R&D “Building of the information society” WIKT 2006 5
WIKT 2006
IRKR - partners
• Technical University of Košice • Institute of Informatics, the Slovak Academy of Sciences • Slovak University of Technology in Bratislava • University of Žilina
WIKT 2006
IRKR - specification
• natural interaction • multi-user interaction • slovak language • fixed and mobile telephone networks • access to distributed information (on internet)
IRKR - architecture
• DARPA Communicator architecture • ‘hub-and-spoke’ • each module seeks services from and provides services to the other modules • modules communicate with them through the central software router - the Galaxy hub • communicator.sourceforge.net
WIKT 2006 8
WIKT 2006
Galaxy – basic overview
Distributed,
message-based
,
hub-and-spoke
infrastructure optimized for constructing spoken dialogue systems; available under a
liberal open source license
; not an end-to-end dialogue system, but provides tools for constructing such a system out of a suite of servers; provides a sophisticated and general transport layer for connecting servers and Hubs, as well as a message syntax (does not provide specifications about semantics); the core Galaxy Communicator infrastructure is written in C; support for defining server and connection initialization functions in C, Python, Java and Allegro Common Lisp.
WIKT 2006
IRKR - architecture
Telephone network
SLDS - Speech Language Dialog System
ASR server Information server Telephony server TTS server
HUB
Dialogue manager Internet VoiceXML
WIKT 2006
Automatic speech recognition server
• conversion of incoming speech to a corresponding text • two speech recognizers of freely available for nonprofit research • ATK - htk.eng.cam.ac.uk/develop/atk.shtml
• SPHINX - cmusphinx.sourceforge.net
• Phoneme acoustic models: • built following REFREC 0.96 training procedure • acoustic features were conventional 39-dimensional MFCCs, including energy and first and second order deltas • 3-state left-to-right HMMs • context dependent (triphone) acoustic models
WIKT 2006
Databases used for ASR training
• SpeechDat-E SK • 1000 speakers, PSTN (office, home, phonebooth) • MobilDat SK • 1100 speakers, GSM networks (office, home, street, vehicle, public building) • Both of them balanced for: age, regional accent, and sex of the speakers • Every speaker pronounced 50 files - numbers, names, dates, money amounts, embedded command words, geographical names, phonetically balanced words, phonetically balanced sentences, Yes/No answers and one longer non-mandatory spontaneous utterance
WIKT 2006
Text-to-speech synthesis
• TTS converts outgoing information in text form to speech • intelligibility , naturalness • we developed two TTS modules using two different approaches: • diphone • intelligible speech • flexible and totally domain–independent • computationally inexpensive • small memory-footprint •sounds a bit robotic and tedious • unit-selection • better naturalness • some problems with intelligibility • limited domain
WIKT 2006
TTS architecture
TEXT Text preprocessor Syntactic - prosodic parser Prosody generation
F0, Energy, duration, ...
text analysis
Orthoepic transcription
SAMPA code
Index of speech segments DB Segment list generation Index of acousticons
preparation
Speech segments DB Prosody matching Segment concatenation Signal Synthesis Acousticons DB
signal processing
SPEECH Diphone synthesizer
TTS server
processing of numerals and abbreviations dictionary data driven phonetic transcription
high level synthesis
HUB
Telephony server
broker channel
GALAXY wrapper phrase cache TTS control block unit selection corpus / SDB unit concatenation audio file
low level synthesis
Unit selection synthetizer
WIKT 2006
Dialogue manager
•The dialogue manager controls the dialogue of the system with the user • The heart of the dialogue manger is the interpreter of VoiceXML mark-up language: • simplifies speech application development • enables distributed application design • accelerates the development of interactive voice response (IVR) environments
WIKT 2006
Dialogue manager architecture
HUB
Dialogue manager
XML parser Grammars handling unit Input interface Voice XML interpreter (core) Output interface Document manager Logging interface ECMAScript unit VoiceXML
WIKT 2006
Audioserver
• provides the whole information system with reliable multiuser connection to the telephone networks • supports telephone hardware - Dialogic D120/41JCT-LSEuro card • The direct (broker) connection between audio server and ASR server or TTS server
WIKT 2006
Dialogue manager architecture
GSM IP ISDN/ PSTN H.323 SIP GSM-GW M-GW BRA/PRA a/b BRA PABX Switch 4..12 ....
Telephony i/o board a/b Audio server
ip network
TTS
HUB
ASR
WIKT 2006
Information server - IS
• IS connects the system to information sources and retrieves information required by the user • special IS for every pilot application – special web wrapper • a rule based ad-hoc IS searching only several predefined web-servers with a relatively well known structure of pages will do a much better job • returning the data in the XML format • caching of results with user defined expiration
WIKT 2006
IS architecture
HUB
Galaxy interface
Integrator
IS Backend
web wrapper web wrapper web wrapper web wrapper
Internet
web source web source web source web source
WIKT 2006
WEB wrapper
• navigation through the web-server • extraction from the web-pages • mapping on to a structured format (XML) • data verification • robust as possible against changes in the web-pages structure Internet
Web wrapper
Navigation module
HTML
Extraction module Mapping module
XML
Data verification
SQL Database
Pilot applications
• “Weather forecast in Slovakia“ • www.meteo.sk
; www.shmu.sk
• weather forecast for about 80 Slovak district towns
Place:
District town or holiday locality
Date:
relative date / accurate date • „Timetable of Slovak Railways“ • www.cp.sk
• information about Slovak railways timetable
Starting place
: railway station in Slovakia Slovakia
Destination place
: railway station in
Date
: relative date (today, tomorrow etc.)/absolute date twentieth of December” etc.) (“the
Time
: departure time (hour, minute) WIKT 2006 22
WIKT 2006
Realization of services
• available at: +421 55 602 2297, +421 2 5941 1118 (T-com), +421 911 650 038 (T Mobile), +421 918 717 491 (Orange), irkr_pub (skype) • IRKR on web - irkr.fei.tuke.sk
Here we show a typical dialogue between the user (U) and the system (S):
S:
Welcome to the IRKR portal. Would you like to play the introduction?
U:
No.
S:
Choose one of the services: Weather forecast or Railway’s timetable.
U:
Weather forecast
S:
Please, name a city and assign a day, for which you want to get the weather forecast.
U:
Bratislava, tomorrow.
S:
Did you say Bratislava, tomorrow?
U:
Yes
S:
The weather forecast for Bratislava for tomorrow is: sunny, 32 centigrade...
Thank you for your attention
WIKT 2006 24