Transcript Slide 1

Information Retrieval using Intelligent Speech Communication Interface

Institute of Informatics of the Slovak Academy of Sciences, Bratislava [email protected]

Overview

1.

2.

3.

4.

5.

Introduction IRKR system Architecture Pilot applications Realization of service

WIKT 2006 2

WIKT 2006

What is a Speech Communicarion Interface (SCI)?

• A SCI, or task Spoken Language Dialog System ( SLDS) is a computer system that you can talk to in order to carry out some • Contemporary SLDSs are typically of two kinds: – Transaction-based systems, allowing to undertake some transaction, such as buying or selling stocks, or reserving a seat on a plane – Information-provision systems, providing information in response to a query, such as a request for timetable information or weather information • The circle of typical speech dialog in SCI shows also main components of SCI

WIKT 2006

The Speech Dialog Circle in SLDS

Speech Speech Text-to-Speech TTS

Which date do you want to fly from Košice to Bratislava?

Response Generation RG Data, Rules Action

GET DEPARTURE DATE

DM Dialog Management ASR Automatic Speech Recognition Words spoken

” I need a flight from Ko šice to Bratislava roundtrip”

SLU Spoken Language Understanding Meaning

ORIGIN_CITY: KOŠICE DESTINATION_CITY: BRATISLAVA FLIGHT_TYPE: ROUNDTRIP

IRKR

• first SLDS which is able to interact in the Slovak language • developed in the period from July 2003 to June 2006 • supported by the National program for R&D “Building of the information society” WIKT 2006 5

WIKT 2006

IRKR - partners

• Technical University of Košice • Institute of Informatics, the Slovak Academy of Sciences • Slovak University of Technology in Bratislava • University of Žilina

WIKT 2006

IRKR - specification

• natural interaction • multi-user interaction • slovak language • fixed and mobile telephone networks • access to distributed information (on internet)

IRKR - architecture

• DARPA Communicator architecture • ‘hub-and-spoke’ • each module seeks services from and provides services to the other modules • modules communicate with them through the central software router - the Galaxy hub • communicator.sourceforge.net

WIKT 2006 8

WIKT 2006

Galaxy – basic overview

Distributed,

message-based

,

hub-and-spoke

infrastructure optimized for constructing spoken dialogue systems; available under a

liberal open source license

; not an end-to-end dialogue system, but provides tools for constructing such a system out of a suite of servers; provides a sophisticated and general transport layer for connecting servers and Hubs, as well as a message syntax (does not provide specifications about semantics); the core Galaxy Communicator infrastructure is written in C; support for defining server and connection initialization functions in C, Python, Java and Allegro Common Lisp.

WIKT 2006

IRKR - architecture

Telephone network

SLDS - Speech Language Dialog System

ASR server Information server Telephony server TTS server

HUB

Dialogue manager Internet VoiceXML

WIKT 2006

Automatic speech recognition server

• conversion of incoming speech to a corresponding text • two speech recognizers of freely available for nonprofit research • ATK - htk.eng.cam.ac.uk/develop/atk.shtml

• SPHINX - cmusphinx.sourceforge.net

• Phoneme acoustic models: • built following REFREC 0.96 training procedure • acoustic features were conventional 39-dimensional MFCCs, including energy and first and second order deltas • 3-state left-to-right HMMs • context dependent (triphone) acoustic models

WIKT 2006

Databases used for ASR training

• SpeechDat-E SK • 1000 speakers, PSTN (office, home, phonebooth) • MobilDat SK • 1100 speakers, GSM networks (office, home, street, vehicle, public building) • Both of them balanced for: age, regional accent, and sex of the speakers • Every speaker pronounced 50 files - numbers, names, dates, money amounts, embedded command words, geographical names, phonetically balanced words, phonetically balanced sentences, Yes/No answers and one longer non-mandatory spontaneous utterance

WIKT 2006

Text-to-speech synthesis

• TTS converts outgoing information in text form to speech • intelligibility , naturalness • we developed two TTS modules using two different approaches: • diphone • intelligible speech • flexible and totally domain–independent • computationally inexpensive • small memory-footprint •sounds a bit robotic and tedious • unit-selection • better naturalness • some problems with intelligibility • limited domain

WIKT 2006

TTS architecture

TEXT Text preprocessor Syntactic - prosodic parser Prosody generation

F0, Energy, duration, ...

text analysis

Orthoepic transcription

SAMPA code

Index of speech segments DB Segment list generation Index of acousticons

preparation

Speech segments DB Prosody matching Segment concatenation Signal Synthesis Acousticons DB

signal processing

SPEECH Diphone synthesizer

TTS server

processing of numerals and abbreviations dictionary data driven phonetic transcription

high level synthesis

HUB

Telephony server

broker channel

GALAXY wrapper phrase cache TTS control block unit selection corpus / SDB unit concatenation audio file

low level synthesis

Unit selection synthetizer

WIKT 2006

Dialogue manager

•The dialogue manager controls the dialogue of the system with the user • The heart of the dialogue manger is the interpreter of VoiceXML mark-up language: • simplifies speech application development • enables distributed application design • accelerates the development of interactive voice response (IVR) environments

WIKT 2006

Dialogue manager architecture

HUB

Dialogue manager

XML parser Grammars handling unit Input interface Voice XML interpreter (core) Output interface Document manager Logging interface ECMAScript unit VoiceXML

WIKT 2006

Audioserver

• provides the whole information system with reliable multiuser connection to the telephone networks • supports telephone hardware - Dialogic D120/41JCT-LSEuro card • The direct (broker) connection between audio server and ASR server or TTS server

WIKT 2006

Dialogue manager architecture

GSM IP ISDN/ PSTN H.323 SIP GSM-GW M-GW BRA/PRA a/b BRA PABX Switch 4..12 ....

Telephony i/o board a/b Audio server

ip network

TTS

HUB

ASR

WIKT 2006

Information server - IS

• IS connects the system to information sources and retrieves information required by the user • special IS for every pilot application – special web wrapper • a rule based ad-hoc IS searching only several predefined web-servers with a relatively well known structure of pages will do a much better job • returning the data in the XML format • caching of results with user defined expiration

WIKT 2006

IS architecture

HUB

Galaxy interface

Integrator

IS Backend

web wrapper web wrapper web wrapper web wrapper

Internet

web source web source web source web source

WIKT 2006

WEB wrapper

• navigation through the web-server • extraction from the web-pages • mapping on to a structured format (XML) • data verification • robust as possible against changes in the web-pages structure Internet

Web wrapper

Navigation module

HTML

Extraction module Mapping module

XML

Data verification

SQL Database

Pilot applications

• “Weather forecast in Slovakia“ • www.meteo.sk

; www.shmu.sk

• weather forecast for about 80 Slovak district towns

Place:

District town or holiday locality

Date:

relative date / accurate date • „Timetable of Slovak Railways“ • www.cp.sk

• information about Slovak railways timetable

Starting place

: railway station in Slovakia Slovakia

Destination place

: railway station in

Date

: relative date (today, tomorrow etc.)/absolute date twentieth of December” etc.) (“the

Time

: departure time (hour, minute) WIKT 2006 22

WIKT 2006

Realization of services

• available at: +421 55 602 2297, +421 2 5941 1118 (T-com), +421 911 650 038 (T Mobile), +421 918 717 491 (Orange), irkr_pub (skype) • IRKR on web - irkr.fei.tuke.sk

Here we show a typical dialogue between the user (U) and the system (S):

S:

Welcome to the IRKR portal. Would you like to play the introduction?

U:

No.

S:

Choose one of the services: Weather forecast or Railway’s timetable.

U:

Weather forecast

S:

Please, name a city and assign a day, for which you want to get the weather forecast.

U:

Bratislava, tomorrow.

S:

Did you say Bratislava, tomorrow?

U:

Yes

S:

The weather forecast for Bratislava for tomorrow is: sunny, 32 centigrade...

Thank you for your attention

WIKT 2006 24