VoiceXML - Kundan Singh

Download Report

Transcript VoiceXML - Kundan Singh

VoiceXML

and Internet Telephony

Kundan Singh and Henning Schulzrinne Columbia University {kns10,hgs}@cs.columbia.edu

Joint work (in progress) with Daniel, Naho, Visda and Sean.

Overview A language for specifying voice dialogs in interactive voice response systems

• Information retrieval – News, sports, traffic, stock quotes • e-business – Customer service, banking, stock trading • Notification service 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 2

PSTN based IVR Platform

PSTN End user

Welcome to voice mail. Press 3 to listen to new messages...

1-212-8545224 • Receives incoming PSTN 5 call • Responds back with prompts • Accepts user input (DTMF or speech) • Takes action based on user input (Usually the service logic is programmed for the specific application, say weather report)

IVR

1

platform

• Voice and telephony functions 18 April, 2001 (ASR 2 , TTS 3 , DTMF 4 ) • Service logic (application specific) [1] Interactive voice response [2] Automated speech recognition [3] Text to speech [4] Dual tone multi-frequency (touch tone) [5] Public switched telephone network VoiceXML/Kundan Singh/Columbia University 3

Decomposition

End user PSTN Internet Voice gateway

• Voice and telephony functions

End user IVR platform

• Voice and telephony functions (ASR, TTS, DTMF) • Service logic (application specific)

Web server

• Service logic 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 4

VoiceXML

End user DB PSTN Internet Voice gateway VXML

• Voice and telephony functions • VoiceXML browser

HTML End user

Multimedia Scripts Audio/ grammar

Web server

• Service logic (CGI, servlet, JSP) VoiceXML/Kundan Singh/Columbia University 5

Why VoiceXML

• Alternative: write C/C++ application on telephony platforms ?

• Separate application specific service logic (HTML, VoiceXML) and User interaction (browser, IO device) • Can use existing web development tools • Can have single application for both web and voice • Can use existing infrastructure: HTTP, web servers, etc.

• Programming voice services for telephony platforms 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 6

VoiceXML vs HTML

• Phone vs PC; IO  phone • Transport: HTTP • Voice browser vs web browser • VoiceXML vs HTML form

Enter your Id: <

input

18 April, 2001 name=‘id’>
<

field

name=‘id’> Your ID, please.

VoiceXML/Kundan Singh/Columbia University 7

VoiceXML examples [ 1 ]

Hello, World!

18 April, 2001 VoiceXML/Kundan Singh/Columbia University 8

VoiceXML examples [ 2 ]

Welcome to the weather information service. <

field

name=“state”> What state? <

grammar

src=“state.gram” type=“application/x-jsgf”/> <

catch

event=“help”> Please speak the state for which you want the weather. 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 9

VoiceXML examples [ 2 ]

What city? <

help

> Please speak the state for which you want the weather. <

submit

next=“/servet/weather”

namelist

=“city state”/>

Grammar (city.gram): California | Illinois | New Jersey | New York 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 10

VoiceXML examples [ 3 ]

… <

grammar

visa {visa} | master [card] {mastercard} | amex {amex} | american [express] {amex}

grammar

> Please say Visa, Mastercard, or American Express. … 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 11

VoiceXML examples [ 4 ]

Would you like Coffee, Tea, Milk or Nothing. <

option

value=“coffee”>coffee<

/option

>

18 April, 2001 VoiceXML/Kundan Singh/Columbia University 12

VoiceXML examples [ 5 ]

<

menu

> Would you like Coffee, Tea, Milk or Nothing. <

choice

next=“http://…coffee.vxml”>coffee<

/choice

> tea milk nothing <

nomatch

count=“1”>I did not understand what you said. Please say one of coffee, tea, milk or nothing <

noinput

>You must say something. Alternatively: “Would you like <

enumerate

/>” 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 13

Form Interpretation Algorithm

• Initialize variables, counters.

• Main loop – Select phase: select next form – Collect phase: prompt and collect input – Process phase: process the event • Document: collection of forms • An application can use multiple documents 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 14

VoiceXML scope

• Human-Machine Interaction – Audio output (TTS, pre-recorded file) – Audio input (Speech recognition, audio recording) – Character input (DTMF) – Presentation logic (scripting) • Basic Connection Control – disconnect – transfer 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 15

Application scope

• General service logic • State management • Dialog generation • Dialog sequencing • Database operation 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 16

VoiceXML features

• Menus, Forms, Sub-Dialogs • Inputs (grammar, record, dtmf) • Outputs (audio, text-to-speech) • Events (error handling: nomatch, noinput, catch-throw) • Variables and scripting (var, assign, if) • Transition or links (goto, submit) • Transfer to 3rd party (also add third party) • Disconnect the call • Platform specific object, and property • Pre-fetching 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 17

VoiceXML 1.0

assign , audio , block , break , catch , choice , clear , disconnect , div , dtmf , else , elseif , emp , enumerate , error , exit , field , filled , form , goto , grammar , help , if , initial, link , menu , meta , noinput , nomatch , object, option , param , property , pros , record , reprompt , return , sayas , script , subdialog , submit , throw , transfer , value , var , vxml

Telephony , Speech Synthesis or audio output , User input and Grammar , Program flow , Variable and properties , Error handling , Misc.

18 April, 2001 VoiceXML/Kundan Singh/Columbia University 18

Internet Telephony

End user PSTN Internet Voice gateway

Voice and telephony function VoiceXML browser

End user Web server

• Service logic (CGI, servlet, JSP) 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 19

End user

Internet Telephony

PSTN Voice gateway

PSTN/SIP New module VoiceXML browser with SIP 18 April, 2001

SIP user agent Web server

•CGI, servlet, JSP VoiceXML/Kundan Singh/Columbia University

SIP phone

20

Internet Telephony

Web server

(CGI, servlet, JSP) Example: Email by phone, voicemail by phone, directory services for department, web browsing by phone (Not WAP), …

VoiceXML

browser with SIP • Accept SIP connection • Fetch XML page over HTTP • Parse XML • Interpret VoiceXML tags • Do Text-to-speech • Receive and detect user input (DTMF, or in future speech) • Parse according to the grammer • Fetch audio file from web and play to the user . . .

gateway SIP phone

SIP for signaling, RTP for audio, DTMF (either in band audio tones or RFC2833) 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 21

Status

• Email by phone (using TellMe voice browser) • Voice XML browser - on going 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 22