Transcript VoiceXML - Kundan Singh
VoiceXML
and Internet Telephony
Kundan Singh and Henning Schulzrinne Columbia University {kns10,hgs}@cs.columbia.edu
Joint work (in progress) with Daniel, Naho, Visda and Sean.
Overview A language for specifying voice dialogs in interactive voice response systems
• Information retrieval – News, sports, traffic, stock quotes • e-business – Customer service, banking, stock trading • Notification service 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 2
PSTN based IVR Platform
PSTN End user
Welcome to voice mail. Press 3 to listen to new messages...
1-212-8545224 • Receives incoming PSTN 5 call • Responds back with prompts • Accepts user input (DTMF or speech) • Takes action based on user input (Usually the service logic is programmed for the specific application, say weather report)
IVR
1
platform
• Voice and telephony functions 18 April, 2001 (ASR 2 , TTS 3 , DTMF 4 ) • Service logic (application specific) [1] Interactive voice response [2] Automated speech recognition [3] Text to speech [4] Dual tone multi-frequency (touch tone) [5] Public switched telephone network VoiceXML/Kundan Singh/Columbia University 3
Decomposition
End user PSTN Internet Voice gateway
• Voice and telephony functions
End user IVR platform
• Voice and telephony functions (ASR, TTS, DTMF) • Service logic (application specific)
Web server
• Service logic 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 4
VoiceXML
End user DB PSTN Internet Voice gateway VXML
• Voice and telephony functions • VoiceXML browser
HTML End user
Multimedia Scripts Audio/ grammar
Web server
• Service logic (CGI, servlet, JSP) VoiceXML/Kundan Singh/Columbia University 5
Why VoiceXML
• Alternative: write C/C++ application on telephony platforms ?
• Separate application specific service logic (HTML, VoiceXML) and User interaction (browser, IO device) • Can use existing web development tools • Can have single application for both web and voice • Can use existing infrastructure: HTTP, web servers, etc.
• Programming voice services for telephony platforms 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 6
VoiceXML vs HTML
• Phone vs PC; IO phone • Transport: HTTP • Voice browser vs web browser • VoiceXML vs HTML form
18 April, 2001 name=‘id’> name=‘id’>VoiceXML examples [ 1 ]
18 April, 2001 VoiceXML/Kundan Singh/Columbia University 8
VoiceXML examples [ 2 ]
Grammar (city.gram): California | Illinois | New Jersey | New York 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 10VoiceXML examples [ 3 ]
grammar
> visa {visa} | master [card] {mastercard} | amex {amex} | american [express] {amex}
grammar
>
VoiceXML examples [ 4 ]
18 April, 2001 VoiceXML/Kundan Singh/Columbia University 12VoiceXML examples [ 5 ]
<
menu
>
choice
next=“http://…coffee.vxml”>coffee<
/choice
>
nomatch
count=“1”>I did not understand what you said.
noinput
>You must say something. Alternatively: “Would you like <
enumerate
/>” 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 13
Form Interpretation Algorithm
• Initialize variables, counters.
• Main loop – Select phase: select next form – Collect phase: prompt and collect input – Process phase: process the event • Document: collection of forms • An application can use multiple documents 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 14
VoiceXML scope
• Human-Machine Interaction – Audio output (TTS, pre-recorded file) – Audio input (Speech recognition, audio recording) – Character input (DTMF) – Presentation logic (scripting) • Basic Connection Control – disconnect – transfer 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 15
Application scope
• General service logic • State management • Dialog generation • Dialog sequencing • Database operation 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 16
VoiceXML features
• Menus, Forms, Sub-Dialogs • Inputs (grammar, record, dtmf) • Outputs (audio, text-to-speech) • Events (error handling: nomatch, noinput, catch-throw) • Variables and scripting (var, assign, if) • Transition or links (goto, submit) • Transfer to 3rd party (also add third party) • Disconnect the call • Platform specific object, and property • Pre-fetching 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 17
VoiceXML 1.0
assign , audio , block , break , catch , choice , clear , disconnect , div , dtmf , else , elseif , emp , enumerate , error , exit , field , filled , form , goto , grammar , help , if , initial, link , menu , meta , noinput , nomatch , object, option , param , property , pros , record , reprompt , return , sayas , script , subdialog , submit , throw , transfer , value , var , vxml
Telephony , Speech Synthesis or audio output , User input and Grammar , Program flow , Variable and properties , Error handling , Misc.
18 April, 2001 VoiceXML/Kundan Singh/Columbia University 18
Internet Telephony
End user PSTN Internet Voice gateway
Voice and telephony function VoiceXML browser
End user Web server
• Service logic (CGI, servlet, JSP) 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 19
End user
Internet Telephony
PSTN Voice gateway
PSTN/SIP New module VoiceXML browser with SIP 18 April, 2001
SIP user agent Web server
•CGI, servlet, JSP VoiceXML/Kundan Singh/Columbia University
SIP phone
20
Internet Telephony
Web server
(CGI, servlet, JSP) Example: Email by phone, voicemail by phone, directory services for department, web browsing by phone (Not WAP), …
VoiceXML
browser with SIP • Accept SIP connection • Fetch XML page over HTTP • Parse XML • Interpret VoiceXML tags • Do Text-to-speech • Receive and detect user input (DTMF, or in future speech) • Parse according to the grammer • Fetch audio file from web and play to the user . . .
gateway SIP phone
SIP for signaling, RTP for audio, DTMF (either in band audio tones or RFC2833) 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 21
Status
• Email by phone (using TellMe voice browser) • Voice XML browser - on going 18 April, 2001 VoiceXML/Kundan Singh/Columbia University 22