MRCPv2 -- the End of Proprietary Speech APIs?

Download Report

Transcript MRCPv2 -- the End of Proprietary Speech APIs?

MRCPv2 – the end of proprietary speech APIs?
1
Daniel C. Burnett
SpeechTek West 2007
Overview
• What is MRCP?
• Why MRCP?
• Why MRCPv2?
• Why an IETF protocol?/Status
• Relationship to other standards
• Features of MRCP
• Sample call flow with ASR/TTS
SpeechTek West 2007
2
What is MRCP?
• IETF Protocol allowing a client to control the server’s
ASR, TTS, Recording, and SIV resources
• A standard, programming-language agnostic API for using
ASR, TTS, and SIV resources
SpeechTek West 2007
3
Why MRCP?
• Pre-MRCP
– Every ASR and TTS vendor has a proprietary API
– Some vendors support Microsoft’s SAPI
– Some vendors support JSAPI
• Today: every major ASR and TTS vendor supports MRCP
SpeechTek West 2007
4
Why MRCPv2?
• MRCP v1
– Was designed by Cisco, Nuance, and SpeechWorks
– “Tunneled over” RTSP
– IETF draft but not IETF standard
(http://www.ietf.org/rfc/rfc4463.txt)
• MRCP v2
– Designed in a public forum by
• Multiple ASR/TTS vendors
• Multiple technology integrators
• Multiple VoiceXML implementers
– “Top-level” application protocol similar to HTTP
– IETF standards-track document
SpeechTek West 2007
5
Why an IETF protocol?/Status
• IETF protocols are
–
–
–
–
Implementation programming language agnostic
Public
Widely reviewed
Well-respected
• Status
– Developed in SPEECHSC Working Group (Real-time Applications
area)
– Published as Work Group Last Call (http://www.ietf.org/internetdrafts/draft-ietf-speechsc-mrcpv2-11.txt)
SpeechTek West 2007
6
Relationship to other standards
• TCP: carrier for MRCP messages
• SIP: used to setup calls
• RTP: carries MRCP-controlled media
• VoiceXML: higher-level language for ASR/TTS that is
often built on top of an MRCP client
• IMS: framework that allows mobile phones to use MRCPcontrolled resources
• SRGS, SSML: ASR grammars and TTS controls that
MRCP clients can use to configure ASR/TTS resources
• TLS: secure alternative to TCP for carrying MRCP
SpeechTek West 2007
7
Features of MRCP
• Control of
–
–
–
–
Synthesizer resource
Recognizer resource
Recorder resource
Speaker Identification and Verification resource
• Optional control channel sharing among resources
SpeechTek West 2007
8
Synthesizer
• Two resource types
– “basicsynth”: concatenated audio clips only
– “speechsynth”: full SSML support
• Capabilities
– Start/stop/pause/resume speaking
– Optional stop on barge-in
– Live notification of <mark> encounters
SpeechTek West 2007
9
Recognizer
• Two resource types
– “speechrecog”: full speech and dtmf recognition with user-enrolled
phrases
– “dtmfrecog”: dtmf digit string recognition only
• Capabilities
–
–
–
–
–
–
–
Start/stop recognition
Support for SRGS grammars
Interpretation of text string
Hotword mode capability (listen until match)
Voice- (user-) enrolled phrases
Recording of recognized audio
Barge-in support
SpeechTek West 2007
10
Recorder
• One resource type
– “recorder”
• Capabilities
–
–
–
–
Start/stop recording
Barge-in support
Optional speech activity detection
Optional automatic end trimming
SpeechTek West 2007
11
SIV
• One resource type
– “speakverify”
• Capabilities
–
–
–
–
Verification and identification using one or multiple utterances
Simultaneous verification and recognition or recording
Verification using live or buffered utterances
Voiceprint creation, querying, and deletion
SpeechTek West 2007
12
NLSML
• XML data format
• Carries results from the MRCP server
– Can store simultaneous recognition, enrollment, and verification
results
• W3C’s EMMA is a future replacement for this format
SpeechTek West 2007
13
Sample call flow with ASR/TTS
• Setup
– Client contacts server using SIP
– Setup of synthesizer resource
– Setup of recognizer resource
• Play
– Client issues SPEAK request
– <mark> and SPEAK completion
• Play & Recognize (with barge-in)
–
–
–
–
Client issues RECOGNIZE request
Client issues bargeable SPEAK request
Barge-in occurs
Server returns result
• Teardown
– Client closes session
SpeechTek West 2007
14
Client contacts server using SIP
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
C->S:
INVITE sip:[email protected] SIP/2.0
Max-Forwards:6
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314159 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:142
v=0
o=sarvi 2890844526 2890842807 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Setup
Play
Play & Recognize
Teardown
S->C:
SIP/2.0 200 OK
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314159 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:131
v=0
o=sarvi 2890844526 2890842807 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
C->S:
ACK sip:[email protected] SIP/2.0
Max-Forwards:6
To:MediaServer <sip:[email protected]>;tag=a6c85cf
From:Sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314160 ACK
Content-Length:0
SpeechTek West 2007
15
Client contacts server using SIP
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
C->S:
INVITE sip:[email protected] SIP/2.0
Max-Forwards:6
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314159 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:142
v=0
o=sarvi 2890844526 2890842807 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
C->S:
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Setup
Play
Play & Recognize
Teardown
S->C:
SIP/2.0 200 OK
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314159 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:131
v=0
o=sarvi 2890844526 2890842807 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
INVITE sip:[email protected] SIP/2.0
Max-Forwards:6
To:MediaServer <sip:[email protected]>
C->S:
ACK sip:[email protected] SIP/2.0
From:sarvi <sip:[email protected]>;tag=1928301774
Max-Forwards:6
To:MediaServer <sip:[email protected]>;tag=a6c85cf
From:Sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314160 ACK
Content-Length:0
SpeechTek West 2007
16
15
Setup
Play
Play & Recognize
Teardown
Client contacts server using SIP
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
C->S:
INVITE sip:[email protected] SIP/2.0
Max-Forwards:6
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314159 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:142
v=0
o=sarvi 2890844526 2890842807 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
•
•
•
•
•
•
•
•
•
•
•
•
•
•
S->C:
SIP/2.0 200 OK
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314159 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:131
v=0
o=sarvi 2890844526 2890842807 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
S->C:
•
C->S:
•
ACK
sip:[email protected]
SIP/2.0
200 OK SIP/2.0
•
Max-Forwards:6
To:MediaServer
<sip:[email protected]>
•
To:MediaServer
<sip:[email protected]>;tag=a6c85cf
•
From:Sarvi <sip:[email protected]>;tag=1928301774
From:sarvi <sip:[email protected]>;tag=1928301774
•
Call-ID:a84b4c76e66710
•
•
CSeq:314160 ACK
Content-Length:0
SpeechTek West 2007
17
15
Client contacts server using SIP
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
C->S:
INVITE sip:[email protected] SIP/2.0
Max-Forwards:6
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314159 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:142
C->S:
•
•
•
•
•
•
•
•
•
Setup
Play
Play & Recognize
Teardown
S->C:
SIP/2.0 200 OK
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314159 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:131
ACK sip:[email protected] SIP/2.0
Max-Forwards:6
To:MediaServer <sip:[email protected]>;tag=a6c85cf
From:Sarvi <sip:[email protected]>;tag=1928301774
•
v=0
v=0
o=sarvi 2890844526 2890842807 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
•
•
•
•
o=sarvi 2890844526 2890842807 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
C->S:
ACK sip:[email protected] SIP/2.0
Max-Forwards:6
To:MediaServer <sip:[email protected]>;tag=a6c85cf
From:Sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314160 ACK
Content-Length:0
SpeechTek West 2007
18
15
Setup
Play
Play & Recognize
Teardown
Setup of synthesizer resource
•
•
•
•
•
•
•
•
•
•
C->S:
INVITE sip:[email protected] SIP/2.0
Max-Forwards:6
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314161 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:142
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
v=0
o=sarvi 2890844526 2890842808 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
m=application 9 TCP/MRCPv2
a=setup:active
a=connection:existing
a=resource:speechsynth
a=cmid:1
m=audio 49170 RTP/AVP 0 96
a=rtpmap:0 pcmu/8000
a=recvonly
a=mid:1
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
S->C:
SIP/2.0 200 OK
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314161 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:131
v=0
o=sarvi 2890844526 2890842808 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
m=application 32416 TCP/MRCPv2
a=setup:passive
a=connection:existing
a=channel:32AECB23433801@speechsynth
a=cmid:1
m=audio 48260 RTP/AVP 0
a=rtpmap:0 pcmu/8000
a=sendonly
a=mid:1
C->S:
ACK sip:[email protected] SIP/2.0
Max-Forwards:6
To:MediaServer <sip:[email protected]>;tag=a6c85cf
From:Sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314162 ACK
Content-Length:0
SpeechTek West 2007
19
16
Setup
Play
Play & Recognize
Teardown
Setup of synthesizer resource
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
C->S:
INVITE sip:[email protected] SIP/2.0
Max-Forwards:6
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314161 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:142
v=0
o=sarvi 2890844526 2890842808 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
m=application 9 TCP/MRCPv2
a=setup:active
a=connection:existing
a=resource:speechsynth
a=cmid:1
m=audio 49170 RTP/AVP 0 96
a=rtpmap:0 pcmu/8000
a=recvonly
a=mid:1
C->S:
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
S->C:
SIP/2.0 200 OK
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314161 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:131
v=0
o=sarvi 2890844526 2890842808 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
m=application 32416 TCP/MRCPv2
a=setup:passive
a=connection:existing
a=channel:32AECB23433801@speechsynth
a=cmid:1
m=audio 48260 RTP/AVP 0
a=rtpmap:0 pcmu/8000
a=sendonly
a=mid:1
INVITE sip:[email protected] SIP/2.0
…
m=application 9 TCP/MRCPv2
a=setup:active
a=connection:existing
C->S:
ACK sip:[email protected] SIP/2.0
a=resource:speechsynth
Max-Forwards:6
To:MediaServer <sip:[email protected]>;tag=a6c85cf
m=audio 49170 RTP/AVP 0 96
From:Sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
a=rtpmap:0 pcmu/8000
CSeq:314162 ACK
Content-Length:0
a=recvonly
SpeechTek West 2007
20
16
Setup
Play
Play & Recognize
Teardown
Setup of synthesizer resource
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
C->S:
INVITE sip:[email protected] SIP/2.0
Max-Forwards:6
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314161 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:142
S->C:
•
•
•
•
•
•
•
•
v=0
o=sarvi 2890844526 2890842808 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
m=application 9 TCP/MRCPv2
a=setup:active
a=connection:existing
a=resource:speechsynth
a=cmid:1
m=audio 49170 RTP/AVP 0 96
a=rtpmap:0 pcmu/8000
a=recvonly
a=mid:1
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
S->C:
SIP/2.0 200 OK
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314161 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:131
v=0
o=sarvi 2890844526 2890842808 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
m=application 32416 TCP/MRCPv2
a=setup:passive
a=connection:existing
a=channel:32AECB23433801@speechsynth
a=cmid:1
m=audio 48260 RTP/AVP 0
a=rtpmap:0 pcmu/8000
a=sendonly
a=mid:1
SIP/2.0 200 OK
…
m=application 32416 TCP/MRCPv2
a=setup:passive
a=connection:existing
a=channel:32AECB23433801@speechsynth
C->S:
ACK sip:[email protected] SIP/2.0
m=audio
48260 RTP/AVP 0
Max-Forwards:6
To:MediaServer <sip:[email protected]>;tag=a6c85cf
a=rtpmap:0
pcmu/8000
From:Sarvi
<sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314162
ACK
a=sendonly
Content-Length:0
SpeechTek West 2007
21
16
Setup
Play
Play & Recognize
Teardown
Setup of synthesizer resource
•
•
•
•
•
•
•
•
•
•
C->S:
INVITE sip:[email protected] SIP/2.0
Max-Forwards:6
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314161 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:142
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
v=0
o=sarvi 2890844526 2890842808 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
m=application 9 TCP/MRCPv2
a=setup:active
a=connection:existing
a=resource:speechsynth
a=cmid:1
m=audio 49170 RTP/AVP 0 96
a=rtpmap:0 pcmu/8000
a=recvonly
a=mid:1
C->S:
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
S->C:
SIP/2.0 200 OK
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314161 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:131
v=0
o=sarvi 2890844526 2890842808 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
m=application 32416 TCP/MRCPv2
a=setup:passive
a=connection:existing
a=channel:32AECB23433801@speechsynth
a=cmid:1
m=audio 48260 RTP/AVP 0
a=rtpmap:0 pcmu/8000
a=sendonly
a=mid:1
ACK sip:[email protected] SIP/2.0
…
C->S:
ACK sip:[email protected] SIP/2.0
Max-Forwards:6
To:MediaServer <sip:[email protected]>;tag=a6c85cf
From:Sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314162 ACK
Content-Length:0
SpeechTek West 2007
22
16
Setup
Play
Play & Recognize
Teardown
Setup of recognizer resource
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
C->S:
INVITE sip:[email protected] SIP/2.0
Max-Forwards:6
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314163 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:142
v=0
o=sarvi 2890844526 2890842809 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
m=application 9 TCP/MRCPv2
a=setup:active
a=connection:existing
a=resource:speechsynth
a=cmid:1
m=audio 49170 RTP/AVP 0 96
a=rtpmap:0 pcmu/8000
a=recvonly
a=mid:1
m=application 9 TCP/MRCPv2
a=setup:active
a=connection:existing
a=resource:speechrecog
a=cmid:2
m=audio 49180 RTP/AVP 0 96
a=rtpmap:0 pcmu/8000
a=rtpmap:96 telephone-event/8000
a=fmtp:96 0-15
a=sendonly
a=mid:2
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
S->C:
SIP/2.0 200 OK
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314163 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:131
v=0
o=sarvi 2890844526 2890842809 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
m=application 32416 TCP/MRCPv2
a=channel:32AECB23433801@speechsynth
a=cmid:1
m=audio 48260 RTP/AVP 0
a=rtpmap:0 pcmu/8000
a=sendonly
a=mid:1
m=application 32416 TCP/MRCPv2
a=channel:32AECB23433801@speechrecog
a=cmid:2
m=audio 48260 RTP/AVP 0
a=rtpmap:0 pcmu/8000
a=rtpmap:96 telephone-event/8000
a=fmtp:96 0-15
a=recvonly
a=mid:2
Note: final C->S ack not shown
SpeechTek West 2007
23
17
Setup
Play
Play & Recognize
Teardown
Setup of recognizer resource
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
C->S:
INVITE sip:[email protected] SIP/2.0
Max-Forwards:6
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314163 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:142
C->S:
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
S->C:
SIP/2.0 200 OK
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314163 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:131
v=0
o=sarvi 2890844526 2890842809 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
m=application 32416 TCP/MRCPv2
a=channel:32AECB23433801@speechsynth
a=cmid:1
m=audio 48260 RTP/AVP 0
a=rtpmap:0 pcmu/8000
a=sendonly
a=mid:1
m=application 32416 TCP/MRCPv2
a=channel:32AECB23433801@speechrecog
a=cmid:2
m=audio 48260 RTP/AVP 0
a=rtpmap:0 pcmu/8000
a=rtpmap:96 telephone-event/8000
a=fmtp:96 0-15
a=recvonly
a=mid:2
INVITE sip:[email protected] SIP/2.0
…
(same synth lines as before, plus the following)
m=application 9 TCP/MRCPv2
a=setup:active
a=connection:existing
a=resource:speechrecog
m=audio 49180 RTP/AVP 0 96
a=rtpmap:0 pcmu/8000
a=rtpmap:96 telephone-event/8000
a=fmtp:96 0-15
a=sendonly Note: final C->S ack not shown
v=0
o=sarvi 2890844526 2890842809 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
m=application 9 TCP/MRCPv2
a=setup:active
a=connection:existing
a=resource:speechsynth
a=cmid:1
m=audio 49170 RTP/AVP 0 96
a=rtpmap:0 pcmu/8000
a=recvonly
a=mid:1
m=application 9 TCP/MRCPv2
a=setup:active
a=connection:existing
a=resource:speechrecog
a=cmid:2
m=audio 49180 RTP/AVP 0 96
a=rtpmap:0 pcmu/8000
a=rtpmap:96 telephone-event/8000
a=fmtp:96 0-15
a=sendonly
a=mid:2
SpeechTek West 2007
24
17
Setup
Play
Play & Recognize
Teardown
Setup of recognizer resource
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
C->S:
S->C:
INVITE sip:[email protected] SIP/2.0
Max-Forwards:6
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314163 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:142
v=0
o=sarvi 2890844526 2890842809 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
m=application 9 TCP/MRCPv2
a=setup:active
a=connection:existing
a=resource:speechsynth
a=cmid:1
m=audio 49170 RTP/AVP 0 96
a=rtpmap:0 pcmu/8000
a=recvonly
a=mid:1
m=application 9 TCP/MRCPv2
a=setup:active
a=connection:existing
a=resource:speechrecog
a=cmid:2
m=audio 49180 RTP/AVP 0 96
a=rtpmap:0 pcmu/8000
a=rtpmap:96 telephone-event/8000
a=fmtp:96 0-15
a=sendonly
a=mid:2
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
S->C:
SIP/2.0 200 OK
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314163 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:131
v=0
o=sarvi 2890844526 2890842809 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
m=application 32416 TCP/MRCPv2
a=channel:32AECB23433801@speechsynth
a=cmid:1
m=audio 48260 RTP/AVP 0
a=rtpmap:0 pcmu/8000
a=sendonly
a=mid:1
m=application 32416 TCP/MRCPv2
a=channel:32AECB23433801@speechrecog
a=cmid:2
m=audio 48260 RTP/AVP 0
a=rtpmap:0 pcmu/8000
a=rtpmap:96 telephone-event/8000
a=fmtp:96 0-15
a=recvonly
a=mid:2
SIP/2.0 200 OK
…
(same synth lines as before, plus the following)
m=application 32416 TCP/MRCPv2
a=channel:32AECB23433801@speechrecog
m=audio 48260 RTP/AVP 0
a=rtpmap:0 pcmu/8000
a=rtpmap:96 telephone-event/8000
a=fmtp:96 0-15
Note: final C->S ack not shown
a=recvonly
SpeechTek West 2007
25
17
Setup
Play
Play & Recognize
Teardown
Setup of recognizer resource
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
C->S:
INVITE sip:[email protected] SIP/2.0
Max-Forwards:6
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314163 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:142
v=0
o=sarvi 2890844526 2890842809 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
m=application 9 TCP/MRCPv2
a=setup:active
a=connection:existing
a=resource:speechsynth
a=cmid:1
m=audio 49170 RTP/AVP 0 96
a=rtpmap:0 pcmu/8000
a=recvonly
a=mid:1
m=application 9 TCP/MRCPv2
a=setup:active
a=connection:existing
a=resource:speechrecog
a=cmid:2
m=audio 49180 RTP/AVP 0 96
a=rtpmap:0 pcmu/8000
a=rtpmap:96 telephone-event/8000
a=fmtp:96 0-15
a=sendonly
a=mid:2
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
S->C:
SIP/2.0 200 OK
To:MediaServer <sip:[email protected]>
From:sarvi <sip:[email protected]>;tag=1928301774
Call-ID:a84b4c76e66710
CSeq:314163 INVITE
Contact:<sip:[email protected]>
Content-Type:application/sdp
Content-Length:131
v=0
o=sarvi 2890844526 2890842809 IN IP4 126.16.64.4
s=SDP Seminar
i=A session for processing media
c=IN IP4 224.2.17.12/127
m=application 32416 TCP/MRCPv2
a=channel:32AECB23433801@speechsynth
a=cmid:1
m=audio 48260 RTP/AVP 0
a=rtpmap:0 pcmu/8000
a=sendonly
a=mid:1
m=application 32416 TCP/MRCPv2
a=channel:32AECB23433801@speechrecog
a=cmid:2
m=audio 48260 RTP/AVP 0
a=rtpmap:0 pcmu/8000
a=rtpmap:96 telephone-event/8000
a=fmtp:96 0-15
a=recvonly
a=mid:2
Note: final C->S ack not shown
SpeechTek West 2007
26
17
Setup
Play
Play & Recognize
Teardown
Client issues SPEAK request
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
C->S:
MRCP/2.0 386 SPEAK 543257
Channel-Identifier:32AECB23433801@speechsynth
Kill-On-Barge-In:false
Voice-gender:neutral
Voice-age:25
Prosody-volume:medium
Content-Type:application/ssml+xml
Content-Length:104
•
•
•
•
S->C:
MRCP/2.0 49 543257 200 IN-PROGRESS
Channel-Identifier:32AECB23433801@speechsynth
Speech-Marker:timestamp=857205015059
<?xml version="1.0"?>
<speak version="1.0"
xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
xml:lang="en-US">
<p>
<s>You have 4 new messages.</s>
<s>The first is from Stephanie Williams
<mark name="Stephanie"/>
and arrived at <break/>
<say-as interpret-as="vxml:time">0345p</say-as>.</s>
<s>The subject is <prosody
rate="-20%">ski trip</prosody></s>
</p>
</speak>
SpeechTek West 2007
27
18
Setup
Play
Play & Recognize
Teardown
Client issues SPEAK request
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
C->S:
MRCP/2.0 386 SPEAK 543257
Channel-Identifier:32AECB23433801@speechsynth
Kill-On-Barge-In:false
Voice-gender:neutral
Voice-age:25
Prosody-volume:medium
Content-Type:application/ssml+xml
Content-Length:104
•
•
•
•
S->C:
MRCP/2.0 49 543257 200 IN-PROGRESS
Channel-Identifier:32AECB23433801@speechsynth
Speech-Marker:timestamp=857205015059
<?xml version="1.0"?>
<speak version="1.0"
xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
xml:lang="en-US">
<p>
<s>You have 4 new messages.</s>
<s>The first is from Stephanie Williams
<mark name="Stephanie"/>
and arrived at <break/>
<say-as interpret-as="vxml:time">0345p</say-as>.</s>
<s>The subject is <prosody
rate="-20%">ski trip</prosody></s>
</p>
</speak>
C->S:
MRCP/2.0 386 SPEAK 543257
Channel-Identifier:32AECB23433801@speechsynth
Kill-On-Barge-In:false
Voice-gender:neutral
Voice-age:25
Prosody-volume:medium
Content-Type:application/ssml+xml
Content-Length:104
SpeechTek West 2007
28
18
Setup
Play
Play & Recognize
Teardown
<?xml version="1.0"?>
<speak
version="1.0"
Client issues
SPEAK
request
xmlns="http://www.w3.org/2001/10/synthesis"
•
C->S:
•
S->C:
xmlns:xsi="http://www.w3.org/2001/XMLSchema•
MRCP/2.0 386 SPEAK 543257
•
MRCP/2.0 49 543257 200 IN-PROGRESS
•
Channel-Identifier:32AECB23433801@speechsynth
•
Channel-Identifier:32AECB23433801@speechsynth
instance"
•
Kill-On-Barge-In:false
•
Speech-Marker:timestamp=857205015059
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Voice-gender:neutral
Voice-age:25
Prosody-volume:medium
Content-Type:application/ssml+xml
Content-Length:104
xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
xml:lang="en-US">
<?xml version="1.0"?>
<speak version="1.0" <p>
xmlns="http://www.w3.org/2001/10/synthesis"
<s>You have 4 new messages.</s>
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
<s>The first is from Stephanie Williams
http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
xml:lang="en-US"> <mark name="Stephanie"/>
<p>
and arrived at <break/>
<s>You have 4 new messages.</s>
<s>The first is from Stephanie
Williams interpret-as="vxml:time">0345p</say-as>.</s>
<say-as
<mark name="Stephanie"/>
<s>The subject is <prosody
and arrived at <break/>
<say-as interpret-as="vxml:time">0345p</say-as>.</s>
rate="-20%">ski trip</prosody></s>
<s>The subject is <prosody
rate="-20%">ski trip</prosody></s>
</p>
</p>
</speak>
</speak>
SpeechTek West 2007
29
18
Setup
Play
Play & Recognize
Teardown
<?xml version="1.0"?>
<speak
version="1.0"
Client issues
SPEAK
request
xmlns="http://www.w3.org/2001/10/synthesis"
•
C->S:
•
S->C:
xmlns:xsi="http://www.w3.org/2001/XMLSchema•
MRCP/2.0 386 SPEAK 543257
•
MRCP/2.0 49 543257 200 IN-PROGRESS
•
Channel-Identifier:32AECB23433801@speechsynth
•
Channel-Identifier:32AECB23433801@speechsynth
instance"
•
Kill-On-Barge-In:false
•
Speech-Marker:timestamp=857205015059
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Voice-gender:neutral
Voice-age:25
Prosody-volume:medium
Content-Type:application/ssml+xml
Content-Length:104
xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
xml:lang="en-US">
<?xml version="1.0"?>
<speak version="1.0" <p>
xmlns="http://www.w3.org/2001/10/synthesis"
<s>You have 4 new messages.</s>
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
<s>The first is from Stephanie Williams
http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
xml:lang="en-US"> <mark name="Stephanie"/>
<p>
and arrived at <break/>
<s>You have 4 new messages.</s>
<s>The first is from Stephanie
Williams interpret-as="vxml:time">0345p</say-as>.</s>
<say-as
<mark name="Stephanie"/>
<s>The subject is <prosody
and arrived at <break/>
<say-as interpret-as="vxml:time">0345p</say-as>.</s>
rate="-20%">ski trip</prosody></s>
<s>The subject is <prosody
rate="-20%">ski trip</prosody></s>
</p>
</p>
</speak>
</speak>
SpeechTek West 2007
30
18
Setup
Play
Play & Recognize
Teardown
<?xml version="1.0"?>
<speak
version="1.0"
Client issues
SPEAK
request
xmlns="http://www.w3.org/2001/10/synthesis"
•
C->S:
•
S->C:
xmlns:xsi="http://www.w3.org/2001/XMLSchema•
MRCP/2.0 386 SPEAK 543257
•
MRCP/2.0 49 543257 200 IN-PROGRESS
•
Channel-Identifier:32AECB23433801@speechsynth
•
Channel-Identifier:32AECB23433801@speechsynth
instance"
•
Kill-On-Barge-In:false
•
Speech-Marker:timestamp=857205015059
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Voice-gender:neutral
Voice-age:25
Prosody-volume:medium
Content-Type:application/ssml+xml
Content-Length:104
xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
xml:lang="en-US">
<?xml version="1.0"?>
<speak version="1.0" <p>
xmlns="http://www.w3.org/2001/10/synthesis"
<s>You have 4 new messages.</s>
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
<s>The first is from Stephanie Williams
http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
xml:lang="en-US"> <mark name="Stephanie"/>
<p>
and arrived at <break/>
<s>You have 4 new messages.</s>
<s>The first is from Stephanie
Williams interpret-as="vxml:time">0345p</say-as>.</s>
<say-as
<mark name="Stephanie"/>
<s>The subject is <prosody
and arrived at <break/>
<say-as interpret-as="vxml:time">0345p</say-as>.</s>
rate="-20%">ski trip</prosody></s>
<s>The subject is <prosody
rate="-20%">ski trip</prosody></s>
</p>
</p>
</speak>
</speak>
SpeechTek West 2007
31
18
Setup
Play
Play & Recognize
Teardown
Client issues SPEAK request
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
C->S:
S->C:
MRCP/2.0 386 SPEAK 543257
Channel-Identifier:32AECB23433801@speechsynth
Kill-On-Barge-In:false
Voice-gender:neutral
Voice-age:25
Prosody-volume:medium
Content-Type:application/ssml+xml
Content-Length:104
•
•
•
•
S->C:
MRCP/2.0 49 543257 200 IN-PROGRESS
Channel-Identifier:32AECB23433801@speechsynth
Speech-Marker:timestamp=857205015059
<?xml version="1.0"?>
<speak version="1.0"
xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
xml:lang="en-US">
<p>
<s>You have 4 new messages.</s>
<s>The first is from Stephanie Williams
<mark name="Stephanie"/>
and arrived at <break/>
<say-as interpret-as="vxml:time">0345p</say-as>.</s>
<s>The subject is <prosody
rate="-20%">ski trip</prosody></s>
</p>
</speak>
MRCP/2.0 49 543257 200 IN-PROGRESS
Channel-Identifier:32AECB23433801@speechsynth
Speech-Marker:timestamp=857205015059
SpeechTek West 2007
32
18
<mark> and SPEAK completion
•
Setup
Play
Play & Recognize
Teardown
S->C: MRCP/2.0 46 SPEECH-MARKER 543257 IN-PROGRESS
•
Channel-Identifier:32AECB23433801@speechsynth
•
Speech-Marker:timestamp=857206027059;Stephanie
•
S->C: MRCP/2.0 48 SPEAK-COMPLETE 543257 COMPLETE
•
Channel-Identifier:32AECB23433801@speechsynth
•
Speech-Marker:timestamp=857207685213;Stephanie
SpeechTek West 2007
33
19
<mark> and SPEAK completion
•
Setup
Play
Play & Recognize
Teardown
S->C: MRCP/2.0 46 SPEECH-MARKER 543257 IN-PROGRESS
•
Channel-Identifier:32AECB23433801@speechsynth
•
Speech-Marker:timestamp=857206027059;Stephanie
•
S->C: MRCP/2.0 48 SPEAK-COMPLETE 543257 COMPLETE
•
Channel-Identifier:32AECB23433801@speechsynth
•
Speech-Marker:timestamp=857207685213;Stephanie
SpeechTek West 2007
34
19
<mark> and SPEAK completion
•
Setup
Play
Play & Recognize
Teardown
S->C: MRCP/2.0 46 SPEECH-MARKER 543257 IN-PROGRESS
•
Channel-Identifier:32AECB23433801@speechsynth
•
Speech-Marker:timestamp=857206027059;Stephanie
•
S->C: MRCP/2.0 48 SPEAK-COMPLETE 543257 COMPLETE
•
Channel-Identifier:32AECB23433801@speechsynth
•
Speech-Marker:timestamp=857207685213;Stephanie
SpeechTek West 2007
35
19
Client issues RECOGNIZE request
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
C->S: MRCP/2.0 343 RECOGNIZE 543258
•
Channel-Identifier:32AECB23433801@speechrecog •
Content-Type:application/srgs+xml
Content-Length:104
Setup
Play
Play & Recognize
Teardown
S->C: MRCP/2.0 49 543258 200 IN-PROGRESS
Channel-Identifier:32AECB23433801@speechrecog
<?xml version="1.0"?>
<!-- the default grammar language is US English -->
<grammar xmlns="http://www.w3.org/2001/06/grammar"
xml:lang="en-US" version="1.0" root="request">
<!-- single language attachment to a rule expansion -->
<rule id="request">
Can I speak to
<one-of xml:lang="fr-CA">
<item>Michel Tremblay</item>
<item>Andre Roy</item>
</one-of>
</rule>
</grammar>
SpeechTek West 2007
36
20
Client issues RECOGNIZE request
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
C->S: MRCP/2.0 343 RECOGNIZE 543258
•
Channel-Identifier:32AECB23433801@speechrecog •
Content-Type:application/srgs+xml
Content-Length:104
Setup
Play
Play & Recognize
Teardown
S->C: MRCP/2.0 49 543258 200 IN-PROGRESS
Channel-Identifier:32AECB23433801@speechrecog
<?xml version="1.0"?>
<!-- the default grammar language is US English -->
<grammar xmlns="http://www.w3.org/2001/06/grammar"
xml:lang="en-US" version="1.0" root="request">
<!-- single language attachment to a rule expansion -->
<rule id="request">
Can I speak to
<one-of xml:lang="fr-CA">
<item>Michel Tremblay</item>
<item>Andre Roy</item>
</one-of>
</rule>
</grammar>
SpeechTek West 2007
37
20
Client issues RECOGNIZE request
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
C->S: MRCP/2.0 343 RECOGNIZE 543258
•
Channel-Identifier:32AECB23433801@speechrecog •
Content-Type:application/srgs+xml
Content-Length:104
Setup
Play
Play & Recognize
Teardown
S->C: MRCP/2.0 49 543258 200 IN-PROGRESS
Channel-Identifier:32AECB23433801@speechrecog
<?xml version="1.0"?>
<!-- the default grammar language is US English -->
<grammar xmlns="http://www.w3.org/2001/06/grammar"
xml:lang="en-US" version="1.0" root="request">
<!-- single language attachment to a rule expansion -->
<rule id="request">
Can I speak to
<one-of xml:lang="fr-CA">
<item>Michel Tremblay</item>
<item>Andre Roy</item>
</one-of>
</rule>
</grammar>
SpeechTek West 2007
38
20
Client issues bargeable SPEAK request
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
C->S: MRCP/2.0 289 SPEAK 543259
•
Channel-Identifier:32AECB23433801@speechsynth •
Kill-On-Barge-In:true
•
Content-Type:application/ssml+xml
Content-Length:104
Setup
Play
Play & Recognize
Teardown
S->C: MRCP/2.0 52 543259 200 IN-PROGRESS
Channel-Identifier:32AECB23433801@speechsynth
Speech-Marker:timestamp=857207696314
<?xml version="1.0"?>
<speak version="1.0"
xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
xml:lang="en-US">
<p>
<s>Welcome to ABC corporation.</s>
<s>Who would you like Talk to.</s>
</p>
</speak>
SpeechTek West 2007
39
21
Client issues bargeable SPEAK request
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
C->S: MRCP/2.0 289 SPEAK 543259
•
Channel-Identifier:32AECB23433801@speechsynth •
Kill-On-Barge-In:true
•
Content-Type:application/ssml+xml
Content-Length:104
Setup
Play
Play & Recognize
Teardown
S->C: MRCP/2.0 52 543259 200 IN-PROGRESS
Channel-Identifier:32AECB23433801@speechsynth
Speech-Marker:timestamp=857207696314
<?xml version="1.0"?>
<speak version="1.0"
xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
xml:lang="en-US">
<p>
<s>Welcome to ABC corporation.</s>
<s>Who would you like Talk to.</s>
</p>
</speak>
SpeechTek West 2007
40
21
Setup
Play
Play & Recognize
Teardown
Barge-in occurs
•
•
•
S->C: MRCP/2.0 49 START-OF-INPUT 543258 IN-PROGRESS
Channel-Identifier:32AECB23433801@speechrecog
Proxy-Sync-Id:987654321
•
•
•
C->S: MRCP/2.0 69 BARGE-IN-OCCURRED 543259
Channel-Identifier:32AECB23433801@speechsynth
Proxy-Sync-Id:987654321
•
•
•
•
S->C: MRCP/2.0 72 543259 200 COMPLETE
Channel-Identifier:32AECB23433801@speechsynth
Active-Request-Id-List:543258
Speech-Marker:timestamp=857206096314
•
•
•
•
S->C: MRCP/2.0 73 SPEAK-COMPLETE 543259 COMPLETE
Channel-Identifier:32AECB23433801@speechsynth
Completion-Cause:001 barge-in
Speech-Marker:timestamp=857207685213
SpeechTek West 2007
Recognizer
(MRCP server)
sends start of
input to client
when input is
detected
41
22
Setup
Play
Play & Recognize
Teardown
Barge-in occurs
•
•
•
S->C: MRCP/2.0 49 START-OF-INPUT 543258 IN-PROGRESS
Channel-Identifier:32AECB23433801@speechrecog
Proxy-Sync-Id:987654321
•
•
•
C->S: MRCP/2.0 69 BARGE-IN-OCCURRED 543259
Channel-Identifier:32AECB23433801@speechsynth
Proxy-Sync-Id:987654321
•
•
•
•
S->C: MRCP/2.0 72 543259 200 COMPLETE
Channel-Identifier:32AECB23433801@speechsynth
Active-Request-Id-List:543258
Speech-Marker:timestamp=857206096314
•
•
•
•
S->C: MRCP/2.0 73 SPEAK-COMPLETE 543259 COMPLETE
Channel-Identifier:32AECB23433801@speechsynth
Completion-Cause:001 barge-in
Speech-Marker:timestamp=857207685213
SpeechTek West 2007
MRCP client
notifies
synthesizer
(MRCP server)
that barge-in has
occurred
42
22
Setup
Play
Play & Recognize
Teardown
Barge-in occurs
•
•
•
S->C: MRCP/2.0 49 START-OF-INPUT 543258 IN-PROGRESS
Channel-Identifier:32AECB23433801@speechrecog
Proxy-Sync-Id:987654321
•
•
•
C->S: MRCP/2.0 69 BARGE-IN-OCCURRED 543259
Channel-Identifier:32AECB23433801@speechsynth
Proxy-Sync-Id:987654321
•
•
•
•
S->C: MRCP/2.0 72 543259 200 COMPLETE
Channel-Identifier:32AECB23433801@speechsynth
Active-Request-Id-List:543258
Speech-Marker:timestamp=857206096314
•
•
•
•
S->C: MRCP/2.0 73 SPEAK-COMPLETE 543259 COMPLETE
Channel-Identifier:32AECB23433801@speechsynth
Completion-Cause:001 barge-in
Speech-Marker:timestamp=857207685213
SpeechTek West 2007
Because Kill-onbarge-in was set
to true, the
synthesizer stops
playing
43
22
Setup
Play
Play & Recognize
Teardown
Barge-in occurs
•
•
•
S->C: MRCP/2.0 49 START-OF-INPUT 543258 IN-PROGRESS
Channel-Identifier:32AECB23433801@speechrecog
Proxy-Sync-Id:987654321
•
•
•
C->S: MRCP/2.0 69 BARGE-IN-OCCURRED 543259
Channel-Identifier:32AECB23433801@speechsynth
Proxy-Sync-Id:987654321
•
•
•
•
S->C: MRCP/2.0 72 543259 200 COMPLETE
Channel-Identifier:32AECB23433801@speechsynth
Active-Request-Id-List:543258
Speech-Marker:timestamp=857206096314
•
•
•
•
S->C: MRCP/2.0 73 SPEAK-COMPLETE 543259 COMPLETE
Channel-Identifier:32AECB23433801@speechsynth
Completion-Cause:001 barge-in
Speech-Marker:timestamp=857207685213
SpeechTek West 2007
Note that
combined asr/tts
resources can
sometimes
automatically
terminate
playback sooner.
44
22
Setup
Play
Play & Recognize
Teardown
Server returns result
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
S->C: MRCP/2.0 412 RECOGNITION-COMPLETE 543258 COMPLETE
Channel-Identifier:32AECB23433801@speechrecog
Completion-Cause:000 success
Waveform-URI:<http://web.media.com/session123/audio.wav>;
size=423523;duration=25432
Content-Type:application/nlsml+xml
Content-Length:104
<?xml version="1.0"?>
<result xmlns="http://www.ietf.org/xml/ns/mrcpv2"
xmlns:ex="http://www.example.com/example"
grammar="session:[email protected]">
<interpretation>
<instance name="Person">
<ex:Person>
<ex:Name> Andre Roy </ex:Name>
</ex:Person>
</instance>
<input> Can I speak to Andre Roy </input>
</interpretation>
</result>
SpeechTek West 2007
45
23
Setup
Play
Play & Recognize
Teardown
Client closes session
•
C->S: BYE sip:[email protected] SIP/2.0
•
Max-Forwards:6
•
From:Sarvi <sip:[email protected]>;tag=a6c85cf
•
To:MediaServer <sip:[email protected]>;tag=1928301774
•
Call-ID:a84b4c76e66710
•
CSeq:231 BYE
•
Content-Length:0
SpeechTek West 2007
46
24
• Dan Burnett
• [email protected]
SpeechTek West 2007
47
25