Beyond VoiceXML: New Standard Languages from the World

Download Report

Transcript Beyond VoiceXML: New Standard Languages from the World

A Standard for Developing
Multimodal Applications
James A. Larson
Larson Technical Services
jim @ larson-tech.com
SpeechTEK West
February 23, 2007
Status of W3C Multimodal Interface Languages
Recommendation
Proposed
Recommendation
Candidate
Recommendation
Last Call
Working Draft
Working Draft
Requirements
2
James A. Larson
Semantic
Speech
Speech InterpretRecogSynthesis
ation
nition
Markup
of
Grammar
Language Speech
Format
(SSML)
Recog(SRGS)
1.0
nition
1.0
(SISR)
1.0
Voice
XML 2.0
Voice
XML 2.1
Extended
Multimodal
Interaction
(EMMA)
1.0
State
Chart
XML
(SCXML)
1.0
Developing & Delivering Multimodal Applications
InkXL
1.0
Interaction Manager Approaches
SALT
Objectoriented
Interaction
Manager
(XHTML)
Interaction
Manager
(C#)
SALT
SAPI 5.3
X+V
Interaction
Manager
(XHTML)
VoiceXML 2.0
Modules
W3C
Interaction
Manager Data
(SCXML) Model
XHTML
VoiceXML 3.0
InkML
3
James A. Larson
Developing & Delivering Multimodal Applications
Standard
Languages
SALT
XHTML
SRGS
SSML
Objectoriented
SRGS
SSML
X+V
VoiceXML
SRGS
SSML
SISR
XHTML
W3C
SCXML
SRGS
VoiceXML
SSML
SISR
XHTML
EMMA
CCXML
Interaction XHTML
Manager
C#
XHTML
SCXML
Modes
GUI
Speech
GUI
Speech
GUI
Speech
Ink
…
4
James A. Larson
GUI
Speech
Developing & Delivering Multimodal Applications
MMI Architecture—Basic Components
•
Interaction Manager—coordinates
modality components and
provides application flow
•
Modality Components—provide
modality capabilities such as
speech, pen, keyboard, mouse
•
Data Model—handles shared data
Interaction
Manager Data
(SCXML) Model
XHTML
VoiceXML 3.0
InkML
5
James A. Larson
Developing & Delivering Multimodal Applications
Multimodal Architecture and Interfaces
•
A loosely-coupled, event-based
architecture for integrating multiple
modalities into applications
Interaction
Manager Data
(SCXML) Model
•
•
All communication is event-based
•
Components can also expose other events
as required
•
•
Encapsulation protects component data
•
6
Based on a set of standard life-cycle
events
Encapsulation enhances extensibility to
new modalities
Can be used outside a Web environment
James A. Larson
XHTML
VoiceXML 3.0
InkML
Developing & Delivering Multimodal Applications
Specify Interaction Manager Using Harel
State Charts
Prepare
State
Prepare
Response
(success)
Start
State
•
•
•
StartFail
Start
Response
WaitState
Extension of state transition
systems
Prepare
Response
(fail)
FailState
States
Transitions
Nested state-transition systems
• Parallel state-transition systems
• History
DoneFail
Done
Success
EndState
7
James A. Larson
Developing & Delivering Multimodal Applications
Example State Transition System
Prepare
State
Prepare
Response
(success)
Start
State
…
<state id="PrepareState">
StartFail
Start
Response
WaitState
State Chart XML (SCXML)
Prepare
Response
(fail)
<send event="prepare"
contentURL="hello.vxml"/>
FailState
<transition event="prepareResponse"
cond="status='success'"
target="StartState"/>
DoneFail
<transition event="prepareResponse"
cond="status='failure'"
target="FailState"/>
Done
Success
EndState
</state>
…
8
James A. Larson
Developing & Delivering Multimodal Applications
Example State Chart with Parallel States
Prepare
Voice
Prepare
Response
Success
Start
Voice
Start
Response
Wait
Voice
Done
Success
End
Voice
9
James A. Larson
Prepare
Response
Fail
Fail Voice
Prepare
GUI
Prepare
Response
Success
Start
GUI
Prepare
Response
Fail
Start
Response
Wait
GUI
Done
Success
End
GUI
Developing & Delivering Multimodal Applications
Fail GUI
The Life Cycle Events
SCXML
XHTML
VoiceXML
SCXML
XHTML
VoiceXML
SCXML
XHTML
VoiceXML
SCXML
XHTML
VoiceXML
SCXML
XHTML
10
James A. Larson
VoiceXML
Developing & Delivering Multimodal Applications
More Life Cycle Events
SCXML
XHTML
VoiceXML
SCXML
XHTML
VoiceXML
SCXML
XHTML
SCXML
XHTML
11
James A. Larson
VoiceXML
Developing & Delivering Multimodal Applications
Synchronization Using the Lifecycle Data
Event
SCXML
XHTML
• Intent-based events
– Capture the underlying intent
rather than the physical
manifestation of user-SCXML
events
– Independent of the physical
characteristics of particular
devices
12
James A. Larson
VoiceXML
• Data/reset
– Reset one or more field values to
null
• Data/focus
– Focus on another field
• Data/change
– Field value has changed
Developing & Delivering Multimodal Applications
Lifecycle Events between Interaction
Manager and Modality
Interaction Manager
Prepare
State
Prepare
Response
Success)
Prepare
Response
Fail
prepare
prepare response (failure)
prepare response (success)
start
Start
State
Start
Response
WaitState
Done
Success
Modality
start response (success)
FailState
start response (failure)
data
done
EndState
13
James A. Larson
Developing & Delivering Multimodal Applications
MMI Architecture Principles
•
Interaction manager communicates with Modality Components
through asynchronous events
•
Modality Components don’t communicate directly with each other,
but indirectly through the Interaction manager
•
Components must implement basic life cycle events, may expose
other events
•
Modality components can be nested (e.g. a Voice Dialog component
like a VoiceXML <form>)
•
•
Components need not be markup-based
14
EMMA communicates users’ inputs to the Interaction Manager
James A. Larson
Developing & Delivering Multimodal Applications
Modalities
Interaction
Manager Data
(SCXML) Model
• GUI Modality (XHTML)
– Adapter converts Lifecycle
events to XHTML events
– XHTML events converted to
lifecycle events
XHTML
VoiceXML 3.0
• Voice Modality (VoiceXML 3.0)
– Lifecyle events are embedded
into VoiceXML 3.0
15
James A. Larson
Developing & Delivering Multimodal Applications
Modalities
• VoiceXML supports
– Events sent from the Interaction Manager
– Sending events to the Interaction Manager.
<form>
<catch name="change">
<assign name="city" value="data"/>
</catch>
Interaction
Manager Data
(SCXML) Model
…
XHTML
<field name = "city">
<prompt> Blah </prompt>
<grammar src="city.grxml"/>
VoiceXML 3.0
<filled>
<send event="data.change" data="city"/>
</filled>
</field>
</form>
16
James A. Larson
Developing & Delivering Multimodal Applications
Modalities
•
XHTML is extended to send events to the Interaction Manager.
<head>
…
<ev:Listener ev:event="onChange"
ev:observer="app1"
ev:handler="onChangeHandler()";>
…
<script>
{function onChangeHandler()
post ("data", data="city")
}
</script>
</head>
Interaction
Manager Data
(SCXML) Model
XHTML
…
<body id="app1"?
<input type="text" id=city "value= " "/>
</body>
VoiceXML 3.0
…
17
James A. Larson
Developing & Delivering Multimodal Applications
Modalities
•
XHTML is extended to support events received from the Interaction
Manager
<head>
…
<handler type="text/javascript“ ev:event="data"
if (event="change"
{document.app1.city.value="data.city"}
</handler>
…
</head>
Interaction
Manager Data
(SCXML) Model
…
<body id="app1"?
<input type="text" id="city" value=" "/>
</body>
…
18
James A. Larson
XHTML
VoiceXML 3.0
Developing & Delivering Multimodal Applications
References
•
SCXML
– Second working draft available at
http://www.w3.org/TR/2006/WD-scxml-20060124/
– Open Source available from
http://jakarta.apache.org/commons/sandbox/scxml/
•
Multimodal Architecture and Interfaces
– Working draft available at http://www.w3.org/TR/2006/WD-mmi-arch20060414/
•
Voice Modality
– First working draft VoiceXML 3.0 scheduled for November 2007
•XHTML
– Full recommendation
– Adapters must be hand-coded
•
Other modalities
– TBD
19
James A. Larson
Developing & Delivering Multimodal Applications
Availability
SAPI 5.3
–
Microsoft Windows Vista®
X+V
–
ACCESS Systems’ NetFront Multimodal Browser for PocketPC 2003
http://www-306.ibm.com/software/pervasive/
multimodal/?Open&ca=daw-prod-mmb
Opera Software Multimodal Browser for Sharp Zaurus
http://www-306.ibm.com/software/pervasive/
multimodal/?Open&ca=daw-prod-mmb
Opera 9 for Windows
http://www.opera.com/
–
–
W3C
–
–
First working draft of VoiceXML 3.0 not yet available
Working drafts of SCXML are available; some open-source implementations are
available
Proprietary APIs
–
20
Available from vendor
James A. Larson
Developing & Delivering Multimodal Applications
Final Advice
• The W3C is defining a rich collection of languages for
authoring multimodal applications
– SCXML can be used as an Interaction Manager
– Many languages for modalities: VoiceXML, XHTML, …
– EMMA may be used to describe data transmitted among
modules
– W3C languages will be available on multiple platforms
• Avoid getting locked into using proprietary languages
available only on a single platform
– The W3C languages will be available on multiple
platforms
21
James A. Larson
Developing & Delivering Multimodal Applications
Web Resources
http://www.w3.org/voice
– Specification of grammar, semantic interpretation, and speech synthesis
languages
http://www.w3.org/2002/mmi
– Specification of EMMA and InkML languages
http:/www.microsoft.com (and query SALT)
– SALT specification and download instructions for adding SALT to Internet
Explorer
http://www-306.ibm.com/software/pervasive/multimodal/
– X+V specification; download Opera and ACCESS browsers
http://www.larson-tech.com/SALT/ReadMeFirst.html
– Student projects using SALT to develop multimodal applications
http://www.larson-tech.com/MMGuide.html or
http://www.w3.org/2002/mmi/Group/2006/Guidelines/
– User interface guidelines for multimodal applications
22
James A. Larson
Developing & Delivering Multimodal Applications