Beyond VoiceXML: New Standard Languages from the World

Download Report

Transcript Beyond VoiceXML: New Standard Languages from the World

Tutorial
Developing and Deploying Multimodal
Applications
James A. Larson
Larson Technical Services
jim @ larson-tech.com
SpeechTEK West
February 23, 2007
Developing and Deploying Multimodal
Applications
What applications should be multimodal?
What is the multimodal application development process?
What standard languages can be used to develop multimodal
applications?
What standard platforms are available for multimodal applications?
2
James A. Larson
Developing & Delivering Multimodal Applications
Capturing Input from the User
Medium
Acoustic
Input Device
Microphone
Mode
Speech
Keypad
Key
Keyboard
Pen
Tactile
Ink
Mouse
GUI
Joystick
Scanner
Visual
Photograph
Still camera
Video camera
3
James A. Larson
Movie
Developing & Delivering Multimodal Applications
Capturing Input From the User
Medium
Acoustic
Input Device
Microphone
Key
Keyboard
Pen
Ink
Mouse
GUI
Joystick
Scanner
Visual
Mode
Speech
Keypad
Tactile
Multimodal
Photograph
Still camera
Video camera
Gaze tracking
Gesture reco
RFID
Electronic
Biometric
Digital data
GPS
4
James A. Larson
Developing & Delivering Multimodal Applications
Presenting Output to the User
Medium
Acoustic
Output Device
Speaker
Mode
Speech
Text
Visual
Display
Photograph
Movie
Tactile
5
James A. Larson
Joystick
Pressure
Developing & Delivering Multimodal Applications
Multimedia
Medium
Acoustic
Presenting Output to the User
Output Device
Speaker
Mode
Speech
Text
Visual
Display
Photograph
Movie
Tactile
6
James A. Larson
Joystick
Pressure
Developing & Delivering Multimodal Applications
Multimodal and Multimedia Application
Benefits
Provide a natural user interface by using multiple channels for user
interactions
Simplify interaction with small devices with limited keyboard and
display, especially on portable devices
Leverage advantages of different modes in different contexts
Decrease error rates and time required to perform tasks
Increase accessibility of applications for special users
Enable new kinds of applications
7
James A. Larson
Developing & Delivering Multimodal Applications
Exercise 1
What new multimodal applications would be useful for your work?
What new multimodal applications would be entertaining to you, your
family, or friends?
8
James A. Larson
Developing & Delivering Multimodal Applications
Voice as a “Third Hand”
Game Commander 3
•
9
http://www.gamecommander.com/
James A. Larson
Developing & Delivering Multimodal Applications
Voice-Enabled Games
Scansoft’s VoCon Games Speech SDK
•
•
•
•
10
James A. Larson
http://www.scansoft.com/games/
PlayStation® 2
Nintendo® GameCube™
http://www.omnipage.com/games/po
weredby/
Developing & Delivering Multimodal Applications
Education
Tucker Maxon School of Oral Education
http://www.tmos.org/
11
James A. Larson
Developing & Delivering Multimodal Applications
Education
Reading Tutor Project
http://cslr.colorado.edu/beginweb/reading/reading.html
12
James A. Larson
Developing & Delivering Multimodal Applications
Multimodal Applications Developed by PSU
and OHSU Students
Hands-busy
Troubleshooting a car’s motor
Repairing a leaky faucet
Tune musical instruments
Construction
Complex origami artifact
Project book for children
Cooking—Talking recipe book
Entertainment
Child’s fairy tale book
Audio-controlled juke box
Games (Battleship, Go)
13
James A. Larson
Developing & Delivering Multimodal Applications
Multimodal Applications Developed by PSU
and OHSU Students (continued)
Data collection
Buy a car
Collect health data
Buy movie tickets
Order meals from a restaurant
Conduct banking business
Locate a business
Order a computer
Choose homeless pets from an animal shelter
Authoring
Photo album tour
Education
Flash cards—Addition tables
Download Opera and the speech plug-in
Go to www.larson-tech.com/mm-Projects/Demos.htm
14
James A. Larson
Developing & Delivering Multimodal Applications
New Application Classes
Active listening
Verbal VCR controls: start, stop, fast forward, rewind, etc.
Virtual assistants
Listen for requests and immediately perform them
-
Violin tuner
TV Controller
Environmental controller
Family-activity coordinator
Synthetic experiences
Synthetic interviews
Speech-enabled games
Education and training
Authoring content
15
James A. Larson
Developing & Delivering Multimodal Applications
Two General Uses of Multiple Modes of
Input
Redundancy—One mode acts as backup for another mode
In noisy environments, use keypad instead of speech input.
In cold environments, use speech instead of keypad.
Complementary—One mode supplements another mode
Voice as a third hand
“Move that (point) to there (point)” (late fusion)
Lip reading = video + speech (early fusion)
16
James A. Larson
Developing & Delivering Multimodal Applications
Potential Problems with Multimodal
Applications
Voice may make an application “noisy.”
•
•
Privacy and security concerns
Noise pollution
Sometimes speech and handwriting recognition systems fail.
False expectations of users wanting to use natural language.
17
James A. Larson
Developing & Delivering Multimodal Applications
Potential Problems with Multimodal
Applications
Voice may make an application “noisy.”
•
•
Privacy and security concerns
Noise pollution
Sometimes speech and handwriting recognition systems fail.
False expectations of users wanting to use natural language.
Full natural language processing requires:
• Knowledge of outside world
• History of the user-computer interaction
• Sophisticated understanding of language structure
“Natural language-like” simulates natural language for a
small domain, short history, and specialized language
structures
18
James A. Larson
Developing & Delivering Multimodal Applications
Potential Problems with Multimodal
Applications
Voice may make an application “noisy.”
•
•
Privacy and security concerns
Noise pollution
Sometimes speech and handwriting recognition systems fail.
False expectations of users wanting to use natural language.
Possible only
on Star Trek
Incorrectly
called “NLP”
19
James A. Larson
Full “natural language” processing requires:
• Knowledge of outside world
• History of the user-computer interaction
• Sophisticated understanding of language structure
“Natural language-like” simulates natural language for
a small domain, short history, and specialized language
structures.
Developing & Delivering Multimodal Applications
Adding a New Mode to an Application
Only if…
The new mode enables new features not previously possible.
The new modes dramatically improves the usability
Always….
Redesign the application to take advantage of the new mode.
Provide backup for the new mode.
Test, test, and test some more.
20
James A. Larson
Developing & Delivering Multimodal Applications
Exercise 2
Where will multimodal applications be used?
A. At home
B. At work
C. “On the road”
D. Other?
21
James A. Larson
Developing & Delivering Multimodal Applications
Developing and Deploying Multimodal
Applications
What applications should be multimodal?
What is the multimodal application development process?
What standard languages can be used to develop multimodal
applications?
What standard platforms are available for multimodal applications?
22
James A. Larson
Developing & Delivering Multimodal Applications
The Playbill—
Who’s Who on the Team
Users—Their lives will be improved by using the multimodal application
Interaction designer—Designs the dialog—when and how the user and
system interchange requests and information
Multimodal programmer—Implements VUI
Voice talent—Records spoken prompts and messages
Grammar writer—Specifies words and phrases the user may speak in
response to a prompt
TTS specialist—Specifies verbal and audio sounds and inflections
Quality assurance specialist—Performs tests to validate the application is
both useful and usable
Customer—Pays the bills
Program manager—Organizes the work and makes sure it is completed
according to schedule and under budget
23
James A. Larson
Developing & Delivering Multimodal Applications
Development Process
Investigation Stage
Design Stage
Each stage involves users
Development Stage
Iterative refinement
Testing Stage
Sustaining Stage
24
James A. Larson
Developing & Delivering Multimodal Applications
Development Process
Investigation Stage
Design Stage
Development Stage
Testing Stage
Identify the Application
• Conduct ethnography studies
• Identify candidate applications
• Conduct focus groups
• Select the application
Sustaining Stage
25
James A. Larson
Developing & Delivering Multimodal Applications
26
James A. Larson
Developing & Delivering Multimodal Applications
Exercise 3
What will be the “killer” consumer multimodal applications?
27
James A. Larson
Developing & Delivering Multimodal Applications
Development Process
Investigation Stage
Design Stage
Development Stage
Testing Stage
Specify the Application
• Construct the conceptual model
• Construct scenarios
• Specify performance and
preference requirements
Sustaining Stage
28
James A. Larson
Developing & Delivering Multimodal Applications
Specify Performance and Preference
Requirements
Is the application useful?
Performance
Is the application enjoyable?
Preference
Measure users’ likes
and dislikes.
Measure what the users
actually accomplished.
Validate that the users
achieved success.
29
James A. Larson
Validate that the users enjoyed
the application and will use it
again again.
Developing & Delivering Multimodal Applications
Performance Metrics
30
User Task
Measure
Typical
Criteria
Speak a command
Word error rate
Less than 3%
The caller supplies
values into a form
Enters valid values into each
field of a form
< 5 seconds per
value
Navigate a list
The user successfully selects
the specified option.
Greater than
95%
Purchase a product
The user successfully
completes the purchase option.
Greater than
93%
James A. Larson
Developing & Delivering Multimodal Applications
Exercise 4
Specify performance metrics for the multimodal email application
User Task
31
James A. Larson
Measure
Typical Criteria
Developing & Delivering Multimodal Applications
Preference Metrics
Question
32
Typical Criteria
On a scale from 1 to 10, rate the
help facility.
The average caller score is greater
than 8.
On a scale from 1 to 10, rate the
ease of use of this application.
The average caller score is greater
than 8.
Would you recommend using this
voice portal to a friend?
Over 80% of callers respond by
saying “yes.”
What would you be willing to pay to
each time you use this application?
Over 80% of callers indicate that
they are willing to pay $1.00 or
more per use.
James A. Larson
Developing & Delivering Multimodal Applications
Exercise 5
Specify preference metrics for the multimodal email application
Question
33
James A. Larson
Typical Criteria
Developing & Delivering Multimodal Applications
Preference Metrics
(Open-ended Questions)
What did you like the best about this voice-enabled application? (Do not
change these features.)
What did you like the least about this voice-enabled application? (Consider
changing these features.)
What new features would you like to have added? (Consider adding these
features in this or a later release.)
What features do you think you will never use? (Consider deleting these
features.)
Do you have any other comments and suggestions? (Pay attention to these
responses. Callers frequently suggest very useful ideas.)
34
James A. Larson
Developing & Delivering Multimodal Applications
Development Process
Investigation Stage
Design Stage
Development Stage
Testing Stage
Develop the Application
• Specify the persona
• Specify the modes and
modalities
• Specify the dialog script
Sustaining Stage
35
James A. Larson
Developing & Delivering Multimodal Applications
UI Design Guidelines
Guidelines for Voice User Interfaces
•
Bruce Balentine and David P. Morgan. How to Build a Speech Recognition
Application, Second Edition. http://www.eiginc.com
Guidelines for Graphical User Interfaces
•
Research-Based Web Design and Usability Guidelines. U.S. Department of
Health and Human Services. http://www.usability.gov/pdfs/guidelines.html
Guidelines for Graphical User Interfaces
•
36
Common Sense Guidelines for Developing Multimodal User Interfaces.W3C
Working Group Note. 19 April 2006
http://www.w3.org/2002/mmi/Group/2006/Guidelines/
James A. Larson
Developing & Delivering Multimodal Applications
Common-sense Suggestions
1. Satisfy Real-World Constraints
Task-oriented Guidelines
1.1. Guideline: For each task, use the easiest mode available on the device.
Physical Guidelines
1.2. Guideline: If the user’s hands are busy, then use speech.
1.3. Guideline: If the user’s eyes are busy, then use speech.
1.4. Guideline: If the user may be walking, use speech for input.
Environmental Guidelines
1.5. Guideline: If the user may be in a noisy environment, then use a pen, keys or mouse.
1.6. Guideline: If the user’s manual dexterity may be impaired, then use speech.
37
James A. Larson
Developing & Delivering Multimodal Applications
Exercise 6
What input mode(s) should be used for each of the following tasks?
A. Selecting objects
B. Entering text
C. Entering symbols
D. Enter sketches or illustrations
38
James A. Larson
Developing & Delivering Multimodal Applications
Common-sense Suggestions
2. Communicate Clearly, Concisely, and Consistently
with Users
Consistency Guidelines
2.1. Phrase all prompts consistently.
2.2. Enable the user to speak keyword utterances rather than natural language sentences.
2.3. Switch presentation modes only when the information is not easily presented in the
current mode.
2.4. Make commands consistent.
2.5. Make the focus consistent across modes.
Organizational Guidelines
2.6. Use audio to indicate the verbal structure.
2.7. Use pauses to divide information into natural “chunks.”
2.8. Use animation and sound to show transitions.
2.9. Use voice navigation to reduce the number of screens.
2.10. Synchronize multiple modalities appropriately.
2.11. Keep the user interface as simple as possible.
39
James A. Larson
Developing & Delivering Multimodal Applications
Common-sense Suggestions
3. Help Users Recover Quickly and Efficiently from
Errors
Conversational Guidelines
3.1. Users tend to use the same mode that was used to prompt them.
3.2. If privacy is not a concern, use speech as output to provide commentary or help.
3.3. Use directed user interfaces, unless the user is always knowledgeable and experienced
in the domain.
3.4 Always provide context-sensitive help for every field and command.
40
James A. Larson
Developing & Delivering Multimodal Applications
Common-sense Suggestions
3. Help Users Recover Quickly and Efficiently from
Errors (Continued)
Reliability Guidelines
Operational status
3.5. The user always should be able to determine easily if the device is listening to the
user.
3.6. For devices with batteries, users always should be able to determine easily how much
longer the device will be operational.
3.8. Support at least two input modes so one input mode can be used when the other
cannot.
Visual feedback
3.8. Present words recognized by the speech recognition system on the display, so the user
can verify they are correct.
3.9. Display the n-best list to enable easy speech recognition error correction
3.10. Try to keep response times less than 5 seconds. Inform the user of longer response
times.
41
James A. Larson
Developing & Delivering Multimodal Applications
Common-sense Suggestions
4. Make Users Comfortable
Listening mode
4.1. Speak after pressing a speak key. which automatically releases after the user finishes
speaking.
System Status
4.2. Always present the current system status to the user.
Human-memory Constraints
4.3. Use the screen to ease stress on the user’s short-term memory.
42
James A. Larson
Developing & Delivering Multimodal Applications
Common-sense Suggestions
4. Make Users Comfortable (Continued)
Social Guidelines
4.4. If the user may need privacy, use a display rather than render speech.
4.5. If the user may need privacy, use a pen or keys.
4.6. If the device may be used during a business meeting, then use a pen or keys (with the
keyboard sounds turned off).
Advertising Guidelines
4.7. Use animation and sound to attract the user’s attention.
4.8. Use landmarks to help the know where he is.
43
James A. Larson
Developing & Delivering Multimodal Applications
Common-sense Suggestions
4. Make Users Comfortable (continued)
Ambience
4.9 Use audio and graphic design to set the mood and convey emotion in games and
entertainment applications.
Accessibility
4.10 For each traditional output technique, provide an alternative output technique.
4.11. Enable users to adjust the output presentation.
44
James A. Larson
Developing & Delivering Multimodal Applications
Books
Ramon Lopez-Cozar Delgado and Masahiro Araki. Spoken, Multilingual
and Multimodal Dialog Systems—Development and Assessment. West
Sussex, England: Wiley, 2005.
Julie A. Jacko and Andrew Sears (Editors) The Human-Computer
Interaction Handbook—Fundamentals, Evolving technologies, and
Emerging Applications. Mahwah, New Jersey: Lawrence Erlbaum
Associates, 2003.
45
James A. Larson
Developing & Delivering Multimodal Applications
Development Process
Investigation Stage
Design Stage
Development Stage
Testing Stage
Test The Application
• Component test
• Usability test
• Stress test
• Field test
Sustaining Stage
46
James A. Larson
Developing & Delivering Multimodal Applications
Testing Resources
Jeffrey Rubin. Handbook of Usability Testing. New York: Wiley
Technical Communication Library, 1994.
Peter and David Leppik. Gourmet Customer Service. Eden Prairie, MN:
VocalLabs, 2005. [email protected]
47
James A. Larson
Developing & Delivering Multimodal Applications
Development Process
Investigation Stage
Design Stage
Development Stage
Testing Stage
Deploy and Monitor the
Application
• User Survey
• Usage reports from log files
• User feedback and comments
Sustaining Stage
48
James A. Larson
Developing & Delivering Multimodal Applications
Developing and Deploying Multimodal
Applications
What applications should be multimodal?
What is the multimodal application development process?
What standard languages can be used to develop multimodal
applications?
What standard platforms are available for multimodal applications?
49
James A. Larson
Developing & Delivering Multimodal Applications
W3C Multimodal Interaction Framework
Recognition Grammar
Semantic Interpretation
Extended Multimodal Annotation (EMMA)
Speech Synthesis
General description of
speech application
components and how
they relate
Interaction Managers
50
James A. Larson
Developing & Delivering Multimodal Applications
W3C Multimodal Interaction Framework
Application
Functions
Input
Interaction
Manager
Output
51
James A. Larson
Telephony
Properties
Developing & Delivering Multimodal Applications
W3C Multimodal Interaction Framework
ASR
Semantic
Interpretation
Information
Integration
Application
Functions
Ink
Interaction
Manager
Display
Audio
User
TTS
52
James A. Larson
Language
Generation
Media
Planning
Developing & Delivering Multimodal Applications
Telephony
Functions
W3C Multimodal Interaction Framework
SRGS: Describe what the user may
say at each point in the dialog
ASR
Semantic
Interpretation
Information
Integration
Application
Functions
Ink
Interaction
Manager
Display
Audio
User
TTS
53
James A. Larson
Language
Generation
Media
Planning
Developing & Delivering Multimodal Applications
Telephony
Functions
Speech Recognition Engines
Low-end
High-end
Other
Speaking mode
Isolated
(discrete)
Continuous
Keywords
Enrollment
Speaker
dependent
Speaker
independent
Adaptive
Vocabulary size
Small
Large
Switch
vocabularies
Speaking style
Read
Spontaneous
Number of
simultaneous
callers
Single-threaded
Multi-threaded
54
James A. Larson
Developing & Delivering Multimodal Applications
Speech Recognition Engines
Low-end
High-end
Speaking mode
Isolated
(discrete)
Continuous
Enrollment
Speaker
dependent
Speaker
independent
Vocabulary size
Small
Speaking style
Read
Number of
simultaneous
callers
Single-threaded
55
James A. Larson
Large
Other
Keywords
Adaptive
Switch
vocabularies
Spontaneous
Multi-threaded
Developing & Delivering Multimodal Applications
Grammars
Describe what the user may say or handwrite at a point in the dialog
Enable the recognition engine to work faster and more accurately
Two types of grammars:
– Structured Grammar
– Statistical Grammar (N-grams)
56
James A. Larson
Developing & Delivering Multimodal Applications
Structured Grammars
Specifies words that a user may speak or write
Two representation formats
1. Backus-Naur format (ABNF)
Production Rules
Single_digit ::= zero | one | two | … | nine
Zero_thru_ten ::= Single_digit | ten
2. XML format
Can be processed by XML validater
57
James A. Larson
Developing & Delivering Multimodal Applications
Example XML Grammar
<grammar mode = "voice"
type = "application/srgs+xml"
root = "zero_to_ten“>
<rule id = "single_digit">
<one-of>
<item> zero </item>
<item> one </item>
<item> two </item>
<rule id = "zero_to_ten">
<item> three </item>
<one-of>
<item> four </item>
<ruleref uri = "#single_digit"/>
<item> five </item>
<item> ten </item>
<item> six </item>
</one-of>
<item> seven </item>
</rule>
<item> eight </item>
<item> nine </item>
</one-of>
</rule>
</grammar>
58
James A. Larson
Developing & Delivering Multimodal Applications
Exercise 7
Write a grammar that recognizes the digits zero through nineteen
(Hint: Modify the previous page)
59
James A. Larson
Developing & Delivering Multimodal Applications
Reusing Existing Grammars
<grammar
type = "application/srgs+xml"
root = "size "
src = "http://www.example.com/size.grxml"/>
60
James A. Larson
Developing & Delivering Multimodal Applications
Exercise 8
Write a grammar for positive responses to a yes/no question (i.e.,
“yes,” “sure,” “affirmative,” and so forth)
61
James A. Larson
Developing & Delivering Multimodal Applications
When Is a Grammar Too Large?
Word
Coverage
Response
62
James A. Larson
Developing & Delivering Multimodal Applications
W3C Multimodal Interaction Framework
SISR: A procedural JavaScript-like language for interpreting the text
strings returned by the speech synthesis engine
ASR
Semantic
Interpretation
Information
Integration
Application
Functions
Ink
Interaction
Manager
Display
Audio
User
TTS
63
James A. Larson
Language
Generation
Media
Planning
Developing & Delivering Multimodal Applications
Telephony
Functions
Semantic Interpretation
Semantic scripts employ ECMAScript
Advantages:
– Translate aliases to vocabulary words
– Perform calculations
– Produces a rich structure rather than a text string
64
James A. Larson
Developing & Delivering Multimodal Applications
Semantic Interpretation
Large
white
t-shirt
Grammar
Recognizer
Conversation
Manager
65
James A. Larson
Developing & Delivering Multimodal Applications
Big
white
t-shirt
Semantic Interpretation
Big
white
t-shirt
Grammar with
Semantic
Interpretation
Scripts
<rule id = "action">
<one-of>
<item> small <tag> out.size = "small"; </tag> </item>
<item> medium <tag> out.size = "medium"; </tag> </item>
<item> large <tag> out.size = "large"; </tag> </item>
<item> big <tag> out.size = "large"; </tag> </item>
</one-of>
<one-of>
<item> green <tag> out.color = "green"; </tag> </item>
<item> blue <tag> out.color = "blue"; </tag> </item>
<item> white <tag> out.color = "white"; </tag> </item>
</one-of>
</rule>
{
size: large
color: white
Recognizer
Semantic
Interpretation
Processor
Conversation
Manager
}
66
James A. Larson
Developing & Delivering Multimodal Applications
Exercise 9
Modify this rule to return only “yes”
<grammar type = "application/srgs+xml" root = "yes" mode = "voice">
<rule id = "yes">
<one-of>
<item> yes </item>
<item> sure </item>
<item> affirmative </item>
…
</one-of>
</rule>
</grammar>
67
James A. Larson
Developing & Delivering Multimodal Applications
W3C Multimodal Interaction Framework
EMMA: A language for representing
the semantic content from speech
recognizers, handwriting recognizers,
and other input devices
ASR
Semantic
Interpretation
Information
Integration
Application
Functions
Ink
Interaction
Manager
Display
Audio
User
TTS
68
James A. Larson
Language
Generation
Media
Planning
Developing & Delivering Multimodal Applications
Telephony
Functions
EMMA
Extensible MultiModal Annotation markup language
Canonical structure semantic interpretations for a variety of inputs
including:
•
•
•
•
69
Speech
Natural language text
GUI
Ink
James A. Larson
Developing & Delivering Multimodal Applications
EMMA
Speech
Grammar
+ Semantic
Interpretation
Instructions
Keyboard
Speech
Recognition
Keyboard
Interpretation
EMMA
Interpretation
Instructions
EMMA
Merging/
Unification
EMMA
Applications
70
James A. Larson
Developing & Delivering Multimodal Applications
EMMA
Speech
Grammar
+ Semantic
Interpretation
Instructions
Keyboard
Speech
Recognition
Keyboard
Interpretation
EMMA
<interpretation mode = "speech">
<travel>
<to hook="ink"/>
<from hook="ink"/>
<day> Tuesday </day>
</travel>
</interpretation>
Interpretation
Instructions
EMMA
Merging/
Unification
EMMA
Applications
71
James A. Larson
Developing & Delivering Multimodal Applications
EMMA
Speech
Grammar
+ Semantic
Interpretation
Instructions
Keyboard
Speech
Recognition
Keyboard
Interpretation
EMMA
<interpretation mode = "speech">
<travel>
<to hook="ink"/>
<from hook="ink"/>
<day> Tuesday </day>
</travel>
</interpretation>
Interpretation
Instructions
EMMA
Merging/
Unification
EMMA
<interpretation mode = "ink">
<travel>
<to>Las Vegas </to>
<from>Portland </from>
</travel>
</interpretation>
Applications
72
James A. Larson
Developing & Delivering Multimodal Applications
EMMA
<interpretation mode = "interp1">
<travel>
<to> Las Vegas </to>
<from> Portland </from>
<day> Tuesday </day>
Keyboard
</travel>
</interpretation>
Speech
Grammar
+ Semantic
Interpretation
Instructions
Speech
Recognition
Keyboard
Interpretation
EMMA
<interpretation mode = "speech">
<travel>
<to hook="ink"/>
<from hook="ink"/>
<day> Tuesday </day>
</travel>
</interpretation>
Interpretation
Instructions
EMMA
Merging/
Unification
EMMA
<interpretation mode = "ink">
<travel>
<to>Las Vegas </to>
<from>Portland </from>
</travel>
</interpretation>
Applications
73
James A. Larson
Developing & Delivering Multimodal Applications
Exercise 10
Given the following two EMMA specifications,
what is the unified EMMA specification?
<interpretation mode = "speech">
<moneyTransfer>
<sourceAcct hook="ink"/>
<targetAcct hook="ink"/>
<amount> 300 </amount>
</moneyTransfer>
</interpretation>
<interpretation mode = "ink">
<moneyTransfer>
<sourceAcct> savings </sourceAcct>
<targetAcct> checking</targetAcct>
</moneyTransfer>
</interpretation>
Unified EMMA specification:
<interpretation mode ="intp1">
<moneyTransfer>
<sourceAcct> ______ </sourceAcct>
<targetAcct> _______</targetAcct>
<amount> ______ </amount>
</moneyTransfer>
</interpretation>
74
James A. Larson
Developing & Delivering Multimodal Applications
W3C Multimodal Interaction Framework
SSML: A language for rendering text
as synthesized speech
ASR
Semantic
Interpretation
Information
Integration
Application
Functions
Ink
Interaction
Manager
Display
Audio
User
TTS
75
James A. Larson
Language
Generation
Media
Planning
Developing & Delivering Multimodal Applications
Telephony
Functions
Speech Synthesis Markup Language
Structure
Analysis
Markup support:
paragraph, sentence
Non-markup behavior:
infer structure by
automated text analysis
Text
Normalization
Text-toPhoneme
Conversion
Markup support:
phoneme, sayas
Non-markup behavior:
look up in pronunciation
dictionary
Markup support: sayas for dates, times, etc.
Non-markup behavior: automatically identify
and convert constructs
76
James A. Larson
Prosody
Analysis
Waveform
Production
Markup support:
emphasis, break, prosody
Non-markup behavior:
automatically generate
prosody through analysis
of document structure and
sentence syntax
Developing & Delivering Multimodal Applications
Speech Synthesis Markup Language
Examples
<phoneme alphabet="ipa" ph="wɪnɛfɛks"> WinFX
</phoneme>
is a great platform
<prosody pitch = "x-low">
Who’s been sleeping in my bed?
</prosody>
said papa bear.
<prosody pitch = "medium">
Who’s been sleeping in my bed?
</prosody>
said momma bear.
<prosody pitch = "x-high">
Who’s been sleeping in my bed?
</prosody>
said baby bear.
77
James A. Larson
Developing & Delivering Multimodal Applications
Popular Strategy
Develop dialogs using SSML
Usability test dialogs
Extract prompts
Hire voice talent to record prompts
Replace <prompt> with <audio>
78
James A. Larson
Developing & Delivering Multimodal Applications
W3C Multimodal Interaction Framework
VXML: A language for controlling the exchange
of information and commands between the
user and the system
ASR
Semantic
Interpretation
Information
Integration
Application
Functions
Ink
Interaction
Manager
Display
Audio
User
TTS
79
James A. Larson
Language
Generation
Media
Planning
Developing & Delivering Multimodal Applications
Telephony
Functions
Developing and Deploying Multimodal
Applications
What applications should be multimodal?
What is the multimodal application development process?
What standard languages can be used to develop multimodal
applications?
What standard platforms are available for multimodal applications?
80
James A. Larson
Developing & Delivering Multimodal Applications
Speech APIs and SDKs
•
JSAPI—Java Speech Application Program Interface
– http://java.sun.com/products/java-media/speech/
– http://developer.mozilla.org/en/docs/JSAPI_Reference
•
Nuance Mobil Speech Platform
– http://www.nuance.com/speechplatform/components.asp
•
VSAPI—Voice Signal API
– http://www.voicesignal.com/news/articles/2006-06-21-SymbianOne.htm
•
SALT
– http://www.saltforum.org/
81
James A. Larson
Developing & Delivering Multimodal Applications
Interaction Manager Approaches
Objectoriented
X+V
W3C
Interaction
Manager
(C#)
Interaction
Manager
(XHTML)
SAPI 5.3
VoiceXML 2.0
Modules
Interaction
Manager
(SCXML)
XHTML
VoiceXML 3.0
InkML
82
James A. Larson
Developing & Delivering Multimodal Applications
Interaction Manager Approaches
Objectoriented
X+V
W3C
Interaction
Manager
(C#)
Interaction
Manager
(XHTML)
SAPI 5.3
VoiceXML 2.0
Modules
Interaction
Manager
(SCXML)
XHTML
VoiceXML 3.0
InkML
83
James A. Larson
Developing & Delivering Multimodal Applications
SAPI 5.3 & Windows Vista™
Speech Synthesis
W3C Speech Synthesis Markup Language 1.0
<speak>
<phoneme alphabet="ipa" ph="wɪnɛfɛks">
WinFX
</phoneme>
is a great platform
</speak>
Objectoriented
Interaction
Manager
(C#)
Microsoft proprietary PromptBuilder
myPrompt.AppendTextWithPronunciation
("WinFX", "wɪnɛfɛks");
myPrompt.AppendText("is a great platform.");
84
James A. Larson
SAPI 5.3
Developing & Delivering Multimodal Applications
SAPI 5.3 & Windows Vista™
Speech Recognition
W3C Speech Recognition Grammar Specification 1.0
<grammar type="application/srgs+xml" root= "city" mode="voice">
<rule id = "city">
<one-of>
<item> New York City </item>
<item> New York </item>
<item> Boston </item>
</one-of>
</rule>
</grammar>
Microsoft proprietary Grammar Builder
Choices cityChoices = new Choices();
cityChoices.AddPhrase ("New York City");
cityChoices.AddPhrase ("New York");
cityChoices.AddPhrase ("Boston");
Grammar pizzaGrammar
= new Grammar (new GrammarBuilder(pizzaChoices));
85
James A. Larson
Developing & Delivering Multimodal Applications
SAPI 5.3 & Windows Vista™
Semantic Interpretation
Augment SRGS grammar with Jscript® for semantic interpretation
<grammar type="application/srgs+xml" root= "city" mode="voice">
<rule id = "city">
<one-of>
<item> New York City <tag> city="JFK" </tag></item>
<item> New York <tag> city = "JFK" </tag> </item>
<item> Portland <tag> city = "PDX" </tag></item>
</one-of>
</rule>
</grammar>
User-Specified “Shortcuts” recognizer replaces “shortcut word”
by expanded string
User says: my address
System: 1033 Smith Street, Apt. 7C, Bloggsville 00000
86
James A. Larson
Developing & Delivering Multimodal Applications
SAPI 5.3 & Windows Vista™
Dialog
1. Introduce the System Speech.Recognition namespace
2. Instantiate a SpeechRecognizer object
3. Build a grammar
4. Attach an event handler
5. Load the grammar into the recognizer
6. When the recognizer hears something that fits the grammar, the
SpeechRecognized event handler is invoked, which accesses the
Result object and works with the recognized text
87
James A. Larson
Developing & Delivering Multimodal Applications
SAPI 5.3 & Windows Vista™
Dialog
using System;
using System.Windows.Forms;
{
pizzaChoices.AddPhrase("I'd like a large pepperoni pizza");
using System.ComponentModel;
pizzaChoices.AddPhrase(
using System.Collections.Generic;
"I'd like a small thin crust vegetarian pizza");
using System.Speech.Recognition;
Grammar pizzaGrammar =
new Grammar(new GrammarBuilder(pizzaChoices));
namespace Reco_Sample_1
{
//Attach an event handler
public partial class Form1 : Form
pizzaGrammar.SpeechRecognized +=
{
new EventHandler<RecognitionEventArgs>(
//create a recognizer
PizzaGrammar_SpeechRecognized);
SpeechRecognizer _recognizer = new
SpeechRecognizer();
_recognizer.LoadGrammar(pizzaGrammar);
}
public Form1() { InitializeComponent(); }
void PizzaGrammar_SpeechRecognized(
private void Form1_Load(object sender, EventArgs e)
object sender, RecognitionEventArgs e)
{
//Create a pizza grammar
MessageBox.Show(e.Result.Text);
Choices pizzaChoices = new Choices();
}
pizzaChoices.AddPhrase("I'd like a cheese pizza");
pizzaChoices.AddPhrase("I'd like a pepperoni pizza");
88
James A. Larson
}
}
Developing & Delivering Multimodal Applications
SAPI 5.3 & Windows Vista™
References
Speech API Overview
http://msdn2.microsoft.com/enus/library/ms720151.aspx#API_Speech_Recognition
Microsoft Speech API (SAPI) 5.3
http://msdn2.microsoft.com/en-us/library/ms723627.aspx
“Exploring New Speech Recognition And Synthesis
APIs In Windows Vista” by Robert Brown
http://msdn.microsoft.com/msdnmag/issues/06/01/
speechinWindowsVista/default.aspx#Resources
89
James A. Larson
Developing & Delivering Multimodal Applications
Interaction Manager Approaches
Objectoriented
X+V
W3C
Interaction
Manager
(C#)
Interaction
Manager
(XHTML)
SAPI 5.3
VoiceXML 2.0
Modules
Interaction
Manager
(SCXML)
XHTML
VoiceXML 3.0
InkML
90
James A. Larson
Developing & Delivering Multimodal Applications
Step 1: Start with Standard VoiceXML and
Standard XHTML
VoiceXML
<form id="topform">
<field name="city">
<prompt>Say a name</prompt>
<grammar src="city.grxml"/>
</field>
</form>
XHTML
W3C grammar
language
<form>
Result: <input type="text" name="in1"/>
</form>
91
James A. Larson
Developing & Delivering Multimodal Applications
Step 2: Combine
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<form id="topform">
<field name="city">
<prompt>Say a name</vxml:prompt>
<grammar src ="city.grxml"/>
</field>
</form>
</head>
<body
<form>
Result: <input type="text" name="in1"/>
</form>
</body>
</html>
92
James A. Larson
Developing & Delivering Multimodal Applications
Step 3: Insert vxml Namespace
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:vxml="http://www.w3.org/2001/vxml">
<head>
<vxml:form id="topform">
<vxml:field name="city">
<vxml:prompt>Say a name</vxml:prompt>
<vxml:grammar ="city.grxml"/>
</vxml:field>
</vxml:form>
</head>
<body>
<form>
Result: <input type="text" name="in1"/
</form>
</body>
</html>
93
James A. Larson
Developing & Delivering Multimodal Applications
Step 4: Insert event
<html xmlns=http://www.w3.org/1999/xhtml
xmlns:vxml=http://www.w3.org/2001/vxml
xmlns:ev="http://www.w3.org/2001/xml-events">
<head>
<vxml:form id="topform">
<vxml:field name="city">
<vxml:prompt>Say a name</vxml:prompt>
<vxml:grammar src ="city.grxml"/>
</vxml:field>
</vxml:form>
</head>
<body
<form ev:event="load" ev:handler="#topform">
Result: <input type="text" name="in1"/>
</form>
</body>
</html>
94
James A. Larson
Developing & Delivering Multimodal Applications
Step 5: Insert <sync>
<html xmlns=http://www.w3.org/1999/xhtml
xmlns:vxml=http://www.w3.org/2001/vxml
xmlns:ev=http://www.w3.org/2001/xml-events
xmlns:xv="http://www.w3.org/2002/xhtml+voice">
<head>
<xv:sync xv:input="in1" xv:field="#result"/>
<vxml:form id="topform">
<vxml:field name="city" xv:id="result">
<vxml:prompt>Say a name</vxml:prompt>
<vxml:grammar src ="city.grxml"/>
</vxml:field>
</vxml:form>
</head>
<body
<form ev:event="load" ev:handler="#topform">
Result: <input type="text" name="in1"/>
</form>
</body>
</html>
95
James A. Larson
Developing & Delivering Multimodal Applications
XHTML plus Voice (X+V) References
• Available on
– ACCESS Systems’ NetFront Multimodal Browser for PocketPC 2003
http://www-306.ibm.com/software/pervasive/
multimodal/?Open&ca=daw-prod-mmb
– Opera Software Multimodal Browser for Sharp Zaurus
http://www-306.ibm.com/software/pervasive/
multimodal/?Open&ca=daw-prod-mmb
– Opera 9 for Windows
http://www.opera.com/
• Programmers Guide
– ftp://ftp.software.ibm.com/software/pervasive/info/multimodal
/XHTML_voice_programmers_guide.pdf
• For a variety of small illustrative applications
– http://www.larson-tech.com/MM-Projects/Demos.htm
96
James A. Larson
Developing & Delivering Multimodal Applications
Exercise 11
Specify the X+V notation for integrating the following VoiceXML
and XHTML code by completing the code on the next page
VoiceXML
<form id="stateForm">
<field name="state">
<prompt>Say a state name</prompt>
<grammar src="city.grxml"/>
</field>
</form>
XHTML
<form>
Result: <input type="text" name="in1"/>
</form>
97
James A. Larson
Developing & Delivering Multimodal Applications
Exercise 11 (continued)
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:vxml="http://www.w3.org/2001/vxml"
xmlns:ev="http://www.w3.org/2001/xml-events"
xmlns:xv="http://www.w3.org/2002/xhtml+voice">
<head>
<xv:sync xv:input="_______" xv:field="________"/>
<vxml:form id="________">
<vxml:field name="state" xv:id="________“>
<vxml:prompt>Say a state name</vxml:prompt>
<vxml:grammar src ="state.grxml"/>
</vxml:field>
</vxml:form>
</head>
<body
<form ev:event="load" ev:handler="#________">
Result: <input type="text" name="_______"/>
</form>
</body>
</html>
98
James A. Larson
Developing & Delivering Multimodal Applications
Interaction Manager Approaches
Objectoriented
X+V
W3C
Interaction
Manager
(C#)
Interaction
Manager
(XHTML)
SAPI 5.3
VoiceXML 2.0
Modules
Interaction
Manager
(SCXML)
XHTML
VoiceXML 3.0
InkML
99
James A. Larson
Developing & Delivering Multimodal Applications
MMI Architecture—4 Basic Components
•
Runtime Framework or Browser—
initializes application and
interprets the markup
•
Interaction Manager—coordinates
modality components and
provides application flow
•
Modality Components—provide
modality capabilities such as
speech, pen, keyboard, mouse
•
Data Model—handles shared data
Interaction
Manager Data
(SCXML) Model
XHTML
VoiceXML 3.0
InkML
100
James A. Larson
Developing & Delivering Multimodal Applications
Multimodal Architecture and Interfaces
•
A loosely-coupled, event-based
architecture for integrating multiple
modalities into applications
•
•
All communication is event-based
•
Components can also expose other events
as required
•
•
Encapsulation protects component data
•
Can be used outside a Web environment
Based on a set of standard life-cycle
events
Encapsulation enhances extensibility to
new modalities
Interaction
Manager Data
(SCXML) Model
XHTML
VoiceXML 3.0
InkML
101
James A. Larson
Developing & Delivering Multimodal Applications
Specify Interaction Manager Using Harel
State Charts
Prepare
State
Prepare
Response
(success)
Start
State
•
•
•
StartFail
Start
Response
WaitState
Extension of state transition
systems
Prepare
Response
(fail)
FailState
States
Transitions
Nested state-transition systems
• Parallel state-transition systems
• History
DoneFail
Done
Success
EndState
102
James A. Larson
Developing & Delivering Multimodal Applications
Example State Transition System
Prepare
State
Prepare
Response
(success)
Start
State
…
<state id="PrepareState">
StartFail
Start
Response
WaitState
State Chart XML (SCXML)
Prepare
Response
(fail)
<send event="prepare"
contentURL="hello.vxml"/>
FailState
<transition event="prepareResponse"
cond="status='success'"
target="StartState"/>
DoneFail
<transition event="prepareResponse"
cond="status='failure'"
target="FailState"/>
Done
Success
EndState
</state>
…
103
James A. Larson
Developing & Delivering Multimodal Applications
Example State Chart with Parallel States
Prepare
Voice
Prepare
Response
Success
Start
Voice
Start
Response
Wait
Voice
Done
Success
End
Voice
104
James A. Larson
Prepare
Response
Fail
Fail Voice
Prepare
GUI
Prepare
Response
Success
Start
GUI
Prepare
Response
Fail
Start
Response
Wait
GUI
Done
Success
End
GUI
Developing & Delivering Multimodal Applications
Fail GUI
The Life Cycle Events
Interaction
Manager
GUI
VUI
Interaction
Manager
GUI
VUI
Interaction
Manager
GUI
VUI
Interaction
Manager
GUI
VUI
Interaction
Manager
GUI
105
James A. Larson
VUI
Developing & Delivering Multimodal Applications
More Life Cycle Events
Interaction
Manager
GUI
VUI
Interaction
Manager
GUI
VUI
Interaction
Manager
GUI
Interaction
Manager
GUI
106
James A. Larson
VUI
Developing & Delivering Multimodal Applications
Synchronization Using the Lifecycle Data
Event
Interaction
Manager
GUI
• Intent-based events
– Capture the underlying intent
rather than the physical
manifestation of user-interaction
events
– Independent of the physical
characteristics of particular
devices
107
James A. Larson
VUI
• Data/reset
– Reset one or more field values to
null
• Data/focus
– Focus on another field
• Data/change
– Field value has changed
Developing & Delivering Multimodal Applications
Lifecycle Events between Interaction
Manager and Modality
Interaction Manager
Prepare
State
Prepare
Response
Success)
Prepare
Response
Fail
prepare
prepare response (failure)
prepare response (success)
start
Start
State
Start
Response
WaitState
Done
Success
Modality
start response (success)
FailState
start response (failure)
data
done
EndState
108
James A. Larson
Developing & Delivering Multimodal Applications
MMI Architecture Principles
•
Runtime Framework communicates with Modality Components
through asynchronous events
•
Modality Components don’t communicate directly with each other,
but indirectly through the Runtime Framework
•
Components must implement basic life cycle events, may expose
other events
•
Modality components can be nested (e.g. a Voice Dialog component
like a VoiceXML <form>)
•
•
Components need not be markup-based
109
EMMA communicates users’ inputs to the Interaction Manager
James A. Larson
Developing & Delivering Multimodal Applications
Modalities
Interaction
Manager Data
(SCXML) Model
• GUI Modality (XHTML)
– Adapter converts Lifecycle
events to XHTML events
– XHTML events converted to
lifecycle events
XHTML
VoiceXML 3.0
• Voice Modality (VoiceXML 3.0)
– Lifecyle events are embedded
into VoiceXML 3.0
110
James A. Larson
Developing & Delivering Multimodal Applications
Exercise 12
What should VoiceXML do when it receives each of the following
events?
A. Reset
B. Change
C. Focus
111
James A. Larson
Developing & Delivering Multimodal Applications
Modalities
VoiceXML 3.0 will support lifecycle events.
<form>
<catch name="change">
<assign name="city" value="data"/>
</catch>
Interaction
Manager Data
(SCXML) Model
…
<field name = "city">
<prompt> Blah </prompt>
XHTML
<grammar src="city.grxml"/>
<filled>
<send event="data.change" data="city"/> VoiceXML 3.0
</filled>
</field>
</form>
112
James A. Larson
Developing & Delivering Multimodal Applications
Exercise 13
What should HTML do when it receives each of the following events?
A. Reset
B. Change
C. Focus
113
James A. Larson
Developing & Delivering Multimodal Applications
Modalities
XHTML is extended to support lifecycle events
sent to a modality.
<head>
…
<ev:Listener ev:event="onChange"
ev:observer="app1"
ev:handler="onChangeHandler()";>
…
<script>
{function onChangeHandler()
post ("data", data="city")
}
</script>
</head>
Interaction
Manager Data
(SCXML) Model
XHTML
…
VoiceXML 3.0
<body id="app1"?
<input type="text" id=city "value= " "/>
</body>
…
114
James A. Larson
Developing & Delivering Multimodal Applications
Modalities
XHTML is extended to support lifecycle events
sent to the interaction manager
<head>
…
<handler type="text/javascript“ ev:event="data"
if (event="change"
{document.app1.city.value="data.city"}
</handler>
…
</head>
Interaction
Manager Data
(SCXML) Model
…
<body id="app1"?
<input type="text" id="city" value=" "/>
</body>
…
115
James A. Larson
XHTML
VoiceXML 3.0
Developing & Delivering Multimodal Applications
References
•
SCXML
– Second working draft available at
http://www.w3.org/TR/2006/WD-scxml-20060124/
– Open Source available from
http://jakarta.apache.org/commons/sandbox/scxml/
•
Multimodal Architecture and Interfaces
– Working draft available at http://www.w3.org/TR/2006/WD-mmi-arch20060414/
•
Voice Modality
– First working draft VoiceXML 3.0 scheduled for November 2007
•XHTML
– Full recommendation
– Adapters must be hand-coded
•
Other modalities
– TBD
116
James A. Larson
Developing & Delivering Multimodal Applications
Comparison
Standard Languages
Objectoriented
SRGS
SISR
SSML
Interaction Manager
C#
XHTML
SCXML
Modes
GUI
Speech
GUI
Speech
GUI
Speech
Ink
…
117
James A. Larson
X+V
VoiceXML
SRGS
SSML
SISR
XHTML
W3C
SCXML
SRGS
VoiceXML
SSML
SISR
XHTML
EMMA
CCXML
Developing & Delivering Multimodal Applications
Availability
SAPI 5.3
–
Microsoft Windows Vista®
X+V
–
ACCESS Systems’ NetFront Multimodal Browser for PocketPC 2003
http://www-306.ibm.com/software/pervasive/
multimodal/?Open&ca=daw-prod-mmb
Opera Software Multimodal Browser for Sharp Zaurus
http://www-306.ibm.com/software/pervasive/
multimodal/?Open&ca=daw-prod-mmb
Opera 9 for Windows
http://www.opera.com/
–
–
W3C
–
–
First working draft of VoiceXML 3.0 not yet available
Working drafts of SCXML are available; some open-source implementations are
available
Proprietary APIs
–
118
Available from vendor
James A. Larson
Developing & Delivering Multimodal Applications
Discussion Question
Should a developer insert SALT tags or X+V modules into an existing
Web page without redesigning the Web page?
119
James A. Larson
Developing & Delivering Multimodal Applications
Conclusion
• Multimodal applications offer benefits over today’s traditional
GUIs.
• Only use multimodal if there is a clear benefit.
• Standard languages are available today to develop
multimodal applications.
• Don’t reinvent the wheel.
• Creativity and lots of usability testing are necessary to create
world-class multimodal applications.
120
James A. Larson
Developing & Delivering Multimodal Applications
Web Resources
http://www.w3.org/voice
– Specification of grammar, semantic interpretation, and speech synthesis
languages
http://www.w3.org/2002/mmi
– Specification of EMMA and InkML languages
http:/www.microsoft.com (and query SALT)
– SALT specification and download instructions for adding SALT to Internet
Explorer
http://www-306.ibm.com/software/pervasive/multimodal/
– X+V specification; download Opera and ACCESS browsers
http://www.larson-tech.com/SALT/ReadMeFirst.html
– Student projects using SALT to develop multimodal applications
http://www.larson-tech.com/MMGuide.html or
http://www.w3.org/2002/mmi/Group/2006/Guidelines/
– User interface guidelines for multimodal applications
121
James A. Larson
Developing & Delivering Multimodal Applications
Status of W3C Multimodal Interface Languages
Recommendation
Proposed
Recommendation
Candidate
Recommendation
Last Call
Working Draft
Working Draft
Requirements
122
James A. Larson
Voice
XML 2.0
Voice
XML 2.1
Speech
Speech
RecogSynthesis Semantic
nition
Markup InterpretGrammar
Language
ation
Format
(SSML)
of
(SRGS)
1.0
Speech Extended
1.0
MultiRecogmodal
nition
(SISR) Interaction
(EMMA)
1.0
1.0
State
Chart
XML
(SCXML)
1.0
Developing & Delivering Multimodal Applications
InkXL
1.0
Questions
?
123
James A. Larson
Developing & Delivering Multimodal Applications
Answer to Exercise 5
Contentmanipulation
task
Select objects
Voice
Pen
Keyboard/
keypad
Mouse/joystick
(3) Speak the
name of the
object
(1) Point to or
circle the object
(4) Press keys to
position the
cursor on the
object and press
the select key
(2) Point to and
click on the object
or drag to select
text
Enter text
(2) Speak the
(3) Write the text (1) Press keys to (4) Spell the text
words in the text
spell the words in by selecting letters
the text
from a soft
keyboard
Enter symbols
(3) Say the name (1) Draw the
(4) Enter one or (2) Select the
of the symbol and symbol where it more characters symbol from a
where it should
should be placed that together
menu and indicate
be placed.
represent the
where it should be
symbol
placed
Enter sketches or (2) Verbally
(1) Draw the
(4) Impossible
(3) Create the
illustrations
describe the
sketch or
sketch by moving
sketch or
illustration
the mouse so it
illustration
leaves a trail
(similar to an
Etch-a-Sketch™)
124
James A. Larson
Developing & Delivering Multimodal Applications
Answer to Exercise 7
Write a grammar for zero to nineteen
<grammar
type = "application/srgs+xml"
root = "zero_to_19"
mode = "voice">
<rule id = "zero_to_19">
<one-of>
<ruleref uri = "#single_digit"/>
<ruleref uri ="#teens">
</rule>
</one-of>
<rule id = "single_digit">
<one-of>
<item> zero </item>
<item> one </item>
<item> two </item>
<item> three </item>
<item> four </item>
<item> five </item>
<item> six </item>
<item> seven </item>
<item> eight </item>
<item> nine </item>
</one-of>
</rule>
125
James A. Larson
<rule id = "#teens">
<one-of>
<item> ten</item>
<item> eleven </item>
<item> twelve </item>
<item> thirteen </item>
<item> fourteen </item>
<item> fifteen </item>
<item> sixteen </item>
<item> seventeen </item>
<item> eighteen </item>
<item> nineteen </item>
</one-of>
</rule>
</grammar>
Developing & Delivering Multimodal Applications
Answer to Exercise 8
<grammar type = "application/srgs+xml" root = "yes" mode = "voice">
<rule id = "yes">
<one-of>
<item> yes </item>
<item> sure </item>
<item> affirmative </item>
…
</one-of>
</rule>
</grammar>
126
James A. Larson
Developing & Delivering Multimodal Applications
Answer to Exercise 9
<grammar type = "application/srgs+xml" root = "yes" mode = "voice">
<rule id = "yes">
<one-of>
<item> yes </item>
<item> sure <tag> out = "yes" </tag> </item>
<item> affirmative <tag> out = "yes" </tag> </item>
…
</one-of>
</rule>
</grammar>
127
James A. Larson
Developing & Delivering Multimodal Applications
Answer to Exercise 10
Given the following two EMMA specifications,
what is the unified EMMA specification?
<interpretation mode = "speech">
<moneyTransfer>
<sourceAcct hook="ink"/>
<targetAcct hook="ink"/>
<amount> 300 </amount>
</moneyTransfer>
</interpretation>
<interpretation mode = "ink">
<moneyTransfer>
<sourceAcct> savings </sourceAcct>
<targetAcct> checking</targetAcct>
</moneyTransfer>
</interpretation>
<interpretation mode = "intp1">
<moneyTransfer>
<sourceAcct> savings </sourceAcct>
<targetAcct> checking</targetAcct>
<amount> 300 </amount>
</moneyTransfer>
</interpretation>
128
James A. Larson
Developing & Delivering Multimodal Applications
Answer to Exercise 11
<html xmlns= "http://www.w3.org/1999/xhtml"
xmlns:vxml= "http://www.w3.org/2001/vxml"
xmlns:ev= "http://www.w3.org/2001/xml-events"
xmlns:xv="http://www.w3.org/2002/xhtml+voice">
<head>
<xv:sync xv:input="in4" xv:field="#answer"/>
<vxml:form id= "stateForm">
<vxml:field name= "state" xv:id= "answer">
<vxml:prompt>Say a state name</vxml:prompt>
<vxml:grammar src = "state.grxml"/>
</vxml:field>
</vxml:form>
</head>
<body
<form ev:event="load" ev:handler="#stateForm">
Result: <input type="text" name="in4"/>
</form>
</body>
</html>
129
James A. Larson
Developing & Delivering Multimodal Applications
Exercise 12
What should HTML do when it receives each of the following events?
• Reset
– Reset the value
• Change
– Change the value
• Focus
– Prompt for the value now in focus
130
James A. Larson
Developing & Delivering Multimodal Applications
Exercise 13
What should HTML do when it receives each of the following events?
• Reset
– Reset the value
– Author decides if cursor should be moved to the reset value
• Change
– Change the value
– Author decides if cursor should be moved to the reset value
• Focus
– Move the cursor to the item in focus
131
James A. Larson
Developing & Delivering Multimodal Applications