Speech Interfaces - Georgia Institute of Technology
Download
Report
Transcript Speech Interfaces - Georgia Institute of Technology
Speech User Interfaces
CS 160, Spring 2002
Professor James Landay
February 20, 2002
2/20/2002
1
UI Hall of Fame or Shame?
Dialog box
ask if you want to delete
2/20/2002
2
UI Hall of Shame!
Dialog box
ask if you want to delete
Problems?
use of color problematic
Yes (green), No (red)
R-G color deficiency
cultural mismatch
Western
• green good
• red bad
Eastern & others differ
2/20/2002
3
Speech User Interfaces
CS 160, Spring 2002
Professor James Landay
February 20, 2002
2/20/2002
4
Outline
Review
Motivation for speech UIs
Speech recognition
UI problems with speech UIs
SpeechActs: Guidelines for speech UIs
Announcements
Speech UI design tools
Multimodal UIs
2/20/2002
5
Review
Why do we prototype?
get feedback on our design from customers – faster & cheaper
Why use low-fi prototypes?
traditional methods take too long & focus designers & customers
on the wrong (visual) issues
What is the Wizard of Oz technique?
faking the interaction
What is the advantage of using informal tools like SILK,
DENIM, & SUEDE?
2/20/2002
advantages of electronic medium (editing, reuse, distribution, etc.)
faster than traditional UI tools
do not focus designers/customers on the wrong issues
ability to support testing & analysis of resulting data
6
Motivation for Speech UIs:
Pervasive Information
Access
Information
&
Services
2/20/2002
I-Land vision by Streitz, et. al.
7
UIs in the Pervasive Computing Era
Future computing devices won’t
have the same UI as current PCs
wide range of devices
small or embedded in environment
often w/ “alternative” I/O & w/o screens
information appliances
I-Land vision by Streitz, et. al.
2/20/2002
8
Information Access via Speech
Read my
important
email
2/20/2002
9
Speech UI Motivation
Smaller devices -> difficult I/O
people can talk at ~ 90 wpm -> high speed
“Virtually unlimited” set of commands
Freedom for other body parts
imagine you are working on your car & need to know
something from the manual
Natural
evolutionarily selected for
reading, writing, & typing are not (too new)
2/20/2002
10
Why are Speech UIs Hard to Get
Right?
Speech recognition far from perfect
imagine inputting commands w/ the mouse &
getting the wrong result 5-20% of the time
Speech UIs have no visible state
can’t see what you have done before or what
affect your commands have had
Speech UIs are hard to learn
how do you explore the interface? how do
you find out what you can say?
2/20/2002
11
Speech UIs Require
Speech recognition
the computer understanding what the customer is saying
Speech production (or synthesis)
the computer talking to the customer
2/20/2002
12
Speech Recognition
Continuous vs. non-continuous
Speaker independent vs. dependent
Speech often misunderstood by people
feedback via speech, facial expressions, & gesture
Recognizers trained with real samples
often get gender-based problems
Based on probabilities (HMMs - Bayes)
trigrams of sounds or words
Several popular recognizers
Nuance, SpeechWorks, IBM ViaVoice
2/20/2002
13
Speech Production
Three frequency regions of great intensity
visible on oscilloscope
come from larynx, throat, mouth
Two needed for recognition but “tinny”
Can generate emotion affect in speech
Demo
anger, disgust, gladness, sadness, fear, & surprise
http://cahn.www.media.mit.edu/people/cahn/emo
t-speech.html
2/20/2002
14
Recognition Problems
Poor recognition
humans < 1% error rate on dictation
top recognition systems get 5-10% error rates
computers don’t use much context
Background noise
even worse recognition rates (20-40% error)
Slow
simple matter of hardware getting faster
in 10 years gone from 5 high-end workstations required to
some speech systems running on laptops or even PDAs
2/20/2002
15
More Recognition Problems
Isolated, short words difficult
common words become short
Segmentation
silly versus sill lea
Spelling
mail vs. male -> need to understand language
2/20/2002
16
Speech UI Problems
Speech UI no-nos
modes (no feedback)
certain commands only work when in specific states
deep hierarchies (aka voice mail hell)
Verbose feedback wastes time/patience
only confirm consequential things
use meaningful, short cues
Interruption
half-duplex communication (i.e., no barge-in support)
Too much speech on the part of customer is tiring
Speech takes up space in working memory
can cause problems when problem solving
2/20/2002
17
SpeechActs:
Guidelines for Speech UIs
Speech interface to computer tools
email, calendar, weather, stock quotes
Establish common ground & shared context
make sure people know where they are in the conversation
Pacing
recog. delays are unnatural, make it clear when this occurs
barge-in lets user interrupt like in real conversations
tapering of prompts
progressive assistance: short errors messages at first,
longer when user needs more help
implicit confirmation: include confirm in next command
2/20/2002
18
SpeechActs Video
Announcements
Task analysis / Contextual inquiry HW
average = 79/100, stdev. 8.4
Low-fi user test due Monday
questions
If you haven’t gotten a laptop yet, check
with Wai-ling after class
2/20/2002
20
SUEDE:
Low-fi Prototyping for Speech-based UIs
Supports design practice
example scripts
Wizard of Oz
error simulation
iterative design (design-test-analysis)
Informal user interface
no speech recognition/synthesis
need not be programming
expert
fast & fluid design
2/20/2002
21
machine prompt
user response
2/20/2002
23
2/20/2002
24
SUEDE Summary
SUEDE supports speech-based UI design
moving from concrete examples to abstractions
allows designer to accept responses that aren’t
exactly what they originally had in mind
embeds iterative design w/ design-test-analyze
Designers using SUEDE need not be experts
in speech recognition technology
2/20/2002
25
One Vision of Future User
Interfaces
Star Trek style UI
verbally ask the computer for information
may be common in mobile/hands-busy situations
problem: hard to design, build, & use!
requires perfect speech recognition & language understanding
2/20/2002
26
Our Vision of Future User
Interfaces
Multimodal, Context-aware UIs
multimodal
uses multiple input modalities (speech & gesture) to
disambiguate
user says “move it to this screen” while pointing
context-aware
apps can be aware of location, user, what they are doing, …
people are talking -> don’t rely on speech I/O
Problem: how to prototype & test new ideas?
Informal UI Design Tools!
combine Wizard of Oz & informal storyboarding
2/20/2002
27
Multimodal Error Correction
Dictation error correction study
found users are better at correcting recognition
errors with a different input modality
recognizer got it wrong the first time -> it will
get it wrong the second time
hyperarticulating aggravates
Correct dictation errors with
vocal spelling, writing, typing, etc
2/20/2002
28
Summary
Speech UIs
may permit more natural computer access
allow us to use computers in more situations
are hard to get to work well
lack of visible state, tax working memory, recognition problems,
etc.
UI tools are needed for speech UI design
Multimodal UIs address some of the problems with
pure speech UIs
help disambiguate
help w/ correction
2/20/2002
29
Next Time
Web Design
Reading
The Limits of Speech Recognition by
Schneiderman
Optional: Designing SpeechActs: Issues in
Speech User Interfaces by Nicole Yankelovich,
Gina-Anne Levow, & Matt Marx
2/20/2002
30