Speech Interfaces - Georgia Institute of Technology

Download Report

Transcript Speech Interfaces - Georgia Institute of Technology

Speech User Interfaces
CS 160, Spring 2002
Professor James Landay
February 20, 2002
2/20/2002
1
UI Hall of Fame or Shame?
 Dialog box
 ask if you want to delete
2/20/2002
2
UI Hall of Shame!
 Dialog box
 ask if you want to delete
 Problems?
 use of color problematic
 Yes (green), No (red)
 R-G color deficiency
 cultural mismatch
 Western
• green good
• red bad
 Eastern & others differ
2/20/2002
3
Speech User Interfaces
CS 160, Spring 2002
Professor James Landay
February 20, 2002
2/20/2002
4
Outline








Review
Motivation for speech UIs
Speech recognition
UI problems with speech UIs
SpeechActs: Guidelines for speech UIs
Announcements
Speech UI design tools
Multimodal UIs
2/20/2002
5
Review
 Why do we prototype?
 get feedback on our design from customers – faster & cheaper
 Why use low-fi prototypes?
 traditional methods take too long & focus designers & customers
on the wrong (visual) issues
 What is the Wizard of Oz technique?
 faking the interaction
 What is the advantage of using informal tools like SILK,
DENIM, & SUEDE?




2/20/2002
advantages of electronic medium (editing, reuse, distribution, etc.)
faster than traditional UI tools
do not focus designers/customers on the wrong issues
ability to support testing & analysis of resulting data
6
Motivation for Speech UIs:
Pervasive Information
Access
Information
&
Services
2/20/2002
I-Land vision by Streitz, et. al.
7
UIs in the Pervasive Computing Era
 Future computing devices won’t
have the same UI as current PCs
 wide range of devices
 small or embedded in environment
 often w/ “alternative” I/O & w/o screens
 information appliances
I-Land vision by Streitz, et. al.
2/20/2002
8
Information Access via Speech
Read my
important
email
2/20/2002
9
Speech UI Motivation
 Smaller devices -> difficult I/O
 people can talk at ~ 90 wpm -> high speed
 “Virtually unlimited” set of commands
 Freedom for other body parts
 imagine you are working on your car & need to know
something from the manual
 Natural
 evolutionarily selected for
 reading, writing, & typing are not (too new)
2/20/2002
10
Why are Speech UIs Hard to Get
Right?
 Speech recognition far from perfect
 imagine inputting commands w/ the mouse &
getting the wrong result 5-20% of the time
 Speech UIs have no visible state
 can’t see what you have done before or what
affect your commands have had
 Speech UIs are hard to learn
 how do you explore the interface? how do
you find out what you can say?
2/20/2002
11
Speech UIs Require
 Speech recognition
 the computer understanding what the customer is saying
 Speech production (or synthesis)
 the computer talking to the customer
2/20/2002
12
Speech Recognition
 Continuous vs. non-continuous
 Speaker independent vs. dependent
 Speech often misunderstood by people
 feedback via speech, facial expressions, & gesture
 Recognizers trained with real samples
 often get gender-based problems
 Based on probabilities (HMMs - Bayes)
 trigrams of sounds or words
 Several popular recognizers
 Nuance, SpeechWorks, IBM ViaVoice
2/20/2002
13
Speech Production
 Three frequency regions of great intensity
visible on oscilloscope
 come from larynx, throat, mouth
 Two needed for recognition but “tinny”
 Can generate emotion affect in speech
 Demo
 anger, disgust, gladness, sadness, fear, & surprise
http://cahn.www.media.mit.edu/people/cahn/emo
t-speech.html
2/20/2002
14
Recognition Problems
 Poor recognition
 humans < 1% error rate on dictation
 top recognition systems get 5-10% error rates
 computers don’t use much context
 Background noise
 even worse recognition rates (20-40% error)
 Slow
 simple matter of hardware getting faster
 in 10 years gone from 5 high-end workstations required to
some speech systems running on laptops or even PDAs
2/20/2002
15
More Recognition Problems
 Isolated, short words difficult
 common words become short
 Segmentation
 silly versus sill lea
 Spelling
 mail vs. male -> need to understand language
2/20/2002
16
Speech UI Problems
 Speech UI no-nos
 modes (no feedback)
 certain commands only work when in specific states
 deep hierarchies (aka voice mail hell)
 Verbose feedback wastes time/patience
 only confirm consequential things
 use meaningful, short cues
 Interruption
 half-duplex communication (i.e., no barge-in support)
 Too much speech on the part of customer is tiring
 Speech takes up space in working memory
 can cause problems when problem solving
2/20/2002
17
SpeechActs:
Guidelines for Speech UIs
 Speech interface to computer tools
 email, calendar, weather, stock quotes
 Establish common ground & shared context
 make sure people know where they are in the conversation
 Pacing
recog. delays are unnatural, make it clear when this occurs
barge-in lets user interrupt like in real conversations
tapering of prompts
progressive assistance: short errors messages at first,
longer when user needs more help
 implicit confirmation: include confirm in next command




2/20/2002
18
SpeechActs Video
Announcements
 Task analysis / Contextual inquiry HW
 average = 79/100, stdev. 8.4
 Low-fi user test due Monday
 questions
 If you haven’t gotten a laptop yet, check
with Wai-ling after class
2/20/2002
20
SUEDE:
Low-fi Prototyping for Speech-based UIs
 Supports design practice




example scripts
Wizard of Oz
error simulation
iterative design (design-test-analysis)
 Informal user interface
 no speech recognition/synthesis
 need not be programming
expert
 fast & fluid design
2/20/2002
21
machine prompt
user response
2/20/2002
23
2/20/2002
24
SUEDE Summary
 SUEDE supports speech-based UI design
 moving from concrete examples to abstractions
 allows designer to accept responses that aren’t
exactly what they originally had in mind
 embeds iterative design w/ design-test-analyze
 Designers using SUEDE need not be experts
in speech recognition technology
2/20/2002
25
One Vision of Future User
Interfaces
 Star Trek style UI
 verbally ask the computer for information
 may be common in mobile/hands-busy situations
 problem: hard to design, build, & use!
 requires perfect speech recognition & language understanding
2/20/2002
26
Our Vision of Future User
Interfaces
 Multimodal, Context-aware UIs
 multimodal
 uses multiple input modalities (speech & gesture) to
disambiguate
 user says “move it to this screen” while pointing
 context-aware
 apps can be aware of location, user, what they are doing, …
 people are talking -> don’t rely on speech I/O
 Problem: how to prototype & test new ideas?
 Informal UI Design Tools!
 combine Wizard of Oz & informal storyboarding
2/20/2002
27
Multimodal Error Correction
 Dictation error correction study
 found users are better at correcting recognition
errors with a different input modality
 recognizer got it wrong the first time -> it will
get it wrong the second time
 hyperarticulating aggravates
 Correct dictation errors with
 vocal spelling, writing, typing, etc
2/20/2002
28
Summary
 Speech UIs
 may permit more natural computer access
 allow us to use computers in more situations
 are hard to get to work well
 lack of visible state, tax working memory, recognition problems,
etc.
 UI tools are needed for speech UI design
 Multimodal UIs address some of the problems with
pure speech UIs
 help disambiguate
 help w/ correction
2/20/2002
29
Next Time
 Web Design
 Reading
 The Limits of Speech Recognition by
Schneiderman
 Optional: Designing SpeechActs: Issues in
Speech User Interfaces by Nicole Yankelovich,
Gina-Anne Levow, & Matt Marx
2/20/2002
30