The Universal Speech Interface (USI) PDG Progress Report

Download Report

Transcript The Universal Speech Interface (USI) PDG Progress Report

The Universal Speech Interface (USI)
PDG Progress Report
Thomas Harris, Stefanie Tomko, Arthur Toth, James Sanders,
Alex Rudnicky, Roni Rosenfeld
School of Computer Science
Carnegie Mellon University
4 June 2003
Outline
•
•
•
•
USI Project Summary
USI Device Control
USI User Studies
Tech Transfer Initiative
– USI Application Generator
Program Goals and Plan
• Overall program goal:
– Design a universal (i.e. device-independent)
interface for speech-based interaction with
wearable and home devices
• Program plan & milestones:
–
–
–
–
Q1: analysis, interaction principles
Q2: build device-simulation environment
Q3: build first device prototype
Q4: initial user studies; development tools
Program Deliverables
• A novel universal design for speech-based
interaction with wearable- and homedevices
• At least one demonstration system
exemplifying the new interface
• A set of tools for rapid prototyping of
compliant applications
The Universal Speech Interface (USI)
In a Nutshell
• Unifying approach to human-machine
speech communication
• Unified “look and feel” across all
applications
– analogous to the Xerox/Macintosh/Windows GUI
look-and-feel
• Stylized, semi-natural interaction
– analogous to the “Graffiti” alphabet for the Palm
PDA
Existing Speech Paradigm 1:
Command-and-control Systems
• Specialized language, optimized for a given
application
– each application has its own interface
• Intensive training of each user
• Daily use helps retain knowledge
Existing Speech Paradigm 2:
Unconstrained Dialog Systems
• “Off-the-street” users, no training required
• System models existing human behavior
• But this comes at a cost:
– each application requires a great deal of data,
labor, human expertise
– Speech Recognition technology is pushed to
the limit
– user does not easily grasp the application’s
functional limits
• Out-Of-Vocabulary words (OOV)
• Out-Of-Domain concepts, requests
Is a Third Paradigm Needed?
• In practice, people are likely to use:
– a handful of apps daily:
• scheduler, contact manager, email,...
– many apps occasionally:
• weather, restaurants, ...
• To exploit this, we need:
– flexible, powerful interface for familiar applications.
– immediate engagement with occasional or new
applications.
Our Approach
• Identify application-independent universals:
– user-side
– machine-side
• Find suitable, general solutions
– Human and machine meeting halfway
• Design a stylized, universal “look and feel”
• Teach it in 5 minutes
Universal Semantic primitives
• Help primitives
– what can the machine do? how do I do X? what can I say?
• Speech channel primitives
– detect & correct ASR errors; finished talking?
• Interaction primitives
– turn taking; question answering; session management; undo
• Application primitives
– environment variables: query, set
– objects (e.g. lists): describe, navigate, create, modify, delete
USI Systems Developed
• Information Access
– MovieLine
– FlightLine
– ApartmentLine
• Device Control
–
–
–
–
–
Stereo system
X-10 control (e.g., lights)
Alarm Clock applet
Digital Video Camera
Windows Media Player
USI Demonstration
• MovieLine
– Experimental subject
USI Device Control
Device Interaction Analysis
• Analysis was done on multiple devices
–
–
–
–
–
–
alarm clock / radio
VCR
cell phone
MP3 player
memo pad / email / vmail
copier/fax
USI/Device Design Issues
•
•
•
•
•
•
•
Confirmation strategy
Error handling strategy
Exploration
Navigation
Disambiguation / context mgmt
Orientation
Querying state variables
USI/Device Design Issues
•
•
•
•
•
•
•
Confirmation strategy: restate-&-execute
Error handling strategy: ignore
Exploration: “OPTIONS”
Navigation: use concept of ‘focus’
Disambiguation / context mgmt: implicit
Orientation: “STATUS”
Querying state variables: “WHAT IS THE...?”
Hooking up with the PUC project
• Fits within the PUC project’s vision of
automatically generated interfaces with
different modalities and form factors
• But, can also be used as a standalone
speech interface
• Compatibility with visual design is desirable,
but not always natural:
– nameless states (speech interface must have
name for everything!)
– speech interface can have shortcuts (“MODE: CD”
vs. “CD”)
Meshing with the PUC project
• Device capabilities specified by XML doc
• States vs. Action dichotomy of the visual
interface does not always conform to
speech interface intuition.
• For now, creating our own interface
specification document
• Ultimately, will augment XML DTD, so
both interfaces can co-exist
USI Device control
(a.k.a. James the Butler)
James
Stereo
x-bass
(mode)
<turns stereo on>
tuner
(radioband)
am
frequency...
station...
fm
auxiliary
seek
forward
backward
cd
(status)
play
disc
volume
on
volume up
off
volume down
#
pause
stop
next track
last track
random...
repeat...
frequency...
station...
digital camera...
Hardware hacking courtesy of the PUC project
off
USI Demonstration
• Device Control
– Alarm Clock Example
User Studies
User study
• Compared Speech Graffiti (SG) & natural
language MovieLines
• How does Speech Graffiti compare to a
natural language interface?
– Subjective user satisfaction
– Task completion rates
– Word error rates
• How do well do users "get" Speech Graffiti?
– How often do they speak within the grammar?
– In what ways do they deviate from the grammar?
Subjective user satisfaction
• 17 of 23 preferred Speech Graffiti (SG)
• SG user
satisfaction ratings
higher than NL in
all categories
• SG ratings positive
except in
annoyance &
habitability
NL- ML
SG- ML
OVERALL
speed
habitability
annoyance
cog. demand
likeability
system resp. acc.
1
2
3
4
5
6
7
me a n use r sa ti sfa cti o n r a ti ng
Computer experience & training
• Computer Science / Engineering
backgrounds and / or programming
experience
– Higher user satisfaction ratings
– Better task completion rates
• Training in-domain vs. out-of-domain
– No differences in user satisfaction or task
completion rates
Task completion
• Overall
8
– 67.9% SG tasks
– 67.4% NL tasks
– 5.43 of 8 SG tasks
– 5.30 of 8 NL tasks
6
me a n ta sk co mp l e ti o n r a te
• Individual means
7
5
4
3
2
1
0
SG- ML
NL- ML
Time-to-completion
• Completed tasks
– 67.9 seconds SG
– 73.4 seconds NL
• Incomplete tasks:
time, in seconds
59 incompletes
76.0
103.8
43.5
38.0
27.3
23.0
SGML
NLML
“best case”
59 incompletes
(inc)
(inc)
81.5
103.0
34.0
28.0
SGML
NLML
“real world”
Turns-to-completion
• Completed tasks
– 8.2 turns SG
– 3.9 turns NL
• Incomplete tasks:
# of turns
59 incompletes
(inc)
59 incompletes
(inc)
35
35
20
20
9.75
5
5
10
5
5
5
2
4
2
4
1
SG-ML
1
“best case”
NL-ML
SG-ML
2
3
NL-ML
“real world”
4
Word error rates
• Very high for both systems
– On "cleaned" set (on-task, non-noisy utts)
# of subj
subj
WER utts mean median
SG Movie 35.1% 3626 35.0% 30.0%
NL Movie 51.2% 1854 50.3% 48.9%
• Concept error is lower for USI
– SG: –29.2% from WER
– NL: +0.8% from WER
• Low error rate is key to acceptance
– 6 who preferred NL-ML had highest SG WER
WER & user satisfaction
user satisfaction rating
• Good correlation for SG
6
6
5
5
4
4
3
3
2
2
1
1
0
20
40
SG-ML
60
80
0
% word-error rate
20
40
60
NL-ML
80
How often do users speak within the
Speech Graffiti grammar?
… and
• grammaticality leads
to user satisfaction
mean
median
80.5%
87.4%
7
6
use r sa ti sfa cti o n r a ti ng
• Actually, pretty often!
5
4
3
2
1
0%
20%
40%
60%
% gr a mma ti ca l
80%
100%
How do users deviate from
the grammar?
subject- verb
agreement
5.7%
missing is /are
11%
value+options
1%
time sy ntax
1.3%
more sy ntax
4%
disfluency
4.3%
out- ofvocabulary
concept
5.1%
value only
6.7%
plural+options
2%
endpoint
1.6%
general sy ntax
20.6%
slot only
14.6%
key w ord problem
8.1%
out- ofvocabulary w ord
14.0%
Future Interface Design Work
• Redesign Help facility
– SG works best for those who "get it"
– Current system provides no assistance to "clueless user"
• Error analysis
– Compare failure cases in SG and NL interfaces
– Compare user recovery attempts in SG and NL
• Address issues of generalizability
– Promoting transparency of slot set and response sets
– Accessing information sets rather than single items
• Adjust grammar components
Future Architecture Work
• Integrate current USI environments
– Information Access
– Device Control
• Improve interface between PUC and
USI components
• Identify USI-specific techniques to
achieve lower WER
• Improved documentation and
distribution packaging
Tech Transfer Initiative
Tech Transfer Initiative
• Tools for creating new USI apps
– 3 days to create a new application
– prior exposure to speech technology highly
beneficial
– decided to further reduce the barrier
–  create an application generator
From 3 Days to a Few Hours
• A USI Application Generator
• New USI applications w/out programming!
• XML document fully specifies the
application
–
–
–
–
–
slot names
accepted inputs
data types
slot properties
...
From a Few Hours to 15 minutes?
• Created a Web interface to generating the
XML document
• Form filling, pulldown menus
• Strong effort to further simplify the process,
minimize complexity of form
– many defaults
– for less common choices, edit the XML doc.
• More importantly, no computer savvy needed
Web Application Generator
• Repository and tool for creating USI
database applications
• Abundant online help to guide users
through process
• Accessible to anyone with an Internet
connection
Web Application Generator
• Two step process:
– General specification
– Slot-by-slot specification
• choose datatype from built-in list, or create own
• Fully featured system with save, copy,
delete functionality
• Hides intricacies of XML document writing
• Advanced users have ability to further
alter the final XML document
General Specification screen with help box displayed.
Web Application Generator
• Built-in generic voice; can record own voice
• DB backend
–
–
–
–
Postgres
Oracle
ODBC (including ASCII files)
Ultimately: web tables
• Platform:
– originally: mixed Unix/Windows, telephone based
– converted to: pure Windows, telephone or laptop
Transferring USI to PDG members
• We do house calls!
– Carnegie Mellon will install USI developer
environment for each interested member
and will train member staff in the use of the
developer environment
– Provide a short tutorial on USI principles
and interface design
Thank you!
Pittsburgh Digital Greenhouse