Speech .NET - brains-N

Download Report

Transcript Speech .NET - brains-N

Speech in .NET
Sphinx CMU
November 2002
Presenter


casey chesnut
brains-N-brawn.com
–
–
–
2
Web Services
Mobile / Wireless
Speech
Audience



3
Java / C++ / VB / C# ?
VoiceXml ?
SALT / Speech .NET ?
Outline


MS Technologies
VoiceXml
–

Speech .NET
–



4
Demo
Demo
Future
Questions (throughout)
~25 slides
MS Technologies


Tools
Devices
–
–
–
–
5
Phone
Desktop PC
Pocket PC
Tablet PC
Tools







6

MS Agents
SAPI / Speech SDK 5.1 (.NET wrappable)
Office
AutoPC ???
ASP .NET (VoiceXml)
(beta) Speech .NET / IE Speech Add-In
… SALT Telephony gateway (early 2003)
… Pocket IE Speech Add-In (mid 2003)
Devices

Phone
–

Desktop PC
–

small market, opportunities for speech (device limitations)
Tablet PC
–
7
large market, speech input is slower and uncomfortable
Pocket PC
–

billions of devices, people are comfortable speaking to
new market, speech friendly (slate models don’t have
keyboards)
Phone

ASP .NET w/ VoiceXml 2.0
–
–

Speech .NET VoiceOnly
–
–
–
8
Production quality now
Multiple vendor support
Currently no way to deploy and test over a phone
Speech .NET Beta 2 has telephony simulation
MS target market for Speech .NET
Desktop PC

Web
–
Speech .NET MultiModal

–
–

Embedded control w/SAPI
MS Agents
Fat
–
–
9
Beta 2 IE Speech Add-In
SAPI
MS Agents
Pocket PC

Web
–

Fat
–
–
10
SALT Pocket IE Speech Add-Ins (mid 2003)
3rd parties only
MS Reader does not support TTS
Tablet PC - TODAY!

Web
–
–
–

Fat
–
–
–
–
11
… same as desktop PC
Beta 2 has added support for Tablet PC
Virtual keyboard has speech control
… same as desktop PC
Virtual keyboard has speech control
MS Reader should be able to support TTS
Digital Ink is currently more compelling to MS
VoiceXml

XML-based language
–
–
Declarative – XML tags, grammars
Procedural – Javascript

–
–
12
Telephony Gateway is the client
Event driven – Bargein, Goodbye
Object oriented – Properties
Usage

Input
–
–
–

Output
–
–

Text-To-Speech
Prerecorded audio files
Telephony control
–
13
Speech Recognition (Command and Control)
DTMF
Voice recording and posting to a server
Hang-up, Transfers, …
Architecture
14
VoiceXml

DEMO
–
–
–
15
/vxml (VS.NET)
Mobile ADK (menu1.aspx)
BeVocal
VoiceXml - SALT

VoiceXml : ??? : : SALT : Speech .NET
–



Nuance has some WYSIWYG
SALT is considered lightweight to VoiceXml
SALT was submitted to W3C August 2002
VoiceXml is v2.0 in W3C
–
Mandatory W3C grammar spec



16
Beta 2 Speech .NET has moved to W3C SRGS
VoiceXml has complementary specs (ccXml)
VoiceXml is moving to MultiModal as well
VoiceXml - SALT






17
VoiceXml = AT&T, Motorola, TellMe, (IBM)
SALT = MS, SpeechWorks, Intel, (BeVocal)
VoiceXml has multiple vendor support with
venture capital from before the burst
Most vendors will support both specs
VoiceXml has ~ 15,000 developers
SALT has potentially millions
SALT



I have not read the new spec 
Remember doing an in-head mapping to VoiceXml
when reading an early spec
Why
–
–
–

Why not VoiceXml
–
18
Common spec for MultiModal operation
Multiple modes of interaction with the same syntax
Speech enabling existing sites
MultiModal retrofit harder than redo
Speech .NET




MS implementation of SALT
(VoiceWebSolutions + DreamWeaver MX)
Some Beta 1 Speech .NET apps still work,
because SALT has not changed much, but
Speech .NET Beta 2 controls have
VoiceXml not as portable between vendors
as it should be, the Speech .NET controls
could help mitigate this for SALT
–
19
i.e. layer of abstraction for voice browser wars
Architecture
20
Code


Creating static grammars and prompts
Very little server-side code
–
–


21
Only dynamic grammars / prompts
Server-side code mods to better support speech
Mainly setting properties on Speech controls
and tying to client-side javascript
Tie javascript to mouse-click events to avoid
redundant code
Impression

Separate app layers to reduce complexity
–





22

Voice UI will be less functional, design is key
Learning low level SALT might be easier than high
level Speech .NET controls
Application controls change this in Beta 2
Speech .NET has a great debugger (now server side
too), grammar, and prompt tools
Speech Control Editor was needed for dev
IE Audio meter was needed for MultiModal
MultiModal has some time to grow
Speech .NET

DEMO
–
–
23
Speech .NET Beta 2 (VS .NET)
/noHands (VoiceOnly web app)
Industry

Wrote 1st VoiceXml article a year ago
–
–

Wrote 1st Speech .NET article 5 months ago
–
24
Received 1st proposal request last month
1 other proposal request since then
Request for an article from MSDN magazine
Voice Recognition

PSTN is less secure than Internet!
–



Traditionally spoken password OR DTMF pin, also #
Clients always confuse with speech recognition
Not a part of VoiceXml or SALT specs
–



25
More accessible and easier to automate hack
Telephony gateways proprietary implementations
Not useful for identifying somebody
Useful for confirming somebody is whom they say
they are
Prints have to change when device changes
Future (MS Speech)




SALT Telephony gateways
Speech .NET (VoiceOnly then MultiModal)
Pocket IE Speech Add-In
NET Fat-client Speech APIs
–


26
Desktop / Tablet / PPC
MS or 3rd party VS .NET VoiceXml controls
Possibility for Speech .NET controls to
render both SALT and VoiceXml
Future




Lots of W3C Voice specs …
VoiceXml MultiModal browser
Auto (hands-free, navigation, radio)
3G (bridge voice and wireless web)
–
–
–

27
offload Speech processing
VOIP or PSTN
Pocket PC Phone Edition / SmartPhones
IBM recently announced chip for Speech on
mobile devices
Questions
28