SpeechWorks Solutions from ScanSoft

Download Report

Transcript SpeechWorks Solutions from ScanSoft

Transforming Contact Centers
with Speech and IP
Jack Chase, Director of Product Management , NMS
Rob Kassel, Senior Manager, Network Speech Products, Nuance
Agenda
 The Evolution of Contact Centers
 Business trends
 Architectures
 Speech Technology Update — Rob Kassel, Nuance
 MRCP-enabled speech
www.nmscommunications.com
Slide 2
Contact Center Evolution
www.nmscommunications.com
Slide 3
Evolution of Contact Centers:
Business Trends
First Generation
Second Generation
Third Generation
Virtual Call Center
Stand-alone sites
Limited PBX routing
Customer talks into
phone  Agent types
into computer
Single and
distributed sites
Some use of
IVRU and ACD
IVRU & ACD
integration
Multi-media access
Email, fax, web
Screen pops
Some call routing
via ACD
Integrated
ERP/CRM
Skills-based routing
Hardware-based
Cost Center
www.nmscommunications.com
Integration and
Technology
Slide 4
Solving Business
Problems:
Profit Center
The Obvious Cost Savings Target
Outsourced Calls
7%
Telecom Costs
15%
Technology
12%
Agent Costs
66%
Source: Benchmark Portal, 2002
www.nmscommunications.com
Slide 5
The Cost of Customer Interaction
is Reduced with Self Service
Email
$16.00
$40
$14.00
Chat
Phone
$12.00
$10.00
$8.00
$7.00
Self-Service
$6.00
$5.00
$4.00
$5.50
Web
$2.00
$0.00
IVR
$0.24
$0.45
Assisted Service
Source: Gartner Group, 2002
www.nmscommunications.com
Slide 6
Evolution of Contact Centers:
Technology Trends
 Self-service using web, ASR and TTS is
reducing the dependency on live agents; costs
 Web, email, and messaging are freely mixed
with phone calls in a single queue
 Network based contact centers are becoming a
significant phenomenon
 VoIP is lowering system costs at the agent and
between system components
 By 2007, 30% of contact center agents will be on VoIP
www.nmscommunications.com
Slide 7
Circuit-Based Contact Center
CRM
CTI
IVR
ACD
PSTN
Circuit
Data
www.nmscommunications.com
Slide 8
VoIP in an IP Contact Center
CRM
SelfService
Operations
Center
Contact Center
(ACD+CTI
+IVR+Speech)
Site A
PSTN
VOIP
Circuit
Data
VOIP
www.nmscommunications.com
IP-PBX
Slide 9
Site B
Upgrading with MRCP and VXML
CRM
VXML
Server
Operations
Center
VXML
Application
Server
Site A
MRCP
SIP, CCXML
RTP
PSTN
Speech
Server
Media
Server
Circuit
Data
VOIP
www.nmscommunications.com
IP-PBX
Slide 10
Site B
Speech Technology Update
Rob Kassel, Senior Manager,
Network Speech Products, Nuance
www.nuance.com
Slide 11
The Need For Speech Recognition
 DTMF often is used for customer self-service
 Numeric entry is easy… unless you are reading
 Spelling entry is more difficult
 Menus need to be enumerated, can’t be too long
 Deep menu structure becomes tiresome
 Assignment inconsistent between vendors (e.g., voicemail)
 How do you enter “5 ½%” or “Albuquerque”?
 With speech, questions are answered naturally
 Caller satisfaction is higher
 Fewer zero-outs leads to additional cost savings
www.nuance.com
Slide 12
Speech Recognition Process
Speech
Speech
Detector
Feature
Extraction
Grammar
Phoneme
Classifier
Grammar
Compiler
Search
System
Dictionary
Pronunciation
Rules
Confidence
Scoring
Results
www.nuance.com
Slide 13
Acoustic
Models
Speech Recognition Challenges
 Processor and memory demands
 Speech can be difficult to decode, even for humans
 Fixed, confusable vocabularies: “B-C-D-E-G-P-T-V-Z”
 Ambiguous boundaries: “It’s hard to wreck a nice beach!”
 Speaker variability: dialect, volume, gender, etc.
 Noise rejection: hands-free, mobile, telematics
 Out-of-vocabulary rejection & confidence measures
 Callers don’t always say what you might expect…
Yes or no?
www.nuance.com
Slide 14
Speech Recognition: State of the Art
 Callers speak naturally in directed dialogs
 High accuracy, infrequent confirmation
 Million-word vocabularies:
stocks, proper names, street addresses
 Scripting to control values returned to application:
“half past three” can return “1530” or “afternoon”
 Open-ended responses, especially for call routing
 Allows for questions like “How may I help you?”
 Based on statistical methods trained from examples
www.nuance.com
Slide 15
The Need For Text-To-Speech
 Professional recordings best for fixed content
 Word concatenation is difficult to do well
 Often used for numeric output
 Can sound mechanical; irritating when frequent
 Large output vocabularies fairly common
(e.g. city names)
 Some applications defy recordings
(e.g. messaging)
www.nuance.com
Slide 16
TTS Text Analysis
Source Text
System
Dictionary
Text
Normalization
“Are you there?”  are + you + there + <question>
$31  thirty one dollars
ATM  eh tee em NATO  nay-toh
A.M.  eh em CUL8R  see you later
Homograph
Disambiguation
minute = 60 seconds minute = tiny
Dr. Jones  doctor jones Jones Dr.  jones drive
11210  eleven thousand two hundred ten (number)
11210  one one two one oh (ZIP code)
Pronunciation
Generation
Pronunciation
Rules
Prosody
Generation
Determine which words require emphasis
Insert pauses based on phrase boundaries, lung capacity
Assign duration, pitch, and volume to each phoneme
Annotated Text
www.nuance.com
Slide 17
TTS Waveform Generation
Parametric
Concatenative
Annotated Text
Annotated Text
Parameter
Generation
Unit
Selection
Vocal Tract
Model
Concatenate
and Smooth
Speech
Speech
 Can mimic natural speech if
parameters are set by hand
 In practice sounds somewhat
robotic, the “drunken Swede”
 Can produce a variety of voices
 Extremely compact
www.nuance.com
Voice
Database
 Units can be smaller or larger than
a phoneme
 Database tends to be very large
 Preserves speaker characteristics
and speaking style of voice talent
Slide 18
FEMALE
FEMALE
CHILD
Text-to-Speech: State of the Art
 Naturalness of concatenative TTS is generally
preferred for call center applications
 …but voice talent takes direction, more expressive
 Custom voices to maintain brand identity
 Use one voice talent for both recordings and TTS
 Seamlessly mix dynamic data with static prompts
 Apply prompt “patches” rapidly until
cost of recording session can be justified
www.nuance.com
Slide 19
Designing Speech Applications
 Observe & interview call center agents
 Listen to calls, develop caller profiles
 Who are they?
 What do they know?
 Where are they calling from?
 What are their goals?
 What are their priorities?
 Determine business objectives & rules
 Define speech user interface
 Call flows
 Prompt wording
 Error recovery; help and instructions
 Anthropomorphism and persona
www.nuance.com
Slide 20
MRCP and Natural Access
www.nmscommunications.com
Slide 21
What is MRCP v1?
Control: MRCP/ RTSP/ TCP/ IP
MRCP Server
Speech: G.711/ RTP/ UDP/ IP
PSTN




IVR
IVR
Servers
Servers
IP
Speech
Speech
Servers
Servers
Speech servers are connected by VoIP to IVR servers
Standard API for ASR and TTS
Easy to reconfigure system as needs change
Easy to implement redundancy
www.nmscommunications.com
Slide 22
Natural Access and MRCP
Call
Control
PSTN
Trunking
IVR
Services
USAI
Conferencing
VoIP
(Fusion)
(MRCP)
Fax
Services
Video
Access
OAM
Service Managers, Libraries
SNMP
Driver
Driver
Driver
IPC
PCI
PCI
PCI
IP
HMP
CX Boards
www.nmscommunications.com
AG Boards
CG Boards
Slide 23
PacketMedia
HMP
Universal Speech Access
Makes Speech Integration Easy
www.nmscommunications.com
Slide 24
Current Support for
Universal Speech Access
Vendor
Type
Universal Speech
Access 1.0
Universal Speech Access
1.1
Nuance
ASR
MRCP Server SP5
Nuance 8.5
MRCP Server SP7 Nuance
8.5
Nuance
(ScanSoft)
ASR
OSMS 2.0.1
OSR 2.0
SWMS 3.1
OSR 3.0
Nuance
TTS
Vocalizer 3.0
Vocalizer 3.0.8
Nuance
(ScanSoft)
TTS
OSMS 2.0.1
Speechify 2.0
SWMS 3.1
RealSpeak 4.0
Telisma
ASR
Philsoft 3.2
teliSpeech 1.0 SP4
Loquendo
ASR
N/A
Loquendo ASR LSS 6.0
www.nmscommunications.com
Slide 25
What’s Next for MRCP?
 MRCP v2
 draft-ietf-speechsc-mrcpv2-06, Feb 20, 2005
 Adds SIP/SDP for session setup
 Replaces RTSP
 Adds support for speaker verification
 Little deployment yet
 NMS will update USAI when deployments occur
www.nmscommunications.com
Slide 26
Questions?
Contact Info:
[email protected]
[email protected]