SpeechWorks Solutions from ScanSoft
Download
Report
Transcript SpeechWorks Solutions from ScanSoft
Transforming Contact Centers
with Speech and IP
Jack Chase, Director of Product Management , NMS
Rob Kassel, Senior Manager, Network Speech Products, Nuance
Agenda
The Evolution of Contact Centers
Business trends
Architectures
Speech Technology Update — Rob Kassel, Nuance
MRCP-enabled speech
www.nmscommunications.com
Slide 2
Contact Center Evolution
www.nmscommunications.com
Slide 3
Evolution of Contact Centers:
Business Trends
First Generation
Second Generation
Third Generation
Virtual Call Center
Stand-alone sites
Limited PBX routing
Customer talks into
phone Agent types
into computer
Single and
distributed sites
Some use of
IVRU and ACD
IVRU & ACD
integration
Multi-media access
Email, fax, web
Screen pops
Some call routing
via ACD
Integrated
ERP/CRM
Skills-based routing
Hardware-based
Cost Center
www.nmscommunications.com
Integration and
Technology
Slide 4
Solving Business
Problems:
Profit Center
The Obvious Cost Savings Target
Outsourced Calls
7%
Telecom Costs
15%
Technology
12%
Agent Costs
66%
Source: Benchmark Portal, 2002
www.nmscommunications.com
Slide 5
The Cost of Customer Interaction
is Reduced with Self Service
Email
$16.00
$40
$14.00
Chat
Phone
$12.00
$10.00
$8.00
$7.00
Self-Service
$6.00
$5.00
$4.00
$5.50
Web
$2.00
$0.00
IVR
$0.24
$0.45
Assisted Service
Source: Gartner Group, 2002
www.nmscommunications.com
Slide 6
Evolution of Contact Centers:
Technology Trends
Self-service using web, ASR and TTS is
reducing the dependency on live agents; costs
Web, email, and messaging are freely mixed
with phone calls in a single queue
Network based contact centers are becoming a
significant phenomenon
VoIP is lowering system costs at the agent and
between system components
By 2007, 30% of contact center agents will be on VoIP
www.nmscommunications.com
Slide 7
Circuit-Based Contact Center
CRM
CTI
IVR
ACD
PSTN
Circuit
Data
www.nmscommunications.com
Slide 8
VoIP in an IP Contact Center
CRM
SelfService
Operations
Center
Contact Center
(ACD+CTI
+IVR+Speech)
Site A
PSTN
VOIP
Circuit
Data
VOIP
www.nmscommunications.com
IP-PBX
Slide 9
Site B
Upgrading with MRCP and VXML
CRM
VXML
Server
Operations
Center
VXML
Application
Server
Site A
MRCP
SIP, CCXML
RTP
PSTN
Speech
Server
Media
Server
Circuit
Data
VOIP
www.nmscommunications.com
IP-PBX
Slide 10
Site B
Speech Technology Update
Rob Kassel, Senior Manager,
Network Speech Products, Nuance
www.nuance.com
Slide 11
The Need For Speech Recognition
DTMF often is used for customer self-service
Numeric entry is easy… unless you are reading
Spelling entry is more difficult
Menus need to be enumerated, can’t be too long
Deep menu structure becomes tiresome
Assignment inconsistent between vendors (e.g., voicemail)
How do you enter “5 ½%” or “Albuquerque”?
With speech, questions are answered naturally
Caller satisfaction is higher
Fewer zero-outs leads to additional cost savings
www.nuance.com
Slide 12
Speech Recognition Process
Speech
Speech
Detector
Feature
Extraction
Grammar
Phoneme
Classifier
Grammar
Compiler
Search
System
Dictionary
Pronunciation
Rules
Confidence
Scoring
Results
www.nuance.com
Slide 13
Acoustic
Models
Speech Recognition Challenges
Processor and memory demands
Speech can be difficult to decode, even for humans
Fixed, confusable vocabularies: “B-C-D-E-G-P-T-V-Z”
Ambiguous boundaries: “It’s hard to wreck a nice beach!”
Speaker variability: dialect, volume, gender, etc.
Noise rejection: hands-free, mobile, telematics
Out-of-vocabulary rejection & confidence measures
Callers don’t always say what you might expect…
Yes or no?
www.nuance.com
Slide 14
Speech Recognition: State of the Art
Callers speak naturally in directed dialogs
High accuracy, infrequent confirmation
Million-word vocabularies:
stocks, proper names, street addresses
Scripting to control values returned to application:
“half past three” can return “1530” or “afternoon”
Open-ended responses, especially for call routing
Allows for questions like “How may I help you?”
Based on statistical methods trained from examples
www.nuance.com
Slide 15
The Need For Text-To-Speech
Professional recordings best for fixed content
Word concatenation is difficult to do well
Often used for numeric output
Can sound mechanical; irritating when frequent
Large output vocabularies fairly common
(e.g. city names)
Some applications defy recordings
(e.g. messaging)
www.nuance.com
Slide 16
TTS Text Analysis
Source Text
System
Dictionary
Text
Normalization
“Are you there?” are + you + there + <question>
$31 thirty one dollars
ATM eh tee em NATO nay-toh
A.M. eh em CUL8R see you later
Homograph
Disambiguation
minute = 60 seconds minute = tiny
Dr. Jones doctor jones Jones Dr. jones drive
11210 eleven thousand two hundred ten (number)
11210 one one two one oh (ZIP code)
Pronunciation
Generation
Pronunciation
Rules
Prosody
Generation
Determine which words require emphasis
Insert pauses based on phrase boundaries, lung capacity
Assign duration, pitch, and volume to each phoneme
Annotated Text
www.nuance.com
Slide 17
TTS Waveform Generation
Parametric
Concatenative
Annotated Text
Annotated Text
Parameter
Generation
Unit
Selection
Vocal Tract
Model
Concatenate
and Smooth
Speech
Speech
Can mimic natural speech if
parameters are set by hand
In practice sounds somewhat
robotic, the “drunken Swede”
Can produce a variety of voices
Extremely compact
www.nuance.com
Voice
Database
Units can be smaller or larger than
a phoneme
Database tends to be very large
Preserves speaker characteristics
and speaking style of voice talent
Slide 18
FEMALE
FEMALE
CHILD
Text-to-Speech: State of the Art
Naturalness of concatenative TTS is generally
preferred for call center applications
…but voice talent takes direction, more expressive
Custom voices to maintain brand identity
Use one voice talent for both recordings and TTS
Seamlessly mix dynamic data with static prompts
Apply prompt “patches” rapidly until
cost of recording session can be justified
www.nuance.com
Slide 19
Designing Speech Applications
Observe & interview call center agents
Listen to calls, develop caller profiles
Who are they?
What do they know?
Where are they calling from?
What are their goals?
What are their priorities?
Determine business objectives & rules
Define speech user interface
Call flows
Prompt wording
Error recovery; help and instructions
Anthropomorphism and persona
www.nuance.com
Slide 20
MRCP and Natural Access
www.nmscommunications.com
Slide 21
What is MRCP v1?
Control: MRCP/ RTSP/ TCP/ IP
MRCP Server
Speech: G.711/ RTP/ UDP/ IP
PSTN
IVR
IVR
Servers
Servers
IP
Speech
Speech
Servers
Servers
Speech servers are connected by VoIP to IVR servers
Standard API for ASR and TTS
Easy to reconfigure system as needs change
Easy to implement redundancy
www.nmscommunications.com
Slide 22
Natural Access and MRCP
Call
Control
PSTN
Trunking
IVR
Services
USAI
Conferencing
VoIP
(Fusion)
(MRCP)
Fax
Services
Video
Access
OAM
Service Managers, Libraries
SNMP
Driver
Driver
Driver
IPC
PCI
PCI
PCI
IP
HMP
CX Boards
www.nmscommunications.com
AG Boards
CG Boards
Slide 23
PacketMedia
HMP
Universal Speech Access
Makes Speech Integration Easy
www.nmscommunications.com
Slide 24
Current Support for
Universal Speech Access
Vendor
Type
Universal Speech
Access 1.0
Universal Speech Access
1.1
Nuance
ASR
MRCP Server SP5
Nuance 8.5
MRCP Server SP7 Nuance
8.5
Nuance
(ScanSoft)
ASR
OSMS 2.0.1
OSR 2.0
SWMS 3.1
OSR 3.0
Nuance
TTS
Vocalizer 3.0
Vocalizer 3.0.8
Nuance
(ScanSoft)
TTS
OSMS 2.0.1
Speechify 2.0
SWMS 3.1
RealSpeak 4.0
Telisma
ASR
Philsoft 3.2
teliSpeech 1.0 SP4
Loquendo
ASR
N/A
Loquendo ASR LSS 6.0
www.nmscommunications.com
Slide 25
What’s Next for MRCP?
MRCP v2
draft-ietf-speechsc-mrcpv2-06, Feb 20, 2005
Adds SIP/SDP for session setup
Replaces RTSP
Adds support for speaker verification
Little deployment yet
NMS will update USAI when deployments occur
www.nmscommunications.com
Slide 26
Questions?
Contact Info:
[email protected]
[email protected]