Wake-Up-Word Speech Recognition: A Missing Link to Natural Language

Transcript Wake-Up-Word Speech Recognition: A Missing Link to Natural Language

Wake-Up-Word Speech
Recognition:
A Missing Link to Natural Language
Understanding
Dr. Veton Këpuska
ECE Department
[email protected]
What is: Wake-Up-Word
Recognition
Wake-Up-Word (WUW) Speech/Voice Recognition (SR):
Automatic Speech Recognition Task of identifying a single
word/phrase in a continuous free speech – Correct Recognition
(e.g.):
<HAL>
<Computer>
<Operator>
– Arthur Clark’s “Space Odyssey 2001”,
– Capt. Pickard’s Star Trek’s computer on
the starship “Enterprise”, or
– Capt. Këpuska’s WUW-SR System
& more importantly
Automatic Recognition of any other noise/sound/word/phrase etc.
NOT to be that WUW – Correct Rejection.
May 25, 2016
Dr. Veton Këpuska
Slide 2
WUW-SR
WUW-SR Requires Continuous Monitoring of Speech
WUW can be used to:
Get Attention,
Provide/Change Context,
Resynchronize Communication
Mimic Human to Human Interaction and Communication that
currently is not possible, &
Provides for significantly more efficient Solution (Memory and
CPU) vs. any Natural Language Understanding System.
It is a mode of communication that would enable more natural
interaction of man and machine.
May 25, 2016
Dr. Veton Këpuska
Slide 3
Natural Language Understanding
(NLU) Task
Massachusetts Institute of Technology’s (MIT’s) Spoken Language
Systems Laboratory’s mission statement states:
“Our goal is both simple and ambitious – create technology that
makes it possible for everyone in the world to interact with
computers via natural spoken language. Conversational
interfaces will enable us to converse with machines in much the
same way that we communicate with one another and will play a
fundamental role in facilitating our move toward an informationbased society”.
To achieve this goal, SR and NLU communities implicitly position
the solution to WUW problem in the context of solving overall
natural language understanding problem.
When a system that can understand the whole language is
developed, the WUW problem will be solved.
May 25, 2016
Dr. Veton Këpuska
Slide 4
Natural Language Understanding
Task - Problem
There are two major problems with the approach that
requires solving the WUW problem within a general
framework of the speech and natural language
understanding system:
Is an expensive solution (CPU, memory, etc.)
It does not exist yet because it is very difficult to achieve.
Even if it is possible to develop NLU Systems close to
human capabilities – WUW is still needed (see
previous slide 3).
May 25, 2016
Dr. Veton Këpuska
Slide 5
WUW-SR Acoustic-Linguistic
Context
Current Implementation of WUW recognizes
how he/she intuitively would use a proper
name to get attention:
It does not respond to other contexts where the
same word (e.g., “OPERATOR”) is used for
other purposes.
What are other WUW contexts?
May 25, 2016
Dr. Veton Këpuska
Slide 6
Wizard of Oz Experiment
(NSF 05-551 Proposal)
Study possible uses of WUW in humanto-human communication.
Collaboration with:
Dr. Deborah Carstens – Human Machine
Interface Specialist (FIT - Management
Information Systems)
Dr. Ron Wallace – Bio-Behavioral
Anthropology and English Language (UCF).
Department of Psychology – Behavior
Analysis Laboratory.
May 25, 2016
Dr. Veton Këpuska
Slide 7
History of Wake-Up-Word Speech
Recognition
Wildfire of Waltham Massachusetts:
Introduced rudimentary capability for Wake-UpWord (WUW) Recognition through Personal
Assistant application in mid 90’s.
At that time the solution was not recognized nor was
developed as being a WUW-SR problem.
Application was restricted to specific word:
“Wildfire”
This custom solution did not perform sufficiently well
and thus Wildfire does not exist any longer.
May 25, 2016
Dr. Veton Këpuska
Slide 8
History of Wake-Up-Word Speech
Recognition (cont.)
Këpuska generalized and introduced a novel way of
performing WUW Recognition while at ThinkEngine
Networks, Marlborough, MA (2001-2003)
Recognition performance of the patented solution
allows practical application of WUW for any suitable
word (e.g., Verizon’s “IOBI” project).
Demonstration uses fixed point DSP implementation
simulated in Windows platform.
New generation of WUW-SR system using floatingpoint C++ implementation almost ready for prime time.
Simulations of floating-point system indicate significant
improvement over the fixed point implementation
May 25, 2016
Dr. Veton Këpuska
Slide 9
Wake-Up-Word Speech Recognition
Technology
~26000 Number of Lines of Fixed Point
Implementation of C Code & Model Data.
Uses Dynamic Time Warping Algorithm for
Pattern Matching (DTW)
Features are based on Mel-Scale Cepstral
Coefficients (MFCC) + Delta’s and Second
Order Delta’s
Uses single Speaker Independent Model.
Achieves high density on DSP
May 25, 2016
Dr. Veton Këpuska
Slide 10
WUW-SR System: Initial
Development
ThinkEngine Networks, Marlborough, MA
84 Simultaneous Channels of WUW
Recognition on each fixed point TI’s
TMS320C205 DSP
200MHz
Memory Space:
64K Byte Program
64K Byte Data
2M Byte External Data
Total of 672 Channels with farm of 8 DSPs
Recognition Rate >95% with ~0% False
Acceptance.
May 25, 2016
Dr. Veton Këpuska
Slide 11
Solution: 3 Patented Inventions
Fundamental Contribution to Pattern
Recognition
Patent Application 13323-009001 - 10/152,095: “Dynamic
Time Warping (DTW) Matching”
Extended DTW Matching.
Patent Application 13323-010001 - 10/152,447:
“Rescoring using Distribution Distortion Measurements of
Dynamic Time Warping Match”
Feature Based Voice Activity Detector (VAD)
Patent Application 13323-011001 - 10/144,248: “Voice
Activity Detection Based on Cepstral Features”
May 25, 2016
Dr. Veton Këpuska
Slide 12
WUW Fixed-Point System
Performance
Distribution Plot of Confidence Scores for WUW
"Operator"
1.0
INV
INV-CUMMULATIVE
0.9
Equal
Error
Rate
0.8
0.7
OOV
OOV-CUMMULATIVE
Operating
Threshold
[%]
0.6
0.5
0.4
0.3
0.2
0.1
0
0
20
40
60
80
100
Confidence Score
(0-100)%
May 25, 2016
Dr. Veton Këpuska
Slide 13
WUW-SR Development Status
Implemented C++ ETSI-MFCC Front End:
Extraction of Mel-Filtered Cepstral Coefficients
Standard Processing Technique to be used as a
baseline
C++ Framework and applied implementation
emphasizes modularity to facilitate research
Implemented Dynamic Time Warping (DTW) as
a Back-End of the Recognition system.
Integrated Perl scripts to automate model
building and accuracy testing procedures.
Includes automatic graph generation
May 25, 2016
Dr. Veton Këpuska
Slide 14
Current Architecture of WUW-SR
System
VAD
Front-End
Back-End
May 25, 2016
Dr. Veton Këpuska
Slide 15
Performance of WUW-SR Floating
Point System
May 25, 2016
Dr. Veton Këpuska
Slide 16
WUW-SR System Performance
How is it possible to achieve this
performance? Considering:
Single Speaker Independent Model for
WUW
No Additional Modeling for other acoustic
events: noise/tone/sound/word/phrase
Clever use of Two-Pass Scoring
May 25, 2016
Dr. Veton Këpuska
Slide 17
Usual Recognition Scoring: First
Score
Standard “First” Recognition Score Performance
Lowest Score of an OOV Sample
May 25, 2016
Dr. Veton Këpuska
Slide 18
“Second” Score is NOTIndependent from the “First” Score
Distribution of Second Score as Function of First Score
Lowest Score of an OOV Sample
May 25, 2016
Dr. Veton Këpuska
Slide 19
How to Obtain “Second” Score?
All modern Speech Recognition Systems use
multiple scoring techniques:
Re-scoring N-best hypothesis to Improve Correct
Recognition based on:
More elaborate recognition algorithm
Baum-Welch Forward-Backward HMM Scoring vs.
Viterbi Scoring
Different Features
MFCC (Mel-scale Filtered Cepstral Coefficients)
RASTA-PLP (Relative Spectral Transform - Perceptual Linear
Prediction)
Other Proprietary front-end’s
Re-scoring using additional models (of non-WUW’s)
to improve Correct Rejection (“Garbage Models”)
May 25, 2016
Dr. Veton Këpuska
Slide 20
WUW-SR System
Uses Proprietary solution that
Does not require additional “Garbage
Models” to increase robustness and Correct
Rejection Rate, e.g.,
It is model independent, and even
It is matching algorithm independent (DTW,
HMM, Graphical Modeling, or any other
paradigm).
May 25, 2016
Dr. Veton Këpuska
Slide 21
What Next?
WUW-SR: Useful technology for numerous
applications:
“Voice Activated” Car Navigation System
Current Solutions apply mixed interfaces: Driver must
press a button while speaking to the system.
Dictation Systems: Require lunching the application
and “informing” the system when dictation is “on”
and when is “off”.
PDA – removing stylus as necessary interface tool.
Keyboard-less laptop computers.
“Smart Rooms”
May 25, 2016
Dr. Veton Këpuska
Slide 22
Smart Room Application
“COMPUTER”!
“COMPUTER”
Play Todd Agnew CD
6
9
#
3
8
*
4 5
7 8
<Percolating Sound>
1 2
Wake-UpWord Speech
Recognition
System
“Yes Master”
90
28
<Percolating Sound>
65
Microphone
array
25'-0"
25
90
May 25, 2016
Dr. Veton Këpuska
Slide 23
Microphone Arrays
Applied Perception Laboratory CE313
May 25, 2016
Dr. Veton Këpuska
Slide 24
Noise Removal
First Place at UML-ADI Competition
June, 2005.
Developed Wiener Filter Nose Removal and implemented on
Analog Devices “Shark” DSP:
May 25, 2016
Dr. Veton Këpuska
Slide 25
Speech Processing and
Recognition System Architecture
48 kHz to 8 kHz Down-sampling with 70 Tap FIR Filter
Wiener Filter Based Noise Removal:
Switch Controlled Activation of the De-noising Algorithm
Automatic Gain Control:
Switch Controlled Activation of the Algorithm
LED Indicate the processing state of the System
 Wake-Up-Word
Speech Recognition
Software
•~26000 Lines of
Speech
Recognition
Engine Code &
Model Data in C.
Host PC
EZ-Kit Lite
Sharc Processor
AD21161N
•~5000 Lines of
Embedded C code
Speakers
Microphone
May 25, 2016
Dr. Veton Këpuska
Speakers
Slide 26
Experimental Results
Windows PC
Noisy test file:
After de-noise:
May 25, 2016
Dr. Veton Këpuska
Slide 27
Experimental Results
Windows PC
Footloose:
Not Footloose:
May 25, 2016
Dr. Veton Këpuska
Slide 28
Results: why didn’t this work?
Hair dryer:
Still there?!?!:
May 25, 2016
Dr. Veton Këpuska
Slide 29
Experimental Results
Windows PC
Hair dryer:
Gone:
May 25, 2016
Dr. Veton Këpuska
Slide 30
Experimental Results on DSP
Brown Noise Example:
May 25, 2016
Dr. Veton Këpuska
Slide 31
Experimental Results on DSP
Drill Test
May 25, 2016
Dr. Veton Këpuska
Slide 32
Experimental Results on DSP
Closer Drill Noise
May 25, 2016
Dr. Veton Këpuska
Slide 33
Experimental Results on DSP
Brown Noise + Drill
May 25, 2016
Dr. Veton Këpuska
Slide 34
Research: Tools Development
MATLAB (NSF EMD-MLR), perl, gnuplot
May 25, 2016
Dr. Veton Këpuska
Slide 35
What is missing?
In need of more of highly motivated
students.
No news there!
Business opportunities and ventures
need to be considered.
Help, advice, … welcome.
May 25, 2016
Dr. Veton Këpuska
Slide 36

Wake-Up-Word Speech Recognition: A Missing Link to Natural Language

Transcript Wake-Up-Word Speech Recognition: A Missing Link to Natural Language

Directory