Natural Language Processing and Speech Enabled Applications

Transcript Natural Language Processing and Speech Enabled Applications

Natural Language Processing
and
Speech Enabled Applications
by Pavlovic Nenad
Presentation Content
What is natural language processing
– Speech synthesis
– Speech recognition
– Natural language understanding
Basic concepts and terms
Types of speech recognition engines
Hardware requirements
How speech recognition/synthesis works
Speech enabled applications
Applications of speech enabled system
Commercial & non-commercial software
2
Natural language processing
Natural Language Processing (NLP) or
Computational Linguistic (CL) “is a discipline
between linguistics and computer science which
is concerned with the computational aspects of
the human language faculty” [1].
“It belongs to the cognitive sciences and
overlaps with the field of artificial intelligence
(AI), a branch of computer science that is aiming
at computational models of human cognition” [1].
3
Natural Language Processing
Other words, NLP is a discipline that aims
to build computer systems that will be able
to analyze, understand and generate
human speech.
Therefore, NLP sub areas of research are:
– Speech Recognition (speech analysis),
– Speech Synthesis (speech generation), and
– Natural Language Understanding (NLU).
4
Speech Recognition & Synthesis
Speech recognition is the process of
converting spoken language to written text
or some similar form.
Speech synthesis is the process of
converting the text into spoken language.
5
Natural Language Understanding
Natural Language Understanding (NLU) is
a process of analysis of recognized words
and transforming them into data
meaningful to computer.
Other words, NLU is a computer based
system that “understands” human
language.
NLU is used in combination with speech
recognition.
6
Basic Terms and Concepts
Utterance is any stream of speech between two
periods of silence.
Pronunciation is what the speech engine thinks
a word should sound like.
Grammars define a domain (of words) within
which recognition engine works.
Vocabulary (dictionary) a list of words
(utterances) that can be recognized by the
speech recognition engine.
Training is the process of adapting the
recognition system to a speaker.
7
Basic Terms and Concepts
Accuracy is the measure of recognizer’s
ability to correctly recognize utterances.
Speaker Dependence
– Speaker dependent system is designed for
only one user (at the time).
– Speaker independent system is designed
for variety of speakers.
8
Types Of Speech Recognition
Speech recognizers are divided into several
different classes according to the type of
utterance that they can to recognize:
–
–
–
–
–
–
Isolated words,
Connected words,
Continuous speech (computer dictation)
Spontaneous speech
Voice Verification
Voice Identification
9
Hardware Requirements
Natural Language Processing requires
string systems in order to work accurately
and with a minimum response time.
The important hardware parts are:
– Sound Card
– Microphone
– Processor/RAM
10
How speech synthesis works?
There are five major steps in the process of
speech synthesis:
– Structure analysis: process the structure of the input
text.
– Text pre-processing: analyze input text for special
constructs of the language.
– Text-to-phoneme conversion: converts each word
to phonemes (e.g. “times” = “t ay m s”).
– Prosody analysis: determining appropriate prosody
for the sentence (e.g. pitch, timing, pausing, etc…).
– Waveform production: phoneme and prosody
information is used to produce the audio waveform.
11
How speech recognition works?
The basic characteristics of mostly used
speech recognizers are:
– Mono-lingual,
– Process a single input at the time,
– Can optionally adopt to the voice of speaker,
– Grammars can be dynamically updated, and
– Has a small defined set of properties.
12
How speech recognition works?
1. Grammar design:
Defines the words that may be spoken
by a user and the pattern in which they
may be spoken.
4. Word recognition:
Compare the sequence of likely
phonemes against the words and
patterns of words specified by grammar.
Gram mars
Speech
2. Signal Processing:
Analyze the spectrum
(frequency) characteristics
of the incoming audio.
Speech
Recognition
Engine
Ac oustic
Model
5. Result generation:
Provides the information about
the words that recognizer has
detected.
Text
3. Phoneme Recognition:
Compare spectrum patterns
To the patterns of the phonemes.
Holds the knowledge of the
environment (how user pronounces
13
Phonemes) – User profile.
Speech Enabled Applications -1
The primary aim of speech enabled
applications is to improve interaction
between user and machine.
For this purpose are used both speech
recognition and synthesis or either one of
them. It mostly depends of the type of
application and its purpose.
14
Speech Enabled Applications -2
Speech synthesis is farley easy for usage.
After setting up the “type” of voice, the
speed of “speaking”, the duration of pause
between sentences, and so on, speech
synthesis engine is ready for usage.
15
Speech enabled applications -3
Applying speech recognition requires careful
analysis of what could be the possible inputs to
the system, and the way in which user provides
the input.
The way in which user provides the input to the
system, and the way the application responds to
the user is called Natural Language Dialog.
Natural Language Dialog is the first decision that
developer must to make.
16
Natural Language Dialog -1
Three essential types of interaction that
are available to software applications are:
– Direct dialog,
– Mixed initiative dialog, and
– Natural dialog.
17
Natural Language Dialog -2
Direct Dialog
Interaction directs the user to perform a specific task by
asking for information at each turn and expecting the
specific words or phrases in response.
System:
User:
System:
User:
System:
User:
“Welcome to ABC bank customer services
system. Please say your name.”
“Nenad Pavlovic”
“Please say your account number.”
“1234-123-12332-1233”
“Would you like to perform a transfer or to see
the status on your account?”
“Transfer.”, etc…
18
Natural Language Dialog - 3
Mixed initiative dialog
Is similar to previous interaction dialog but it gives
speaker some freedom. However, it allows user to have
as much as little control as s/he desire.
System:
User:
System:
User:
“Welcome to ABC bank customer services
system. Please say your name.”
“My name is Nenad Pavlovic, and my account
number is: 1234-123-12332-1233”
“Would you like to perform a transfer or to see
the status on your account?”
“Show me the status and than go to
transfers.”, etc…
19
Natural Language Dialog - 4
Natural dialog
Allows user to enjoy a more unstructured interaction with
an application (as natural as possible)
System:
User:
System:
User:
System:
“Welcome to City Directory Dialer, how can I
help you?”
“I’d like to call Mr. George Eleftherakis in
Tsimiski building.”
“George Eleftherakis – Tsimiski building. Is
this correct?”
“Yes”
“George Eleftherakis is found in directory.
Calling…”, etc…
20
Grammars vs. Statistical NLU
More freedom is given to the user to
interact with application, the more complex
processing of input data become.
According to complexity of possible user
inputs and used interaction dialog, it will
be used on of two approaches of
implementation:
– Grammar-based NLU
– Statistical NLU
21
Grammars vs. Statistical NLU
Grammar-based NLU: relies on defining
(creating) the grammar, which means
constructing the phrases and stating all
posible words that can be used.
– Advantages: fast, allows freedom of phrases
construction.
– Disadvantages: used only for small set of
phrases and words, if word or phrase is not
defined it will not be recognized.
22
Grammars vs. Statistical NLU
Statistical NLU: relies on usage of
statistical model of utterances derived
from actual conversation data.
– Advantages: huge set of phrases and words
– Disadvantages: slow, difficult to add new
phrases.
23
Uses of speech applications
The speech technology is mostly used in
the following areas:
– Dictation
– Command and Control
– Telephony
– Wearables
– Medical Disabilities
– Embedded Applications
24
Speech Systems
Commercial
–
–
–
–
–
IBM’s ViaVoice (Linux, Windows, MacOS)
Dragon NaturalySpeaking (Windows)
Microsoft’s Speech Engine (Windows)
BaBear (Linux, Windows, MacOS)
SpeechWorks (Linux, Sparc & x86 Solaris, Tru64,
Unixware, Windows)
Non-commercial
–
–
–
–
OpenMind Speech (Linux)
XVoice (Linux)
CVoiceControl/kVOiceControl (Linux)
GVoice (Linux)
25
Conclusion
Developers’ perspective: developing speech
enabled application does not require redesigning
or explicitly designing systems to support
speech. It is treated and “attached entity” and
can be viewed as separate module. Also, It does
not require special linguistic or programming
skills.
Business perspective: usage of speech
enabled applications can noticeable improve the
accuracy and effectives of employees that work
with big number of data or people or both.
26
Thank you 
Pavlovic Nenad
[email protected]
References
[1]
Radev, R., D.(2001), “Natural Language Processing FAQ”, Columbia
University, Dept. of Computer Science, NYC.
28