Natural Language Processing

Download Report

Transcript Natural Language Processing

Natural Language Processing
August 23, 2007
SpeechTEK University
Deborah Dahl
Conversational Technologies
Conversational Technologies
1
Description of the Tutorial
An introduction to the principles of natural language
processing and the role of natural language
processing in current and future speech applications
 9:00-9:15 Introduction: what is natural language
 9:15-10:15 Part 1: Overview and Principles
 10:15-10:45 (30 minute break)
 10:45-12:00 Part 2: Detailed Examples
Conversational Technologies
2
Attendees

Backgrounds and goals
Conversational Technologies
3
Audience and Background


A general technical background.
No natural language processing background will be
assumed, but experience developing speech
applications would be helpful.
Conversational Technologies
4
What is Natural Language?
Natural language is the kind of language that’s used to
communicate between people
 Can be spoken, written or gestural (in the case of
Sign Languages)
 There are several thousand currently spoken human
languages
Conversational Technologies
5
Why are We Interested in Natural
Language?

Support for more natural and effective
computer-human interactions by
accommodating the ways that people already
communicate
Conversational Technologies
6
Natural Language Processing



Natural language understanding
Natural language generation
Machine translation
Conversational Technologies
7
Part 1: Overview and Principles
Conversational Technologies
8
Goals





Understand what natural language is
Learn about the most common techniques for
processing natural language
Their strengths and weaknesses
Understand where natural language processing
technology is headed in the future.
Focus is on commercial applications
Conversational Technologies
9
Topics







What is natural language?
Issues in spoken natural language and how to handle
them
Statistical Language Models (SLM's)
speech grammars with semantic tags
Variability in expression, pronouns, and filling
multiple slots from a single utterance
How emerging standards such as EMMA will
contribute to more sophisticated future applications
Recent topics in natural language research and how
this research may eventually be utilized in future
applications
Conversational Technologies
10
Natural Language Understanding

The task of automatically assigning meaning
to language
Conversational Technologies
11
What natural language processing
isn’t



Speech recognition, which turns the sounds of
spoken language into the words of written language
Dialog management, which manages a natural
language interaction between a user and a computer
Artificial intelligence, which studies how to provide
intelligent capabilities to computers
Conversational Technologies
12
Assigning Meaning to Language




In most applications, the developer decides what the
set of possible meanings is
Meanings can be simple or complex
Language can be simple or complex
Current commercial techniques can



Assign simple meanings to simple language
Assign simple meanings to complex language
Research systems can handle more complex
meanings and language, but no existing system can
handle all meanings and all language for even one
human language
Conversational Technologies
13
Examples of Complex Language




Shakespeare
Religious texts
The United States Constitution
We don’t have to worry about assigning
meaning to these texts!
Conversational Technologies
14
Simple to Slightly More Complex
Language






“yes”
“New York”
“call home”
“a red t-shirt, size large”
“I want to go from Philadelphia to New York
on Sunday, August 19”
As language becomes more complex, the
more we need special techniques to process
it
Conversational Technologies
15
Human Communication Process?
Thought
language
Person A
Conversational Technologies
Thought
Person B
16
More Realistic Communication
Process
Should I believe this?
Could A be lying or lacking credibility?
If I think A is lying should I say so?
Did I hear it right?
Did I understand it?
Why did Person A say that?
language
Thought 1
Person A
How should I express this?
Is this something I really need to say?
What does B already know?
Why do I want to express this thought?
Do I want to impress B?
Might I offend B by saying this?
What language should I use?
Conversational Technologies
A thought somewhat
similar to Thought 1
Person B
17
Issues in Natural Language





Variability of expression
Infinite number of meanings that can be
expressed
Infinite number of possible sentences in a
language
Many ways to say the same thing
The same thing can have different meanings
in different contexts
Conversational Technologies
18
What is a Meaning?



Many approaches to representing meanings
in traditional linguistics and philosophy of
language
Most widely used commercial representation
is as a token or as a set of slot/value pairs
(also called “key/value” or “attribute/value”
pairs)
Often structured into a set of related
slot/value pairs (for example, the fields of a
VoiceXML <form>, or a traditional frame)
Conversational Technologies
19
Tokens


“my printer is printing horizontal bands and
everything is printing in blue”  “printer
problem”
“I can’t connect to the internet”  “internet
problem”
Conversational Technologies
20
What is a Meaning? Slot/Value Pairs


I want to go from Chicago to New York on August 19
midafternoon on United
Form/frame – airline reservation





Destination: New York
Departure city: Chicago
Departure date: August 19
Departure time: midafternoon
Airline: United
Conversational Technologies
21
Information Available for Extracting
Meaning
Used by today’s commercial systems





Words of the utterance
Word order
Grammatical endings
Specific grammar for the application
Information about what previous instances of that utterance
have meant
Used by research systems and people





Prosody (intonation, pauses, loudness, stress, timing)
General information about the language itself (dictionaries,
grammars, thesauri)
Context of the utterance
Information about the topic
Facial expressions, gestures
Conversational Technologies
22
Traditional Tasks in Natural Language
Understanding







(Recognition – speech, handwriting, OCR…)
Lexical lookup
Part of speech tagging
Sense disambiguation
Syntactic parsing
Semantic analysis
Pragmatic analysis
Conversational Technologies
23
Problems with Traditional Approaches


Try to describe the full language and a broad
set of meanings
For practical applications, it’s much easier to
just write a small grammar for a specific
application
Conversational Technologies
24
Natural Language Tasks in
Commercial Speech Systems







(Recognition – speech, handwriting, OCR…)
Lexical lookup (part of recognition)
Part of speech tagging – parts of speech not used
Sense disambiguation – not needed, constrained
application
Syntactic parsing – syntactic structure used indirectly
Semantic analysis
Done in parallel
Pragmatic analysis
Conversational Technologies
}
25
Extracting Meaning in Commercial
Applications


Filling slots by using semantically tagged
grammars (CFG’s)
Mapping complex utterances to categories
(SLM’s)
Conversational Technologies
26
Semantically Tagged Grammars



A grammar defines what the recognizer can
recognize (recognized strings)
Tags define return values for different
recognized strings
Information used: words of the utterance and
a special-purpose grammar
Conversational Technologies
27
Context-Free Grammar Formats







Represent what a speech recognizer can recognize
Example: Request  PoliteWord + Action + Item
(please open the door)
Speech Recognition Grammar Specification (SRGS)
(ABNF and XML formats)
Java Speech Grammar Format (JSGF)
Nuance GSL
Microsoft Speech Application Programmer’s Interface
(SAPI)
Conversational Technologies
28
Semantic Tags







Reduce variability of expression
Assign return values to recognized strings
W3C Semantic Interpretation for Speech
Recognition (SISR)
JSGF tags
SAPI tags
IBM ECMAScript tags
Nuance GSL
Conversational Technologies
29
Capabilities of Tag Formats
Assign tokens to strings (JSGF)
Yeah  yes
 Create key-value pairs (SAPI)



“to chicago”  <destination>ord</destination>
Perform computations (SISR, IBM,GSL)


“three days from now”  August 26, 2007
“two medium and three large pizzas”  5 pizzas
Conversational Technologies
30
SISR Tags for “yes” and “no”
<rule id="yes">
<one-of>
<item>yes</item>
<item>yeah<tag>yes</tag></item>
<item><token>you bet</token><tag>yes</tag></item>
<item xml:lang="fr-CA">oui<tag>yes</tag></item>
</one-of>
</rule>
<rule id="no">
<one-of>
<item>no</item>
<item>nope</item>
<item><token>no way</token></item>
</one-of>
<tag>no</tag>
</rule>
Conversational Technologies
31
GSL Token
DigitValue [
([zero oh] one) { return (01) }
...]
“oh one”  01
Conversational Technologies
32
SISR Slot/Value
"I would like a small coca cola and three large pizzas with
pepperoni and mushrooms.”
<rule id="order">
I would like a
<ruleref uri="#drink"/>
<tag>out.drink = new Object();
out.drink.liquid=rules.drink.type;
out.drink.drinksize=rules.drink.drinksize;</tag>
and
<ruleref uri="#pizza"/>
<tag>out.pizza=rules.pizza;</tag>
</rule>
Conversational Technologies
33
GSL Slot/Value
;GSL 2.0;
ColoredObject:public (Color Object)
Color [
[red pink]
{ <color red> }
[yellow canary] { <color yellow> }
[green khaki] { <color green> }
]
Object [
[truck car] { <object vehicle> }
[ball block] { <object toy> }
[shirt blouse] { <object clothing> }
]
Conversational Technologies
34
SAPI Slot-Value
<RULE name="elvis">
<L PROPNAME="artist">
<P VALSTR="elvis_presley">elvis <O>presley</O></P>
<P VALSTR="elvis_presley">the king</P>
</L>
</RULE>
Conversational Technologies
35
Problems with Tagged Grammars





Hard to maintain when complex
Hard to anticipate all the variations in how
someone might say something
Can use wildcards/garbage to ignore parts of
utterance
Speech recognition suffers when grammars
are too complex
Speech recognition suffers when wildcards
are used
Conversational Technologies
36
Statistical Language Models (SLM’s)



Speech recognition is based on statistical
models, not grammars
In commercial systems, natural language
processing is a process of classification,
relatively coarse meaning extraction
Works well if goal is to extract very simple
meanings
Conversational Technologies
37
Stages in SLM Processing

Ngram speech recognition: probabilities of word
sequences, usually 2-3 words



Much more flexible (but less accurate) than a grammar
However, accuracy is not as critical with SLM’s because you
don’t have to get every single word right
Text classification: given a text, assign it to
categories based on training from previous texts

There are many algorithms for classification
Conversational Technologies
38
Problems with SLM’s



Less accurate than CFG’s
Expensive to implement and maintain
Require a lot of data for good performance
Conversational Technologies
39
Tagged Grammars or SLM’s?



Deeply nested menus  SLM’s
Complex applications with many slots to fill
and precise meanings needed  grammars
Can combine both approaches in one
application


Front-end SLM followed by grammar
Prompt asks specific question to catch most
common tasks but has “other” category
Conversational Technologies
40
Other Combination Approaches


Use SLM technology to recognize but
grammar to interpret
Rules combined with SLM’s



Robust parsing
Rules combined with wildcard
I want um make that a large pizza with
pepperoni and onions
Conversational Technologies
41
Emerging Standards: EMMA



EMMA (Extensible Multi-Modal Annotation)
Developed by the World Wide Web
Consortium Multimodal Interaction Working
Group
An XML format for representing users’ inputs
and the results of processing them
Conversational Technologies
42
How does EMMA relate to natural
language understanding?

EMMA represents the results of a natural
language understanding process
Conversational Technologies
43
EMMA Benefits (1)

EMMA’s standard format lets all kinds of
EMMA producers (multimodal modality
components) exchange results






handwriting recognizers
speech recognizers
text classifiers
face recognizers
speaker identification and verification
…
Conversational Technologies
44
EMMA Benefits (2)

Through “<derived-from>”, provides a way
for “specialist” processing components to
cooperate in processing a single input
Speech
recognition
Ngram
speech
recognition
Lexical
lookup
Part of
Speech
tagging
Parsing
Semantic
analysis
Classification
Conversational Technologies
45
EMMA Example – (1) Annotation
Elements
from philadelphia to boston and i want a vegetarian meal
<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma/">
<emma:info>
<application>airline</application>
</emma:info>
<emma:model>
<model class="airline">
<source></source>
<destination></destination>
<days></days>
<meals></meals>
</model></emma:model>
<emma:model>
Conversational Technologies
46
EMMA Example – (2) Annotation
Attributes
<emma:interpretation
id="interp5
emma:start="1186519245101"
emma:mode="speech“
emma:end="1186519248391“
emma:confidence="0.03"
emma:function="dialog"
emma:duration="3290"
emma:uninterpreted="false“
emma:lang="en-US"
emma:verbal="true"
emma:dialog-turn=“1"
emma:tokens="from philadelphia to boston and i want a vegetarian meal "
emma:medium="acoustic"
emma:process="file://Microsoft Speech Recognizer 8.0 for Windows (English - US), SAPI5, Microsoft" >
/>
Conversational Technologies
47
EMMA Example (3) Application
Semantics
<source>philadelphia </source>
<destination>boston</destination>
<meal>vegetarian</meal>
Conversational Technologies
48
Part 2: Detailed Examples
Conversational Technologies
49
SAPI XML Grammar Examples






Windows Speech Recognition (Vista)
Office 2003 Speech Recognition
Example – music player interface
I’d like to hear Beethoven’s 5th
Please play Brandenburg Concertos by Bach
Play something by Elvis
Conversational Technologies
50
Canonicalizing Forms
<RULE name="elvis">
<L PROPNAME="artist">
<P VALSTR="elvis_presley">elvis <O>presley</O></P>
<P VALSTR="elvis_presley">the king</P>
</L>
</RULE>
Conversational Technologies
51
Canonicalizing Forms (2)
<RULE name="name">
<L PROPNAME="name">
<P VALSTR="opus 125">ninth <O>symphony</O></P>
<P VALSTR="opus 92">seventh <O>symphony</O></P>
<P VALSTR="opus 67">fifth <O>symphony</O></P>
<P VALSTR="brandenburg_concertos">Brandenburg Concertos</P>
<P VALSTR="opus 55">third symphony</P>
<P VALSTR="you ain't nothing but a hound dog">hound dog</P>
<P VALSTR="anything">something</P>
<P VALSTR="anything">anything</P>
<P VALSTR="opus 3">symphony in d major <O>opus 3</O></P>
</L>
</RULE>
Conversational Technologies
52
Disambiguating
<RULE name="jsbach">
<P PROPNAME="composer" VALSTR="johann_sebastian_bach">
<O>
<L><P>J S </P>
<P>Johann Sebastian</P>
</L>
</O>
<P>Bach</P>
</P>
</RULE>
<RULE name="jcbach">
<P PROPNAME="composer" VALSTR="johann_christian_bach">
<L>
<P>J C </P>
<P>Johann Christian</P>
</L>
<P>Bach</P>
</P>
</RULE>
Conversational Technologies
53
SLM Examples




Meta-utterances for channel control
I’m confused
Speak louder please
Could you say that again?
Conversational Technologies
54
Training Data

Find out how people ask these questions
Manually tag them with their categories

Category:repeat
could you say that again please
i didn't catch that
sorry
pardon me?
repeat that please
say that again
what?
Category:operator
I need to speak to a human
are there any humans I can talk to?
please get me an operator
I want an operator
operator please
I need an agent
Conversational Technologies
55
Use NGram Speech Grammar




Ngrams are sets of two or three words and
the probabilities that they’ll occur together in
that order
Much less constrained than CFG’s
Less accurate
Used in “How may I help you?” applications,
dictation systems, and research
Conversational Technologies
56
Use Text Classification Software



Uses training data to develop probabilities
that a new text is in one of the training
categories
Many algorithms and approaches to text
classification
Similar to the technology used in spam filters,
but input is speech
Conversational Technologies
57
Example

User says:
Pardon me, I didn’t catch that
Speech recognizer hears:
party may i didn't catch that

Classifier classifies
increase_volume
0.4595725150090289
decrease_volume 0.0
slower 0.0
faster 0.0
confused 0.4447495899966607
repeat 0.567774973957669
Conversational Technologies
operator 0.5163977794943222
58
EMMA Text Input Example
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma/">
<emma:interpretation id="interp4 emma:duration="3038"
emma:confidence="1.0" emma:process="file://Microsoft Speech
Recognizer 8.0 for Windows (English - US), SAPI5, Microsoft"
emma:medium="tactile" emma:verbal="true" emma:mode="keys"
emma:start="1187040519583" emma:uninterpreted="false"
emma:function="dialog" emma:dialog-turn="4"
emma:end="1187040737446" emma:lang="en-US" emma:tokens="i'd like
to go from boston to philadelphia on tuesday " >
<source>boston</source>
<destination>philadelphia</destination>
<day>Tuesday</day>
</emma:interpretation>
</emma:emma>
Conversational Technologies
59
EMMA: Classification Example
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma/">
<emma:interpretation id="interp4 emma:duration="3038"
emma:confidence=“.5" emma:process=“tech-support-slm"
emma:medium=“acoustic" emma:verbal="true" emma:mode=“voice"
emma:start="1187040519583" emma:uninterpreted="false"
emma:function="dialog" emma:dialog-turn="4"
emma:end="1187040737446" emma:lang="en-US" emma:tokens=“my
internet connection keeps going off " >
<problem>internet connectivity</problem>
</emma:interpretation>
</emma:emma>
Conversational Technologies
60
Natural Language Research


Natural language processing is an active area of
academic and industrial research
Topics studied include spoken dialog processing, text
understanding, natural language generation,
automatic translation, acquisition of natural language
information such as words and grammars,
information extraction, summarization and support
for search
Conversational Technologies
61
Natural Language Research

Most interesting to this audience are topics such as






Broadening domains (sense disambiguation and parsing
disambiguation)
Handling spoken dialog phenomena such as pronouns and
ellipses
Handling speech errors such as hesitations, false starts
Multimodal communication, such as integrating speech and
gestures
Extracting information provided by prosody and other
suprasegmentals
The main academic organization is The Association
for Computational Linguistics (www.aclweb.org)
Conversational Technologies
62
More Information: Websites
W3C Voice Browser WG SISR
http://www.w3.org/TR/semantic-interpretation/

W3C Multimodal Interaction WG (EMMA)
http://www.w3.org/TR/emma/

Association for Computational Linguistics (www.aclweb.org)

Loquendo Café (for testing SISR grammars)
http://www.loquendocafe.com

Voxeo Prophecy Platform (for testing Nuance grammars)
www.voxeo.com

SAPI XML grammars (test with Windows Speech Recognition or Office
2003 Microsoft 6.1 recognizer)
http://www.microsoft.com/speech/SDK/51/sapi.chm

Conversational Technologies
http://www.conversational-technologies.com

Conversational Technologies
63
More Information: Books, Journals,
Articles

“Natural Language Processing: the Next Steps”
(September 2006)
http://www.speechtechmag.com/Articles/ReadArticle.aspx?ArticleID=29474

Speech and Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics and Speech
Recognition by Daniel Jurafsky and James H. Martin (2000)


Computational Linguistics
Natural Language Engineering
Conversational Technologies
64