Sequence Package Analysis

Download Report

Transcript Sequence Package Analysis

Natural Language Processing and Cognitive Science
NLPCS 2012
University of Wroclaw, School of Economics
Wroclaw, Poland
June 28, 2012
Amy Neustein, Ph.D.
Founder and CEO
Linguistic Technology Systems
www.lingtechsys.com
The Role of Cognitive Science in Natural
Language Processing
Goals:
 Broaden definition of Cognitive Science (CS) to make its
application to NLP timeless – as opposed to fashionable
(e.g. neurolinguistics, psycholinguistics, semiotics, social
psychology, theoretical linguistics, etc.)
 Use a generic definition of CS –an interdisciplinary field
of study concerned with how information is represented,
processed and transformed – to encompass certain
disciplines that are traditionally outside the scope of
cognitive studies.
What Constitutes a Sensible Interdisciplinary
Approach to NLP?
 Selecting apposite methods for EACH task at hand
rather than broadly applying methods – individually or
collectively – of human language study
 Understanding both the method AND its origins when
applying different disciplines to solve natural language
problems
RELATED DISCIPLINES FOR THE
STUDY OF HUMAN LANGUAGE
Discourse Analysis
and Computational
Linguistics
Cognitive
Psychology and
Psycholinguistics
Artificial
Intelligence, Soft
Computing, and
Argument-Based
Computing
Sociology,
Sociolinguistics,
and Conversation
Analysis
Linguistic
Philosophy and
Speech Act Theory
EXAMPLES OF CHALLENGING NLP TASKS THAT MAY
BENEFIT FROM A COGNITIVE SCIENCE APPROACH
Machine Translation of Web
Pages (and cross-lingual text
mining in which material is
extracted from a specific
portion of the text ) in UnderResourced Languages
Computer-Aided Bilingual
Instruction for HearingImpaired Primary School
Students that use Situated
Learning Techniques
Intelligent Tutoring
Systems that Can Reach
Deeper levels of
Understanding by
Responding to Human
Emotions and
Disengagement Behavior
Design of Spoken Dialog
Systems that Must Adjust
to Ambiguous Customer
Requests or to Complaints
that do not Contain
Standard Keywords used to
Express Anger/Frustration
Design of Electronic
Dictionaries and Search
Engines that Conform to
Dynamic (rather than
static) Referential Practices
NLP PROBLEMS POSED BY UNDER-RESOURCED
LANUAGES WHICH CANNOT BE SOLVED STATISTICALLY
Under-resourced languages pose several problems for computational
modeling:
1)They produce small quantities of parallel corpus data;
2)Speech recognition accuracy is compromised by small quantities of
parallel corpus data (this is especially true for computational (statistical)
models that have come to depend on large amounts of corpus data to
perform with a high level of recognition accuracy); and
3)Performing disambiguation of sense-meaning of words (among other
NLP tasks) can be hampered by limited parallel corpus data.
COGNITIVE STRATEGIES FOR WORD
SENSE DISAMBIGUATION (WSD)
A Lexically Sensitive Model of WSD
(Kwong 2012)
How Does it Work and Why is it
Important?
Given that intrinsic properties of words
are closely related to our cognition, a
lexically sensitive model of WSD presents
one possible solution to the ambiguities
found in natural language
Such a model would separate words, or
more accurately word senses, into fairly
distinct groups (sense types) according to
their responses to disambiguation, based
on different knowledge sources
Such sense types go beyond simple
linguistic categories such as POS because
are more likely to be semantic and
perceptual
SEPARATING WORDS INTO SENSE TYPES
ACHIEVES DISAMBIGUATION
By separating words into sense
types, the knowledge pertaining
to information susceptibility of
target words (the relation
between the intrinsic properties
of a word and the effectiveness of
various types of lexico-semantic
knowledge to characterize and
disambiguate it) can help finetune WSD systems and inform
the optimal combination of
knowledge sources for
disambiguation.
USING PSYCHOLINGUISTIC EVIDENCE TO RESOLVE
AMBIGUITIES PRESENT IN NATURAL LANGUAGE
Kwong (2012) demonstrates that a
lexically sensitive model for WSD,
one that combines both a
cognitive and computational
perspective, will better inform
automatic systems with
psycholinguistic evidence instead
of “resting entirely and helplessly
with specific machine learning
algorithms and their feature
selection mechanisms” (p. 92).
AN INTERDISCIPLINARY APPROACH
TO THE DESIGN OF SPOKEN DIALOG SYSTEMS
Bel-Enguix and Jimenez-Lopez propose
Conversational Grammar Systems (CGS) to model
dialog as inter-action: “a sequence of acts performed
by two or more agents in a common environment”
(2008: 209).
The authors drew from the conversation analytic
literature, which they combined with their knowledge
of computational linguistics, formal language theory,
and speech act theory, maintaining that because the
“investigation and modeling of human language is
clearly an interdisciplinary task…methods for
language technology have to come from different
disciplines” (p. 219)
THE PROBLEM OF METAPHORS FOR
SPOKEN LANGUAGE UNDERSTANDING
Barnden (2008) explored the problem presented
by the use of metaphors when performing both
text and speech-based NLP tasks, pointing out
that while the problem of metaphor may be
viewed “as a peripheral problem (perhaps mostly
to do with poetry and other literary language) it
is in fact a pervasive feature of mundane
language…” (p. 121).
Barnden employed an interdisciplinary
approach, augmenting discourse analysis with
conversation analysis by drawing on the research
of Paul Drew and Elizabeth Holt (1998) who
showed how speakers employ the art of
metaphor to achieve topic transition in
conversation.
HOW AN INTERDISCIPLINARY APPROACH
AIDS THE DIALOG MANAGER
HERE IS AN ILLUSTRATION OF HOW AN
INTERDISCIPLINARY APPROACH ASSISTS THE
DIALOG MANAGER IN GUAGING WHAT IS
HAPPENING AT EACH TURN:
ANALYZING THE TURN-TAKING FEATURES
OF CONCESSIVE CONNECTORS
Popescu, Caelen, and Burileau (2009) discuss
the importance of a dialog manager’s correct
reading of a “concessive connector”
(Moeschler and Reboul, 1994) or what might
be seen as a clue word – “so,” “anyway,” “now,”
“but,” “although” – that connects the various
utterances (or context space, Reichman ,
1985) that comprise a multi-utterance
speaking turn.
Computational linguists, in the absence of
conversation analysts who study in
painstaking detail the turn-taking features of
talk-in-interaction (and how speakers
demonstrate, through the design of their
speaking turns, their understanding and
interpretation of each other’s social actions)
can misinterpret the meaning/function of the
concessive connector which may in fact NOT
serve its literal meaning of “connecting” parts
of a multi-utterance speaking turn (Neustein,
2001; 2004; 2007; 2011)
GIVING DIALOG MANAGER TWO CONSTRASTIVE
SCENARIOS FOR CONCESSIVE CONNECTOR
The current speaker, immediately
after producing the concessive
connector (“so”), displays a
“holding” silence (Jefferson, 1983;
1986) – an abrupt silence
accompanied by marked inhalation
that indicates the speaker’s intent to
“hold” the turn and continue
speaking.
The current speaker, unlike the first
scenario, immediately after
producing a concessive connector
(“so”), displays a “trail off” silence –
a gradual silence accompanied by
exhalation – the kind of silence that
provides a clear transition relevance
place for the other speaker to begin
to speak.
AIDING THE DIALOG MANAGER BY ADDING
CONVERSATION ANALYSIS TO COMPUTATIONAL
LINGUISTICS
The two scenarios show that a concessive connector or clue
word does not always serve the purpose of connecting the
utterances of a multi-utterance turn because as
demonstrated in the second scenario, the speaker’s intent
may be to yield his turn to the next speaker rather than to
continue speaking.
This is amply demonstrated by the speaker’s use of a clue
word followed by a “trail off” silence, which indicates the
speaker’s intent to relinquish his turn to the next speaker.
Dialog managers must be able to recognize some of the
formal properties of conversation interaction, such as the
difference between a “holding” silence and a “trail off”
silence to gauge what is happening at each turn.
LIMITATIONS OF NATURAL LANGUAGE UNDERSTANDING
METHODS FOR PARSING OF CALL CENTER DIALOG
Since it would be practically
impossible to construct grammars
that could cover all spontaneous
utterances, including all
concomitant disfluencies, robust
parsing of spontaneous speech has
proven to be a practical alternative
to the crafting of rule-based
grammars (Pieraccini, 2012, p. 162).
Using statistical modeling
“conceptual HMMs can find the
most probable concepts represented
by a sequence of words, just as
acoustic HMMs can find the most
probable phonemes for a given
sequence of acoustic observations.”
(p. 164).
Reduced to a sequences of words
and their associated probabilities
depending on context, robust
parsing methods, however, can be
severely hampered when keywords
are not found in the dialog
(Neustein, 2001; 2006;2007)
ANGRY UTTERANCES THAT ELUDE THE NATURAL
LANGUAGE UNDERSTANDING MODULE
Keywords associated with
anger and frustration
“cancel my
account”
“give me a
supervisor”
“I’m switching
to X
(competitor)”
When keywords are absent
from the dialog, the Spoken
Language Understanding
module fails to identify
angry/frustrated customers.
ADDING CONVERSATION ANALYSIS TO THE DESIGN OF
SPOKEN DIALOG SYSTEMS TO IDENTIFY SPEAKER STATE
GOALS:
To devise a parsing
method that builds
conversation analysis into
Spoken Dialog Systems
To base this parsing
method on a statistical
language modeling
approach to understanding
natural language dialog in
lieu of rule-based
grammars that anticipate
all constructions of
spontaneous utterances
and their associated
disfluencies
GOALS, continued
To build a BNF table (built upon
more elemental units) consisting of
a set of non-terminals – contextfree grammatical units and their
related prosodic features for which
there is a corresponding list of
interchangeable terminals (words,
phrases, or a whole utterance)
(Neustein, 2007)
To build this multi-tiered BNF table
with an elaborate incremental
design of complex grammatical
units that capture the kind of
speaker state data
(angry/frustrated) that elude
natural language systems that
search for standard keywords (e.g.
“cancel my account”) (Neustein,
2006; 2011)
NEW NLU METHOD FOR BUILDING MULTI-TIERED
TABLE OF SPEAKER-STATE PARSING STRUCTURES
SEQUENCE PACKAGE ANALYSIS
(or SPA) constitutes a new NLU
method for classifying speaker
state (Neustein, 2001; 2004; 2006;
2011)
SPA algorithms identify in spoken
language dialog (and blogs,
tweets, and other social media)
the conversational sequence
patterns of natural language
dialog that reflect elusive,
sometimes confounding, human
emotions
SPA draws from the field of
conversation analysis, a rigorous,
empirically- based method of
recording and transcribing verbal
interaction (using highly refined
transcription symbols to identify
linguistic and paralinguistic
features) to study how speakers
demonstrate, through the design
of their speaking turn, their
understanding and interpretation
of each other’s social actions
HOW DOES SPA WORK?
SPA relies more on the
sequence package (a series of
related turns and turn
construction units or part of
turns that are discretely
packaged a sequence of
conversational interaction) in
its entirety, as the primary unit
of analysis, than on isolated
syntactic parts
By marking sequence package
boundaries and specifying
package properties, the SPAenhanced mining program
gives the software downstream
the contextual indicia—the
precise location points in the
flow of interactive dialog,
signifying the different
conversational activities and
phases of the dialog—needed
to interpret the rest of the data
stream reliably.
By parsing dialog for its
relevant sequence packages
that are discretely packaged a
sequence of conversational
interaction– the SPA designed
natural language interface
extracts important data,
including emotional content on
speaker state, by looking at the
sequential order and frequency
of the totality of the contextfree grammatical components
that make up each sequence
package
ILLUSTRATION OF ANGRY CALLER IN THE ABSENCE OF
STANDARD KEYWORDS THAT SIGNIFY AN ANGRY CALLER
Note: Punctuation
symbols below are
acoustic and not
grammatical:
question marks
appear midsentence to indicate
an upward query at
that location point
in the dialog;
exclamatory marker
is used to indicate a
rise in inflection
• Caller: Absolutely
unbelievable! What is your?
name
• Agent: Mr. Smith
• Caller: Well! I intend to take
this much further…This is
just absolutely ridiculous!
SEQUENCE PACKAGES AND
CORRESPONDING ANGER INDEX
 Absolutely Unbelievable! <Exaggerative Qualifier> (8)
 What is your? name <Identification Request with




Inflection> (non sequitur; accusatory tone as
indicated by displaced (mid- sentence) inflection) (9)
Well! <Exclamation with Prosody> (7)
I intend to take this much further…<Declarative
Assertion> (9)
This is absolutely ridiculous! <Exaggerative Qualifier>
(8)
Total Score for Customer Anger Index: 41
ACCRETION OF MORE ELEMENTAL PARSING
FEATURES IN ANGRY CALLER EXAMPLE
A “very angry complaint,” is
illustrated on the BNF table
as the natural accretion of
its more elemental parsing
features:
assertions
Exaggerations
declarations
SALIANCE VALUE ATTRIBUTED TO PARSING
STRUCTURES
• Note: For the purposes of this
illustration, I am not addressing
the smaller POS grammatical units
that make up the larger parsing
structures, such as exaggerative
qualifiers or exclamation with
prosody, since it is a given that a
spoken language system would
identify the smaller units that
make up these larger parsing
structures.
Descriptors (“absolutely
unbelievable” “absolutely
ridiculous”) have “high
salience value” (they cooccur with the emotion
class “anger” or “surprise” as
opposed to a low salience
value ascribed to more
neutral words, such as
“continue” or “yes”); yet
there are still no “catch”
phrases or standard
keywords in dialog to
signify an irate caller
DESIGN OF DYNAMIC ELECTRONIC DICTIONARIES AND DYNAMIC
SEARCH ENGINES TO CONFORM TO HUMAN LANUAGE USE
Language Use is Not a Static Process: The
meaning of words continually evolve as words
derive their meaning from their contextual
usage and in turn “reflexively” re-define
context through their use (e.g., political usage
of “transparency” or “bailout”)
Query-Based Search is a Not Static Process:
Users do not engage in what is known as
“precise search” which presupposes that user
know exactly what to look for: a precise paper
knowing its title, authors, and major theme.
It is, therefore, not unusual for users to input
search terms that are different from index
terms used by the system (Kboubi, et al. 2012)
ELECTRONIC DICTIONARIES
Electronic dictionaries
“promise dynamic, proactive
search via multiple criteria
(meaning, sound, related
words) and via diverse access
routes.
Navigation (supported by our
understanding of the mental
lexicon and an integration of
these findings into the design
of electronic dictionaries)
takes place in a huge
conceptual lexical space, and
the results are displayable in a
multitude of forms (e.g. as
trees, as lists, as graphs, or
sorted alphabetically, by
topic, by frequency)” (Zock
and Rapp, 2012).
DYNAMIC AND FLEXIBLE SEARCH ENGINES
Kboubi, et al. (2012) Propose Alternative Search
Types:
• “thematic search” (allowing users to navigate the
corpus according to a particular theme)
• “connotative search” (allowing users to discover the
associated and similar concepts to their target
concepts)
• “exploratory search” (allowing users to ‘consult’
with the corpus so that they will derive a better idea
of what they were not able to initially define)
APPLYING CONVERSATION ANALYSIS TO STUDY
WEB USERS’ DYNAMIC SEARCH PROCESS
Moore (2012)
showed that web
searchers display
interactional
competencies
found in
conversational
dialog in the course
of their query
formulations
produced during
an Internet search:
• First, they formulate
their queries using
names for the entity that
occupies their online
search
• Second, they resort to
generic descriptions
when they don’t know
the name of the entity in
question
• Third, they use the newly
learned name (uncovered
during a generic search)
as opposed to generic
descriptions in all
subsequent searches
ONLINE SEARCH PROBLEMS AND THEIR
CONVERSATIONAL DIALOG COUNTERPART
REFERENCE GENERATION: In the process of preforming reference
generation the speaker must move from his own “egocentric point of
view…to the listener’s position” so that the referring expressions are aimed at
the listener’s frame of reference and cognitive state (Zock, et al., 2012). In
conversation speakers display a preference for recognitionals [referring
expressions that are recognized/understood by the other speaker] as
“stronger than the preference for minimization” [use of a single reference
form, usually a name] (Sacks and Schegloff, 1979).
SIGNS OF TROUBLE: When listeners fail to recognize the referring
expression (e.g., the name of a person or object), speakers tend to become
verbose, giving multiple descriptions in an attempt to gain recognition over
concision. “[W]hen [online] search queries are verbose due to their
descriptive nature, they can be taken as signs of interactional trouble and of
a knowledge gap on the part of the user…The occurrence of verbose queries
suggests that the user chose to relax the preference for minimization
[usually a single reference form]…After a few failed queries… [users] often
formulated their queries as grammatical questions thereby increasing query
length” (Moore, 2012)
ARRIVING AT NATURAL LANGUAGE
SOLUTIONS TO WEB SEARCH CHALLENGES
Determine if web search problems which fail to bring up useful search results are
caused by scarcity of available information on the web OR by users’ failure to use a
correct entity name?
Look for indications of what Moore (2012) calls “kitchen-sink” queries (in the absence
of proper entity names, users progressively pile on additional descriptive words and
phrases until they become unwieldy)
Break online search sessions into segments. Pay close attention to midway points in
users’ on line search sessions where long/verbose queries (“kitchen-sink”
queries)signifying trouble are most likely to occur during difficult tasks (Moore 2012)
A NATURAL LANGUAGE HEURISTIC FOR
IMPROVING ONLINE SEARCH RESULTS
Similar to webbased
applications for
teaching foreign
speakers how to
perform reference
generation (Zock
et al., 2012), users
can be trained to
use the correct
search item in the
following way:
• Using both images of entity descriptors and text
descriptions that provide the best alternate match
for the user’s incorrect entity name, users can be
taught the correct entity name for future online
search purposes (i.e., “large pot” V. “casserole pot”)
• Users can be presented with a pop up link labeled ‘I
don’t know what it’s called’ (Moore 2012) when they
first begin to demonstrate signs of trouble (e.g.,
repetitive “kitchen –sink” queries). This link may be
used as a standard query option (Moore 2012) or it
may be activated only if users display search
difficulties.
REFERENCES
Barnden, J. A., 2008.
Challenges in natural
language
understanding: the
case of metaphor
(commentary).
International Journal
of Speech
Technology, 11(3-4):
121-123.
Bel-Enguix, G.,
Jimenez-Lopez,
M.D., 2008.
Modelling dialog as
inter-action.
International Journal
of Speech
Technology, 11(3-4):
209-221.
Drew, P., Holt, E.,
1998. Figures of
speech: Figurative
expressions and the
management of topic
transition in
conversation.
Language in Society,
27(4): 495-522.
Kwong, O., 2012. New
Perspectives on
Computational and
Cognitive Strategies
for Word Sense
Disambiguation.
SpringerBriefs, Series
in Speech
Technology
(Neustein, A., Ed.)
Springer-Verlag,
Berlin Heidelberg
New York.
REFERENCES, CONT.
Jefferson, G., 1983.
On a failed
hypothesis:
“Conjuctionals” as
overlap vulnerable.
Tilburg Papers in
Language and
Literature, 28: 29-33.
Jefferson, G., 1986.
Notes on ‘latency’ in
overlap onset.
Human Studies, 9(23): 153-183.
Kboubi, F., Habacha
Chaibi, A., and
BenAhmed, M., 2012.
Semantic
visualization and
navigation in textual
corpus. International
Journal of
Information Sciences
and Techniques, 2
(1):53-63
Moeschler, J.,
Reboul, A., 1994.
Dictionnaire
Encyclopedique de
pragmatique. Seuil,
Paris.
REFERENCES, CONT.
Moore, R. J., forthcoming.
A Name is Worth a
Thousand Pictures:
Referential Practice
inSearch Engine
Interactions. In A.
Neustein and J. M.
Markowitz (Eds.), Machine
Talk: The Next Generation
of Natural Language
Processing and Speech
Technology. SpringerVerlag,Berlin Heidelberg
New York.
Neustein, A., 2001. Using
sequence package analysis
to improve natural
language understanding.
International Journal of
Speech Technology, 4(1):
31-44.
Neustein, A., 2004.
Sequence Package
Analysis: a new natural
language understanding
method for performing
data mining of help-line
calls and doctor-patient
interviews. In B. Sharp
(Ed.), Proceedings of first
international workshop on
natural language
understanding and
cognitive science, ICEIS
2004, University of
Portugal, (April 13) pp. 6474.
Neustein, A., 2006.
Sequence Package
Analysis: A new natural
language understanding
method for improving
human response in critical
systems. International
Journal of
SpeechTechnology, 9(3-4):
109-120.
REFERENCES, CONT.
Neustein, A., 2007. Sequence
Package Analysis: A new method for
intelligent mining of patient dialog,
blogs and help-line calls. Journal of
Computers, 2(10): 45-51.
Neustein, A., 2011. Sequence Package
Analysis and Soft Computing:
Introducing a new hybrid method to
adjust to the fluid and dynamic
nature of human speech. In E.
Corchado,V. Snasel, J. Sedano, A.E.
Hassanien, J.L. Calvo, and D. Slezak
(Eds.) Soft ComputingModels in
Industrial and Environmental
Applications, 6th International
Conference SOCO2011: Advances in
Intelligent and Soft Computing,
Volume 87, Springer-Verlag,
BerlinHeidelberg New York, pp. 1-10.
Pieraccini, R., 2012. The Voice in the
Machine. MIT Press, Cambridge,
Mass.
REFERENCES, CONT.
Popescu, V., Caelen, J.,
and Burileanu, D., 2009.
A constraint satisfaction
approach to contextsensitive utterance
generation in multiparty dialogue systems.
International Journal of
Speech Technology, 12(23): 95-112.
Sacks, H., Schegloff, E.
A., 1979. Two preferences
in the organization of
reference to persons in
conversation and their
interaction. In G.
Psathas (Ed.), Everyday
Language: Studies in
Ethnomethodology.
Irvington Publishers,
Inc., New York, pp. 15-21.
Zock, M., Lapalme, G.,
and Yousfi-Monod, M.,
2012. Learn to speak like
normal people do: The
case of object
descriptions. In B. Sharp
and M. Zock (Eds.),
Proceedings of 9th
International Workshop
on Natural Language
Processing and
Cognitive Science.
Wroclaw University of
Economics, Wroclaw,
Poland (June 28, 2012).
Zock, M., Rapp, R., 2012.
Cognitive Aspects of the
Lexicon (CogALex-III).
Workshop in
conjunction with the
24th International
Conference on
Computational
Linguistics , Mumbai,
India, (December 8-15,
2012)